Skip to content

Extract Recorddefs

bots_airflow.devtools.extract_recorddefs is a developer utility for turning a large shared recorddefs catalog into a small Python module that only contains the segments your grammar actually uses.

This is useful when you are moving legacy Bots grammars into runtime modules and do not want to carry a large project-wide recorddefs file into every grammar package.

What the tool expects

The extractor works from Python modules and supports either import paths or file paths.

  • --source-recorddefs must point to a module that exposes recorddefs = {...}
  • --grammar must point to one or more grammar modules that expose structure = [...]
  • --segment can be used to add segment ids explicitly
  • --output is the Python file to generate

The tool walks the grammar structure, collects segment ids, verifies that those segments exist in the source recorddefs, and writes a reduced module that contains only the selected definitions.

Command syntax

Run the command from an environment where bots_airflow is installed. For a local source checkout, either install the package in editable mode or set PYTHONPATH=src for the command invocation.

python -m bots_airflow.devtools.extract_recorddefs \
  --source-recorddefs SOURCE \
  --grammar GRAMMAR \
  [--grammar GRAMMAR ...] \
  [--segment SEGMENT ...] \
  --output OUTPUT

Example with import paths

This is the normal case when your runtime modules are already importable in your development environment:

python -m bots_airflow.devtools.extract_recorddefs \
  --source-recorddefs my_company_edi.legacy.recorddefs_004010 \
  --grammar my_company_edi.grammars.x12.orders_850 \
  --output my_company_edi/grammars/x12/segments_850.py

If orders_850 references ST, BEG, REF, PO1, and SE in its structure, the generated file will contain only those recorddefs entries.

Example with file paths

You can also point the tool directly at Python files:

python -m bots_airflow.devtools.extract_recorddefs \
  --source-recorddefs ./legacy/recorddefs_004010.py \
  --grammar ./runtime/grammars/x12/orders_850.py \
  --output ./runtime/grammars/x12/segments_850.py

This is useful when you are still reorganizing a codebase and the target modules are not yet on PYTHONPATH.

Example with multiple grammars and manual segments

You can merge the segment requirements from multiple grammars into one output module, and you can add manual segments when needed:

python -m bots_airflow.devtools.extract_recorddefs \
  --source-recorddefs my_company_edi.legacy.recorddefs_004010 \
  --grammar my_company_edi.grammars.x12.orders_850 \
  --grammar my_company_edi.grammars.x12.ship_notice_856 \
  --segment ISA \
  --segment GS \
  --output my_company_edi/grammars/x12/segments_004010_core.py

Use --segment when:

  • you want to include envelope or support segments that are not in the inspected grammar module
  • you want one shared reduced catalog for several related grammars
  • you want to pin a segment in the generated output even before a grammar starts using it

Generated output

The output is a normal Python module with a header that records the source and selection inputs:

"""
Generated by bots_airflow.devtools.extract_recorddefs.
Source: tests.runtime_modules.legacy_recorddefs
Grammar selection: tests.runtime_modules.grammars.x12.sample_850
Segments: BEG, PO1, REF, SE, ST
"""

recorddefs = {
    "BEG": [["BOTSID", "M", 3, "AN"], ["BEG01", "M", 2, "AN"]],
    "PO1": [["BOTSID", "M", 3, "AN"], ["PO101", "C", 20, "AN"]],
}

In practice the generated module will contain the full field definitions copied from your source catalog.

Programmatic use

If you want to build your own wrapper script or test the output in memory, use build_recorddefs_module(...) directly:

from bots_airflow.devtools.extract_recorddefs import build_recorddefs_module

segments, extracted, rendered = build_recorddefs_module(
    source_reference="my_company_edi.legacy.recorddefs_004010",
    grammar_references=["my_company_edi.grammars.x12.orders_850"],
    segments=["ISA", "GS"],
)

print(segments)
print(rendered)

Return values:

  • segments: sorted segment ids selected for output
  • extracted: the reduced recorddefs dictionary
  • rendered: the generated Python module source

Failure modes

The command fails fast in a few important cases:

  • the source module does not expose a recorddefs dict
  • a grammar module does not expose a structure list
  • no segments are selected from --grammar or --segment
  • the selected segment ids are not present in the source recorddefs

That behavior is deliberate. The tool is meant to produce a reviewed, committed Python module rather than silently generating a partial or ambiguous result.