Extract Recorddefs
bots_airflow.devtools.extract_recorddefs is a developer utility for turning a
large shared recorddefs catalog into a small Python module that only contains
the segments your grammar actually uses.
This is useful when you are moving legacy Bots grammars into runtime modules and do not want to carry a large project-wide recorddefs file into every grammar package.
What the tool expects
The extractor works from Python modules and supports either import paths or file paths.
--source-recorddefsmust point to a module that exposesrecorddefs = {...}--grammarmust point to one or more grammar modules that exposestructure = [...]--segmentcan be used to add segment ids explicitly--outputis the Python file to generate
The tool walks the grammar structure, collects segment ids, verifies that those
segments exist in the source recorddefs, and writes a reduced module that
contains only the selected definitions.
Command syntax
Run the command from an environment where bots_airflow is installed. For a
local source checkout, either install the package in editable mode or set
PYTHONPATH=src for the command invocation.
python -m bots_airflow.devtools.extract_recorddefs \
--source-recorddefs SOURCE \
--grammar GRAMMAR \
[--grammar GRAMMAR ...] \
[--segment SEGMENT ...] \
--output OUTPUT
Example with import paths
This is the normal case when your runtime modules are already importable in your development environment:
python -m bots_airflow.devtools.extract_recorddefs \
--source-recorddefs my_company_edi.legacy.recorddefs_004010 \
--grammar my_company_edi.grammars.x12.orders_850 \
--output my_company_edi/grammars/x12/segments_850.py
If orders_850 references ST, BEG, REF, PO1, and SE in its
structure, the generated file will contain only those recorddefs entries.
Example with file paths
You can also point the tool directly at Python files:
python -m bots_airflow.devtools.extract_recorddefs \
--source-recorddefs ./legacy/recorddefs_004010.py \
--grammar ./runtime/grammars/x12/orders_850.py \
--output ./runtime/grammars/x12/segments_850.py
This is useful when you are still reorganizing a codebase and the target modules
are not yet on PYTHONPATH.
Example with multiple grammars and manual segments
You can merge the segment requirements from multiple grammars into one output module, and you can add manual segments when needed:
python -m bots_airflow.devtools.extract_recorddefs \
--source-recorddefs my_company_edi.legacy.recorddefs_004010 \
--grammar my_company_edi.grammars.x12.orders_850 \
--grammar my_company_edi.grammars.x12.ship_notice_856 \
--segment ISA \
--segment GS \
--output my_company_edi/grammars/x12/segments_004010_core.py
Use --segment when:
- you want to include envelope or support segments that are not in the inspected grammar module
- you want one shared reduced catalog for several related grammars
- you want to pin a segment in the generated output even before a grammar starts using it
Generated output
The output is a normal Python module with a header that records the source and selection inputs:
"""
Generated by bots_airflow.devtools.extract_recorddefs.
Source: tests.runtime_modules.legacy_recorddefs
Grammar selection: tests.runtime_modules.grammars.x12.sample_850
Segments: BEG, PO1, REF, SE, ST
"""
recorddefs = {
"BEG": [["BOTSID", "M", 3, "AN"], ["BEG01", "M", 2, "AN"]],
"PO1": [["BOTSID", "M", 3, "AN"], ["PO101", "C", 20, "AN"]],
}
In practice the generated module will contain the full field definitions copied from your source catalog.
Programmatic use
If you want to build your own wrapper script or test the output in memory, use
build_recorddefs_module(...) directly:
from bots_airflow.devtools.extract_recorddefs import build_recorddefs_module
segments, extracted, rendered = build_recorddefs_module(
source_reference="my_company_edi.legacy.recorddefs_004010",
grammar_references=["my_company_edi.grammars.x12.orders_850"],
segments=["ISA", "GS"],
)
print(segments)
print(rendered)
Return values:
segments: sorted segment ids selected for outputextracted: the reducedrecorddefsdictionaryrendered: the generated Python module source
Failure modes
The command fails fast in a few important cases:
- the source module does not expose a
recorddefsdict - a grammar module does not expose a
structurelist - no segments are selected from
--grammaror--segment - the selected segment ids are not present in the source
recorddefs
That behavior is deliberate. The tool is meant to produce a reviewed, committed Python module rather than silently generating a partial or ambiguous result.