.fzm files store a compressed payload together with everything needed to decompress it: the full pipeline stage graph, per-stage configuration, and buffer layout. You can decompress an .fzm file using only this specification and the element data types — no pipeline source code required.

All multi-byte integers are little-endian. The library enforces a little-endian host at configure time.

Version History

Version	`FZMHeaderCore` size	Changes
v3.0	72 bytes	Initial versioned format
v3.1	80 bytes	Added `flags`, `data_checksum`, `header_checksum` fields

Version field (2 bytes): high_byte = major, low_byte = minor. A major mismatch causes the read to throw. A minor mismatch emits a warning and continues. Legacy files stored a plain integer (e.g. 3); these are read as major=3, minor=0.

File Layout

┌───────────────────────────────────────────────────┐
│  FZMHeaderCore                     (80 bytes)     │
├───────────────────────────────────────────────────┤
│  FZMStageInfo × num_stages                        │
│  (256 bytes each)                                 │
├───────────────────────────────────────────────────┤
│  FZMBufferEntry × num_buffers                     │
│  (256 bytes each)                                 │
├───────────────────────────────────────────────────┤
│  Compressed payload                               │
│  Starts at byte offset  header_size               │
│                                                   │
│    segment 0   (FZMBufferEntry[0].data_size bytes)│
│    segment 1   (FZMBufferEntry[1].data_size bytes)│
│    ...                                            │
└───────────────────────────────────────────────────┘

The payload is a flat concatenation of buffer segments, one per FZMBufferEntry. Each entry's byte_offset field gives the segment's start position relative to header_size (i.e. relative to the start of the payload, not the start of the file).

FZMHeaderCore (80 bytes)

Offset	Size	Field	Description
0	4	`magic`	Must equal `0x464D5A32` ("FZM2" LE)
4	2	`version`	`(major << 8) \\| minor`; current = `0x0301`
6	2	`num_buffers`	Number of `FZMBufferEntry` records
8	8	`uncompressed_size`	Total uncompressed input size (bytes)
16	8	`compressed_size`	Total compressed payload size (bytes)
24	8	`header_size`	Byte offset where the compressed payload begins
32	4	`num_stages`	Number of `FZMStageInfo` records
36	2	`num_sources`	Number of pipeline source stages (currently always 1)
38	2	`flags`	Feature flags (see below)
40	32	`source_uncompressed_sizes[4]`	Per-source uncompressed size (8 bytes × 4 slots); only index 0 is used
72	4	`data_checksum`	CRC32 (IEEE 802.3) of payload; 0 if flag not set
76	4	`header_checksum`	CRC32 of full header with this field zeroed; 0 if flag not set

Flags:

Bit	Constant	Meaning
0	`FZM_FLAG_HAS_DATA_CHECKSUM`	`data_checksum` is valid
1	`FZM_FLAG_HAS_HEADER_CHECKSUM`	`header_checksum` is valid

Header checksum computation:

Concatenate the core (80 B) + stage array + buffer array.
Zero the 4 bytes at offset 76 (header_checksum field).
CRC32 the entire buffer.

FZMStageInfo (256 bytes, one per stage)

Describes one stage in the pipeline: its type, serialized configuration, and which DAG buffers feed its inputs and outputs.

Offset	Size	Field	Description
0	2	`stage_type`	`StageType` enum value
2	2	`stage_version`	Stage config format version
4	1	`num_inputs`	Number of input ports
5	1	`num_outputs`	Number of output ports
6	2	`reserved1`	Reserved (zeroed)
8	16	`input_buffer_ids[8]`	DAG buffer ID for each input (2 bytes each); `0xFFFF` = unused
24	16	`output_buffer_ids[8]`	DAG buffer ID for each output (2 bytes each); `0xFFFF` = unused
40	128	`stage_config`	Stage-defined serialized config (see `Stage::serializeHeader()`)
168	4	`config_size`	Valid bytes in `stage_config`
172	84	(reserved)	Reserved (zeroed)

FZMBufferEntry (256 bytes, one per buffer)

Describes one compressed buffer segment: its producer, element type, sizes, and position within the payload.

Offset	Size	Field	Description
0	2	`stage_type`	Producer stage type
2	2	`stage_version`	Producer stage config version
4	1	`data_type`	`DataType` enum value
5	1	`producer_output_idx`	Which output port of the producer stage
6	2	`dag_buffer_id`	DAG routing ID; `0xFFFF` = unassigned
8	64	`name[64]`	Output port name, null-terminated
72	8	`data_size`	Compressed bytes in this segment
80	8	`allocated_size`	Buffer capacity needed for decompression
88	8	`uncompressed_size`	Bytes after fully decompressing this stage's output
96	8	`byte_offset`	Offset of this segment within the payload (relative to `header_size`)
104	128	`stage_config`	Producer stage config (same content as the matching `FZMStageInfo.stage_config`)
232	4	`config_size`	Valid bytes in `stage_config`
236	20	(reserved)	Reserved (14 bytes declared + 6 bytes implicit struct padding)

StageType Values

Value	Constant	Stage
0	`UNKNOWN`	Unknown / unset
1	`LORENZO_QUANT`	`LorenzoQuantStage` — fused float predictor + quantizer; dimensionality stored in config
2	`DIFFERENCE`	`DifferenceStage` — first-order difference coder
3	`SCALE`	`ScaleStage` (test utility)
4	`PASSTHROUGH`	`PassThroughStage` (test utility)
5	`RLE`	`RLEStage` — run-length encoding
6	`HUFFMAN`	`HuffmanStage` — GPU Huffman entropy coding (PHF)
7	`BITPACK`	`BitpackStage` — dense N-bit integer packing
10	`SPLIT`	`SplitStage` (test utility)
11	`MERGE`	`MergeStage` (test utility)
12	`LORENZO`	`LorenzoStage` — plain integer delta predictor; dimensionality stored in config
14	`QUANTIZER`	`QuantizerStage` — direct-value quantizer
15	`ZIGZAG`	`ZigzagStage` — zigzag encode/decode
16	`NEGABINARY`	`NegabinaryStage` — negabinary encode/decode
17	`BITSHUFFLE`	`BitshuffleStage` — GPU bit-matrix transpose
18	`RZE`	`RZEStage` — recursive zero-byte elimination

Rule: never reuse or renumber an existing value — stage type IDs are baked into .fzm files on disk. New stages always take the next unused integer.

DataType Values

Value	Constant	C type
0	`UINT8`	`uint8_t`
1	`UINT16`	`uint16_t`
2	`UINT32`	`uint32_t`
3	`UINT64`	`uint64_t`
4	`INT8`	`int8_t`
5	`INT16`	`int16_t`
6	`INT32`	`int32_t`
7	`INT64`	`int64_t`
8	`FLOAT32`	`float`
9	`FLOAT64`	`double`
255	`UNKNOWN`	byte-transparent (type checking skipped at pipeline finalize)

Reading a File Without the Library

Minimum steps to parse an .fzm file in Python:

import struct, zlib, sys
 
STAGE_TYPES = {
    0: "Unknown", 1: "LorenzoQuant", 2: "Difference", 3: "Scale",
    4: "PassThrough", 5: "RLE", 6: "Huffman", 7: "BitPack",
    10: "Split", 11: "Merge", 12: "Lorenzo", 14: "Quantizer",
    15: "Zigzag", 16: "Negabinary", 17: "Bitshuffle", 18: "RZE",
}
DATA_TYPES = {
    0: "uint8", 1: "uint16", 2: "uint32", 3: "uint64",
    4: "int8", 5: "int16", 6: "int32", 7: "int64",
    8: "float32", 9: "float64", 0xFF: "unknown",
}
 
filename = sys.argv[1] if len(sys.argv) > 1 else "output.fzm"
print(f"Parsing: {filename}\n")
 
with open(filename, "rb") as f:
    # 1. Read and validate the header core
    core = f.read(80)
    magic, version, num_buffers = struct.unpack_from("<IHH", core, 0)
    assert magic == 0x464D5A32, f"Bad magic: 0x{magic:08X}"
    major = (version >> 8) if version > 0xFF else version
    minor = (version & 0xFF) if version > 0xFF else 0
    assert major == 3, f"Unsupported major version {major}"
 
    uncomp_size, comp_size, header_size = struct.unpack_from("<QQQ", core, 8)
    num_stages, num_sources, flags      = struct.unpack_from("<IHH", core, 32)
    data_crc, hdr_crc = struct.unpack_from("<II", core, 72)
 
    print(f"=== Header ===")
    print(f"  magic:            0x{magic:08X} (FZM2)")
    print(f"  version:          {major}.{minor}")
    print(f"  num_stages:       {num_stages}")
    print(f"  num_buffers:      {num_buffers}")
    print(f"  num_sources:      {num_sources}")
    print(f"  uncompressed:     {uncomp_size:,} bytes")
    print(f"  compressed:       {comp_size:,} bytes")
    print(f"  header_size:      {header_size:,} bytes")
    print(f"  flags:            0x{flags:04X}  (data_crc={'yes' if flags&1 else 'no'}, hdr_crc={'yes' if flags&2 else 'no'})")
    if uncomp_size:
        print(f"  compression ratio: {uncomp_size/comp_size:.3f}x")
 
    # 2. Read stage and buffer arrays
    stage_array  = f.read(num_stages  * 256)
    buffer_array = f.read(num_buffers * 256)
 
    # 3. Verify header checksum (v3.1+)
    if flags & 0x0002:
        full_header = bytearray(core) + stage_array + buffer_array
        full_header[76:80] = b'\x00\x00\x00\x00'  # zero header_checksum before computing
        computed = zlib.crc32(full_header) & 0xFFFFFFFF
        assert computed == hdr_crc, f"Header CRC mismatch: computed 0x{computed:08X}, stored 0x{hdr_crc:08X}"
        print(f"\n  header CRC:  PASS (0x{hdr_crc:08X})")
    else:
        print(f"\n  header CRC:  skipped (flag not set)")
 
    # 4. Read the compressed payload
    f.seek(header_size)
    payload = f.read(comp_size)
 
    # 5. Verify data checksum (v3.1+)
    if flags & 0x0001:
        computed = zlib.crc32(payload) & 0xFFFFFFFF
        assert computed == data_crc, f"Payload CRC mismatch: computed 0x{computed:08X}, stored 0x{data_crc:08X}"
        print(f"  payload CRC: PASS (0x{data_crc:08X})")
    else:
        print(f"  payload CRC: skipped (flag not set)")
 
    # 6. Decode stage info
    print(f"\n=== Stages ({num_stages}) ===")
    for i in range(num_stages):
        entry = stage_array[i*256 : (i+1)*256]
        stage_type_id, stage_ver, num_in, num_out = struct.unpack_from("<HHBB", entry, 0)
        # input/output buffer IDs: 8 x uint16 each, starting at offset 8 and 24
        in_ids  = [struct.unpack_from("<H", entry, 8  + j*2)[0] for j in range(num_in)]
        out_ids = [struct.unpack_from("<H", entry, 24 + j*2)[0] for j in range(num_out)]
        in_str  = ", ".join(str(x) for x in in_ids)  if in_ids  else "-"
        out_str = ", ".join(str(x) for x in out_ids) if out_ids else "-"
        sname = STAGE_TYPES.get(stage_type_id, f"type#{stage_type_id}")
        print(f"  stage[{i}]: {sname} v{stage_ver}  inputs=[{in_str}]  outputs=[{out_str}]")
 
    # 7. Decode buffer entries and extract segments
    print(f"\n=== Buffers ({num_buffers}) ===")
    segments = []
    for i in range(num_buffers):
        entry = buffer_array[i*256 : (i+1)*256]
        stage_type_id, stage_ver  = struct.unpack_from("<HH", entry, 0)
        data_type_id, prod_out_idx = struct.unpack_from("<BB", entry, 4)
        dag_buf_id                 = struct.unpack_from("<H",  entry, 6)[0]
        name = entry[8:72].rstrip(b'\x00').decode("utf-8", errors="replace")
 
        data_size        = struct.unpack_from("<Q", entry, 72)[0]
        allocated_size   = struct.unpack_from("<Q", entry, 80)[0]
        uncompressed_size= struct.unpack_from("<Q", entry, 88)[0]
        byte_offset      = struct.unpack_from("<Q", entry, 96)[0]
 
        sname = STAGE_TYPES.get(stage_type_id, f"type#{stage_type_id}")
        dtype = DATA_TYPES.get(data_type_id, f"dtype#{data_type_id}")
 
        print(f"  buf[{i}]: '{name}'  stage={sname}  dtype={dtype}  "
              f"data_size={data_size:,}  uncomp={uncompressed_size:,}  offset={byte_offset:,}")
 
        segment = payload[byte_offset : byte_offset + data_size]
        if len(segment) != data_size:
            print(f"    WARNING: expected {data_size} bytes but got {len(segment)}")
        segments.append(segment)
 
    print(f"\nPayload size: {len(payload):,} bytes")
    print(f"Total segment bytes: {sum(len(s) for s in segments):,}")
    print(f"\nParsed OK.")

This produces output like the following example:

Parsing: output.fzm
 
=== Header ===
  magic:            0x464D5A32 (FZM2)
  version:          3.1
  num_stages:       2
  num_buffers:      4
  num_sources:      1
  uncompressed:     262,144 bytes
  compressed:       250,280 bytes
  header_size:      1,616 bytes
  flags:            0x0003  (data_crc=yes, hdr_crc=yes)
  compression ratio: 1.047x
 
  header CRC:  PASS (0xF02D1EA0)
  payload CRC: PASS (0x5ECF8D8B)
 
=== Stages (2) ===
  stage[0]: LorenzoQuant v1  inputs=[5]  outputs=[0, 1, 2, 3]
  stage[1]: RLE v1  inputs=[0]  outputs=[4]
 
=== Buffers (4) ===
  buf[0]: 'output'  stage=RLE  dtype=uint16  data_size=250,276  uncomp=250,276  offset=0
  buf[1]: 'outlier_errors'  stage=LorenzoQuant  dtype=float32  data_size=0  uncomp=0  offset=250,276
  buf[2]: 'outlier_indices'  stage=LorenzoQuant  dtype=uint32  data_size=0  uncomp=0  offset=250,276
  buf[3]: 'outlier_count'  stage=LorenzoQuant  dtype=uint32  data_size=4  uncomp=4  offset=250,276
 
Payload size: 250,280 bytes
Total segment bytes: 250,280
 
Parsed OK.

Versioning Rules

Major bump — incompatible layout change to any header struct. Old files are rejected at read time.
Minor bump — additive change only (new fields in reserved space, new flag bits). Old files load with new fields defaulting to 0; new files load on old readers with a warning.
stage_version — per-stage config version managed by each stage independently. The library does not interpret it; stages read it in Stage::deserializeHeader() to handle their own config migration.

Library version	FZM versions supported
`v1.x`	v3.0, v3.1
`v2.x` (current)	v3.1 (reads v3.0 with warning)