|
FZGPUModules 2.0
GPU-accelerated modular compression pipelines
|
.fzm files store a compressed payload together with everything needed to decompress it: the full pipeline stage graph, per-stage configuration, and buffer layout. You can decompress an .fzm file using only this specification and the element data types — no pipeline source code required.
All multi-byte integers are little-endian. The library enforces a little-endian host at configure time.
| Version | FZMHeaderCore size | Changes |
|---|---|---|
| v3.0 | 72 bytes | Initial versioned format |
| v3.1 | 80 bytes | Added flags, data_checksum, header_checksum fields |
Version field (2 bytes): high_byte = major, low_byte = minor. A major mismatch causes the read to throw. A minor mismatch emits a warning and continues. Legacy files stored a plain integer (e.g. 3); these are read as major=3, minor=0.
The payload is a flat concatenation of buffer segments, one per FZMBufferEntry. Each entry's byte_offset field gives the segment's start position relative to header_size (i.e. relative to the start of the payload, not the start of the file).
| Offset | Size | Field | Description |
|---|---|---|---|
| 0 | 4 | magic | Must equal 0x464D5A32 ("FZM2" LE) |
| 4 | 2 | version | (major << 8) \| minor; current = 0x0301 |
| 6 | 2 | num_buffers | Number of FZMBufferEntry records |
| 8 | 8 | uncompressed_size | Total uncompressed input size (bytes) |
| 16 | 8 | compressed_size | Total compressed payload size (bytes) |
| 24 | 8 | header_size | Byte offset where the compressed payload begins |
| 32 | 4 | num_stages | Number of FZMStageInfo records |
| 36 | 2 | num_sources | Number of pipeline source stages (currently always 1) |
| 38 | 2 | flags | Feature flags (see below) |
| 40 | 32 | source_uncompressed_sizes[4] | Per-source uncompressed size (8 bytes × 4 slots); only index 0 is used |
| 72 | 4 | data_checksum | CRC32 (IEEE 802.3) of payload; 0 if flag not set |
| 76 | 4 | header_checksum | CRC32 of full header with this field zeroed; 0 if flag not set |
Flags:
| Bit | Constant | Meaning |
|---|---|---|
| 0 | FZM_FLAG_HAS_DATA_CHECKSUM | data_checksum is valid |
| 1 | FZM_FLAG_HAS_HEADER_CHECKSUM | header_checksum is valid |
Header checksum computation:
header_checksum field).Describes one stage in the pipeline: its type, serialized configuration, and which DAG buffers feed its inputs and outputs.
| Offset | Size | Field | Description |
|---|---|---|---|
| 0 | 2 | stage_type | StageType enum value |
| 2 | 2 | stage_version | Stage config format version |
| 4 | 1 | num_inputs | Number of input ports |
| 5 | 1 | num_outputs | Number of output ports |
| 6 | 2 | reserved1 | Reserved (zeroed) |
| 8 | 16 | input_buffer_ids[8] | DAG buffer ID for each input (2 bytes each); 0xFFFF = unused |
| 24 | 16 | output_buffer_ids[8] | DAG buffer ID for each output (2 bytes each); 0xFFFF = unused |
| 40 | 128 | stage_config | Stage-defined serialized config (see Stage::serializeHeader()) |
| 168 | 4 | config_size | Valid bytes in stage_config |
| 172 | 84 | *(reserved)* | Reserved (zeroed) |
Describes one compressed buffer segment: its producer, element type, sizes, and position within the payload.
| Offset | Size | Field | Description |
|---|---|---|---|
| 0 | 2 | stage_type | Producer stage type |
| 2 | 2 | stage_version | Producer stage config version |
| 4 | 1 | data_type | DataType enum value |
| 5 | 1 | producer_output_idx | Which output port of the producer stage |
| 6 | 2 | dag_buffer_id | DAG routing ID; 0xFFFF = unassigned |
| 8 | 64 | name[64] | Output port name, null-terminated |
| 72 | 8 | data_size | Compressed bytes in this segment |
| 80 | 8 | allocated_size | Buffer capacity needed for decompression |
| 88 | 8 | uncompressed_size | Bytes after fully decompressing this stage's output |
| 96 | 8 | byte_offset | Offset of this segment within the payload (relative to header_size) |
| 104 | 128 | stage_config | Producer stage config (same content as the matching FZMStageInfo.stage_config) |
| 232 | 4 | config_size | Valid bytes in stage_config |
| 236 | 20 | *(reserved)* | Reserved (14 bytes declared + 6 bytes implicit struct padding) |
| Value | Constant | Stage |
|---|---|---|
| 0 | UNKNOWN | Unknown / unset |
| 1 | LORENZO_QUANT | LorenzoQuantStage — fused float predictor + quantizer; dimensionality stored in config |
| 2 | DIFFERENCE | DifferenceStage — first-order difference coder |
| 3 | SCALE | ScaleStage (test utility) |
| 4 | PASSTHROUGH | PassThroughStage (test utility) |
| 5 | RLE | RLEStage — run-length encoding |
| 6 | HUFFMAN | Reserved for future use |
| 7 | BITPACK | BitpackStage — dense N-bit integer packing |
| 10 | SPLIT | SplitStage (test utility) |
| 11 | MERGE | MergeStage (test utility) |
| 12 | LORENZO | LorenzoStage — plain integer delta predictor; dimensionality stored in config |
| 14 | QUANTIZER | QuantizerStage — direct-value quantizer |
| 15 | ZIGZAG | ZigzagStage — zigzag encode/decode |
| 16 | NEGABINARY | NegabinaryStage — negabinary encode/decode |
| 17 | BITSHUFFLE | BitshuffleStage — GPU bit-matrix transpose |
| 18 | RZE | RZEStage — recursive zero-byte elimination |
Rule: never reuse or renumber an existing value — stage type IDs are baked into .fzm files on disk. New stages always take the next unused integer.
| Value | Constant | C type |
|---|---|---|
| 0 | UINT8 | uint8_t |
| 1 | UINT16 | uint16_t |
| 2 | UINT32 | uint32_t |
| 3 | UINT64 | uint64_t |
| 4 | INT8 | int8_t |
| 5 | INT16 | int16_t |
| 6 | INT32 | int32_t |
| 7 | INT64 | int64_t |
| 8 | FLOAT32 | float |
| 9 | FLOAT64 | double |
| 255 | UNKNOWN | byte-transparent (type checking skipped at pipeline finalize) |
Minimum steps to parse an .fzm file in Python:
This produces output like the following example:
Stage::deserializeHeader() to handle their own config migration.| Library version | FZM versions supported |
|---|---|
v1.x | v3.0, v3.1 |
v2.x (current) | v3.1 (reads v3.0 with warning) |