|
FZGPUModules 2.0
GPU-accelerated modular compression pipelines
|
Header: modules/shufflers/bitshuffle/bitshuffle_stage.h
Class: fz::BitshuffleStage — no template parameters
Category: Transform / shuffler (lossless)
Common instantiation:
GPU bit-matrix transpose over fixed-size chunks. Given a chunk of N elements each W bits wide, the forward pass groups all N values' bit-plane k together, producing W bit-planes of N bits each. This concentrates sign bits and exponent bits into contiguous regions, dramatically improving the compressibility of floating-point or integer data for downstream byte-oriented coders like RZEStage.
Output is the same byte size as input (size-preserving transform).
Constraints:
block_size must be a positive multiple of 1024 × element_width. The default of 16384 satisfies this for all supported element widths.element_width must be 1, 2, 4, or 8. Both are enforced at execute() time.BitshuffleStage requires its input to be a multiple of block_size bytes. The pipeline pads automatically when connected to a chunked upstream stage (DifferenceStage with matching chunk_size, or RZEStage).
Set element_width to match the element type flowing in from upstream:
LorenzoQuantStage<float, uint16_t> → setElementWidth(2)QuantizerStage<float, uint32_t> → setElementWidth(4)