|
FZGPUModules 2.0
GPU-accelerated modular compression pipelines
|
#include <bitpack_stage.h>
Inheritance diagram for fz::BitpackStage< T >:Public Member Functions | |
| void | setInverse (bool inv) override |
| void | setNBits (uint8_t nbits) |
| void | setAutoDetect (bool enable) |
| void | execute (cudaStream_t stream, MemoryPool *pool, const std::vector< void * > &inputs, const std::vector< void * > &outputs, const std::vector< size_t > &sizes) override |
| std::string | getName () const override |
| std::vector< size_t > | estimateOutputSizes (const std::vector< size_t > &input_sizes) const override |
| std::unordered_map< std::string, size_t > | getActualOutputSizesByName () const override |
| size_t | getActualOutputSize (int index) const override |
| uint16_t | getStageTypeId () const override |
| uint8_t | getOutputDataType (size_t) const override |
| uint8_t | getInputDataType (size_t) const override |
| size_t | serializeHeader (size_t, uint8_t *buf, size_t max_size) const override |
| void | deserializeHeader (const uint8_t *buf, size_t size) override |
| size_t | getMaxHeaderSize (size_t) const override |
| void | saveState () override |
| bool | isGraphCompatible () const override |
Public Member Functions inherited from fz::Stage | |
| virtual size_t | getRequiredInputAlignment () const |
| virtual std::vector< std::string > | getOutputNames () const |
| int | getOutputIndex (const std::string &name) const |
| virtual void | setDims (const std::array< size_t, 3 > &dims) |
| virtual void | postStreamSync (cudaStream_t stream) |
| virtual size_t | estimateScratchBytes (const std::vector< size_t > &input_sizes) const |
Bit-packing stage.
Forward: T[] → uint8_t[] Pack each element using only the low nbits bits. Inverse: uint8_t[] → T[] Unpack elements, zero-extending to full width.
| T | Input element type: uint8_t, uint16_t, or uint32_t. |
|
inlineoverridevirtual |
Switch between forward (compression) and inverse (decompression) mode. Affects getNumInputs()/getNumOutputs() for stages with asymmetric port counts.
Reimplemented from fz::Stage.
|
inline |
Set the number of bits per element.
Must be a power of two between 1 and 8*sizeof(T) inclusive. Allowed values: uint8_t : 1, 2, 4, 8 uint16_t : 1, 2, 4, 8, 16 uint32_t : 1, 2, 4, 8, 16, 32
Ignored during forward execute when setAutoDetect(true) is active.
|
inline |
Enable automatic bit-width detection.
When true, forward execute scans the input for its maximum value and selects the smallest valid power-of-two nbits that covers it. The chosen nbits is stored in the serialized header so the inverse pass can unpack correctly.
After compress(), getNBits() reflects the detected value.
Incompatible with CUDA Graph capture: isGraphCompatible() returns false while auto-detect is enabled.
|
overridevirtual |
Execute the stage. Inputs, outputs, and sizes are device pointers/bytes.
Implements fz::Stage.
|
inlineoverridevirtual |
Human-readable name used in error messages and debug output.
Implements fz::Stage.
|
inlineoverridevirtual |
Estimate output buffer sizes given input sizes. Used for buffer allocation planning in PREALLOCATE mode — must be a safe upper bound; under-estimation causes buffer overruns.
Implements fz::Stage.
|
inlineoverridevirtual |
|
inlineoverridevirtual |
Actual size of a single output by index after execute(). Avoids constructing the map for the common single-output case. Default delegates to getActualOutputSizesByName(); override to return directly from an internal field.
Reimplemented from fz::Stage.
|
inlineoverridevirtual |
|
inlineoverridevirtual |
DataType enum of the given output port.
Implements fz::Stage.
|
inlineoverridevirtual |
Expected DataType of the given input port.
Used by Pipeline::finalize() to detect type mismatches between connected stages before any execution. Return DataType::UNKNOWN to opt out of checking — byte-transparent stages (Bitshuffle, RZE) and mock stages must return UNKNOWN; finalize() skips any connection where either side is UNKNOWN.
Reimplemented from fz::Stage.
|
inlineoverridevirtual |
Serialize stage config into header_buffer (max 128 bytes) for the FZM file. Return the number of bytes written, or 0 if the stage has no config.
Reimplemented from fz::Stage.
|
inlineoverridevirtual |
Restore stage config from header_buffer during decompression.
Reimplemented from fz::Stage.
|
inlineoverridevirtual |
Maximum bytes this stage writes into its per-output FZM header slot.
Reimplemented from fz::Stage.
|
inlineoverridevirtual |
Save/restore config state around a decompression pass. deserializeHeader() overwrites the stage's forward-pass config; saveState() is called before and restoreState() after so the stage returns to its original configuration.
Reimplemented from fz::Stage.
|
inlineoverridevirtual |
Whether this stage is safe inside a CUDA Graph capture.
A stage is graph-compatible if execute() enqueues only device-side work (kernel launches, cudaMemcpyAsync D2D/H2D) and makes no host-synchronous calls. Override and return false if execute() contains D2H copies or dynamic decisions based on device data — the DAG will throw at setCaptureMode(true) time rather than producing a broken graph.
Default: true. Inverse-mode stages that do D2H reads (e.g. RZE inverse) must return false.
Reimplemented from fz::Stage.