|
FZGPUModules 1.0
GPU-accelerated modular compression pipeline
|
#include <stage.h>
Inheritance diagram for fz::Stage:Public Member Functions | |
| virtual void | execute (cudaStream_t stream, MemoryPool *pool, const std::vector< void * > &inputs, const std::vector< void * > &outputs, const std::vector< size_t > &sizes)=0 |
| virtual std::string | getName () const =0 |
| virtual size_t | getRequiredInputAlignment () const |
| virtual std::vector< std::string > | getOutputNames () const |
| int | getOutputIndex (const std::string &name) const |
| virtual std::vector< size_t > | estimateOutputSizes (const std::vector< size_t > &input_sizes) const =0 |
| virtual std::unordered_map< std::string, size_t > | getActualOutputSizesByName () const =0 |
| virtual size_t | getActualOutputSize (int index) const |
| virtual void | setInverse (bool inverse) |
| virtual uint16_t | getStageTypeId () const =0 |
| virtual uint8_t | getOutputDataType (size_t output_index) const =0 |
| virtual uint8_t | getInputDataType (size_t) const |
| virtual size_t | serializeHeader (size_t output_index, uint8_t *header_buffer, size_t max_size) const |
| virtual void | deserializeHeader (const uint8_t *header_buffer, size_t size) |
| virtual void | saveState () |
| virtual void | setDims (const std::array< size_t, 3 > &dims) |
| virtual void | postStreamSync (cudaStream_t stream) |
| virtual size_t | getMaxHeaderSize (size_t output_index) const |
| virtual bool | isGraphCompatible () const |
| virtual size_t | estimateScratchBytes (const std::vector< size_t > &input_sizes) const |
Base class for all compression/decompression stages.
A stage is a single transformation in the pipeline (e.g. Lorenzo predictor, RLE encoder, bitshuffle). The pipeline interacts with stages exclusively through this interface — no downcasting or type-name branching anywhere in the pipeline or DAG code.
|
pure virtual |
Execute the stage. Inputs, outputs, and sizes are device pointers/bytes.
Implemented in fz::DifferenceStage< T, TOut >, fz::RLEStage< T >, fz::LorenzoStage< TInput, TCode >, fz::QuantizerStage< TInput, TCode >, fz::BitshuffleStage, fz::NegabinaryStage< TIn, TOut >, fz::RZEStage, and fz::ZigzagStage< TIn, TOut >.
|
pure virtual |
Human-readable name used in error messages and debug output.
Implemented in fz::DifferenceStage< T, TOut >, fz::RLEStage< T >, fz::LorenzoStage< TInput, TCode >, fz::QuantizerStage< TInput, TCode >, fz::BitshuffleStage, fz::NegabinaryStage< TIn, TOut >, fz::RZEStage, and fz::ZigzagStage< TIn, TOut >.
|
inlinevirtual |
Minimum input size alignment in bytes. Chunked stages return their chunk size; the pipeline uses the LCM of all stage alignments at finalize() to transparently zero-pad the input. Default: 1 (no alignment requirement).
Reimplemented in fz::DifferenceStage< T, TOut >, fz::BitshuffleStage, and fz::RZEStage.
|
inlinevirtual |
Output port names in order. Default: single port named "output". Multi-output stages (e.g. Lorenzo: "codes", "outliers") override this.
Reimplemented in fz::LorenzoStage< TInput, TCode >, and fz::QuantizerStage< TInput, TCode >.
|
inline |
Returns the index of a named output port, or -1 if not found.
|
pure virtual |
Estimate output buffer sizes given input sizes. Used for buffer allocation planning in PREALLOCATE mode — must be a safe upper bound; under-estimation causes buffer overruns.
Implemented in fz::DifferenceStage< T, TOut >, fz::RLEStage< T >, fz::LorenzoStage< TInput, TCode >, fz::QuantizerStage< TInput, TCode >, fz::BitshuffleStage, fz::NegabinaryStage< TIn, TOut >, fz::RZEStage, and fz::ZigzagStage< TIn, TOut >.
|
pure virtual |
Actual output sizes after execute(), keyed by output port name.
Implemented in fz::DifferenceStage< T, TOut >, fz::RLEStage< T >, fz::LorenzoStage< TInput, TCode >, fz::QuantizerStage< TInput, TCode >, fz::BitshuffleStage, fz::NegabinaryStage< TIn, TOut >, fz::RZEStage, and fz::ZigzagStage< TIn, TOut >.
|
inlinevirtual |
Actual size of a single output by index after execute(). Avoids constructing the map for the common single-output case. Default delegates to getActualOutputSizesByName(); override to return directly from an internal field.
Reimplemented in fz::DifferenceStage< T, TOut >, fz::RLEStage< T >, fz::LorenzoStage< TInput, TCode >, fz::QuantizerStage< TInput, TCode >, fz::BitshuffleStage, fz::NegabinaryStage< TIn, TOut >, fz::RZEStage, and fz::ZigzagStage< TIn, TOut >.
|
inlinevirtual |
Switch between forward (compression) and inverse (decompression) mode. Affects getNumInputs()/getNumOutputs() for stages with asymmetric port counts.
Reimplemented in fz::BitshuffleStage, fz::NegabinaryStage< TIn, TOut >, fz::RZEStage, fz::ZigzagStage< TIn, TOut >, fz::LorenzoStage< TInput, TCode >, fz::DifferenceStage< T, TOut >, fz::RLEStage< T >, and fz::QuantizerStage< TInput, TCode >.
|
pure virtual |
Stage type identifier written into the FZM file header.
Implemented in fz::DifferenceStage< T, TOut >, fz::RLEStage< T >, fz::LorenzoStage< TInput, TCode >, fz::QuantizerStage< TInput, TCode >, fz::BitshuffleStage, fz::NegabinaryStage< TIn, TOut >, fz::RZEStage, and fz::ZigzagStage< TIn, TOut >.
|
pure virtual |
DataType enum of the given output port.
Implemented in fz::DifferenceStage< T, TOut >, fz::RLEStage< T >, fz::LorenzoStage< TInput, TCode >, fz::QuantizerStage< TInput, TCode >, fz::ZigzagStage< TIn, TOut >, fz::BitshuffleStage, fz::NegabinaryStage< TIn, TOut >, and fz::RZEStage.
|
inlinevirtual |
Expected DataType of the given input port.
Used by Pipeline::finalize() to detect type mismatches between connected stages before any execution. Return DataType::UNKNOWN to opt out of checking — byte-transparent stages (Bitshuffle, RZE) and mock stages must return UNKNOWN; finalize() skips any connection where either side is UNKNOWN.
Reimplemented in fz::DifferenceStage< T, TOut >, fz::RLEStage< T >, fz::LorenzoStage< TInput, TCode >, fz::QuantizerStage< TInput, TCode >, fz::NegabinaryStage< TIn, TOut >, and fz::ZigzagStage< TIn, TOut >.
|
inlinevirtual |
Serialize stage config into header_buffer (max 128 bytes) for the FZM file. Return the number of bytes written, or 0 if the stage has no config.
Reimplemented in fz::DifferenceStage< T, TOut >, fz::QuantizerStage< TInput, TCode >, fz::BitshuffleStage, fz::RZEStage, fz::ZigzagStage< TIn, TOut >, fz::RLEStage< T >, fz::LorenzoStage< TInput, TCode >, and fz::NegabinaryStage< TIn, TOut >.
|
inlinevirtual |
Restore stage config from header_buffer during decompression.
Reimplemented in fz::NegabinaryStage< TIn, TOut >, fz::DifferenceStage< T, TOut >, fz::QuantizerStage< TInput, TCode >, fz::BitshuffleStage, fz::RZEStage, fz::ZigzagStage< TIn, TOut >, fz::RLEStage< T >, and fz::LorenzoStage< TInput, TCode >.
|
inlinevirtual |
Save/restore config state around a decompression pass. deserializeHeader() overwrites the stage's forward-pass config; saveState() is called before and restoreState() after so the stage returns to its original configuration.
Reimplemented in fz::DifferenceStage< T, TOut >, fz::LorenzoStage< TInput, TCode >, fz::QuantizerStage< TInput, TCode >, fz::BitshuffleStage, and fz::RZEStage.
|
inlinevirtual |
Called once by Pipeline::finalize() so stages can react to the dataset dimensions set via Pipeline::setDims() after construction.
| dims | {x, y, z} extents (z==1 → 2-D; y==z==1 → 1-D) |
Reimplemented in fz::LorenzoStage< TInput, TCode >.
|
inlinevirtual |
Called after dag->execute() and stream sync, before compress() returns. Use for D2H transfers that must not block mid-pipeline (e.g. Lorenzo's outlier count readback). The stream is already idle so a plain cudaMemcpy is safe here.
Reimplemented in fz::RLEStage< T >, fz::LorenzoStage< TInput, TCode >, fz::QuantizerStage< TInput, TCode >, and fz::RZEStage.
|
inlinevirtual |
Maximum bytes this stage writes into its per-output FZM header slot.
Reimplemented in fz::DifferenceStage< T, TOut >, fz::RLEStage< T >, fz::LorenzoStage< TInput, TCode >, fz::QuantizerStage< TInput, TCode >, fz::BitshuffleStage, fz::NegabinaryStage< TIn, TOut >, fz::RZEStage, and fz::ZigzagStage< TIn, TOut >.
|
inlinevirtual |
Whether this stage is safe inside a CUDA Graph capture.
A stage is graph-compatible if execute() enqueues only device-side work (kernel launches, cudaMemcpyAsync D2D/H2D) and makes no host-synchronous calls. Override and return false if execute() contains D2H copies or dynamic decisions based on device data — the DAG will throw at setCaptureMode(true) time rather than producing a broken graph.
Default: true. Inverse-mode stages that do D2H reads (e.g. RZE inverse) must return false.
Reimplemented in fz::RZEStage.
|
inlinevirtual |
Peak persistent scratch bytes this stage holds in the MemoryPool.
Only count allocations that are drawn from the pool and kept alive across execute() calls. Transient scratch freed within execute() is already captured by the pool's high-water mark and must not be included. Used by CompressionDAG::computeTopoPoolSize() to size the release threshold.
Reimplemented in fz::RLEStage< T >, and fz::RZEStage.