|
FZGPUModules 1.0
GPU-accelerated modular compression pipeline
|
#include <rze_stage.h>
Inheritance diagram for fz::RZEStage:Public Member Functions | |
| void | setInverse (bool inv) override |
| bool | isGraphCompatible () const override |
| size_t | getRequiredInputAlignment () const override |
| void | execute (cudaStream_t stream, MemoryPool *pool, const std::vector< void * > &inputs, const std::vector< void * > &outputs, const std::vector< size_t > &sizes) override |
| void | postStreamSync (cudaStream_t stream) override |
| std::string | getName () const override |
| std::vector< size_t > | estimateOutputSizes (const std::vector< size_t > &input_sizes) const override |
| std::unordered_map< std::string, size_t > | getActualOutputSizesByName () const override |
| size_t | getActualOutputSize (int index) const override |
| size_t | estimateScratchBytes (const std::vector< size_t > &input_sizes) const override |
| uint16_t | getStageTypeId () const override |
| uint8_t | getOutputDataType (size_t) const override |
| size_t | serializeHeader (size_t output_index, uint8_t *buf, size_t max_size) const override |
| void | deserializeHeader (const uint8_t *buf, size_t size) override |
| size_t | getMaxHeaderSize (size_t) const override |
| void | saveState () override |
Public Member Functions inherited from fz::Stage | |
| virtual std::vector< std::string > | getOutputNames () const |
| int | getOutputIndex (const std::string &name) const |
| virtual uint8_t | getInputDataType (size_t) const |
| virtual void | setDims (const std::array< size_t, 3 > &dims) |
Recursive Zero-byte Elimination stage.
setChunkSize(bytes) — chunk size (default 16384; must be a multiple of 4096). setLevels(n) — recursion depth 1–4 (default 4).
|
inlineoverridevirtual |
Switch between forward (compression) and inverse (decompression) mode. Affects getNumInputs()/getNumOutputs() for stages with asymmetric port counts.
Reimplemented from fz::Stage.
|
inlineoverridevirtual |
CUDA Graph capture is supported for compression (forward pass) only.
The inverse path reads the stream header (orig_bytes, per-chunk sizes) with two blocking D2H cudaMemcpy calls before it can compute per-chunk decode offsets and launch the decode kernel. These calls prevent the inverse path from being recorded into a CUDA Graph.
This is intentional by design, not a fixable limitation: graph-compatible decompression would only help a "repeatedly decompress the same compressed buffer" workflow, which has no practical use case. The compression path (new data every iteration) is where graph capture provides real value.
Reimplemented from fz::Stage.
|
inlineoverridevirtual |
Minimum input size alignment in bytes. Chunked stages return their chunk size; the pipeline uses the LCM of all stage alignments at finalize() to transparently zero-pad the input. Default: 1 (no alignment requirement).
Reimplemented from fz::Stage.
|
overridevirtual |
Execute the stage. Inputs, outputs, and sizes are device pointers/bytes.
Implements fz::Stage.
|
overridevirtual |
|
inlineoverridevirtual |
Human-readable name used in error messages and debug output.
Implements fz::Stage.
|
inlineoverridevirtual |
Estimate output buffer sizes given input sizes. Used for buffer allocation planning in PREALLOCATE mode — must be a safe upper bound; under-estimation causes buffer overruns.
Implements fz::Stage.
|
overridevirtual |
|
overridevirtual |
Actual size of a single output by index after execute(). Avoids constructing the map for the common single-output case. Default delegates to getActualOutputSizesByName(); override to return directly from an internal field.
Reimplemented from fz::Stage.
|
inlineoverridevirtual |
Forward pass allocates four persistent pool arrays proportional to n_chunks = ceil(input_bytes / chunk_size_): d_scratch_ : n_chunks * chunk_size_ (per-chunk worst-case output) d_sizes_dev_ : n_chunks * 4 (raw compressed sizes) d_clean_dev_ : n_chunks * 4 (flag-stripped sizes) d_dst_off_dev_: n_chunks * 4 (exclusive prefix-sum offsets)
Inverse path scratch is transient (allocated and freed within execute), so it is not reported here.
Reimplemented from fz::Stage.
|
inlineoverridevirtual |
|
inlineoverridevirtual |
DataType enum of the given output port.
Implements fz::Stage.
|
inlineoverridevirtual |
Serialize stage config into header_buffer (max 128 bytes) for the FZM file. Return the number of bytes written, or 0 if the stage has no config.
Reimplemented from fz::Stage.
|
inlineoverridevirtual |
Restore stage config from header_buffer during decompression.
Reimplemented from fz::Stage.
|
inlineoverridevirtual |
Maximum bytes this stage writes into its per-output FZM header slot.
Reimplemented from fz::Stage.
|
inlineoverridevirtual |
Save/restore config state around a decompression pass. deserializeHeader() overwrites the stage's forward-pass config; saveState() is called before and restoreState() after so the stage returns to its original configuration.
Reimplemented from fz::Stage.