|
FZGPUModules 2.0
GPU-accelerated modular compression pipelines
|
#include <diff.h>
Inheritance diagram for fz::DifferenceStage< T, TOut >:Public Member Functions | |
| void | setInverse (bool inverse) override |
| void | setChunkSize (size_t bytes) |
| size_t | getRequiredInputAlignment () const override |
| void | execute (cudaStream_t stream, MemoryPool *pool, const std::vector< void * > &inputs, const std::vector< void * > &outputs, const std::vector< size_t > &sizes) override |
| std::string | getName () const override |
| std::vector< size_t > | estimateOutputSizes (const std::vector< size_t > &input_sizes) const override |
| std::unordered_map< std::string, size_t > | getActualOutputSizesByName () const override |
| size_t | getActualOutputSize (int index) const override |
| uint16_t | getStageTypeId () const override |
| uint8_t | getOutputDataType (size_t output_index) const override |
| uint8_t | getInputDataType (size_t) const override |
| size_t | serializeHeader (size_t output_index, uint8_t *buf, size_t max_size) const override |
| void | deserializeHeader (const uint8_t *buf, size_t size) override |
| size_t | getMaxHeaderSize (size_t output_index) const override |
| void | saveState () override |
Public Member Functions inherited from fz::Stage | |
| virtual std::vector< std::string > | getOutputNames () const |
| int | getOutputIndex (const std::string &name) const |
| virtual void | setDims (const std::array< size_t, 3 > &dims) |
| virtual void | onFinalize (size_t, MemoryPool *) |
| virtual size_t | estimateDeviceFootprintBytes (size_t) const |
| virtual size_t | estimatePinnedFootprintBytes (size_t) const |
| virtual void | postStreamSync (cudaStream_t stream) |
| virtual bool | isGraphCompatible () const |
| virtual size_t | estimateScratchBytes (const std::vector< size_t > &input_sizes) const |
Difference coding stage.
d_DIFFNB algorithm from the LC/PFPL framework (Burtscher et al., BSD-3-Clause). See THIRD_PARTY.md.Forward (compression): first-order differences with optional negabinary output output[0] = input[0] output[i] = input[i] - input[i-1] (when TOut == T) output[i] = Negabinary<T>::encode(diff) (when TOut != T)
Inverse (decompression): cumulative sum with optional negabinary decode first
Optional chunking (setChunkSize > 0): The difference/cumsum resets at each chunk boundary. Each chunk is a fully independent context — the first element of each chunk is stored as-is (previous = 0 implied). This enables parallel decompression and is required for the PFPL pipeline where 16 KB chunks flow independently through Bitshuffle and RZE.
Template parameters: T — input element type (signed/unsigned integer or float). TOut — output element type (defaults to T). When TOut != T: T must be a signed integer and TOut its unsigned counterpart of the same width; negabinary encoding is fused at the final write of the forward kernel and the decode is the first step of the inverse kernel.
Serialized header layout (6 bytes): [0] DataType T (1 byte) [1] DataType TOut (1 byte) [2..5] chunk_size (uint32_t, little-endian, 0 = no chunking)
|
inlineoverridevirtual |
Switch between forward (compression) and inverse (decompression) mode. Affects getNumInputs()/getNumOutputs() for stages with asymmetric port counts.
Reimplemented from fz::Stage.
|
inline |
Set chunk size in bytes (default 0 = no chunking).
When > 0, differences and cumulative sums reset at each chunk boundary. Must be a positive multiple of sizeof(T). Pass 0 to disable chunking and process the whole array as a single context (the previous default).
|
inlineoverridevirtual |
Minimum input size alignment in bytes. Chunked stages return their chunk size; the pipeline uses the LCM of all stage alignments at finalize() to transparently zero-pad the input. Default: 1 (no alignment requirement).
Reimplemented from fz::Stage.
|
overridevirtual |
Execute the stage. Inputs, outputs, and sizes are device pointers/bytes.
Stages may call cudaStreamSynchronize(stream) or issue blocking D2H copies when the algorithm requires it (e.g. Huffman histogram readback for codebook construction, ANS renormalization tables). Such stages must return false from isGraphCompatible() and must document the sync points.
Note: the DAG dispatches sibling nodes (same topological level) via a sequential CPU loop, each enqueuing to its own stream. A sync inside execute() blocks the CPU from dispatching subsequent siblings until the synced stream is idle — this delays parallel branches in wide DAGs. In a linear pipeline there are no siblings and no extra cost.
Implements fz::Stage.
|
inlineoverridevirtual |
Human-readable name used in error messages and debug output.
Implements fz::Stage.
|
inlineoverridevirtual |
Estimate output buffer sizes given input sizes. Used for buffer allocation planning in PREALLOCATE mode — must be a safe upper bound; under-estimation causes buffer overruns.
Implements fz::Stage.
|
inlineoverridevirtual |
|
inlineoverridevirtual |
Actual size of a single output by index after execute(). Avoids constructing the map for the common single-output case. Default delegates to getActualOutputSizesByName(); override to return directly from an internal field.
Reimplemented from fz::Stage.
|
inlineoverridevirtual |
|
inlineoverridevirtual |
DataType enum of the given output port.
Implements fz::Stage.
|
inlineoverridevirtual |
Expected DataType of the given input port.
Used by Pipeline::finalize() to detect type mismatches between connected stages before any execution. Return DataType::UNKNOWN to opt out of checking — byte-transparent stages (Bitshuffle, RZE) and mock stages must return UNKNOWN; finalize() skips any connection where either side is UNKNOWN.
Reimplemented from fz::Stage.
|
inlineoverridevirtual |
Serialize stage config into header_buffer (max 128 bytes) for the FZM file. Return the number of bytes written, or 0 if the stage has no config.
Reimplemented from fz::Stage.
|
inlineoverridevirtual |
Restore stage config from header_buffer during decompression.
Reimplemented from fz::Stage.
|
inlineoverridevirtual |
Maximum bytes this stage writes into its per-output FZM header slot.
Reimplemented from fz::Stage.
|
inlineoverridevirtual |
Save/restore config state around a decompression pass. deserializeHeader() overwrites the stage's forward-pass config; saveState() is called before and restoreState() after so the stage returns to its original configuration.
Reimplemented from fz::Stage.