FZGPUModules 2.0
GPU-accelerated modular compression pipelines
Loading...
Searching...
No Matches
fz::QuantizerStage< TInput, TCode > Class Template Reference

#include <quantizer.h>

+ Inheritance diagram for fz::QuantizerStage< TInput, TCode >:

Classes

struct  Config
 

Public Member Functions

void execute (cudaStream_t stream, MemoryPool *pool, const std::vector< void * > &inputs, const std::vector< void * > &outputs, const std::vector< size_t > &sizes) override
 
void postStreamSync (cudaStream_t stream) override
 
std::string getName () const override
 
std::vector< std::string > getOutputNames () const override
 
std::vector< size_t > estimateOutputSizes (const std::vector< size_t > &input_sizes) const override
 
std::unordered_map< std::string, size_t > getActualOutputSizesByName () const override
 
size_t getActualOutputSize (int index) const override
 
void setInverse (bool inverse) override
 
uint16_t getStageTypeId () const override
 
uint8_t getOutputDataType (size_t output_index) const override
 
uint8_t getInputDataType (size_t) const override
 
size_t serializeHeader (size_t output_index, uint8_t *buf, size_t max_size) const override
 
size_t getMaxHeaderSize (size_t) const override
 
void deserializeHeader (const uint8_t *buf, size_t size) override
 
void saveState () override
 
void setOutlierThreshold (float t)
 ABS/NOA: |x| >= threshold → lossless outlier regardless of bin (LC reference parameter).
 
void setInplaceOutliers (bool enable)
 ABS/NOA: encode outliers in-place (raw float bits in codes array; no scatter buffers).
 
- Public Member Functions inherited from fz::Stage
virtual size_t getRequiredInputAlignment () const
 
int getOutputIndex (const std::string &name) const
 
virtual void setDims (const std::array< size_t, 3 > &dims)
 
virtual void onFinalize (size_t, MemoryPool *)
 
virtual size_t estimateDeviceFootprintBytes (size_t) const
 
virtual size_t estimatePinnedFootprintBytes (size_t) const
 
virtual bool isGraphCompatible () const
 
virtual size_t estimateScratchBytes (const std::vector< size_t > &input_sizes) const
 

Detailed Description

template<typename TInput = float, typename TCode = uint16_t>
class fz::QuantizerStage< TInput, TCode >

Direct-value quantizer with error-bounded coding and lossless outlier fallback.

Note
Prior work: ABS/NOA/REL quantization scheme, outlier handling, and log-space REL encoding follow the LC/PFPL framework (Burtscher et al., BSD-3-Clause). See THIRD_PARTY.md.

Unlike LorenzoQuantStage (which quantizes prediction differences), this stage quantizes the input values directly. It supports all three error-bound modes:

ABS — absolute error bound: |x - x_hat| <= eb Uniform quantization with step = 2*eb. Works with any TCode type.

NOA — norm-of-absolute (PFPL): abs_eb = eb * (max(data) - min(data)) Scans the data once to find value_range, then falls through to ABS. Works with any TCode type.

REL — pointwise relative error bound (PFPL exact definition): |x - x_hat| / |x| <= eb Implemented via log2-space quantization (see PFPL paper): bin = round(log2(|x|) / log2eb), log2eb = 2 * log2(1 + eb) x_hat = sign(x) * 2^(bin * log2eb) Zeros, denormals, infinities and NaNs are stored losslessly as outliers. Reconstruction is also verified against the exact bounds; if the fast log2/pow2 approximation causes a violation the value is stored losslessly instead.

NOTE: REL mode uses a 4-byte code per element (bit-packed: sign of x, sign of log_bin, magnitude of log_bin). You must use a 4-byte code type: QuantizerStage<float, uint32_t>. An exception is thrown at runtime if TCode is narrower and the required stored value overflows. For epsilon >= 0.01 with float32, uint16_t codes are sufficient in practice (max |log_bin| ≈ 4460 << 16383 max for uint16 REL).

Outputs (compression mode): [0] codes — quantization codes (TCode[n]) [1] outlier_vals — original values at outlier positions (TInput[k]) [2] outlier_idxs — indices of outlier positions (uint32_t[k]) [3] outlier_count — number of outliers (uint32_t scalar)

Inputs (decompression mode): same 4 buffers → reconstructed TInput[n]

Member Function Documentation

◆ execute()

template<typename TInput = float, typename TCode = uint16_t>
void fz::QuantizerStage< TInput, TCode >::execute ( cudaStream_t  stream,
MemoryPool pool,
const std::vector< void * > &  inputs,
const std::vector< void * > &  outputs,
const std::vector< size_t > &  sizes 
)
overridevirtual

Execute the stage. Inputs, outputs, and sizes are device pointers/bytes.

Stages may call cudaStreamSynchronize(stream) or issue blocking D2H copies when the algorithm requires it (e.g. Huffman histogram readback for codebook construction, ANS renormalization tables). Such stages must return false from isGraphCompatible() and must document the sync points.

Note: the DAG dispatches sibling nodes (same topological level) via a sequential CPU loop, each enqueuing to its own stream. A sync inside execute() blocks the CPU from dispatching subsequent siblings until the synced stream is idle — this delays parallel branches in wide DAGs. In a linear pipeline there are no siblings and no extra cost.

Implements fz::Stage.

◆ postStreamSync()

template<typename TInput = float, typename TCode = uint16_t>
void fz::QuantizerStage< TInput, TCode >::postStreamSync ( cudaStream_t  stream)
overridevirtual

Called after dag->execute() and stream sync, before compress() returns. Use for D2H transfers that must not block mid-pipeline (e.g. Lorenzo's outlier count readback). The stream is already idle so a plain cudaMemcpy is safe here.

Reimplemented from fz::Stage.

◆ getName()

template<typename TInput = float, typename TCode = uint16_t>
std::string fz::QuantizerStage< TInput, TCode >::getName ( ) const
inlineoverridevirtual

Human-readable name used in error messages and debug output.

Implements fz::Stage.

◆ getOutputNames()

template<typename TInput = float, typename TCode = uint16_t>
std::vector< std::string > fz::QuantizerStage< TInput, TCode >::getOutputNames ( ) const
inlineoverridevirtual

Output port names in order. Default: single port named "output". Multi-output stages (e.g. Lorenzo: "codes", "outliers") override this.

Reimplemented from fz::Stage.

◆ estimateOutputSizes()

template<typename TInput = float, typename TCode = uint16_t>
std::vector< size_t > fz::QuantizerStage< TInput, TCode >::estimateOutputSizes ( const std::vector< size_t > &  input_sizes) const
overridevirtual

Estimate output buffer sizes given input sizes. Used for buffer allocation planning in PREALLOCATE mode — must be a safe upper bound; under-estimation causes buffer overruns.

Implements fz::Stage.

◆ getActualOutputSizesByName()

template<typename TInput = float, typename TCode = uint16_t>
std::unordered_map< std::string, size_t > fz::QuantizerStage< TInput, TCode >::getActualOutputSizesByName ( ) const
inlineoverridevirtual

Actual output sizes after execute(), keyed by output port name.

Implements fz::Stage.

◆ getActualOutputSize()

template<typename TInput = float, typename TCode = uint16_t>
size_t fz::QuantizerStage< TInput, TCode >::getActualOutputSize ( int  index) const
inlineoverridevirtual

Actual size of a single output by index after execute(). Avoids constructing the map for the common single-output case. Default delegates to getActualOutputSizesByName(); override to return directly from an internal field.

Reimplemented from fz::Stage.

◆ setInverse()

template<typename TInput = float, typename TCode = uint16_t>
void fz::QuantizerStage< TInput, TCode >::setInverse ( bool  inverse)
inlineoverridevirtual

Switch between forward (compression) and inverse (decompression) mode. Affects getNumInputs()/getNumOutputs() for stages with asymmetric port counts.

Reimplemented from fz::Stage.

◆ getStageTypeId()

template<typename TInput = float, typename TCode = uint16_t>
uint16_t fz::QuantizerStage< TInput, TCode >::getStageTypeId ( ) const
inlineoverridevirtual

Stage type identifier written into the FZM file header.

Implements fz::Stage.

◆ getOutputDataType()

template<typename TInput = float, typename TCode = uint16_t>
uint8_t fz::QuantizerStage< TInput, TCode >::getOutputDataType ( size_t  output_index) const
inlineoverridevirtual

DataType enum of the given output port.

Implements fz::Stage.

◆ getInputDataType()

template<typename TInput = float, typename TCode = uint16_t>
uint8_t fz::QuantizerStage< TInput, TCode >::getInputDataType ( size_t  ) const
inlineoverridevirtual

Expected DataType of the given input port.

Used by Pipeline::finalize() to detect type mismatches between connected stages before any execution. Return DataType::UNKNOWN to opt out of checking — byte-transparent stages (Bitshuffle, RZE) and mock stages must return UNKNOWN; finalize() skips any connection where either side is UNKNOWN.

Reimplemented from fz::Stage.

◆ serializeHeader()

template<typename TInput = float, typename TCode = uint16_t>
size_t fz::QuantizerStage< TInput, TCode >::serializeHeader ( size_t  output_index,
uint8_t *  header_buffer,
size_t  max_size 
) const
overridevirtual

Serialize stage config into header_buffer (max 128 bytes) for the FZM file. Return the number of bytes written, or 0 if the stage has no config.

Reimplemented from fz::Stage.

◆ getMaxHeaderSize()

template<typename TInput = float, typename TCode = uint16_t>
size_t fz::QuantizerStage< TInput, TCode >::getMaxHeaderSize ( size_t  output_index) const
inlineoverridevirtual

Maximum bytes this stage writes into its per-output FZM header slot.

Reimplemented from fz::Stage.

◆ deserializeHeader()

template<typename TInput = float, typename TCode = uint16_t>
void fz::QuantizerStage< TInput, TCode >::deserializeHeader ( const uint8_t *  header_buffer,
size_t  size 
)
overridevirtual

Restore stage config from header_buffer during decompression.

Reimplemented from fz::Stage.

◆ saveState()

template<typename TInput = float, typename TCode = uint16_t>
void fz::QuantizerStage< TInput, TCode >::saveState ( )
inlineoverridevirtual

Save/restore config state around a decompression pass. deserializeHeader() overwrites the stage's forward-pass config; saveState() is called before and restoreState() after so the stage returns to its original configuration.

Reimplemented from fz::Stage.