|
FZGPUModules 1.0
GPU-accelerated modular compression pipeline
|
#include <quantizer.h>
Inheritance diagram for fz::QuantizerStage< TInput, TCode >:Classes | |
| struct | Config |
Public Member Functions | |
| void | execute (cudaStream_t stream, MemoryPool *pool, const std::vector< void * > &inputs, const std::vector< void * > &outputs, const std::vector< size_t > &sizes) override |
| void | postStreamSync (cudaStream_t stream) override |
| std::string | getName () const override |
| std::vector< std::string > | getOutputNames () const override |
| std::vector< size_t > | estimateOutputSizes (const std::vector< size_t > &input_sizes) const override |
| std::unordered_map< std::string, size_t > | getActualOutputSizesByName () const override |
| size_t | getActualOutputSize (int index) const override |
| void | setInverse (bool inverse) override |
| uint16_t | getStageTypeId () const override |
| uint8_t | getOutputDataType (size_t output_index) const override |
| uint8_t | getInputDataType (size_t) const override |
| size_t | serializeHeader (size_t output_index, uint8_t *buf, size_t max_size) const override |
| size_t | getMaxHeaderSize (size_t) const override |
| void | deserializeHeader (const uint8_t *buf, size_t size) override |
| void | saveState () override |
| void | setOutlierThreshold (float t) |
| ABS/NOA: |x| >= threshold → lossless outlier regardless of bin (LC reference parameter). | |
| void | setInplaceOutliers (bool enable) |
| ABS/NOA: encode outliers in-place (raw float bits in codes array; no scatter buffers). | |
Public Member Functions inherited from fz::Stage | |
| virtual size_t | getRequiredInputAlignment () const |
| int | getOutputIndex (const std::string &name) const |
| virtual void | setDims (const std::array< size_t, 3 > &dims) |
| virtual bool | isGraphCompatible () const |
| virtual size_t | estimateScratchBytes (const std::vector< size_t > &input_sizes) const |
Direct-value quantizer with error-bounded coding and lossless outlier fallback.
Unlike LorenzoStage (which quantizes prediction differences), this stage quantizes the input values directly. It supports all three error-bound modes:
ABS — absolute error bound: |x - x_hat| <= eb Uniform quantization with step = 2*eb. Works with any TCode type.
NOA — norm-of-absolute (PFPL): abs_eb = eb * (max(data) - min(data)) Scans the data once to find value_range, then falls through to ABS. Works with any TCode type.
REL — pointwise relative error bound (PFPL exact definition): |x - x_hat| / |x| <= eb Implemented via log2-space quantization (see PFPL paper): bin = round(log2(|x|) / log2eb), log2eb = 2 * log2(1 + eb) x_hat = sign(x) * 2^(bin * log2eb) Zeros, denormals, infinities and NaNs are stored losslessly as outliers. Reconstruction is also verified against the exact bounds; if the fast log2/pow2 approximation causes a violation the value is stored losslessly instead.
NOTE: REL mode uses a 4-byte code per element (bit-packed: sign of x, sign of log_bin, magnitude of log_bin). You must use a 4-byte code type: QuantizerStage<float, uint32_t>. An exception is thrown at runtime if TCode is narrower and the required stored value overflows. For epsilon >= 0.01 with float32, uint16_t codes are sufficient in practice (max |log_bin| ≈ 4460 << 16383 max for uint16 REL).
Outputs (compression mode): [0] codes — quantization codes (TCode[n]) [1] outlier_vals — original values at outlier positions (TInput[k]) [2] outlier_idxs — indices of outlier positions (uint32_t[k]) [3] outlier_count — number of outliers (uint32_t scalar)
Inputs (decompression mode): same 4 buffers → reconstructed TInput[n]
|
overridevirtual |
Execute the stage. Inputs, outputs, and sizes are device pointers/bytes.
Implements fz::Stage.
|
overridevirtual |
|
inlineoverridevirtual |
Human-readable name used in error messages and debug output.
Implements fz::Stage.
|
inlineoverridevirtual |
Output port names in order. Default: single port named "output". Multi-output stages (e.g. Lorenzo: "codes", "outliers") override this.
Reimplemented from fz::Stage.
|
overridevirtual |
Estimate output buffer sizes given input sizes. Used for buffer allocation planning in PREALLOCATE mode — must be a safe upper bound; under-estimation causes buffer overruns.
Implements fz::Stage.
|
inlineoverridevirtual |
|
inlineoverridevirtual |
Actual size of a single output by index after execute(). Avoids constructing the map for the common single-output case. Default delegates to getActualOutputSizesByName(); override to return directly from an internal field.
Reimplemented from fz::Stage.
|
inlineoverridevirtual |
Switch between forward (compression) and inverse (decompression) mode. Affects getNumInputs()/getNumOutputs() for stages with asymmetric port counts.
Reimplemented from fz::Stage.
|
inlineoverridevirtual |
|
inlineoverridevirtual |
DataType enum of the given output port.
Implements fz::Stage.
|
inlineoverridevirtual |
Expected DataType of the given input port.
Used by Pipeline::finalize() to detect type mismatches between connected stages before any execution. Return DataType::UNKNOWN to opt out of checking — byte-transparent stages (Bitshuffle, RZE) and mock stages must return UNKNOWN; finalize() skips any connection where either side is UNKNOWN.
Reimplemented from fz::Stage.
|
overridevirtual |
Serialize stage config into header_buffer (max 128 bytes) for the FZM file. Return the number of bytes written, or 0 if the stage has no config.
Reimplemented from fz::Stage.
|
inlineoverridevirtual |
Maximum bytes this stage writes into its per-output FZM header slot.
Reimplemented from fz::Stage.
|
overridevirtual |
Restore stage config from header_buffer during decompression.
Reimplemented from fz::Stage.
|
inlineoverridevirtual |
Save/restore config state around a decompression pass. deserializeHeader() overwrites the stage's forward-pass config; saveState() is called before and restoreState() after so the stage returns to its original configuration.
Reimplemented from fz::Stage.