|
FZGPUModules 2.0
GPU-accelerated modular compression pipelines
|
Header: modules/quantizers/quantizer/quantizer.h
Class: fz::QuantizerStage<TInput, TCode>
Category: Quantizer (lossy)
Quantizes floating-point input values directly (not prediction residuals). Values that fall outside the representable range are stored losslessly as outliers in separate scatter buffers, unless inplace mode is active.
Three error-bound modes are supported: ABS, NOA, and REL.
| Parameter | Constraint |
|---|---|
TInput | float or double |
TCode | Unsigned integer (see available instantiations below) |
Only these combinations are compiled and linked:
QuantizerStage<float, uint16_t>QuantizerStage<float, uint32_t>QuantizerStage<double, uint16_t>QuantizerStage<double, uint32_t>Using any other combination will result in a linker error. Most common: QuantizerStage<float, uint16_t>.
| Setting | Purpose | Notes |
|---|---|---|
setErrorBound(eb) | User error bound | Interpreted by setErrorBoundMode() |
setErrorBoundMode(mode) | ABS / NOA / REL | REL is exact pointwise relative (log-space) |
setQuantRadius(r) | Quantization radius | Used by ABS/NOA modes |
setOutlierCapacity(f) | Outlier reserve fraction | 0.0-1.0 of element count |
setZigzagCodes(enable) | Zigzag-encode codes | ABS/NOA only; improves compressibility |
| setOutlierThreshold(t) | Force outliers | ABS/NOA only; |x| >= t -> outlier | | setInplaceOutliers(enable) | Embed outliers in codes | ABS/NOA only; see constraints below | | setValueBase(v) | Precomputed value range | NOA only; optional, see below |
| Index | Name | Type | Description |
|---|---|---|---|
| 0 | "codes" | TCode[n] | Quantization codes |
| 1 | "outlier_vals" | TInput[k] | Original values at outlier positions |
| 2 | "outlier_idxs" | uint32_t[k] | Linear indices of outlier positions |
| 3 | "outlier_count" | uint32_t | Number of outliers (scalar) |
Connect downstream stages to "codes":
When setInplaceOutliers(true) is active, outliers are embedded directly in the codes array using their raw IEEE-754 bit pattern. Only the "codes" port exists; the three outlier scatter ports are absent.
| Mode | Formula | Notes |
|---|---|---|
ABS | abs(x_orig - x_recon) ≤ eb | Uniform quantization with step 2 * eb |
NOA | abs(error) / value_range ≤ eb | Scales ABS by the data range |
REL | abs(error) / abs(x_orig) ≤ eb | Ratio of error to original value |
REL mode details:
uint32_t is safe for all cases; uint16_t works for eb >= 0.01 with float32 in practice.Both of the following are required when setInplaceOutliers(true) is set. Violations throw at runtime during the first compress() call.
Why: the inverse kernel distinguishes valid codes from embedded outlier floats via the sentinel (code >> 1) >= quant_radius. With zigzag encoding (TCMS), valid codes are in [0, 2 × quant_radius). Normal float bit patterns are always >= 0x00800000, which exceeds 2 × quant_radius for any practical radius (<= 2²²), making the sentinel check unambiguous. Without zigzag, signed two's-complement codes overlap with float bit patterns and the sentinel fails.
Why: the inplace kernel stores outlier raw bits with __builtin_memcpy(&raw, &x, sizeof(TCode)). If the sizes differ the copy is truncated or out-of-bounds.
REL mode packs sign + log-bin into the code word and uses a sentinel value for outliers. There is no unused range large enough to safely embed raw IEEE-754 bit patterns without collisions, and REL already needs the scatter buffers to preserve special values (zero, denormals, inf, NaN) exactly. For REL, outliers must remain in the explicit scatter buffers.
Only NOA needs a data-dependent value base (max - min). If setValueBase() is not called, the stage scans the data once to compute it. For CUDA Graph capture, provide the precomputed value base to avoid a device sync:
ABS and REL modes do not require setValueBase().