|
FZGPUModules 2.0
GPU-accelerated modular compression pipelines
|
Header: modules/fused/lorenzo_quant/lorenzo_quant.h
Class: fz::LorenzoQuantStage<TInput, TCode>
Category: Fused predictor + quantizer
Computes a Lorenzo prediction (each element minus its spatial neighbor(s)), then immediately quantizes the prediction error into integer codes. The fused kernel avoids writing the raw residuals to device memory.
Supports 1-D, 2-D, and 3-D data. Dimensionality is controlled by setDims() and must be set before pipeline.addStage() so the pipeline can push the correct dims at add-time.
Outliers (errors that fall outside [-quant_radius, quant_radius)) are scattered to separate outlier_errors and outlier_indices buffers.
| Parameter | Constraint |
|---|---|
TInput | float or double |
TCode | Unsigned integer (see available instantiations below) |
Only these combinations are compiled and linked:
LorenzoQuantStage<float, uint8_t>LorenzoQuantStage<float, uint16_t>LorenzoQuantStage<double, uint16_t>LorenzoQuantStage<double, uint32_t>Using any other combination will result in a linker error. Most common: LorenzoQuantStage<float, uint16_t> (cuSZ-style pipelines).
| Setting | Purpose | Notes |
|---|---|---|
setErrorBound(eb) | User error bound | Interpreted by setErrorBoundMode() |
setErrorBoundMode(mode) | ABS / NOA / REL | REL is a global approximation (see below) |
setQuantRadius(r) | Quantization radius | Must fit in TCode range |
setOutlierCapacity(f) | Outlier reserve fraction | 0.0-1.0x of element count |
setZigzagCodes(enable) | Zigzag-encode codes | Can improve compressibility |
setValueBase(v) | Precomputed scale | NOA: (max - min), REL: abs(max); optional |
| Index | Name | Type | Description |
|---|---|---|---|
| 0 | "codes" | TCode[n] | Quantized prediction errors |
| 1 | "outlier_errors" | TInput[k] | Original values at outlier positions |
| 2 | "outlier_indices" | uint32_t[k] | Linear indices of outlier positions |
| 3 | "outlier_count" | uint32_t | Number of outliers (scalar) |
Connect downstream stages to the "codes" port:
| Mode | Interpretation | Note |
|---|---|---|
ABS | abs(error) <= eb | Default |
NOA | abs_eb = eb × (max - min) | Uses value range; can be precomputed via setValueBase() |
REL | abs_eb = eb × max(abs(data)) | Global approximation (not exact per-element) |
REL is supported, but because it uses a single global scale (max(abs(x))), small values can exceed the per-element relative bound. For exact pointwise REL bounds, use QuantizerStage with ErrorBoundMode::REL.
addStage() pushes the pipeline's current dims into the stage immediately. finalize() pushes them again as a safety net. If dims are set after addStage(), call stage->setDims() directly.
NOA and REL modes need a data-dependent scale:
value_base = max - minvalue_base = max(|x|)If setValueBase() is not called, the stage scans the data to compute the value base internally. For CUDA Graph capture, you must provide the value base up front to avoid a device sync and D2H read.
ABS mode needs no setValueBase() call.