8#include <cuda_runtime.h>
Bit-packing stage: packs N-bit integers into a dense byte stream.
GPU bit-matrix transpose stage (W × N bit shuffle over fixed-size chunks).
Pipeline builder and execution API.
Compression DAG wiring, execution, and memory strategy types.
First-order difference coding stage with optional negabinary fusion.
Logging infrastructure and macros.
Fused Lorenzo predictor and quantizer stage.
Plain integer Lorenzo predictor (delta coding / prefix sum). Lossless.
Negabinary (base -2) integer encoding helpers.
Element-wise negabinary encode/decode stage (TIn[] ↔ TOut[]).
Direct-value quantizer stage with error-bounded coding and lossless outlier fallback.
Run-Length Encoding stage (lossless, stream-ordered).
Recursive Zero-byte Elimination stage — lossless byte-stream compressor.
Base class interface for all compression stages.
Reconstruction quality metrics (MSE, PSNR, max error, NRMSE).
Element-wise zigzag encode/decode stage (TIn[] ↔ TOut[]).