FZGPUModules 2.0
GPU-accelerated modular compression pipelines
Loading...
Searching...
No Matches
adaptive_bitpack_kernels.h File Reference

Per-block adaptive fixed-rate bit-plane packing kernels (cuSZp-style plain mode), decoupled from the offset scan. More...

#include <cstddef>
#include <cstdint>
#include <cuda_runtime.h>

Go to the source code of this file.

Classes

struct  fz::adaptive_bitpack::Config
 

Namespaces

namespace  fz
 

Functions

size_t fz::adaptive_bitpack::maxArchiveBytes (const Config &c, unsigned bits_per_elem)
 

Detailed Description

Per-block adaptive fixed-rate bit-plane packing kernels (cuSZp-style plain mode), decoupled from the offset scan.

Layout (one logical "data block" of block_size signed elements):

  • rate byte r = number of bit-planes needed for max |value| in the block (0 if the whole block is zero).
  • if r > 0: a sign region of word_bytes = ceil(block_size/8) bytes (bit j of byte k = sign of element 8k+j), followed by r bit-planes of word_bytes bytes each (bit j of byte k of plane p = bit p of |element 8k+j|).
  • block payload cost = r > 0 ? word_bytes * (r + 1) : 0.

Archive = [num_blocks rate bytes] [payloads packed by exclusive-scan offsets]. The stage carries block_size and num_elements in the FZM header, so the archive needs no internal header of its own.

Function Documentation

◆ maxArchiveBytes()

size_t fz::adaptive_bitpack::maxArchiveBytes ( const Config c,
unsigned  bits_per_elem 
)
inline

Worst-case archive size: every block stores its metadata plus a full-width sign region and bits_per_elem bit-planes (the cheaper of the plain/outlier candidates is always <= the plain candidate, so this bounds both modes).