FZGPUModules 2.0
GPU-accelerated modular compression pipelines
Loading...
Searching...
No Matches
bitplane_rze_kernels.h File Reference

Host-callable launchers + config helper for the fused bitplane-RZE (FZ-GPU lossless) encode/decode kernels. More...

#include <cuda_runtime.h>
#include <cstddef>
#include <cstdint>

Go to the source code of this file.

Classes

struct  fz::bitplane_rze::Config
 Sizes derived from the (unpadded) uint16 symbol count. More...
 

Namespaces

namespace  fz
 

Enumerations

enum  : int
 

Functions

Config fz::bitplane_rze::configure (size_t data_len)
 Compute the padded layout for data_len uint16 symbols.
 
size_t fz::bitplane_rze::maxArchiveBytes (size_t data_len)
 
void fz::bitplane_rze::launchEncode (const uint16_t *d_in, const Config &cfg, uint32_t *d_offset_counter, uint32_t *d_bitflag, uint32_t *d_start_pos, uint8_t *d_comp_out, uint32_t *d_comp_len, cudaStream_t stream)
 
void fz::bitplane_rze::launchDecode (const uint8_t *d_bitstream, const uint32_t *d_bitflag, const uint32_t *d_start_pos, uint16_t *d_out, const Config &cfg, cudaStream_t stream)
 

Detailed Description

Host-callable launchers + config helper for the fused bitplane-RZE (FZ-GPU lossless) encode/decode kernels.

Internal interface — only bitplane_rze_kernels.cu and bitplane_rze_stage.cu include it. The two device kernels live in bitplane_rze_encode.inl / bitplane_rze_decode.inl, included privately by bitplane_rze_kernels.cu.

The codec is hard-wired to a 4096-byte data chunk (a 32×32 grid of uint32_t words). Each CUDA block processes one chunk: it bit-transposes the 32×32 bit matrix, then eliminates all-zero 4-byte groups, reserving output space with a per-block atomicAdd. Input is interpreted as uint16_t symbols packed two-per-word, matching FZ-GPU's quantizer codes.

Enumeration Type Documentation

◆ anonymous enum

anonymous enum : int

Self-describing 128-byte archive header, embedded at byte 0 of the output. Mirrors FZ-GPU's fzg_header. entry[] holds cumulative byte offsets: entry[0] = 0 (header start) entry[1] = 128 (bitflag array start) entry[2] = entry[1] + bitflag bytes (start-position array start) entry[3] = entry[2] + startpos bytes (compacted bitstream start) entry[4] = entry[3] + bitstream bytes (= total archive size)

Function Documentation

◆ maxArchiveBytes()

size_t fz::bitplane_rze::maxArchiveBytes ( size_t  data_len)

Worst-case archive byte count for data_len symbols. Safe upper bound for output-buffer allocation (header + bitflag + start-pos + a bitstream that retains every 4-byte group).

◆ launchEncode()

void fz::bitplane_rze::launchEncode ( const uint16_t *  d_in,
const Config cfg,
uint32_t *  d_offset_counter,
uint32_t *  d_bitflag,
uint32_t *  d_start_pos,
uint8_t *  d_comp_out,
uint32_t *  d_comp_len,
cudaStream_t  stream 
)

Forward (compress). Bit-transpose + zero-group elimination of pad_len/2 uint32 words read from d_in into the three archive sub-arrays. The kernel reads exactly cfg.data_bytes bytes, so d_in must point to a buffer of at least that size with any tail beyond the real input zeroed.

d_offset_counter (1 uint32) must be zeroed on stream before the call; the kernel atomically accumulates the total compacted word count into it. d_comp_len (grid_x uint32) is per-block scratch written by the kernel.

This launcher does NOT synchronize — the caller reads d_offset_counter D2H afterwards to learn the bitstream length.

◆ launchDecode()

void fz::bitplane_rze::launchDecode ( const uint8_t *  d_bitstream,
const uint32_t *  d_bitflag,
const uint32_t *  d_start_pos,
uint16_t *  d_out,
const Config cfg,
cudaStream_t  stream 
)

Inverse (decompress). Reconstruct pad_len uint16 symbols into d_out from the archive sub-arrays. d_out must hold at least cfg.data_bytes bytes (the kernel writes the full padded length; trailing padding is zero).