|
FZGPUModules 1.0
GPU-accelerated modular compression pipeline
|
GPU bit-matrix transpose stage (W × N bit shuffle over fixed-size chunks). More...
#include "stage/stage.h"#include "fzm_format.h"#include <cuda_runtime.h>#include <cstdint>#include <cstring>#include <stdexcept>#include <string>#include <unordered_map>#include <vector>Go to the source code of this file.
Classes | |
| class | fz::BitshuffleStage |
Namespaces | |
| namespace | fz |
GPU bit-matrix transpose stage (W × N bit shuffle over fixed-size chunks).
Given a chunk of N elements each W bits wide, the forward pass produces W groups each containing the k-th bit of all N elements (a W × N bit-matrix transpose). Output is the same byte size as input.
Output layout: MSB-first — bit-plane W-1 at plane index 0, bit-plane 0 at W-1. Plane p occupies words p*(N_chunk/32)..(p+1)*(N_chunk/32)-1 where N_chunk = block_size / element_width.
Serialized header (5 bytes): [0..3] block_size (uint32_t LE), [4] element_width (uint8_t).