FZGPUModules 1.0
GPU-accelerated modular compression pipeline
Loading...
Searching...
No Matches
fz::PipelinePerfResult Struct Reference

#include <perf.h>

Public Member Functions

float throughput_gbs () const noexcept
 
float pipeline_throughput_gbs () const noexcept
 
void print (std::ostream &os) const
 Pretty-print a timing table to os (defaults to std::cout).
 

Public Attributes

bool is_compress
 true = compress pass, false = decompress pass
 
float host_elapsed_ms
 Total host-side wall time including setup (ms)
 
float dag_elapsed_ms
 GPU compute time only — dag->execute() (ms)
 
size_t input_bytes
 Bytes fed into the pipeline.
 
size_t output_bytes
 Bytes produced by the pipeline.
 
std::vector< StageTimingResultstages
 Per-stage results in topological order.
 
std::vector< LevelTimingResultlevels
 Per-level aggregates in level order.
 

Detailed Description

Complete performance snapshot for one compress() or decompress() call.

Obtain via: pipeline.enableProfiling(true); pipeline.compress(d_in, n, &d_out, &out_size, stream); auto& r = pipeline.getLastPerfResult(); r.print();

Timing layers: host_elapsed_ms — wall-clock time for the full call including host overhead such as buffer metadata collection, concat, and any pipeline construction (e.g. decompressFromFile setup). Useful for end-to-end benchmarking but not throughput. dag_elapsed_ms — time spent solely inside dag->execute() (GPU compute, ms) i.e. the actual GPU compute excluding all host setup. This is the denominator for throughput_gbs(). stage elapsed_ms — per-stage GPU time from paired CUDA events; most accurate for isolating individual kernel costs.

Throughput is always reported as uncompressed data size / dag_elapsed_ms: compress: uncompressed_bytes = input_bytes decompress: uncompressed_bytes = output_bytes

Member Function Documentation

◆ throughput_gbs()

float fz::PipelinePerfResult::throughput_gbs ( ) const
noexcept

DAG throughput in GB/s: uncompressed data size divided by dag_elapsed_ms. Isolates actual GPU compute cost from host-side overhead and file I/O.

◆ pipeline_throughput_gbs()

float fz::PipelinePerfResult::pipeline_throughput_gbs ( ) const
noexcept

Pipeline throughput in GB/s: uncompressed data size divided by host_elapsed_ms. Reflects end-to-end latency including host setup.