|
FZGPUModules 1.0
GPU-accelerated modular compression pipeline
|
#include <perf.h>
Public Member Functions | |
| float | throughput_gbs () const noexcept |
| float | pipeline_throughput_gbs () const noexcept |
| void | print (std::ostream &os) const |
Pretty-print a timing table to os (defaults to std::cout). | |
Public Attributes | |
| bool | is_compress |
| true = compress pass, false = decompress pass | |
| float | host_elapsed_ms |
| Total host-side wall time including setup (ms) | |
| float | dag_elapsed_ms |
| GPU compute time only — dag->execute() (ms) | |
| size_t | input_bytes |
| Bytes fed into the pipeline. | |
| size_t | output_bytes |
| Bytes produced by the pipeline. | |
| std::vector< StageTimingResult > | stages |
| Per-stage results in topological order. | |
| std::vector< LevelTimingResult > | levels |
| Per-level aggregates in level order. | |
Complete performance snapshot for one compress() or decompress() call.
Obtain via: pipeline.enableProfiling(true); pipeline.compress(d_in, n, &d_out, &out_size, stream); auto& r = pipeline.getLastPerfResult(); r.print();
Timing layers: host_elapsed_ms — wall-clock time for the full call including host overhead such as buffer metadata collection, concat, and any pipeline construction (e.g. decompressFromFile setup). Useful for end-to-end benchmarking but not throughput. dag_elapsed_ms — time spent solely inside dag->execute() (GPU compute, ms) i.e. the actual GPU compute excluding all host setup. This is the denominator for throughput_gbs(). stage elapsed_ms — per-stage GPU time from paired CUDA events; most accurate for isolating individual kernel costs.
Throughput is always reported as uncompressed data size / dag_elapsed_ms: compress: uncompressed_bytes = input_bytes decompress: uncompressed_bytes = output_bytes
|
noexcept |
DAG throughput in GB/s: uncompressed data size divided by dag_elapsed_ms. Isolates actual GPU compute cost from host-side overhead and file I/O.
|
noexcept |
Pipeline throughput in GB/s: uncompressed data size divided by host_elapsed_ms. Reflects end-to-end latency including host setup.