#include <perf.h>

Public Member Functions
float	throughput_gbs () const noexcept

float	pipeline_throughput_gbs () const noexcept

void	print (std::ostream &os) const
	Pretty-print a timing table to `os` (defaults to std::cout).

Public Attributes
bool	is_compress
	true = compress pass, false = decompress pass

float	host_elapsed_ms
	Total host-side wall time including setup (ms)

float	dag_elapsed_ms
	GPU compute time only — dag->execute() (ms)

size_t	input_bytes
	Bytes fed into the pipeline.

size_t	output_bytes
	Bytes produced by the pipeline.

std::vector< StageTimingResult >	stages
	Per-stage results in topological order.

std::vector< LevelTimingResult >	levels
	Per-level aggregates in level order.

Detailed Description

Complete performance snapshot for one compress() or decompress() call.

Obtain via: pipeline.enableProfiling(true); pipeline.compress(d_in, n, &d_out, &out_size, stream); auto& r = pipeline.getLastPerfResult(); r.print();

Timing layers: host_elapsed_ms — wall-clock time for the full call including host overhead such as buffer metadata collection, concat, and any pipeline construction (e.g. decompressFromFile setup). Useful for end-to-end benchmarking but not throughput. dag_elapsed_ms — time spent solely inside dag->execute() (GPU compute, ms) i.e. the actual GPU compute excluding all host setup. This is the denominator for throughput_gbs(). stage elapsed_ms — per-stage GPU time from paired CUDA events; most accurate for isolating individual kernel costs.

Throughput is always reported as uncompressed data size / dag_elapsed_ms: compress: uncompressed_bytes = input_bytes decompress: uncompressed_bytes = output_bytes

Member Function Documentation

◆ throughput_gbs()

float fz::PipelinePerfResult::throughput_gbs ( ) const

noexcept

DAG throughput in GB/s: uncompressed data size divided by dag_elapsed_ms. Isolates actual GPU compute cost from host-side overhead and file I/O.

◆ pipeline_throughput_gbs()

float fz::PipelinePerfResult::pipeline_throughput_gbs ( ) const

noexcept

Pipeline throughput in GB/s: uncompressed data size divided by host_elapsed_ms. Reflects end-to-end latency including host setup.

Public Member Functions

Public Attributes

Detailed Description

Member Function Documentation

◆ throughput_gbs()

◆ pipeline_throughput_gbs()