|
FZGPUModules 1.0
GPU-accelerated modular compression pipeline
|
#include <perf.h>
Aggregate timing for all stages that run concurrently at one DAG level. elapsed_ms is the critical-path duration — the longest stage at the level — because parallel stages overlap on different CUDA streams.