|
FZGPUModules 2.0
GPU-accelerated modular compression pipelines
|
#include <perf.h>
Aggregate timing for all stages that run concurrently at one DAG level. elapsed_ms is the critical-path duration — the longest stage at the level — because parallel stages overlap on different CUDA streams.