Submitted by claxton on
Title | Memory Traffic and Complete Application Profiling with PAPI Multi-Component Measurements |
Publication Type | Conference Proceedings |
Year of Publication | 2023 |
Authors | Barry, D., H. Jagode, A. Danalis, and J. Dongarra |
Conference Name | 2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) |
Date Published | 2023-08 |
Publisher | IEEE |
Conference Location | St. Petersburg, Florida |
Keywords | GPU power, High Performance Computing, network traffic, papi, performance analysis, Performance Counters |
Abstract | Some of the most important categories of performance events count the data traffic between the processing cores and the main memory. However, since these counters are not core-private, applications require elevated privileges to access them. PAPI offers a component that can access this information on IBM systems through the Performance Co-Pilot (PCP); however, doing so adds an indirection layer that involves querying the PCP daemon. This paper performs a quantitative study of the accuracy of the measurements obtained through this component on the Summit supercomputer. We use two linear algebra kernels---a generalized matrix multiply, and a modified matrix-vector multiply---as benchmarks and a distributed, GPU-accelerated 3D-FFT mini-app (using cuFFT) to compare the measurements obtained through the PAPI PCP component against the expected values across different problem sizes. We also compare our measurements against an in-house machine with a very similar architecture to Summit, where elevated privileges allow PAPI to access the hardware counters directly (without using PCP) to show that measurements taken via PCP are as accurate as the those taken directly. Finally, using both QMCPACK and the 3D-FFT, we demonstrate the diverse hardware activities that can be monitored simultaneously via PAPI hardware components. |
URL | https://ieeexplore.ieee.org/document/10196656 |
DOI | 10.1109/IPDPSW59300.2023.00070 |