Innovative Computing Laboratory

Overview

This project is developing PAPI, which will provide tool designers and application engineers with a consistent interface and methodology for the use of low-level performance counter hardware found across the entire compute system (i.e. CPUs, GPUs, on/off-chip memory, interconnects, I/O system, energy/power, etc.). PAPI will enable users to see, in near real time, the relations between software performance and hardware events across the entire computer system.

Exa-PAPI builds on the latest PAPI project and will be extended with:

  1. Performance counter monitoring capabilities for new and advanced ECP hardware, and software technologies.
  2. Fine-grained power management support.
  3. Functionality for performance counter analysis at "task granularity" for task-based runtime systems.
  4. "Software-defined Events" that originate from the ECP software stack and are currently treated as black boxes (i.e., communication libraries, math libraries, task-based runtime systems, etc.)

The objective is to enable monitoring of both types of performance events—hardware- and software-related events—in a uniform way, through one consistent PAPI interface. Third-party tools and application developers will have to handle only a single hook to PAPI in order to access all hardware performance counters in a system, including the new software-defined events.

PAPI Releases

NEWS

Announcing PAPI 7.0.0

Just in time for Supercomputing 2022, PAPI 7.0.0 is now available.

This is a major release of PAPI, which offers several new components, including "intel_gpu" with monitoring capabilities on Intel GPUs; "sysdetect" (along with a new user API) for detecting details of the available hardware on a given compute system; a significant revision of the "rocm" component for AMD GPUs; the extension of the "cuda" component to enable performance monitoring on NVIDIA's compute capabilities 7.0 and beyond. Furthermore, PAPI 7.0.0 ships with a standalone "libsde" library and a new C++ API for software developers to define software-defined events from within their applications.

For specific and detailed information on changes made for this release, see ChangeLogP700.txt for filenames or keywords of interest and change summaries, or go directly to the PAPI git repository.

Some Major Changes for PAPI 7.0.0 include:

A new "intel_gpu" component with monitoring capabilities support for Intel GPUs (including GPU hardware events and memory performance metrics (e.g., bytes read/written/transferred from/to L3). The PAPI "intel_gpu" component offers two collection modes: (1) "Time-based Collection Mode," where metrics can be read at any given time during the execution of kernels. (2) "Kernel-based Collection Mode," where performance counter data is available once the kernel execution is finished.

A new "sysdetect" component for detecting a machine's architectural details, including the hardware's topology, specific aspects about the memory hierarchy, number and type of GPUs and CPUs on a node, thread affinity to NUMA nodes and GPU devices, etc. Additionally, PAPI offers a new API that enables users to get "sysdetect" details from within their application.

A major redesign of the "rocm" component for advanced monitoring features for the latest AMD GPUs. The PAPI "rocm" component is now thread-safe and offers two collection modes: "sampling" and "kernel intercept" mode.

Support for NVIDIA compute capability 7.0 and greater. This implies support for CUPTI's new Profiling and Perfworks APIs. The PAPI CUDA component has been refactored to work equally for NVIDIA compute capabilities = 7.0.

A significant redesign of the "sde" component into two separate entities: (1) a standalone library "libsde" with a new API for software developers to define software-based metrics from within their applications, and (2) the PAPI "sde" component that enables monitoring of these new software-based events.

A new C++ interface for "libsde," which enables software developers to define software-defined events from within their C++ applications.

New Counter Analysis Toolkit (CAT) benchmarks and refinements of PAPI's CAT data analysis, specifically, the extension of PAPI's CAT with MPI and "distributed memory"-aware benchmarks and analysis to stress all cores per node.

Support for FUGAKU's A64FX Arm architecture, including monitoring capabilities for memory bandwidth and other node-wide metrics.

Acknowledgments: This release is the result of efforts from many people. The PAPI team would like to express special Thanks to Vince Weaver, Stephane Eranian (for libpfm4), William Cohen, Steve Kaufmann, Peinan Zhang, John Rodgers, Yamada Masahiko, Thomas Richter, and Phil Mucci.

The PAPI release can be downloaded from: https://icl.cs.utk.edu/papi/software.

Papers

Barry, D., A. Danalis, and H. Jagode, Effortless Monitoring of Arithmetic Intensity with PAPI's Counter Analysis Toolkit,” 13th International Workshop on Parallel Tools for High Performance Computing, Dresden, Germany, Springer International Publishing, September 2020.  (738.47 KB)
Jagode, H., A. Danalis, and D. Genet, Roadmap for Refactoring Classic PAPI to PAPI++: Part II: Formulation of Roadmap Based on Survey Results,” PAPI++ Working Notes, no. 2, ICL-UT-20-09: Innovative Computing Laboratory, University of Tennessee, July 2020.  (763.75 KB)
Jagode, H., A. Danalis, and J. Dongarra, Exa-PAPI: The Exascale Performance API with Modern C++ , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.  (556.78 KB)
Winkler, F., Redesigning PAPI’s High-Level API,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-03: University of Tennessee, February 2020.  (356.41 KB)
Jagode, H., A. Danalis, and J. Dongarra, Formulation of Requirements for New PAPI++ Software Package: Part I: Survey Results,” PAPI++ Working Notes, no. 1, ICL-UT-20-02: Innovative Computing Laboratory, University of Tennessee Knoxville, January 2020.  (1.49 MB)
Jagode, H., A. Danalis, H. Anzt, and J. Dongarra, PAPI Software-Defined Events for in-Depth Performance Analysis,” The International Journal of High Performance Computing Applications, vol. 33, issue 6, pp. 1113-1127, November 2019.  (442.39 KB)
Jagode, H., A. Danalis, and J. Dongarra, What it Takes to keep PAPI Instrumental for the HPC Community,” 1st Workshop on Sustainable Scientific Software (CW3S19), Collegeville, Minnesota, July 2019.  (50.57 KB)
Danalis, A., H. Jagode, T. Herault, P. Luszczek, and J. Dongarra, Software-Defined Events through PAPI,” 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Rio de Janeiro, Brazil, IEEE, May 2019. DOI: 10.1109/IPDPSW.2019.00069  (446.41 KB)
Danalis, A., H. Jagode, H. Hanumantharayappa, S. Ragate, and J. Dongarra, Counter Inspection Toolkit: Making Sense out of Hardware Performance Events,” 11th International Workshop on Parallel Tools for High Performance Computing, Dresden, Germany, Cham, Switzerland: Springer, February 2019. DOI: 10.1007/978-3-030-11987-4_2  (216.39 KB)
Haidar, A., H. Jagode, P. Vaccaro, A. YarKhan, S. Tomov, and J. Dongarra, Investigating Power Capping toward Energy-Efficient Scientific Applications,” Concurrency Computation: Practice and Experience, vol. 2018, issue e4485, pp. 1-14, April 2018. DOI: 10.1002/cpe.4485  (1.2 MB)
Parker, S., J. Mellor-Crummey, D. H. Ahn, H. Jagode, H. Brunst, S. Shende, A. D. Malony, D. DelSignore, R. Tschuter, R. Castain, et al., Performance Analysis and Debugging Tools at Scale,” Exascale Scientific Applications: Scalability and Performance Portability: Chapman & Hall / CRC Press, pp. 17-50, November 2017. DOI: 10.1201/b21930
Haidar, A., H. Jagode, A. YarKhan, P. Vaccaro, S. Tomov, and J. Dongarra, Power-aware Computing: Measurement, Control, and Performance Analysis for Intel Xeon Phi,” 2017 IEEE High Performance Extreme Computing Conference (HPEC'17), Best Paper Finalist, Waltham, MA, IEEE, September 2017. DOI: 10.1109/HPEC.2017.8091085  (908.84 KB)

Presentations

Jagode, H., A. Danalis, and J. Dongarra, Exa-PAPI: The Exascale Performance API with Modern C++ , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.  (556.78 KB)
Danalis, A., H. Jagode, and J. Dongarra, PAPI's new Software-Defined Events for in-depth Performance Analysis , Dresden, Germany, 13th Parallel Tools Workshop, September 2019.  (3.14 MB)
Danalis, A., H. Jagode, and J. Dongarra, Does your tool support PAPI SDEs yet? , Tahoe City, CA, 13th Scalable Tools Workshop, July 2019.  (3.09 MB)
Jagode, H., A. Danalis, and J. Dongarra, What it Takes to keep PAPI Instrumental for the HPC Community , Collegeville, MN, The 2019 Collegeville Workshop on Sustainable Scientific Software (CW3S19), July 2019.  (3.29 MB)
Danalis, A., H. Jagode, and J. Dongarra, Is your scheduling good? How would you know? , Bordeaux, France, 14th Scheduling for Large Scale Systems Workshop, June 2019.  (2.5 MB)
Danalis, A., H. Jagode, D. Barry, and J. Dongarra, Understanding Native Event Semantics , Knoxville, TN, 9th JLESC Workshop, April 2019.  (2.33 MB)
Jagode, H., A. Danalis, and J. Dongarra, PAPI's New Software-Defined Events for In-Depth Performance Analysis , Lyon, France, CCDSC 2018: Workshop on Clusters, Clouds, and Data for Scientific Computing, September 2018.
Danalis, A., H. Jagode, and J. Dongarra, Software-Defined Events through PAPI for In-Depth Analysis of Application Performance , Basel, Switzerland, 5th Platform for Advanced Scientific Computing Conference (PASC18), July 2018.
Danalis, A., H. Jagode, and J. Dongarra, PAPI: Counting outside the Box , Barcelona, Spain, 8th JLESC Meeting, April 2018.

ICL Team Members

Daniel Barry
Graduate Research Assistant
Giuseppe Congiu
Research Scientist I
Anthony Danalis
Research Assistant Professor
Jack Dongarra
Research Professor Emeritus
Heike Jagode
Research Assistant Professor
Exascale Computing Project

Exa-PAPI is part of ICL's involvement in the Exascale Computing Project (ECP). The ECP was established with the goals of maximizing the benefits of high-performance computing (HPC) for the United States and accelerating the development of a capable exascale computing ecosystem. Exascale refers to computing systems at least 50 times faster than the nation’s most powerful supercomputers in use today.

The ECP is a collaborative effort of two U.S. Department of Energy organizations – the Office of Science (DOE-SC) and the National Nuclear Security Administration (NNSA).