papiex - Command line/library utility to measure hardware performance counters with PAPI
This version of papiex is no longer supported directly by it's
authors, the University of Tennessee or former employees of
SiCortex.
papiex is a performance analysis tool designed to transparently and passively measure
the hardware performance counters of an application using PAPI. It uses Monitor
to effortlessly intercept process/thread creation/destruction. It measures the entire run
of an application. By default this includes all subprocesses. papiex's goal is to be a
Linux substitute for the perfex command found in SGI's Speedshop. papiex is fairly
simple to build, install and use. The most up to date documentation for monitor is always found in the man page.
Features
- No external dependencies other than Monitor and PAPI.
- Supports papiex_start()/papiex_stop() calipers in user code. man page.
- Can report all sorts of memory usage.
- Supports PAPI multiplexing.
- Supports automatic counting of useful available events for the architecture with a single flag (-a).
- Automatically detects threaded executables.
- Works for MPI and threaded-MPI executables.
- Has special support for MPICH, which avoids the need to link papiex to the MPICH library.
- Dumps aggregate statistics, such as mean/max/avg across threads and tasks.
- Works across variants of fork/exec and handles SIGINT/asserts/aborts properly.
- Can dump out shell arguments for those not wanting to use the papiex driver program.
- Supports counting native events (non-PAPI) and different counting domains.
- Architecture independent build and papiex-config driver. man page.
Download and Installation
- CVS is the best way to get the code.
- Those without CVS access (?!) can find the most recent release (0.99): papiex-0.99.tar.gz
- To build/install Papiex, please see the INSTALL file.
- To use Papiex, please see the man page.
- To see the ChangeLog, please see the file.
Examples
The best documentation is in the form of examples. This example ASSUMES you
have successfully built AND installed papiex AND that you're in the platform specific build directory.
First we run emacs and count Total Cycles and Total Instructions redirecting the output from stderr(default)
to a file. Next we run the pthreads test case and tell papiex to create files.
[mucci@localhost]$ papiex -e PAPI_TOT_CYC -e PAPI_TOT_INS emacs 2> sample.emacs
[mucci@localhost]$ tests/papiex -e PAPI_TOT_CYC -e PAPI_TOT_INS tests/pthreads
Here's the output: sample.emacs, sample.pthreads.1, sample.pthreads.2, sample.pthreads.3 and sample.memory.
papiex can automatically multiplex and count useful events available on your architecture. This
is similar in intent to perfex -a and hpmstat -a
[mucci@localhost]$ papiex -a find /usr 2> sample.find
For statistical relevance, you should make sure that the run is reasonably long.
Multithreaded executables are handled seamlessly. papiex creates an output file the
name cmd.papiex.host.pid.instance.
The user can prefix the output file name with -pprefix flag. As an example:
[mucci@localhost]$ papiex -pmystats_ ./thrspecific 2>sample.thrspecific
The stderr output contains the aggregate statistics across
all five threads of the executable. Individual per-thread statistics are placed in a directory
mystats_thrspecific.papiex.localhost.localdomain.4444/task_0
Here are the files:
thread_0.summary, thread_1.summary,
thread_2.summary, thread_3.summary,
thread_4.summary.
Now let's consider a more involved example with a threaded-MPI run.
[mucci@localhost]$ mpirun -np 4 papiex -f /tmp bin/mpich2-mpi-thrspecific 2>sample.mpich2-mpi-thrspecific
The -f flag instructs papiex to create all output files under /tmp.
The aggregate statistics across all tasks (which in turn are aggregated across all the threads
for the task) are written to stderr, and can be seen here.
The per-task and per-thread statistics are placed in:
/tmp/mpich2-mpi-thrspecific.papiex.localhost.localdomain.4613
Per-task summaries, which are averaged across all the threads of a task can be seen under this directory:
task_0.summary,
task_1.summary,
task_2.summary,
task_3.summary.
The directory also contains per-task directories, which contain per-thread numbers as shown in the
previous example.
Finally, let's consider how papiex makes using mpiP, a light-weight library for scalable
profiling of MPI calls, easy to use. Normally, mpiP needs to be linked into the target executable.
The papiex driver allows seamless deployment of mpiP on dynamically-linked executables.
Let's see this with an example:
[mucci@localhost]$ mpirun -np 4 papiex -e PAPI_L1_DCM -M bin/mpich2-simple-mpi 2> sample.mpich2-simple-mpi
In the example we instruct papiex to measure L1 data cache misses, and also do
MPI profiling with mpiP. The stderr output can be viewed in
sample.mpich2-simple-mpi. The mpiP is stored in
mpich2-simple-mpi.mpiP.localhost.localdomain.4862.1.
The PAPI task statistics are stored in: mpich2-simple-mpi.papiex.localhost.localdomain.4862
CVS Access
Currently, the best way to get papiex is to get it directly from CVS. You can access the CVS repository with your browser or use the anonymous CVS pserver. Just hit enter when asked for the password.
% setenv CVSROOT :pserver:anonymous@cvs.eecs.utk.edu:/cvs/homes/ospat
% cvs login
Password:
% cvs co papiex
Testing
The distribution includes a 'make test' phase. The current release has been tested on:
- MIPS64, MIPS32
- i686
- x86_64
- ia64
- PPC64, PPC32
Bug Reports
Bugs should be submitted to the PAPI Mailing List.
Authors
papiex was written by Philip J. Mucci of the Innovative Computing Laboratory and SiCortex Inc.. Major contributions and enhancements were made by Tushar Mohan, also of SiCortex Inc.
Copyright
This software is COMPLETELY OPEN SOURCE with an LGPL license. If you incorporate any portion of this software, I would appreciate an acknowledgement in the appropriate places. Should you find papiex useful, please considering making a contribution in the form of hardware, software or plain old cash.