I have a new AMD Magny-Cours processor and am encountering problems using PAPI while pinning my test program to certain domains. Using numactl I pin the program to execute on one domain and force it to allocate memory on another, then tell PAPI to measure the CPU to DRAM requests for a specific NUMA domain.
This is my application, and its output when I run it with numactl:
- Code: Select all
$ cat sandbox.c
#include <stdio.h>
#include <papi.h>
#include <stdlib.h> // exit
#include <string.h>
#include <errno.h>
enum
{
//EVENT = PAPI_TOT_INS, // preset; total instructions
//EVENT = PAPI_TOT_CYC, // preset; total cycles
//EVENT = 0x40040062, // native; CPU_TO_DRAM_REQUESTS_TO_TARGET_NODE, all
//EVENT = 0x40002062, // native; CPU_TO_DRAM_REQUESTS_TO_TARGET_NODE, to n3
//EVENT = 0x40040068, // native; CPU_COMMAND_LATENCY_TO_TARGET_NODE_0_3_4_7, all
EVENT = 0x40040061, // native; MEMORY_CONTROLLER_REQUESTS, all
DATA_SIZE = (128 << 20)
};
#define PAPI_WRAP(x) \
do { \
int error = (x); \
if( error != PAPI_OK ) \
{ \
fprintf( stderr, "papi error: %s\n", PAPI_strerror(error) ); \
if( error == PAPI_ESYS ) /* manual says to check errno */ \
fprintf( stderr, " errno: %s\n", strerror(errno) ); \
exit(error); \
} \
} while( 0 )
typedef int perfctr_eventset_t;
typedef long long perfctr_values_t;
typedef int papi_eventcode_t;
int main( void )
{
perfctr_eventset_t set = PAPI_NULL;
perfctr_values_t read[1];
PAPI_set_debug( PAPI_VERB_ESTOP ); // make papi handle the errors and stop the program
if( PAPI_library_init( PAPI_VER_CURRENT ) != PAPI_VER_CURRENT )
{
fprintf( stderr, "error: library init\n" );
return -1;
}
printf( "papi initialized\n" );
printf( "testing for event\n" );
PAPI_WRAP( PAPI_query_event( EVENT ) );
printf( "library says we're okay\n" );
PAPI_WRAP( PAPI_create_eventset( &set ) );
printf( "eventset initialized\n" );
PAPI_WRAP( PAPI_add_event( set, EVENT ) );
printf( "event added to set\n" );
PAPI_WRAP( PAPI_start( set ) ); // clears counter
unsigned long *data = malloc( DATA_SIZE );
//memset( data, 0xDEADBEEF, DATA_SIZE );
unsigned long i;
for( i = 0; i < (DATA_SIZE/sizeof(unsigned long)); i++ )
data[ i ] = 0xdeadbeefdeadbeef;
free(data);
PAPI_WRAP( PAPI_stop( set, read ) );
printf( "value of counter: %lld\n", *read );
PAPI_WRAP( PAPI_cleanup_eventset( set ) );
PAPI_WRAP( PAPI_destroy_eventset( &set ) );
return 0;
}
$ gcc -O0 -Wall -ggdb -D_GNU_SOURCE sandbox.c -o sandbox -lpapi
$ numactl --cpubind=1 --membind=2 ./sandbox
papi initialized
testing for event
library says we're okay
eventset initialized
event added to set
PAPI Error: vperfctr_control() returned < 0.
papi error: PAPI_ESYS
errno: Invalid argument
$ numactl --cpubind=0 --membind=3 ./sandbox
papi initialized
testing for event
library says we're okay
eventset initialized
event added to set
value of counter: 239222667
My workstation has 4 NUMA domains, 0-3. Each domain has 6 CPU cores and one MMU. This program initializes PAPI with a single event to count and only runs with one thread. It then allocates some memory, touches each byte then frees it and reads the counter. I determined I can only pin it to CPU cores 0 and 1, which reside on NUMA domains 0 and 3, respectively. The same problem occurs with other NUMA-related events, such as CPU_COMMAND_LATENCY_TO_TARGET_NODE_0_3_4_7, and MEMORY_CONTROLLER_REQUESTS. Binding to NUMA domains 1 and 2 fail consistently with many of these events, no matter which core I select, and also no matter which NUMA domain's memory to use for allocation. Preset events work as expected, and other non-NUMA events work fine as well.
Here is more information about my system:
- Code: Select all
$ papi_version
PAPI Version: 4.1.0.0
$ uname -r
2.6.27
$ getenforce # SELinux
Permissive
$ cat /sys/devices/system/node/node[0-3]/cpulist
0,2,4,6,8,10
12,14,16,18,20,22
13,15,17,19,21,23
1,3,5,7,9,11
$ lsmod | grep perfctr
perfctr 141112 0
$ perfex -i
PerfCtr Info:
abi_version 0x05020501
driver_version 2.6.41
cpu_type 19 (AMD Family 10h)
cpu_features 0x7 (rdpmc,rdtsc,pcint)
cpu_khz 2199988
tsc_to_cpu_mult 1
cpu_nrctrs 4
cpus [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23], total: 24
cpus_forbidden [], total: 0
$ pwd
/tmp/linux-2.6.27
$ grep PERFCTR .config
CONFIG_PERFCTR=m
CONFIG_KPERFCTR=y
# CONFIG_PERFCTR_DEBUG is not set
# CONFIG_PERFCTR_INIT_TESTS is not set
CONFIG_PERFCTR_VIRTUAL=y
# CONFIG_PERFCTR_GLOBAL is not set
CONFIG_PERFCTR_INTERRUPT_SUPPORT=y
CONFIG_PERFCTR_CPUS_FORBIDDEN_MASK=y
ACPI is indeed enabled in my kernel's configuraiton file, so the SRAT table is available enabling the kernel to make an accurate mapping of the physical NUMA layout in its state, so I'm not sure what else could be causing this. Counters are available on each core, correct? Not on just two cores in separate domains or something funky?
Any help is appreciated; thank you in advance for suggestions.
-Alex
