Page 1 of 1
Can't read L2_DCA and L2_DCM

Posted:
Fri May 25, 2012 4:56 am
by Korso10
I'm trying to measure PAPI_L2_DCA and PAPI_L2_DCM to calculate L2 Hit ratio. If I measure both counters in the same execution, DCA value is 0. If I change PAPI_L2_DCM to another counter (like PAPI_TOT_CYC) L2_DCA hit value is "normal". I've had no problems with L1 or L3 counters, and I tried with native and no native interfaces, but the results are the same.
My specs are:
i7 860 (Nehalem)
Ubuntu 10.04 x86_64
¿Is this a bug? ¿There's a solution?
Thanks in advance
Re: Can't read L2_DCA and L2_DCM

Posted:
Mon Jun 04, 2012 3:30 pm
by bravegag
Hi,
I'm having the same problem. I'm running on:
Linux bravegag-MacBookPro 3.0.0-21-generic #35-Ubuntu SMP Fri May 25 17:57:41 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Intel 2 Core Duo T9900
The two PAPI_L1_DCA and PAPI_L1_DCM seem to work fine though.
Is this a general problem? known bug? I'm actually doing one full measurement pass for the L1 and another separate for the L2 but still doesn't work, always get the error "PAPI_add_events(...) error (1)" and no text.
UPDATE: papi_avail utility shows these are available to my environment:
bravegag@bravegag-MacBookPro:~/code/fastcode_project/build$ papi_avail | grep DCA
PAPI_L1_DCA 0x80000040 Yes No Level 1 data cache accesses
PAPI_L2_DCA 0x80000041 Yes Yes Level 2 data cache accesses
PAPI_L3_DCA 0x80000042 No No Level 3 data cache accesses
bravegag@bravegag-MacBookPro:~/code/fastcode_project/build$ papi_avail | grep DCM
PAPI_L1_DCM 0x80000000 Yes No Level 1 data cache misses
PAPI_L2_DCM 0x80000002 Yes Yes Level 2 data cache misses
PAPI_L3_DCM 0x80000004 No No Level 3 data cache misses
bravegag@bravegag-MacBookPro:~/code/fastcode_project/build$
TIA,
Best regards,
Giovanni
Re: Can't read L2_DCA and L2_DCM

Posted:
Mon Jun 04, 2012 9:01 pm
by danterpstra
It looks like both of the L2 events for your machine are derived events. That's indicated by the "Yes Yes" in papi_avail. The first "Yes" says the event is available, the second says that it's derived, or composed of more than one native event. That means you need at least 4 programmable counters to measure these two events, and you only have 3. You can use papi_avail -e to get detailed information on the native events in each of these PRESET events. You might then be able to measure 3 native events at a time and use the common event as a scaling factor. Sorry this is so hard. Blame Intel

Re: Can't read L2_DCA and L2_DCM

Posted:
Tue Jun 05, 2012 9:21 am
by Korso10
Thank for the answer danterpstra.
I looked in my machine and in my case only L2_DCM is derived:
- Code: Select all
mpedrero@huracan:~$ papi_avail | grep L2_DC*
PAPI_L2_DCM 0x80000002 Yes Yes Level 2 data cache misses
PAPI_L2_DCH 0x8000003f Yes Yes Level 2 data cache hits
PAPI_L2_DCA 0x80000041 Yes No Level 2 data cache accesses
PAPI_L2_DCR 0x80000044 Yes No Level 2 data cache reads
PAPI_L2_DCW 0x80000047 Yes No Level 2 data cache writes
mpedrero@huracan:~$
And it seems that my processor has 16 counters:
- Code: Select all
mpedrero@huracan:~$ papi_avail
Available events and hardware information.
--------------------------------------------------------------------------------
PAPI Version : 4.1.3.0
Vendor string and code : GenuineIntel (1)
Model string and code : Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz (30)
CPU Revision : 5.000000
CPUID Info : Family: 6 Model: 30 Stepping: 5
CPU Megahertz : 2808.604004
CPU Clock Megahertz : 2808
Hdw Threads per core : 1
Cores per Socket : 4
NUMA Nodes : 1
CPU's per Node : 4
Total CPU's : 4
Number Hardware Counters : 16
Max Multiplex Counters : 512
--------------------------------------------------------------------------------
According to papi_avail -e I only need 3 of them:
- Code: Select all
Event name: PAPI_L2_DCA
Event Code: 0x80000041
Number of Native Events: 1
Short Description: |L2D cache accesses|
Long Description: |Level 2 data cache accesses|
Developer's Notes: ||
Derived Type: |NOT_DERIVED|
Postfix Processing String: ||
Native Code[0]: 0x40002028 |L1D:REPL|
Number of Register Values: 3
Register[ 0]: 0x00000051 |Event Code|
Register[ 1]: 0x00000051 |Event Code|
Register[ 2]: 0x00000001 |Unit Mask|
Native Event Description: |L1D cache, masks:L1 data cache lines allocated|
- Code: Select all
The following correspond to fields in the PAPI_event_info_t structure.
Event name: PAPI_L2_DCM
Event Code: 0x80000002
Number of Native Events: 2
Short Description: |L2D cache misses|
Long Description: |Level 2 data cache misses|
Developer's Notes: ||
Derived Type: |DERIVED_ADD|
Postfix Processing String: ||
Native Code[0]: 0x40010037 |L2_RQSTS:LD_MISS|
Number of Register Values: 5
Register[ 0]: 0x00000024 |Event Code|
Register[ 1]: 0x00000024 |Event Code|
Register[ 2]: 0x00000024 |Event Code|
Register[ 3]: 0x00000024 |Event Code|
Register[ 4]: 0x00000002 |Unit Mask|
Native Event Description: |L2 requests, masks:L2 load misses|
Native Code[1]: 0x40400037 |L2_RQSTS:RFO_MISS|
Number of Register Values: 5
Register[ 0]: 0x00000024 |Event Code|
Register[ 1]: 0x00000024 |Event Code|
Register[ 2]: 0x00000024 |Event Code|
Register[ 3]: 0x00000024 |Event Code|
Register[ 4]: 0x00000008 |Unit Mask|
Native Event Description: |L2 requests, masks:L2 RFO misses|
Programmable counters are not the "hardware counters" that shows papi_avail?
I'll try your solution anyway and post the results
Thank you again
Re: Can't read L2_DCA and L2_DCM

Posted:
Tue Jun 05, 2012 11:09 am
by danterpstra
Sorry. I was looking at DCM and DCH when I said both were derived. You are right; these two events should be countable in 3 counter registers.
However, the Intel i7 series has 4 programmable counters and 3 fixed counters, not 16 as reported by PAPI. I'm not sure why that's happening, unless it's reporting 4 programmable counters for each of 4 cores. Still confusing.
I don't understand why these two events can't be counted together. It appears as if there is no counter assignment conflict.
Have you tried counting each one independently just to make sure they both work?
Re: Can't read L2_DCA and L2_DCM

Posted:
Wed Jun 06, 2012 5:44 am
by Korso10
Yes, I tried a single matrix access code, but the results are odd for me. I access the matrix normally, so hit ratio should be near 100%. In L3 I obtain a hit ratio about 99%, but in L2 I obtain a ratio <50%. I tested the code in another processor (Intel(R) Xeon(R) CPU X7550) and the results are nearly the same. I expect a L2 hit ratio >90% so I'm not sure if it is measuring correctly.
My code:
- Code: Select all
#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include <signal.h>
#define F 15000
#define C 10000
#include "papi.h"
#define NUM_EVENTS 2
int Events[NUM_EVENTS] = {PAPI_TOT_CYC, PAPI_L2_DCM};
long long values[NUM_EVENTS];
long long start_usec, end_usec, start_v_usec, end_v_usec, start_cycles, end_cycles;
int EventSet = PAPI_NULL;
int num_counters;
double matrix[F][C];
const PAPI_hw_info_t *hwinfo = NULL;
int main(int argc, char* argv[])
{
int n,i,j;
if ((n=PAPI_library_init(PAPI_VER_CURRENT)) != PAPI_VER_CURRENT) {
printf("\n Papi ver current (%d) other than %d \n", n,PAPI_VER_CURRENT);
}
/* Gets the starting time in microseconds */
if ((hwinfo = PAPI_get_hardware_info()) == NULL) {
printf("\n Papi: Error PAPI_get_hardware_info null\n");
}
else {
printf("\n%d CPU at %f Mhz.\n",hwinfo->totalcpus,hwinfo->mhz);
}
/* Start counting events */
if (n=PAPI_start_counters(Events, NUM_EVENTS) != PAPI_OK)
printf("\n Error %d: PAPI_start_counters\n",n);
// CODIGO A MEDIR
for(i=0;i<F;i++){
for(j=0;j<C;j++){
matrix[i][j] = 3.14;
}
}
// FIN DE CODIGO A MEDIR
if (n=PAPI_stop_counters(values, NUM_EVENTS) != PAPI_OK)
printf("\n Error %d : PAPI_stop_counters\n", n);
printf("\n PAPI: cycles=%lld l2misses=%lld\n", values[0], values[1]);
}
I compile with
- Code: Select all
gcc -lpapi -O0 -o papiexample papiexample.c
And I obtain:
DCA: 83973
DCM: 53502
Re: Can't read L2_DCA and L2_DCM

Posted:
Mon Jul 02, 2012 12:51 pm
by sanath
Hi korso,
Assuming your OS is Linux, you can use "perf" utility as a 3rd method to check, without having to modify source code. Simply run, on the command line:
> sudo perf stat -e rXXXX,rYYYY,rZZZZ,... ./<your_prog> <arg1> <arg2> ...
where the rXXXX etc are the hex codes formed by Umask and EventCode of relevant cache events for your processor (Core i7) nehalem. Or, have you already done it?
(I posted this reply to your post in the Intel forum too)
Sanath