Page 1 of 1
counting L1 misses for simple example

Posted:
Sat Apr 03, 2010 4:14 am
by kodgireabhijeet
Hello All,
I am working on intel xeon architecture with L1 cache .
Cache Information.
L1 Data Cache:
Total size: 32 KB
Line size: 64 B
Number of Lines: 512
Associativity: 8
I am running small program to calculate number of L1 misses.
And following is the code which accesses the data.
int *temp = (int*) malloc(1024*1024*sizeof(int));
if ((retval = PAPI_start(EventSet)) != PAPI_OK)
test_fail(__FILE__, __LINE__, "PAPI_start", retval);
for(i=0;i<8192;i++){
temp[i] = 10;
}
if ((retval = PAPI_read(EventSet,&values[0])) != PAPI_OK)
test_fail(__FILE__, __LINE__, "PAPI_read", retval);
I am expecting 512 L1 cache misses, its giving around 150 l1 misses that is also fluctuating for evry execution.
Am I missing anything? Any help would be appreciated.
Re: counting L1 misses for simple example

Posted:
Mon Apr 05, 2010 5:55 pm
by vweaver1
kodgireabhijeet wrote:I am working on intel xeon architecture with L1 cache .
what type of intel xeon is this exactly?
Also, what compiler and compiler options did you use to compile your example?
Re: counting L1 misses for simple example

Posted:
Wed Apr 07, 2010 10:27 pm
by kodgireabhijeet
Here is more information about my machine.
akodgire@timon ~/papi $ uname -a
Linux timon 2.6.32-gentoo-r5 #1 SMP Wed Feb 17 12:55:37 EST 2010 x86_64 Intel(R) Xeon(R) CPU X5365 @ 3.00GHz GenuineIntel GNU/Linux
I am using gcc compiler and -g option only. I am not using any optimizing option for compiler.
Let me know if anything else is needed.
Thanks,
Abhijeet
Re: counting L1 misses for simple example

Posted:
Tue Apr 13, 2010 9:09 am
by vweaver1
hello
I've been able to re-produce your problem on a core2 machine I have here. For some reason the L1 dcache misses are always off, by a large number (the variation run to run is normal). I suspect this has to do with the advanced prefetching into L1 that modern core2s do. I tried to verify this by turning off all prefetching, but unfortunately this did not change things. I'm still investigating to see if I can track down the source of the problem.
Re: counting L1 misses for simple example

Posted:
Tue Apr 13, 2010 1:29 pm
by kodgireabhijeet
hello,
Thanks for looking into it. I have considered the hardware prefetching mechanism but couldn't find how aggressively it prefetch the data.
And one more interesting observation, I have added the initialization code just before PAPI_Start call. I have initialized the aray elements and then accessed them while PAPI_Counters are active. When I initialized the array, that means all array elements are in cache for the next access. I was expecting less number of cache miss when I reaccess those elements, but surprisingly cache miss counter increased by almost 5 times. This result left me confused again, and could not apply hardware prefetchers theory here.
Code snippet:
int *temp = (int*) malloc(1024*1024*sizeof(int));
for(i=0;i<8192;i++){ // Added the initialization of array elements
temp[i] = 10;
}
if ((retval = PAPI_start(EventSet)) != PAPI_OK)
test_fail(__FILE__, __LINE__, "PAPI_start", retval);
for(i=0;i<8192;i++){
temp[i] = 10;
}
if ((retval = PAPI_read(EventSet,&values[0])) != PAPI_OK)
test_fail(__FILE__, __LINE__, "PAPI_read", retval);
Re: counting L1 misses for simple example

Posted:
Mon May 03, 2010 8:24 am
by Dmitry
Do you set "the thread affinity" by the testing?