PAPI Forum

by **cnavarrete** » Thu Nov 28, 2013 7:50 am

I'm trying to measure the memory bandwidth of an application running on a SandyBridge machine using the CAS counters:

Code: Select all: for (i=0; i<4; i++) { sprintf(sIMCCounter, "snbep_unc_imc%d::UNC_M_CAS_COUNT:RD:e=0:i=0:t=0", i); iError = PAPI_add_named_event(*pMaske, sIMCCounter); if (iError != PAPI_OK && iError != PAPI_ECNFLCT) // proper error handling } for (i=0; i<4; i++) { sprintf(sIMCCounter, "snbep_unc_imc%d::UNC_M_CAS_COUNT:WR:e=0:i=0:t=0", i); iError = PAPI_add_named_event(*pMaske, sIMCCounter); if (iError != PAPI_OK && iError != PAPI_ECNFLCT) // proper error handling }

According to the intel documentation, the memory bandwidth can be measures as:

Code: Select all: Memory Read BW [MBytes/s] = 1.0E-06*(snbep_unc_imc0::UNC_M_CAS_COUNT:RD + snbep_unc_imc1::UNC_M_CAS_COUNT:RD + snbep_unc_imc2::UNC_M_CAS_COUNT:RD + snbep_unc_imc3::UNC_M_CAS_COUNT:RD)*64.0/time Memory Write BW [MBytes/s] = 1.0E-06*(snbep_unc_imc0::UNC_M_CAS_COUNT:WR + snbep_unc_imc1::UNC_M_CAS_COUNT:WR + snbep_unc_imc2::UNC_M_CAS_COUNT:WR + snbep_unc_imc3::UNC_M_CAS_COUNT:WR)*64.0/time Memory BW [MBytes/s] = Memory Read BW [MBytes/s] + Memory Write BW [MBytes/s]

Using this, I get an enormous memory bandwidth that can not really be.
I validated the results with the ones obtained with likwid:

Code: Select all: +-----------------------+-------------+ | Event | core 16 | +-----------------------+-------------+ | CAS_COUNT_RD | 3.17476e+07 | | CAS_COUNT_WR | 4.49026e+08 | | CAS_COUNT_RD | 5.67655e+07 | | CAS_COUNT_WR | 4.48882e+08 | | CAS_COUNT_RD | 5.8196e+07 | | CAS_COUNT_WR | 4.49182e+08 | | CAS_COUNT_RD | 5.76525e+07 | | CAS_COUNT_WR | 4.48991e+08 | +-----------------------+-------------+ | Memory Read BW [MBytes/s] | 469.748 | | Memory Write BW [MBytes/s] | 4128.49 | | Memory BW [MBytes/s] | 4598.24 | +--------------------------------+------------+

In the case of PAPI I get:

Code: Select all: CAS0_R=15637948602, CAS1_R=15652951072, CAS2_R=15655992784, CAS3_R=15619746407 CAS0_W=15663920078, CAS1_W=15654867324, CAS2_W=15671126429, CAS3_W=15706756602

Strange thing here is that CAS_R counters are 300 times bigger than the ones obtained with PAPI and CAS_W are almost 40 times bigger. CAS_R and CAS_W are in the same range bringing a "constant" RD_Bandwidth and RW_Bandwidth (it should be in me example).

The code I'm profiling is:

Code: Select all: for (j=0; j<TIMES; j++) { pValues = (long int *) malloc (liSize * sizeof(long int)); if (!pValues) // Error handling // kernel W for (i=0; i<liSize; i++) pValues[i] = i; free((void *) pValues); pValues = NULL; } //for

Can anybody explain what is happening and how can I measure the memory bandwidth of an application using PAPI?

Thanks in advance!!
Carmen

PAPI Forum

Measuring memory bandwidth

Measuring memory bandwidth

Who is online

PAPI