I have a Nehalem CPU and I would like to count the FLOPs that my code executes. My code comprises a for loop with only double precision operations, here:
- Code: Select all
#define INDEX 10
unsigned int Events[2] = {PAPI_SP_OPS,PAPI_DP_OPS};
long long values[2];
/* Initialize the Matrix here */
if(PAPI_start_counters((int*)Events,2) != PAPI_OK)
printf("ERROR at init.");
/* Matrix-Matrix multiply */
for ( j = 0; j < INDEX; j++ )
for ( k = 0; k < INDEX; k++ )
mresult[k][j] = mresult[k][j]/2;
if(PAPI_stop_counters(values,2)!= PAPI_OK)
printf("ERROR at end.");
printf( "\n \n single precision: %lld double precision: %lld \n \n", values[0],values[1] );
When I compile it with the -O2 flag, I get the following,
single precision: 0 double precision: 100
which is what I expected.
When I compile with the -O3 flag, I get the following,
single precision: 150 double precision: 100
I know I should only be looking at the PAPI_DP_OPS value, but I am curious to know why exactly PAPI_SP_OPS is being incremented when I vectorize the for-loop due to the -O3 flag.
