Hi all,
I am new to PAPI and I am wondering if it is possible to collect counters for an OpenCL kernel running with the Intel Runtime library for CPU.
I am starting the counter collection just before the clEnqueNDRangeKernel function call and terminating after the OpenCL queue has been flushed.
At the moment a simple FLOP computation gives me wrong and inconsistent results across repetitions.
Has anybody successfully set-up PAPI for OpenCL on Intel ?
Many thanks.
