Analysis and Optimization of Yee_Bench using Hardware Performance Counters

Ulf Andersson; Phil Mucci

Submitted by webmaster on Thu, 03/07/2013 - 15:16

Title	Analysis and Optimization of Yee_Bench using Hardware Performance Counters
Publication Type	Conference Proceedings
Year of Publication	2005
Authors	Andersson, U., and P. Mucci
Conference Name	Proceedings of Parallel Computing 2005 (ParCo)
Date Published	2005-01
Conference Location	Malaga, Spain
Keywords	papi
Abstract	In this paper, we report on our analysis and optimization of a serial Fortran 90 benchmark called Yee bench. This benchmark has been run on a variety of architectures and its performance is reasonably well understood. However, on AMD Opteron based machines, we found unexpected dips in the delivered MFLOPS of the code for a seemingly random set of problem sizes. Through the use of the Opteron’s on-chip hardware performance counters andPapiEx, aPAPI based tool, we discovered that these drops were directly related to high L1 cache miss rates for these problem sizes. The high miss rates could be attributed to the fact that in the two core regions of the code we have references to three dynamically allocated arrays which compete for the same set in the Opteron’s 2-way set associative cache. We validated this conclusion by accurately predicting those problem sizes that exhibit this problem. We were able to alleviate these performance anomalies using variable intra-array padding to effectively accomplish inter-array padding. We conclude with some comments on the general applicability of this method as well how one might improving the implementation of the Fortran 90ALLOCATE intrinsic to handle this case. 1.

Project Tags:

papi

File:

icl-utk-256-2005.pdf

External Publication Flag: