Publications
Accurate Cache and TLB Characterization Using Hardware Counters,”
International Conference on Computational Science (ICCS 2004), Krakow, Poland, Springer, June 2004.
(167.1 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Automatic Blocking of QR and LU Factorizations for Locality,”
2nd ACM SIGPLAN Workshop on Memory System Performance (MSP 2004), Washington, DC, ACM, June 2004.
(212.77 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Experiences and Lessons Learned with a Portable Interface to Hardware Performance Counters,”
PADTAD Workshop, IPDPS 2003, Nice, France, IEEE, April 2003.
(432.57 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Comparison of Search Heuristics for Empirical Code Optimization,”
The 3rd international Workshop on Automatic Performance Tuning, Tsukuba, Japan, October 2008.
(772.48 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
The Design of an Auto-tuning I/O Framework on Cray XT5 System,”
Cray Users Group Conference (CUG'11) (Best Paper Finalist), Fairbanks, Alaska, May 2011.
(459.57 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
The Impact of Paravirtualized Memory Hierarchy on Linear Algebra Computational Kernels and Software,”
ACM/IEEE International Symposium on High Performance Distributed Computing, Boston, MA., June 2008.
(403.89 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Autotuned Parallel I/O for Highly Scalable Biosequence Analysis,”
TeraGrid'11, Salt Lake City, Utah, July 2011.
(275.34 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Collecting Performance Data with PAPI-C,”
Tools for High Performance Computing 2009, 3rd Parallel Tools Workshop, Dresden, Germany, Springer Berlin / Heidelberg, pp. 157-173, May 2010.
(4.45 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Paravirtualization Effect on Single- and Multi-threaded Memory-Intensive Linear Algebra Software,”
Cluster Computing Journal: Special Issue on High Performance Distributed Computing, vol. 12, no. 2: Springer Netherlands, pp. 101-122, 00 2009.
(451.07 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
PERI Auto-tuning,”
Proc. SciDAC 2008, vol. 125, Seatlle, Washington, Journal of Physics, January 2008.
(873.75 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Self Adapting Numerical Software SANS Effort,”
IBM Journal of Research and Development, vol. 50, no. 2/3, pp. 223-238, January 2006.
(357.53 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Power-aware Computing on GPGPUs
, Gatlinburg, TN, Fall Creek Falls Conference, Poster, September 2011.
(2.89 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
ATLAS on the BlueGene/L – Preliminary Results,”
ICL Technical Report, no. ICL-UT-06-10, January 2006.
(46.19 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Automated Empirical Tuning of a Multiresolution Analysis Kernel,”
ICL Technical Report, no. ICL-UT-07-01, pp. 10, January 2007.
(120.7 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
An Effective Empirical Search Method for Automatic Software Tuning,”
ICL Technical Report, no. ICL-UT-05-02, January 2005.
(74.66 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Empirical Tuning of a Multiresolution Analysis Kernel using a Specialized Code Generator,”
ICL Technical Report, no. ICL-UT-07-02, January 2007.
(123.34 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)