Publications
Autotuned Parallel I/O for Highly Scalable Biosequence Analysis,”
TeraGrid'11, Salt Lake City, Utah, July 2011.
(275.34 KB)
“The Design of an Auto-tuning I/O Framework on Cray XT5 System,”
Cray Users Group Conference (CUG'11) (Best Paper Finalist), Fairbanks, Alaska, May 2011.
(459.57 KB)
“Power-aware Computing on GPGPUs
, Gatlinburg, TN, Fall Creek Falls Conference, Poster, September 2011.
(2.89 MB)
Collecting Performance Data with PAPI-C,”
Tools for High Performance Computing 2009, 3rd Parallel Tools Workshop, Dresden, Germany, Springer Berlin / Heidelberg, pp. 157-173, May 2010.
(4.45 MB)
“Paravirtualization Effect on Single- and Multi-threaded Memory-Intensive Linear Algebra Software,”
Cluster Computing Journal: Special Issue on High Performance Distributed Computing, vol. 12, no. 2: Springer Netherlands, pp. 101-122, 00 2009.
(451.07 KB)
“A Comparison of Search Heuristics for Empirical Code Optimization,”
The 3rd international Workshop on Automatic Performance Tuning, Tsukuba, Japan, October 2008.
(772.48 KB)
“The Impact of Paravirtualized Memory Hierarchy on Linear Algebra Computational Kernels and Software,”
ACM/IEEE International Symposium on High Performance Distributed Computing, Boston, MA., June 2008.
(403.89 KB)
“PERI Auto-tuning,”
Proc. SciDAC 2008, vol. 125, Seatlle, Washington, Journal of Physics, January 2008.
(873.75 KB)
“Automated Empirical Tuning of a Multiresolution Analysis Kernel,”
ICL Technical Report, no. ICL-UT-07-01, pp. 10, January 2007.
(120.7 KB)
“Empirical Tuning of a Multiresolution Analysis Kernel using a Specialized Code Generator,”
ICL Technical Report, no. ICL-UT-07-02, January 2007.
(123.34 KB)
“ATLAS on the BlueGene/L – Preliminary Results,”
ICL Technical Report, no. ICL-UT-06-10, January 2006.
(46.19 KB)
“Self Adapting Numerical Software SANS Effort,”
IBM Journal of Research and Development, vol. 50, no. 2/3, pp. 223-238, January 2006.
(357.53 KB)
“An Effective Empirical Search Method for Automatic Software Tuning,”
ICL Technical Report, no. ICL-UT-05-02, January 2005.
(74.66 KB)
“Accurate Cache and TLB Characterization Using Hardware Counters,”
International Conference on Computational Science (ICCS 2004), Krakow, Poland, Springer, June 2004.
(167.1 KB)
“Automatic Blocking of QR and LU Factorizations for Locality,”
2nd ACM SIGPLAN Workshop on Memory System Performance (MSP 2004), Washington, DC, ACM, June 2004.
(212.77 KB)
“Experiences and Lessons Learned with a Portable Interface to Hardware Performance Counters,”
PADTAD Workshop, IPDPS 2003, Nice, France, IEEE, April 2003.
(432.57 KB)
“