Publications
Power-aware Computing on GPGPUs
, Gatlinburg, TN, Fall Creek Falls Conference, Poster, September 2011.
(2.89 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Power-aware Computing: Measurement, Control, and Performance Analysis for Intel Xeon Phi,”
2017 IEEE High Performance Extreme Computing Conference (HPEC'17), Best Paper Finalist, Waltham, MA, IEEE, September 2017.
(908.84 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Power Profiling of Cholesky and QR Factorizations on Distributed Memory Systems,”
Third International Conference on Energy-Aware High Performance Computing, Hamburg, Germany, September 2012.
(290.27 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Power Monitoring with PAPI for Extreme Scale Architectures and Dataflow-based Programming Models,”
2014 IEEE International Conference on Cluster Computing, no. ICL-UT-14-04, Madrid, Spain, IEEE, September 2014.
(3.45 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Power Management and Event Verification in PAPI,”
Tools for High Performance Computing 2015: Proceedings of the 9th International Workshop on Parallel Tools for High Performance Computing, September 2015, Dresden, Germany, Dresden, Germany, Springer International Publishing, pp. pp. 41-51, 2016.
(565.14 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Power Aware Computing on GPUs,”
SAAHPC '12 (Best Paper Award), Argonne, IL, July 2012.
(658.06 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Post-failure recovery of MPI communication capability: Design and rationale,”
International Journal of High Performance Computing Applications, vol. 27, issue 3, pp. 244 - 254, January 2013.
(285.77 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Porting the PLASMA Numerical Library to the OpenMP Standard,”
International Journal of Parallel Programming, June 2016.
(1.66 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Porting Sparse Linear Algebra to Intel GPUs,”
Euro-Par 2021: Parallel Processing Workshops, vol. 13098, Lisbon, Portugal, Springer International Publishing, pp. 57 - 68, June 2022.
“Portable Representation of Internet Content Channels in I2-DSI,”
4th Intl. Web Caching Workshop, San Diego, CA, March 1999.
“A Portable Programming Interface for Performance Evaluation on Modern Processors,”
University of Tennessee Computer Science Technical Report, UT-CS-00-444, July 2000.
(655.17 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Portable Programming Interface for Performance Evaluation on Modern Processors,”
The International Journal of High Performance Computing Applications, vol. 14, no. 3, pp. 189-204, September 2000.
(655.17 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi,”
PPAM 2013, Warsaw, Poland, September 2013.
(284.97 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
POMPEI: Programming with OpenMP4 for Exascale Investigations,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-17-09: University of Tennessee, December 2017.
(1.1 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Polynomial Acceleration of Optimised Multi-grid Smoothers; Basic Theory,”
ICL Technical Report, vol. 156, no. ICL-UT-02-03, January 2002.
(100.66 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
PMIx: Process Management for Exascale Environments,”
Proceedings of the 24th European MPI Users' Group Meeting, New York, NY, USA, ACM, pp. 14:1–14:10, 2017.
“PMIx: Process Management for Exascale Environments,”
Parallel Computing, vol. 79, pp. 9–29, January 2018.
“The PlayStation 3 for High Performance Scientific Computing,”
University of Tennessee Computer Science Technical Report, no. UT-CS-08-608, January 2008.
(2.45 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
The PlayStation 3 for High Performance Scientific Computing,”
Computing in Science and Engineering, pp. 80-83, January 2008.
(2.45 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
PLASMA: Parallel Linear Algebra Software for Multicore Using OpenMP,”
ACM Transactions on Mathematical Software, vol. 45, issue 2, June 2019.
(7.5 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
The PLASMA Library on CORAL Systems and Beyond (Poster)
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(550.86 KB)
![application/pdf](/modules/file/icons/application-pdf.png)
PLASMA 17.1 Functionality Report,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-17-10: University of Tennessee, June 2017.
(1.8 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
PLASMA 17 Performance Report,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-17-11: University of Tennessee, June 2017.
(7.57 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Plan B: Interruption of Ongoing MPI Operations to Support Failure Recovery,”
22nd European MPI Users' Group Meeting, Bordeaux, France, ACM, September 2015.
(543.32 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
PERI Auto-tuning,”
Proc. SciDAC 2008, vol. 125, Seatlle, Washington, Journal of Physics, January 2008.
(873.75 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance Tuning SLATE,”
SLATE Working Notes, no. 14, ICL-UT-20-01: Innovative Computing Laboratory, University of Tennessee, January 2020.
(1.29 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs,”
International Conference on Computational Science (ICCS'16), San Diego, CA, June 2016.
(626.21 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance Profiling Overhead Compensation for MPI Programs,”
In Proc. of the 12th European Parallel Virtual Machine and Message Passing Interface Conference: Springer LNCS, September 2005.
(220.26 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance Profiling and Analysis of DoD Applications using PAPI and TAU,”
Proceedings of DoD HPCMP UGC 2005, Nashville, TN, IEEE, June 2005.
(322.56 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance Portability of a GPU Enabled Factorization with the DAGuE Framework,”
IEEE Cluster: workshop on Parallel Programming on Accelerator Clusters (PPAC), June 2011.
(290.98 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Performance Oriented Migration Framework for the Grid,”
Proceedings of the 3rd International Symposium on Cluster Computing and the Grid, Tokyo, Japan, pp. 130-137, May 2003.
(113.6 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs,”
Concurrency and Computation: Practice and Experience, vol. 28, issue 12, pp. 3447 - 3465, May 2016.
(3.21 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance Optimization and Modeling of Blocked Sparse Kernels,”
ICL Technical Report, no. ICL-UT-04-05, 00 2004.
(229.58 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),”
University of Tennessee Computer Science Dept. Technical Report CS-89-85, 00 2007.
(6.42 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),”
University of Tennessee Computer Science Department Technical Report, no. CS-89-85, January 2000.
(354.1 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance of Various Computers Using Standard Linear Equations Software,”
University of Tennessee Computer Science Technical Report, no. cs-89-85, February 2013.
(539.24 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),”
University of Tennessee Computer Science Technical Report, UT-CS-89-85, 00 2010.
(6.42 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),”
University of Tennessee Computer Science Technical Report, CS-89-85, January 2008.
(6.42 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),”
University of Tennessee Computer Science Department Technical Report, CS-89-85, January 2004.
(6.42 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance of Various Computers Using Standard Linear Equations Software, (Linpack Benchmark Report),”
University of Tennessee Computer Science Technical Report, no. CS-89-85: University of Tennessee, June 2014.
(514.64 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),”
University of Tennessee Computer Science Technical Report, no. CS-89-85, 00 2011.
(6.42 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),”
University of Tennessee Computer Science Department Technical Report, UT-CS-04-526, vol. –89-95, January 2006.
(6.42 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),”
University of Tennessee Computer Science Technical Report, no. CS-89-85, January 2001.
(6.42 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs,”
The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.
“Performance of Asynchronous Optimized Schwarz with One-sided Communication,”
Parallel Computing, vol. 86, pp. 66-81, August 2019.
(3.09 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance Modeling for Self Adapting Collective Communications for MPI,”
LACSI Symposium 2001, Santa Fe, NM, October 2001.
(105.49 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Performance Model to Execute Workflows on High-Bandwidth Memory Architectures,”
The 47th International Conference on Parallel Processing (ICPP 2018), Eugene, OR, IEEE Computer Society Press, August 2018.
(868.44 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance Instrumentation and Measurement for Terascale Systems,”
ICCS 2003 Terascale Workshop, Melbourne, Australia, Springer, Berlin, Heidelberg, June 2003.
(5.36 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance Instrumentation and Compiler Optimizations for MPI/OpenMP Applications,”
Second International Workshop on OpenMP, Reims, France, January 2006.
(350.9 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance Instrumentation and Compiler Optimizations for MPI/OpenMP Applications,”
Lecture Notes in Computer Science, OpenMP Shared Memory Parallel Programming, vol. 4315: Springer Berlin / Heidelberg, 00 2008.
(350.9 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)