Publications
Power Management and Event Verification in PAPI,”
Tools for High Performance Computing 2015: Proceedings of the 9th International Workshop on Parallel Tools for High Performance Computing, September 2015, Dresden, Germany, Dresden, Germany, Springer International Publishing, pp. pp. 41-51, 2016.
(565.14 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
What it Takes to keep PAPI Instrumental for the HPC Community
, Collegeville, MN, The 2019 Collegeville Workshop on Sustainable Scientific Software (CW3S19), July 2019.
(3.29 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
PULSE: PAPI Unifying Layer for Software-Defined Events (Poster)
, Seattle, WA, 2020 NSF Cyberinfrastructure for Sustained Scientific Innovation (CSSI) Principal Investigator Meeting, February 2020.
(1.86 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Accelerating NWChem Coupled Cluster through dataflow-based Execution,”
11th International Conference on Parallel Processing and Applied Mathematics (PPAM 2015), Krakow, Poland, Springer International Publishing, September 2015.
(452.82 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Holistic Approach for Performance Measurement and Analysis for Petascale Applications,”
ICCS 2009 Joint Workshop: Tools for Program Development and Analysis in Computational Science and Software Engineering for Large-Scale Computing, vol. 2009, Baton Rouge, Louisiana, Springer-Verlag Berlin Heidelberg 2009, pp. 686-695, May 2009.
(3.96 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Trace-based Performance Analysis for the Petascale Simulation Code FLASH,”
International Journal of High Performance Computing Applications (to appear), 00 2010.
(887.54 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Software-Defined Events (SDEs) in MAGMA-Sparse,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-18-12: University of Tennessee, December 2018.
(481.69 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Custom assignment of MPI ranks for parallel multi-dimensional FFTs: Evaluation of BG/P versus BG/L,”
Proceedings of the 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA-08), Sydney, Australia, IEEE Computer Society, pp. 271-283, January 2008.
(2.6 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Dataflow Programming Paradigms for Computational Chemistry Methods,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-17-01, Knoxville, TN, University of Tennessee, May 2017.
“PAPI's New Software-Defined Events for In-Depth Performance Analysis
, Lyon, France, CCDSC 2018: Workshop on Clusters, Clouds, and Data for Scientific Computing, September 2018.
PAPI Software-Defined Events for in-Depth Performance Analysis,”
The International Journal of High Performance Computing Applications, vol. 33, issue 6, pp. 1113-1127, November 2019.
(442.39 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Accelerating NWChem Coupled Cluster through dataflow-based Execution,”
The International Journal of High Performance Computing Applications, vol. 32, issue 4, pp. 540--551, July 2018.
(1.68 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Formulation of Requirements for New PAPI++ Software Package: Part I: Survey Results,”
PAPI++ Working Notes, no. 1, ICL-UT-20-02: Innovative Computing Laboratory, University of Tennessee Knoxville, January 2020.
(1.49 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Lecture Notes in Computer Science: High Performance Computing
, vol. 12761: Springer International Publishing, 2021.
Evaluation of Dataflow Programming Models for Electronic Structure Theory,”
Concurrency and Computation: Practice and Experience: Special Issue on Parallel and Distributed Algorithms, vol. 2018, issue e4490, pp. 1–20, May 2018.
(1.69 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
What it Takes to keep PAPI Instrumental for the HPC Community,”
1st Workshop on Sustainable Scientific Software (CW3S19), Collegeville, Minnesota, July 2019.
(50.57 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
I/O Performance Analysis for the Petascale Simulation Code FLASH,”
ISC'09, Hamburg, Germany, June 2009.
(88.88 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Accelerating NWChem Coupled Cluster through Dataflow-Based Execution,”
The International Journal of High Performance Computing Applications, pp. 1–13, January 2017.
(4.07 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Trace-based Performance Analysis for the Petascale Simulation Code FLASH,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-09-01, April 2009.
(887.54 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Roadmap for Refactoring Classic PAPI to PAPI++: Part II: Formulation of Roadmap Based on Survey Results,”
PAPI++ Working Notes, no. 2, ICL-UT-20-09: Innovative Computing Laboratory, University of Tennessee, July 2020.
(763.75 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Exploiting Block Structures of KKT Matrices for Efficient Solution of Convex Optimization Problems,”
IEEE Access, 2021.
(1.35 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Intelligent Service Trading and Brokering for Distributed Network Services in GridSolve,”
VECPAR 2010, 9th International Meeting on High Performance Computing for Computational Science, Berkeley, CA, June 2010.
(256.04 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Predicting MPI Collective Communication Performance Using Machine Learning,”
2020 IEEE International Conference on Cluster Computing (CLUSTER), Kobe, Japan, IEEE, September 2020.
(619.68 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures,”
Symposium for Application Accelerators in High Performance Computing (SAAHPC'11), Knoxville, TN, July 2011.
(329.68 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
An international survey on MPI users,”
Parallel Computing, vol. 108, December 2021.
(1.49 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Overhead of Using Spare Nodes,”
The International Journal of High Performance Computing Applications, February 2020.
(2.15 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
System Software for Many-Core and Multi-Core Architectures,”
Advanced Software Technologies for Post-Peta Scale Computing: The Japanese Post-Peta CREST Research Project, Singapore, Springer Singapore, pp. 59–75, 2019.
“A Report of the MPI International Survey (Poster)
, Austin, TX, EuroMPI/USA '20: 27th European MPI Users' Group Meeting, September 2020.
Dynamic Task Discovery in PaRSEC- A data-flow task-based Runtime,”
ScalA17, Denver, ACM, September 2017.
(1.15 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
P1673R3: A Free Function Linear algebra Interface Based on the BLAS,”
ISO JTC1 SC22 WG22, no. P1673R3: ISO, April 2021.
(858.89 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Production Implementations of Pipelined & Communication-Avoiding Iterative Linear Solvers
, Tokyo, Japan, SIAM Conference on Parallel Processing for Scientific Computing, March 2018.
(2.34 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Towards Efficient MapReduce Using MPI,”
Lecture Notes in Computer Science, Recent Advances in Parallel Virtual Machine and Message Passing Interface - 16th European PVM/MPI Users' Group Meeting, vol. 5759, Espoo, Finland, Springer Berlin / Heidelberg, pp. 240-249, 00 2009.
“Earth Virtualization Engines - A Technical Perspective
, September 2023.
A New Approach to MPI Collective Communication Implementations,”
Distributed and Parallel Systems: Springer US, pp. 45-54, 2007.
(140.2 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
XaaS: Acceleration as a Service to Enable Productive High-Performance Cloud Computing
: arXiv, January 2024.
Distributed Probablistic Model-Building Genetic Algorithm,”
Lecture Notes in Computer Science, vol. 2723: Springer-Verlag, Heidelberg, pp. 1015-1028, January 2003.
(288.91 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Truss Structural Optimization Using NetSolve System,”
Meeting of the Japan Society of Mechanical Engineers, Kyoto University, Kyoto, Japan, October 2002.
(450.65 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Energy Minimization of Protein Tertiary Structure by Parallel Simulated Annealing using Genetic Crossover,”
Special Issue on Biological Applications of Genetic and Evolutionary Computation (submitted), March 2003.
(438.68 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Optimization Problem Solving System using Grid RPC,”
3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, Tokyo, Japan, March 2003.
(71.6 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Optimization of Injection Schedule of Diesel Engine Using GridRPC,”
Information Processing Society of Japan Symposium Series, vol. 2003, no. 14, pp. 189-197, January 2003.
(520.96 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Simple Installation and Administration Tool for Large-scaled PC Cluster System,”
ClusterWorld Conference and Expo, San Jose, CA, March 2003.
(275.97 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Static Scheduling for ScaLAPACK on the Grid Using Genetic Algorithm,”
Information Processing Society of Japan Symposium Series, vol. 2003, no. 14, pp. 3-10, January 2003.
(506.42 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Optimization System Using Grid RPC,”
Meeting of the Japan Society of Mechanical Engineers, Kyoto University, Kyoto, Japan, October 2002.
“Assessing the Cost of Redistribution followed by a Computational Kernel: Complexity and Performance Results,”
Parallel Computing, vol. 52, pp. 22-41, February 2016.
(2.06 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Toward a New Metric for Ranking High Performance Computing Systems,”
SAND2013 - 4744, June 2013.
(225.32 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance Instrumentation and Compiler Optimizations for MPI/OpenMP Applications,”
Lecture Notes in Computer Science, OpenMP Shared Memory Parallel Programming, vol. 4315: Springer Berlin / Heidelberg, 00 2008.
(350.9 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance Instrumentation and Compiler Optimizations for MPI/OpenMP Applications,”
Second International Workshop on OpenMP, Reims, France, January 2006.
(350.9 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Event-based Measurement and Analysis of One-sided Communication,”
In Proceedings of the European Conference on Parallel Computing (Euro-Par), Lisbon, Portugal, Springer, August 2005.
(403.44 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Checkpointing Strategies for Shared High-Performance Computing Platforms,”
International Journal of Networking and Computing, vol. 9, no. 1, pp. 28–52, 2019.
(490.5 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Distributed-Memory Multi-GPU Block-Sparse Tensor Contraction for Electronic Structure,”
35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021), Portland, OR, IEEE, May 2021.
“