Publications
Analysis of Dynamically Scheduled Tile Algorithms for Dense Linear Algebra on Multicore Architectures,”
Submitted to Concurrency and Computations: Practice and Experience, November 2010.
(1.65 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Data Flow Divide and Conquer Algorithm for Multicore Architecture,”
29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Hyderabad, India, IEEE, May 2015.
(535.44 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Investigating Half Precision Arithmetic to Accelerate Dense Linear System Solvers,”
ScalA17: 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Denver, CO, ACM.
(766.35 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance and Portability with OpenCL for Throughput-Oriented HPC Workloads Across Accelerators, Coprocessors, and Multicore Processors,”
5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA '14), New Orleans, LA, IEEE, November 2014.
(407.5 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Generic Approach to Scheduling and Checkpointing Workflows,”
The 47th International Conference on Parallel Processing (ICPP 2018), Eugene, OR, IEEE Computer Society Press, August 2018.
(737.11 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Energy-Aware Strategies for Reliability-Oriented Real-Time Task Allocation on Heterogeneous Platforms,”
49th International Conference on Parallel Processing (ICPP 2020), Edmonton, AB, Canada, ACM Press, 2020.
(804.96 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Generic Approach to Scheduling and Checkpointing Workflows,”
International Journal of High Performance Computing Applications, vol. 33, issue 6, pp. 1255-1274, November 2019.
(555.01 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Generic Approach to Scheduling and Checkpointing Workflows,”
Int. Journal of High Performance Computing Applications, vol. 33, no. 6, pp. 1255-1274, 2019.
(555.01 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Checkpointing Workflows for Fail-Stop Errors,”
IEEE Cluster, Honolulu, Hawaii, IEEE, September 2017.
(400.64 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Improved Energy-Aware Strategies for Periodic Real-Time Tasks under Reliability Constraints,”
40th IEEE Real-Time Systems Symposium (RTSS 2019), York, UK, IEEE Press, February 2020.
“Checkpointing Workflows for Fail-Stop Errors,”
IEEE Transactions on Computers, vol. 67, issue 8, pp. 1105–1120, August 2018.
“Interactive Grid-Access Using Gridsolve and Giggle,”
Computing and Informatics, vol. 27, no. 2, pp. 233-248,ISSN1335-9150, 00 2008.
(533.4 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance Analysis and Modeling of Task-Based Runtimes,”
Department of Electrical Engineering and Computer Science, vol. PhD, Knoxville, University of Tennessee, May 2016.
(5.14 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Search Space Pruning Constraints Visualization,”
VISSOFT'14: 2nd IEEE Working Conference on Software Visualization, Victoria, BC, Canada, IEEE, September 2014.
(1.32 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Visualizing Execution Traces with Task Dependencies,”
2nd Workshop on Visual Performance Analysis (VPA '15), Austin, TX, ACM, November 2015.
(927.5 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
The Semantic Conference Organizer,”
Statistical Data Mining and Knowledge Discovery: CRC Press, 00 2003.
(998.12 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
ASCR@40: Four Decades of Department of Energy Leadership in Advanced Scientific Computing Research
: Advanced Scientific Computing Advisory Committee (ASCAC), US Department of Energy, August 2020.
ASCR@40: Highlights and Impacts of ASCR’s Programs
: US Department of Energy’s Office of Advanced Scientific Computing Research, June 2020.
A Parallel Implementation of the Nonsymmetric QR Algorithm for Disitributed Memory Architectures,”
SIAM Journal on Scientific Computing, vol. 16, no. 2, pp. 284-311, October 2002.
(224.7 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Parallel Implementation of the Nonsymmetric QR Algorithm for Distributed Memory Architectures,”
SIAM Journal on Scientific Computing, vol. 24, no. 1, pp. 284-311, January 2003.
(224.7 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Composition of Algorithmic Building Blocks in Template Task Graphs,”
2022 IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM), Dallas, TX, USA, IEEE, January 2023, 2022.
(1015.99 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Optimal Cooperative Checkpointing for Shared High-Performance Computing Platforms,”
2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Best Paper Award, Vancouver, BC, Canada, IEEE, May 2018.
(899.3 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Generic Matrix Multiplication for Multi-GPU Accelerated Distributed-Memory Platforms over PaRSEC,”
ScalA'19: 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Denver, CO, IEEE, November 2019.
(260.69 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Checkpointing Strategies for Shared High-Performance Computing Platforms,”
International Journal of Networking and Computing, vol. 9, no. 1, pp. 28–52, 2019.
(490.5 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Practical Scalable Consensus for Pseudo-Synchronous Distributed Systems,”
The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.
(550.96 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Practical Scalable Consensus for Pseudo-Synchronous Distributed Systems: Formal Proof,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-15-01, April 2015.
(570.97 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Distributed-Memory Multi-GPU Block-Sparse Tensor Contraction for Electronic Structure,”
35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021), Portland, OR, IEEE, May 2021.
“Event-based Measurement and Analysis of One-sided Communication,”
In Proceedings of the European Conference on Parallel Computing (Euro-Par), Lisbon, Portugal, Springer, August 2005.
(403.44 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance Instrumentation and Compiler Optimizations for MPI/OpenMP Applications,”
Second International Workshop on OpenMP, Reims, France, January 2006.
(350.9 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance Instrumentation and Compiler Optimizations for MPI/OpenMP Applications,”
Lecture Notes in Computer Science, OpenMP Shared Memory Parallel Programming, vol. 4315: Springer Berlin / Heidelberg, 00 2008.
(350.9 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Toward a New Metric for Ranking High Performance Computing Systems,”
SAND2013 - 4744, June 2013.
(225.32 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Assessing the Cost of Redistribution followed by a Computational Kernel: Complexity and Performance Results,”
Parallel Computing, vol. 52, pp. 22-41, February 2016.
(2.06 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Truss Structural Optimization Using NetSolve System,”
Meeting of the Japan Society of Mechanical Engineers, Kyoto University, Kyoto, Japan, October 2002.
(450.65 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Optimization of Injection Schedule of Diesel Engine Using GridRPC,”
Information Processing Society of Japan Symposium Series, vol. 2003, no. 14, pp. 189-197, January 2003.
(520.96 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Energy Minimization of Protein Tertiary Structure by Parallel Simulated Annealing using Genetic Crossover,”
Special Issue on Biological Applications of Genetic and Evolutionary Computation (submitted), March 2003.
(438.68 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Optimization Problem Solving System using Grid RPC,”
3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, Tokyo, Japan, March 2003.
(71.6 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Simple Installation and Administration Tool for Large-scaled PC Cluster System,”
ClusterWorld Conference and Expo, San Jose, CA, March 2003.
(275.97 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Static Scheduling for ScaLAPACK on the Grid Using Genetic Algorithm,”
Information Processing Society of Japan Symposium Series, vol. 2003, no. 14, pp. 3-10, January 2003.
(506.42 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Distributed Probablistic Model-Building Genetic Algorithm,”
Lecture Notes in Computer Science, vol. 2723: Springer-Verlag, Heidelberg, pp. 1015-1028, January 2003.
(288.91 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Optimization System Using Grid RPC,”
Meeting of the Japan Society of Mechanical Engineers, Kyoto University, Kyoto, Japan, October 2002.
“XaaS: Acceleration as a Service to Enable Productive High-Performance Cloud Computing
: arXiv, January 2024.
Towards Efficient MapReduce Using MPI,”
Lecture Notes in Computer Science, Recent Advances in Parallel Virtual Machine and Message Passing Interface - 16th European PVM/MPI Users' Group Meeting, vol. 5759, Espoo, Finland, Springer Berlin / Heidelberg, pp. 240-249, 00 2009.
“Earth Virtualization Engines - A Technical Perspective
, September 2023.
A New Approach to MPI Collective Communication Implementations,”
Distributed and Parallel Systems: Springer US, pp. 45-54, 2007.
(140.2 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
P1673R3: A Free Function Linear algebra Interface Based on the BLAS,”
ISO JTC1 SC22 WG22, no. P1673R3: ISO, April 2021.
(858.89 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Production Implementations of Pipelined & Communication-Avoiding Iterative Linear Solvers
, Tokyo, Japan, SIAM Conference on Parallel Processing for Scientific Computing, March 2018.
(2.34 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Dynamic Task Discovery in PaRSEC- A data-flow task-based Runtime,”
ScalA17, Denver, ACM, September 2017.
(1.15 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Report of the MPI International Survey (Poster)
, Austin, TX, EuroMPI/USA '20: 27th European MPI Users' Group Meeting, September 2020.
An international survey on MPI users,”
Parallel Computing, vol. 108, December 2021.
(1.49 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Overhead of Using Spare Nodes,”
The International Journal of High Performance Computing Applications, February 2020.
(2.15 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)