Publications
Extending the MPI Specification for Process Fault Tolerance on High Performance Computing Systems,”
Proceedings of ISC2004 (to appear), Heidelberg, Germany, June 2004.
(548.38 KB)
“Exploring New Architectures in Accelerating CFD for Air Force Applications,”
Proceedings of the DoD HPCMP User Group Conference, Seattle, Washington, January 2008.
(492.86 KB)
“Exploiting Fine-Grain Parallelism in Recursive LU Factorization,”
Proceedings of PARCO'11, no. ICL-UT-11-04, Gent, Belgium, April 2011.
“Experiments with Strassen's Algorithm: From Sequential to Parallel,”
18th IASTED International Conference on Parallel and Distributed Computing and Systems PDCS 2006 (submitted), Dallas, Texas, January 2006.
(514.33 KB)
“Experiments with Scheduling Using Simulated Annealing in a Grid Environment,”
Grid Computing - GRID 2002, Third International Workshop, vol. 2536, Baltimore, MD, Springer, pp. 232-242, November 2002.
(66.91 KB)
“An Evaluation of User-Level Failure Mitigation Support in MPI,”
Proceedings of Recent Advances in Message Passing Interface - 19th European MPI Users' Group Meeting, EuroMPI 2012, Vienna, Austria, Springer, September 2012.
“Evaluation of the HPC Challenge Benchmarks in Virtualized Environments,”
6th Workshop on Virtualization in High-Performance Cloud Computing, Bordeaux, France, August 2011.
(114.73 KB)
“Energy Footprint of Advanced Dense Numerical Linear Algebra using Tile Algorithms on Multicore Architecture,”
The 2nd International Conference on Cloud and Green Computing (submitted), Xiangtan, Hunan, China, November 2012.
(329.5 KB)
“Enabling and Scaling Matrix Computations on Heterogeneous Multi-Core and Multi-GPU Systems,”
26th ACM International Conference on Supercomputing (ICS 2012), San Servolo Island, Venice, Italy, ACM, June 2012.
(5.88 MB)
“Efficient Pattern Search in Large Traces through Successive Refinement,”
Proceedings of Euro-Par 2004, Pisa, Italy, Springer-Verlag, August 2004.
(177.46 KB)
“Efficiency of General Krylov Methods on GPUs – An Experimental Study,”
2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 683-691, May 2016.
“Dynamic Task Scheduling for Linear Algebra Algorithms on Distributed-Memory Multicore Systems,”
International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '09), Portland, OR, November 2009.
(502.49 KB)
“Dynamic Task Discovery in PaRSEC- A data-flow task-based Runtime,”
ScalA17, Denver, ACM, September 2017.
(1.15 MB)
“Domain Overlap for Iterative Sparse Triangular Solves on GPUs,”
Software for Exascale Computing - SPPEXA, vol. 113: Springer International Publishing, pp. 527–545, September 2016.
“Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols,”
Proceedings of EuroMPI 2010, Stuttgart, Germany, Springer, September 2010.
(202.87 KB)
“Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols,”
Proceedings of EuroMPI 2010, Stuttgart, Germany, Springer, September 2010.
(202.87 KB)
“Developing an Architecture to Support the Implementation and Development of Scientific Computing Applications,”
to appear in Proceedings of Working Conference 8: Software Architecture for Scientific Computing Applications, Ottawa, Canada, October 2000.
(176.25 KB)
“The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques,”
International Conference on Computational Science (ICCS 2018), vol. 10860, Wuxi, China, Springer, pp. 586–600, June 2018.
(487.88 KB)
“Design of an Interactive Environment for Numerically Intensive Parallel Linear Algebra Calculations,”
International Conference on Computational Science, Poland, Springer Verlag, June 2004.
(88.31 KB)
“Design of an Interactive Environment for Numerically Intensive Parallel Linear Algebra Calculations,”
International Conference on Computational Science, Poland, Springer Verlag, June 2004.
(88.31 KB)
“Deploying Parallel Numerical Library Routines to Cluster Computing in a Self Adapting Fashion,”
Parallel Computing: Advances and Current Issues:Proceedings of the International Conference ParCo2001, London, England, Imperial College Press, January 2002.
(381.89 KB)
“Dense Linear Algebra Solvers for Multicore with GPU Accelerators,”
Parallel Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on, Atlanta, GA, pp. 1-8, 2010.
(1 MB)
“Deep Gaussian process with multitask and transfer learning for performance optimization,”
2022 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1-7, September 2022.
“DAGuE: A Generic Distributed DAG Engine for High Performance Computing,”
Proceedings of the Workshops of the 25th IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2011 Workshops), Anchorage, Alaska, USA, IEEE, pp. 1151-1158, 00 2011.
(830.85 KB)
“CPU-GPU Hybrid Bidiagonal Reduction With Soft Error Resilience,”
ScalA '13 Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Montpellier, France, November 2013.
(238.58 KB)
“Correlated Set Coordination in Fault Tolerant Message Logging Protocols,”
Proceedings of 17th International Conference, Euro-Par 2011, Part II, vol. 6853, Bordeaux, France, Springer, pp. 51-64, August 2011.
(486.68 KB)
“A Comparison of Search Heuristics for Empirical Code Optimization,”
The 3rd international Workshop on Automatic Performance Tuning, Tsukuba, Japan, October 2008.
(772.48 KB)
“Comparison of Nonlinear Conjugate-Gradient methods for computing the Electronic Properties of Nanostructure Architectures,”
Proceedings of 5th International Conference on Computational Science (ICCS), Atlanta, GA, USA, Springer's Lecture Notes in Computer Science, pp. 317-325, January 2005.
(172.86 KB)
“Comparison of Nonlinear Conjugate-Gradient methods for computing the Electronic Properties of Nanostructure Architectures,”
Proceedings of 5th International Conference on Computational Science (ICCS), Atlanta, GA, USA, Springer's Lecture Notes in Computer Science, pp. 317-325, January 2005.
(172.86 KB)
“Comparative Study of One-Sided Factorizations with Multiple Software Packages on Multi-Core Hardware,”
2009 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '09) (to appear), 00 2009.
(515.63 KB)
“A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures,”
Symposium for Application Accelerators in High Performance Computing (SAAHPC'11), Knoxville, TN, July 2011.
(329.68 KB)
“A Class of Communication-Avoiding Algorithms for Solving General Dense Linear Systems on CPU/GPU Parallel Machines,”
Proc. of the International Conference on Computational Science (ICCS), vol. 9, pp. 17-26, June 2012.
“A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI,”
18th International European Conference on Parallel and Distributed Computing (Euro-Par 2012) (Best Paper Award), Rhodes, Greece, Springer-Verlag, August 2012.
(289.32 KB)
“Can Hardware Performance Counters Produce Expected, Deterministic Results?,”
3rd Workshop on Functionality of Hardware Performance Monitoring, Atlanta, GA, December 2010.
(392.71 KB)
“BlackjackBench: Hardware Characterization with Portable Micro-Benchmarks and Automatic Statistical Analysis of Results,”
IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 2011.
“Bi-objective Scheduling Algorithms for Optimizing Makespan and Reliability on Heterogeneous Systems,”
19th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA) (submitted), San Diego, CA, June 2007.
(223.82 KB)
“Binomial Graph: A Scalable and Fault- Tolerant Logical Network Topology,”
Proceedings of The Fifth International Symposium on Parallel and Distributed Processing and Applications (ISPA07), Niagara Falls, Canada, Springer, August 2007.
(480.47 KB)
“Batched Generation of Incomplete Sparse Approximate Inverses on GPUs,”
Proceedings of the 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, pp. 49–56, November 2016.
“Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioner Generation on GPUs,”
Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, New York, NY, USA, ACM, pp. 1–10, February 2017.
(552.62 KB)
“Automatically Tuned Collective Communications,”
Proceedings of SuperComputing 2000 (SC'2000), Dallas, TX, November 2000.
(232.69 KB)
“Automatic Translation of Fortran to JVM Bytecode,”
Joint ACM Java Grande - ISCOPE 2001 Conference (submitted), Stanford University, California, June 2001.
(185.8 KB)
“Automatic Experimental Analysis of Communication Patterns in Virtual Topologies,”
In Proceedings of the International Conference on Parallel Processing, Oslo, Norway, IEEE Computer Society, June 2005.
(227.13 KB)
“Applying Aspect-Oriented Programming Concepts to a Component-based Programming Model,”
IPDPS 2003, Workshop on NSF-Next Generation Software, Nice, France, March 2003.
(66.99 KB)
“Anatomy of a Globally Recursive Embedded LINPACK Benchmark,”
2012 IEEE High Performance Extreme Computing Conference, Waltham, MA, pp. 1-6, September 2012.
(204.74 KB)
“Algorithm-Based Fault Tolerance for Dense Matrix Factorization,”
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2012, New Orleans, LA, USA, ACM, pp. 225-234, February 2012.
(865.79 KB)
“Algorithm-Based Checkpoint-Free Fault Tolerance for Parallel Matrix Computations on Volatile Resources,”
IPDPS 2006, 20th IEEE International Parallel and Distributed Processing Symposium, Rhodes Island, Greece, January 2006.
(266.54 KB)
“An Algebra for Cross-Experiment Performance Analysis,”
2004 International Conference on Parallel Processing (ICCP-04), Montreal, Quebec, Canada, August 2004.
(166.12 KB)
“Addressing Irregular Patterns of Matrix Computations on GPUs and Their Impact on Applications Powered by Sparse Direct Solvers,”
2022 International Conference for High Performance Computing, Networking, Storage and Analysis (SC22), Dallas, TX, IEEE Computer Society, pp. 354-367, November 2022.
(1.57 MB)
“Active Logistical State Management in the GridSolve/L,”
4th International Symposium on Cluster Computing and the Grid (CCGrid 2004)(submitted), Chicago, Illinois, January 2004.
(123.69 KB)
““8th International Conference on Parallel Processing and Applied Mathematics, Lecture Notes in Computer Science (LNCS),”
PPAM 2009 Proceedings, vol. 6067, Wroclaw, Poland, Springer, September 2010.