Publications
The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques,”
International Conference on Computational Science (ICCS 2018), vol. 10860, Wuxi, China, Springer, pp. 586–600, June 2018.
(487.88 KB)
“Developing an Architecture to Support the Implementation and Development of Scientific Computing Applications,”
to appear in Proceedings of Working Conference 8: Software Architecture for Scientific Computing Applications, Ottawa, Canada, October 2000.
(176.25 KB)
“Domain Overlap for Iterative Sparse Triangular Solves on GPUs,”
Software for Exascale Computing - SPPEXA, vol. 113: Springer International Publishing, pp. 527–545, September 2016.
“Efficiency of General Krylov Methods on GPUs – An Experimental Study,”
2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 683-691, May 2016.
“Enabling Full Service Surrogates Using the Portable Channel Representation,”
Tenth International World Wide Web Conference Proceedings (to appear),, Hong Kong, May 2001.
(267.23 KB)
“Enabling Full Service Surrogates Using the Portable Channel Representation,”
Tenth International World Wide Web Conference Proceedings (to appear),, Hong Kong, May 2001.
(267.23 KB)
“Exploring New Architectures in Accelerating CFD for Air Force Applications,”
Proceedings of the DoD HPCMP User Group Conference, Seattle, Washington, January 2008.
(492.86 KB)
“Extending the MPI Specification for Process Fault Tolerance on High Performance Computing Systems,”
Proceedings of ISC2004 (to appear), Heidelberg, Germany, June 2004.
(548.38 KB)
“Fault Tolerant Communication Library and Applications for High Performance Computing,”
Los Alamos Computer Science Institute (LACSI) Symposium 2003 (presented), Santa Fe, NM, October 2003.
(146.05 KB)
“Fault Tolerant High Performance Computing by a Coding Approach,”
Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (to appear), Chicago, Illinois, January 2005.
(209.37 KB)
“A Fault-Tolerant Communication Library for Grid Environments,”
17th Annual ACM International Conference on Supercomputing (ICS'03) International Workshop on Grid Computing and e-Science, San Francisco, June 2003.
(377.14 KB)
“A Holistic Approach for Performance Measurement and Analysis for Petascale Applications,”
ICCS 2009 Joint Workshop: Tools for Program Development and Analysis in Computational Science and Software Engineering for Large-Scale Computing, vol. 2009, Baton Rouge, Louisiana, Springer-Verlag Berlin Heidelberg 2009, pp. 686-695, May 2009.
(3.96 MB)
“A Holistic Approach for Performance Measurement and Analysis for Petascale Applications,”
ICCS 2009 Joint Workshop: Tools for Program Development and Analysis in Computational Science and Software Engineering for Large-Scale Computing, vol. 2009, Baton Rouge, Louisiana, Springer-Verlag Berlin Heidelberg 2009, pp. 686-695, May 2009.
(3.96 MB)
“Lossy all-to-all exchange for accelerating parallel 3-D FFTs on hybrid architectures with GPUs,”
2022 IEEE International Conference on Cluster Computing (CLUSTER), pp. 152-160, September 2022.
“Metacomputing Support for the SARA3D Structural Acoustics Application,”
Department of Defense Users' Group Conference (to appear), Biloxi, Mississippi, June 2001.
(64.58 KB)
“Modeling the Office of Science Ten Year Facilities Plan: The PERI Architecture Tiger Team,”
SciDAC 2009, Journal of Physics: Conference Series, vol. 180(2009)012039, San Diego, California, IOP Publishing, July 2009.
(906.39 KB)
“The NetSolve Environment: Progressing Towards the Seamless Grid,”
2000 International Conference on Parallel Processing (ICPP-2000), Toronto, Canada, August 2000.
(148.85 KB)
“A Note on Auto-tuning GEMM for GPUs,”
9th International Conference on Computational Science (ICCS 2009), no. 5544-5545, Baton Rouge, LA, pp. 884-892, May 2009.
(236.02 KB)
“A Note on Auto-tuning GEMM for GPUs,”
9th International Conference on Computational Science (ICCS 2009), no. 5544-5545, Baton Rouge, LA, pp. 884-892, May 2009.
(236.02 KB)
“Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects,”
Journal of Physics: Conference Series, vol. 180, 00 2009.
(119.37 KB)
“Optimal Routing in Binomial Graph Networks,”
The International Conference on Parallel and Distributed Computing, applications and Technologies (PDCAT), Adelaide, Australia, IEEE Computer Society, December 2007.
“Performance Analysis of MPI Collective Operations,”
4th International Workshop on Performance Modeling, Evaluation, and Optmization of Parallel and Distributed Systems (PMEO-PDS '05), Denver, Colorado, April 2005.
(1018.28 KB)
“Porting Sparse Linear Algebra to Intel GPUs,”
Euro-Par 2021: Parallel Processing Workshops, vol. 13098, Lisbon, Portugal, Springer International Publishing, pp. 57 - 68, June 2022.
“Prediction of Optimal Solvers for Sparse Linear Systems Using Deep Learning,”
2022 SIAM Conference on Parallel Processing for Scientific Computing (PP), Philadelphia, PA, Society for Industrial and Applied Mathematics, pp. 14 - 24.
“QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment,”
24th IEEE International Parallel and Distributed Processing Symposium (also LAWN 224), Atlanta, GA, April 2010.
(261.55 KB)
“QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators,”
Proceedings of IPDPS 2011, no. ICL-UT-10-04, Anchorage, AK, October 2010.
(468.17 KB)
“QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators,”
Proceedings of IPDPS 2011, no. ICL-UT-10-04, Anchorage, AK, October 2010.
(468.17 KB)
“Reliability Analysis of Self-Healing Network using Discrete-Event Simulation,”
Proceedings of Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07): IEEE Computer Society, pp. 437-444, May 2007.
“Request Sequencing: Optimizing Communication for the Grid,”
Lecture Notes in Computer Science: Proceedings of 6th International Euro-Par Conference 2000, Parallel Processing, (Germany: Springer Verlag 2000), pp. V1900,1213-1222, January 2000.
(165.92 KB)
“Reshaping Geostatistical Modeling and Prediction for Extreme-Scale Environmental Applications,”
2022 International Conference for High Performance Computing, Networking, Storage and Analysis (SC22), Dallas, TX, IEEE Press, November 2022.
“Reshaping Geostatistical Modeling and Prediction for Extreme-Scale Environmental Applications,”
2022 International Conference for High Performance Computing, Networking, Storage and Analysis (SC22), Dallas, TX, IEEE Press, November 2022.
“Scalability Analysis of the SPEC OpenMP Benchmarks on Large-Scale Shared Memory Multiprocessors,”
Proceedings of the 2007 International Conference on Computational Science (ICCS 2007), vol. 4487-4490, Beijing, China, Springer LNCS, pp. 815-822, 2007.
(145.84 KB)
“Scalability Issues in FFT Computation,”
International Conference on Parallel Computing Technologies: Springer, pp. 279–287, 2021.
“Scalable Fault Tolerant MPI: Extending the Recovery Algorithm,”
Proceedings of 12th European Parallel Virtual Machine and Message Passing Interface Conference - Euro PVM/MPI, vol. 3666, Sorrento (Naples) , Italy, Springer-Verlag Berlin, pp. 67, September 2005.
(144.86 KB)
“Seamless Access to Adaptive Solver Algorithms,”
Proceedings of 16th IMACS World Congress 2000 on Scientific Computing, Applications Mathematics and Simulation, Lausanne, Switzerland, August 2000.
(151.42 KB)
“Secure Remote Access to Numerical Software and Computational Hardware,”
Proceedings of the DoD HPC Users Group Conference (HPCUG) 2000, Albuquerque, NM, June 2000.
(172.6 KB)
“Self-Healing in Binomial Graph Networks,”
2nd International Workshop On Reliability in Decentralized Distributed Systems (RDDS 2007), Vilamoura, Algarve, Portugal, November 2007.
(322.39 KB)
“Self-Healing Network for Scalable Fault Tolerant Runtime Environments,”
DAPSYS 2006, 6th Austrian-Hungarian Workshop on Distributed and Parallel Systems, Innsbruck, Austria, January 2006.
(162.83 KB)
“A Systematic Multi-step Methodology for Performance Analysis of Communication Traces of Distributed Applications based on Hierarchical Clustering,”
Proc. of the 5th International Workshop on Performance Modeling, Evaluation, and Organization of Parallel and Distributed Systems (PMEO-PDS 2006), no. ICL-UT-05-06, Rhodes Island, Greece, IEEE Computer Society, April 2006.
(1.02 MB)
“Tile QR Factorization with Parallel Panel Processing for Multicore Architectures,”
24th IEEE International Parallel and Distributed Processing Symposium (submitted), 00 2010.
(313.98 KB)
“Tile QR Factorization with Parallel Panel Processing for Multicore Architectures,”
accepted in 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2010), Atlanta, GA, December 2009.
“Toward a Framework for Preparing and Executing Adaptive Grid Programs,”
International Parallel and Distributed Processing Symposium: IPDPS 2002 Workshops, Fort Lauderdale, FL, pp. 0171, April 2002.
(64.5 KB)
“Usage of the Scalasca Toolset for Scalable Performance Analysis of Large-scale Parallel Applications,”
Proceedings of the 2nd International Workshop on Tools for High Performance Computing, Stuttgart, Germany, Springer, pp. 157-167, January 2008.
(229.2 KB)
“User Level Failure Mitigation in MPI,”
Euro-Par 2012: Parallel Processing Workshops, vol. 7640, Rhodes Island, Greece, Springer Berlin Heidelberg, pp. 499-504, August 2012.
(136.15 KB)
“Variable-Size Batched Gauss-Huard for Block-Jacobi Preconditioning,”
International Conference on Computational Science (ICCS 2017), vol. 108, Zurich, Switzerland, Procedia Computer Science, pp. 1783-1792, June 2017.
(512.57 KB)
“Weighted Block-Asynchronous Iteration on GPU-Accelerated Systems,”
Tenth International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (Best Paper), Rhodes Island, Greece, August 2012.
(764.02 KB)
“Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC,”
IEEE Transactions on Parallel and Distributed Systems, vol. 33, issue 4, pp. 964 - 976, April 2022.
“
“Acceleration of GPU-based Krylov solvers via Data Transfer Reduction,”
International Journal of High Performance Computing Applications, 2015.
“Adaptive Precision in Block-Jacobi Preconditioning for Iterative Sparse Linear System Solvers,”
Concurrency and Computation: Practice and Experience, vol. 31, no. 6, pp. e4460, March 2019.
(341.54 KB)
“