Publications
Scalable Runtime for MPI: Efficiently Building the Communication Infrastructure,”
Proceedings of Recent Advances in the Message Passing Interface - 18th European MPI Users' Group Meeting, EuroMPI 2011, vol. 6960, Santorini, Greece, Springer, pp. 342-344, September 2011.
(115.75 KB)
“A Scalable Non-blocking Multicast Scheme for Distributed DAG Scheduling,”
The International Conference on Computational Science 2009 (ICCS 2009), vol. 5544, Baton Rouge, LA, pp. 195-204, May 2009.
(228.45 KB)
“A Scalable Framework for Heterogeneous GPU-Based Clusters,”
The 24th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2012), Pittsburgh, PA, USA, ACM, June 2012.
(3.39 MB)
“Scalable Fault Tolerant MPI: Extending the Recovery Algorithm,”
Proceedings of 12th European Parallel Virtual Machine and Message Passing Interface Conference - Euro PVM/MPI, vol. 3666, Sorrento (Naples) , Italy, Springer-Verlag Berlin, pp. 67, September 2005.
(144.86 KB)
“A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters,”
Proceedings of SuperComputing 2000 (SC'00), Dallas, TX, November 2000.
(178.15 KB)
“A Scalable Approach to MPI Application Performance Analysis,”
In Proc. of the 12th European Parallel Virtual Machine and Message Passing Interface Conference: Springer LNCS, September 2005.
(988.58 KB)
“Scalability Issues in FFT Computation,”
International Conference on Parallel Computing Technologies: Springer, pp. 279–287, 2021.
“On Scalability for MPI Runtime Systems,”
International Conference on Cluster Computing (CLUSTER), Austin, TX, USA, IEEEE, pp. 187-195, September 2011.
(898.76 KB)
“Scalability Analysis of the SPEC OpenMP Benchmarks on Large-Scale Shared Memory Multiprocessors,”
Proceedings of the 2007 International Conference on Computational Science (ICCS 2007), vol. 4487-4490, Beijing, China, Springer LNCS, pp. 815-822, 2007.
(145.84 KB)
“Scalability Analysis of the SPEC OpenMP Benchmarks on Large-Scale Shared Memory Multiprocessors,”
Proceedings of the 2007 International Conference on Computational Science (ICCS 2007), vol. 4487-4490, Beijing, China, Springer LNCS, pp. 815-822, 2007.
(145.84 KB)
“Revisiting Credit Distribution Algorithms for Distributed Termination Detection,”
2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW): IEEE, pp. 611–620, 2021.
“Request Sequencing: Optimizing Communication for the Grid,”
Lecture Notes in Computer Science: Proceedings of 6th International Euro-Par Conference 2000, Parallel Processing, (Germany: Springer Verlag 2000), pp. V1900,1213-1222, January 2000.
(165.92 KB)
“Request Sequencing: Enabling Workflow for Efficient Problem Solving in GridSolve,”
International Conference on Grid and Cooperative Computing (GCC 2008) (submitted), Shenzhen, China, October 2008.
(1.64 MB)
“Reliability Analysis of Self-Healing Network using Discrete-Event Simulation,”
Proceedings of Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07): IEEE Computer Society, pp. 437-444, May 2007.
“Redesigning the Message Logging Model for High Performance,”
International Supercomputer Conference (ISC 2008), Dresden, Germany, January 2008.
(622.1 KB)
“Recursive approach in sparse matrix LU factorization,”
Proceedings of 1st SGI Users Conference, Cracow, Poland (ACC Cyfronet UMM, 2000), pp. 409-418, January 2000.
(176.14 KB)
““Recent Advances in the Message Passing Interface, Lecture Notes in Computer Science (LNCS),”
EuroMPI 2010 Proceedings, vol. 6305, Stuttgart, Germany, Springer, September 2010.
Recent Advances in Parallel Virtual Machine and Message Passing Interface,”
Lecture Notes in Computer Science: Proceedings of 7th European PVM/MPI Users' Group Meeting 2000, (Hungary: Springer Verlag), pp. V1908, January 2000.
“QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators,”
Proceedings of IPDPS 2011, no. ICL-UT-10-04, Anchorage, AK, October 2010.
(468.17 KB)
“QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment,”
24th IEEE International Parallel and Distributed Processing Symposium (also LAWN 224), Atlanta, GA, April 2010.
(261.55 KB)
“Proposal of MPI operation level Checkpoint/Rollback and one implementation,”
Proceedings of IEEE CCGrid 2006: IEEE Computer Society, January 2006.
(277.27 KB)
“Programming the LU Factorization for a Multicore System with Accelerators,”
Proceedings of VECPAR’12, Kobe, Japan, April 2012.
(414.33 KB)
“Profiling High Performance Dense Linear Algebra Algorithms on Multicore Architectures for Power and Energy Efficiency,”
International Conference on Energy-Aware High Performance Computing (EnA-HPC 2011), Hamburg, Germany, September 2011.
(1.27 MB)
“Process Distance-aware Adaptive MPI Collective Communications,”
IEEE Int'l Conference on Cluster Computing (Cluster 2011), Austin, Texas, 00 2011.
“Power Profiling of Cholesky and QR Factorizations on Distributed Memory Systems,”
Third International Conference on Energy-Aware High Performance Computing, Hamburg, Germany, September 2012.
(290.27 KB)
“Power Management and Event Verification in PAPI,”
Tools for High Performance Computing 2015: Proceedings of the 9th International Workshop on Parallel Tools for High Performance Computing, September 2015, Dresden, Germany, Dresden, Germany, Springer International Publishing, pp. pp. 41-51, 2016.
(565.14 KB)
“Performance Modeling for Self Adapting Collective Communications for MPI,”
LACSI Symposium 2001, Santa Fe, NM, October 2001.
(105.49 KB)
“Performance Instrumentation and Compiler Optimizations for MPI/OpenMP Applications,”
Second International Workshop on OpenMP, Reims, France, January 2006.
(350.9 KB)
“Performance evaluation of eigensolvers in nano-structure computations,”
IEEE/ACM Proceedings of HPCNano SC06 (to appear), January 2006.
(120.61 KB)
“Performance Evaluation for Petascale Quantum Simulation Tools,”
Proceedings of the Cray Users' Group Meeting, Atlanta, GA, May 2010.
“Performance evaluation for petascale quantum simulation tools,”
Proceedings of CUG09, Atlanta, GA, May 2009.
(1.09 MB)
“Performance Analysis of MPI Collective Operations,”
4th International Workshop on Performance Modeling, Evaluation, and Optmization of Parallel and Distributed Systems (PMEO-PDS '05), Denver, Colorado, April 2005.
(1018.28 KB)
“A Pattern-Based Approach to Automated Application Performance Analysis,”
Workshop on Patterns in High Performance Computing, University of Illinois at Urbana-Champaign, May 2005.
(3.47 MB)
“Parallel Reduction to Condensed Forms for Symmetric Eigenvalue Problems using Aggregated Fine-Grained and Memory-Aware Kernels,”
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC11), Seattle, WA, November 2011.
(636.01 KB)
“Overview of GridRPC: A Remote Procedure Call API for Grid Computing,”
Proceedings of the Third International Workshop on Grid Computing, pp. 274-278, January 2002.
(221.82 KB)
“Overlapping Computation and Communication for Advection on a Hybrid Parallel Computer,”
IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 2011.
“Optimizing Symmetric Dense Matrix-Vector Multiplication on GPUs,”
ACM/IEEE Conference on Supercomputing (SC’11), Seattle, WA, November 2011.
(630.63 KB)
“Optimizing Performance and Reliability in Distributed Computing Systems Through Wide Spectrum Storage,”
Proceedings of the IPDPS 2003, NGS Workshop, Nice, France, pp. 209, January 2003.
“Optimization Problem Solving System using Grid RPC,”
3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, Tokyo, Japan, March 2003.
(71.6 KB)
“Optimization of Injection Schedule of Diesel Engine Using GridRPC,”
Information Processing Society of Japan Symposium Series, vol. 2003, no. 14, pp. 189-197, January 2003.
(520.96 KB)
“Optimal Routing in Binomial Graph Networks,”
The International Conference on Parallel and Distributed Computing, applications and Technologies (PDCAT), Adelaide, Australia, IEEE Computer Society, December 2007.
“OpenCL Evaluation for Numerical Linear Algebra Library Development,”
Symposium on Application Accelerators in High-Performance Computing (SAAHPC '10), Knoxville, TN, July 2010.
(2.69 MB)
“One-Sided Dense Matrix Factorizations on a Multicore with Multiple GPU Accelerators,”
The International Conference on Computational Science (ICCS), June 2012.
“Numerically Stable Real Number Codes Based on Random Matrices,”
The International Conference on Computational Science, Atlanta, GA, LNCS 3514, Springer-Verlag, January 2005.
(166.2 KB)
“Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects,”
Journal of Physics: Conference Series, vol. 180, 00 2009.
(119.37 KB)
“A Note on Auto-tuning GEMM for GPUs,”
9th International Conference on Computational Science (ICCS 2009), no. 5544-5545, Baton Rouge, LA, pp. 884-892, May 2009.
(236.02 KB)
“A Note on Auto-tuning GEMM for GPUs,”
9th International Conference on Computational Science (ICCS 2009), no. 5544-5545, Baton Rouge, LA, pp. 884-892, May 2009.
(236.02 KB)
“A New Recursive Implementation of Sparse Cholesky Factorization,”
Proceedings of 16th IMACS World Congress 2000 on Scientific Computing, Applications Mathematics and Simulation, Lausanne, Switzerland, August 2000.
“Network-Enabled Server Systems: Deploying Scientific Simulations on the Grid,”
2001 High Performance Computing Symposium (HPC'01), part of the Advance Simulation Technologies Conference, Seattle, Washington, April 2001.
(175.23 KB)
“The NetSolve Environment: Progressing Towards the Seamless Grid,”
2000 International Conference on Parallel Processing (ICPP-2000), Toronto, Canada, August 2000.
(148.85 KB)
“