Publications
Self-Healing Network for Scalable Fault Tolerant Runtime Environments,”
DAPSYS 2006, 6th Austrian-Hungarian Workshop on Distributed and Parallel Systems, Innsbruck, Austria, January 2006.
(162.83 KB)
“A Systematic Multi-step Methodology for Performance Analysis of Communication Traces of Distributed Applications based on Hierarchical Clustering,”
Proc. of the 5th International Workshop on Performance Modeling, Evaluation, and Organization of Parallel and Distributed Systems (PMEO-PDS 2006), no. ICL-UT-05-06, Rhodes Island, Greece, IEEE Computer Society, April 2006.
(1.02 MB)
“Binomial Graph: A Scalable and Fault- Tolerant Logical Network Topology,”
Proceedings of The Fifth International Symposium on Parallel and Distributed Processing and Applications (ISPA07), Niagara Falls, Canada, Springer, August 2007.
(480.47 KB)
“Decision Trees and MPI Collective Algorithm Selection Problem,”
Euro-Par 2007, Rennes, France, Springer, pp. 105–115, August 2007.
(552.94 KB)
“MPI Collective Algorithm Selection and Quadtree Encoding,”
Parallel Computing (Special Edition: EuroPVM/MPI 2006): Elsevier, 00 2007.
(308.39 KB)
“Optimal Routing in Binomial Graph Networks,”
The International Conference on Parallel and Distributed Computing, applications and Technologies (PDCAT), Adelaide, Australia, IEEE Computer Society, December 2007.
“Performance Analysis of MPI Collective Operations,”
Cluster computing, vol. 10, no. 2: Springer Netherlands, pp. 127-143, June 2007.
(1018.28 KB)
“Reliability Analysis of Self-Healing Network using Discrete-Event Simulation,”
Proceedings of Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07): IEEE Computer Society, pp. 437-444, May 2007.
“Scalability Analysis of the SPEC OpenMP Benchmarks on Large-Scale Shared Memory Multiprocessors,”
Proceedings of the 2007 International Conference on Computational Science (ICCS 2007), vol. 4487-4490, Beijing, China, Springer LNCS, pp. 815-822, 2007.
(145.84 KB)
“Self-Healing in Binomial Graph Networks,”
2nd International Workshop On Reliability in Decentralized Distributed Systems (RDDS 2007), Vilamoura, Algarve, Portugal, November 2007.
(322.39 KB)
““,”
8th International Conference on Computational Science (ICCS), Proceedings Parts I, II, and III, Lecture Notes in Computer Science, vol. 5101, Krakow, Poland, Springer Berlin, January 2008.
DARPA's HPCS Program: History, Models, Tools, Languages,”
in Advances in Computers, vol. 72: Elsevier, January 2008.
(3.61 MB)
“Exploring New Architectures in Accelerating CFD for Air Force Applications,”
Proceedings of the DoD HPCMP User Group Conference, Seattle, Washington, January 2008.
(492.86 KB)
“Fast and Small Short Vector SIMD Matrix Multiplication Kernels for the CELL Processor,”
University of Tennessee Computer Science Technical Report, no. UT-CS-08-609, (also LAPACK Working Note 189), January 2008.
(500.99 KB)
“High Performance GridRPC Middleware,”
Recent developments in Grid Technology and Applications: Nova Science Publishers, 00 2008.
(923.06 KB)
“Usage of the Scalasca Toolset for Scalable Performance Analysis of Large-scale Parallel Applications,”
Proceedings of the 2nd International Workshop on Tools for High Performance Computing, Stuttgart, Germany, Springer, pp. 157-167, January 2008.
(229.2 KB)
“Comparative Study of One-Sided Factorizations with Multiple Software Packages on Multi-Core Hardware,”
2009 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '09) (to appear), 00 2009.
(515.63 KB)
““Computational Science – ICCS 2009, Proceedings of the 9th International Conference,”
Lecture Notes in Computer Science: Theoretical Computer Science and General Issues, vol. -, no. 5544-5545, Baton Rouge, LA, May 2009.
“Computational Science – ICCS 2009, Proceedings of the 9th International Conference,”
Lecture Notes in Computer Science: Theoretical Computer Science and General Issues, vol. -, no. 5544-5545, Baton Rouge, LA, May 2009.
Enhancing Parallelism of Tile QR Factorization for Multicore Architectures,”
Submitted to Transaction on Parallel and Distributed Systems, December 2009.
(464.23 KB)
“A Holistic Approach for Performance Measurement and Analysis for Petascale Applications,”
ICCS 2009 Joint Workshop: Tools for Program Development and Analysis in Computational Science and Software Engineering for Large-Scale Computing, vol. 2009, Baton Rouge, Louisiana, Springer-Verlag Berlin Heidelberg 2009, pp. 686-695, May 2009.
(3.96 MB)
“A Holistic Approach for Performance Measurement and Analysis for Petascale Applications,”
ICCS 2009 Joint Workshop: Tools for Program Development and Analysis in Computational Science and Software Engineering for Large-Scale Computing, vol. 2009, Baton Rouge, Louisiana, Springer-Verlag Berlin Heidelberg 2009, pp. 686-695, May 2009.
(3.96 MB)
“Impact of Quad-core Cray XT4 System and Software Stack on Scientific Computation,”
Euro-Par 2009, Lecture Notes in Computer Science, vol. 5704/2009, Delft, The Netherlands, Springer Berlin / Heidelberg, pp. 334-344, August 2009.
(312.74 KB)
“The International Exascale Software Project: A Call to Cooperative Action by the Global High Performance Community,”
International Journal of High Performance Computing Applications (to appear), July 2009.
(203.04 KB)
“Modeling the Office of Science Ten Year Facilities Plan: The PERI Architecture Tiger Team,”
SciDAC 2009, Journal of Physics: Conference Series, vol. 180(2009)012039, San Diego, California, IOP Publishing, July 2009.
(906.39 KB)
“A Note on Auto-tuning GEMM for GPUs,”
9th International Conference on Computational Science (ICCS 2009), no. 5544-5545, Baton Rouge, LA, pp. 884-892, May 2009.
(236.02 KB)
“A Note on Auto-tuning GEMM for GPUs,”
9th International Conference on Computational Science (ICCS 2009), no. 5544-5545, Baton Rouge, LA, pp. 884-892, May 2009.
(236.02 KB)
“Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects,”
Journal of Physics: Conference Series, vol. 180, 00 2009.
(119.37 KB)
“Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects
, Portland, OR, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC09), November 2009.
(3.53 MB)
Optimizing Matrix Multiplication for a Short-Vector SIMD Architecture - CELL Processor,”
Parallel Computing, vol. 35, pp. 138-150, 00 2009.
(591.16 KB)
“Tall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures,”
Innovative Computing Laboratory Technical Report (also LAPACK Working Note 222 and CS Tech Report UT-CS-09-645), no. ICL-UT-09-03, September 2009.
(464.23 KB)
“Tile QR Factorization with Parallel Panel Processing for Multicore Architectures,”
accepted in 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2010), Atlanta, GA, December 2009.
“Transparent Cross-Platform Access to Software Services using GridSolve and GridRPC,”
in Cloud Computing and Software Services: Theory and Techniques (to appear): CRC Press, 00 2009.
“Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers
: 2010 Symposium on Application Accelerators in. High-Performance Computing (SAAHPC'10), Tutorial, July 2010.
(499.51 KB)
Autotuning Dense Linear Algebra Libraries on GPUs
, Basel, Switzerland, Sixth International Workshop on Parallel Matrix Algorithms and Applications (PMAA 2010), June 2010.
(579.44 KB)
Faster, Cheaper, Better - A Hybridization Methodology to Develop Linear Algebra Software for GPUs,”
LAPACK Working Note, no. 230, 00 2010.
(334.48 KB)
“Faster, Cheaper, Better - A Hybridization Methodology to Develop Linear Algebra Software for GPUs,”
LAPACK Working Note, no. 230, 00 2010.
(334.48 KB)
“MaPHyS or the Development of a Parallel Algebraic Domain Decomposition Solver in the Course of the Solstice Project,”
Sparse Days 2010 Meeting at CERFACS, Toulouse, France, June 2010.
““Proceedings of the International Conference on Computational Science,”
ICCS 2010, Amsterdam, Elsevier, May 2010.
QCG-OMPI: MPI Applications on Grids,”
Future Generation Computer Systems, vol. 27, no. 4, pp. 357-369, March 2010.
(1.48 MB)
“QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment,”
24th IEEE International Parallel and Distributed Processing Symposium (also LAWN 224), Atlanta, GA, April 2010.
(261.55 KB)
“QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators,”
Proceedings of IPDPS 2011, no. ICL-UT-10-04, Anchorage, AK, October 2010.
(468.17 KB)
“QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators,”
Proceedings of IPDPS 2011, no. ICL-UT-10-04, Anchorage, AK, October 2010.
(468.17 KB)
“Scheduling Cholesky Factorization on Multicore Architectures with GPU Accelerators
, Knoxville, TN, 2010 Symposium on Application Accelerators in High-Performance Computing (SAAHPC'10), Poster, July 2010.
(3.86 MB)
Scheduling Cholesky Factorization on Multicore Architectures with GPU Accelerators
, Knoxville, TN, 2010 Symposium on Application Accelerators in High-Performance Computing (SAAHPC'10), Poster, July 2010.
(3.86 MB)
Self-Healing Network for Scalable Fault-Tolerant Runtime Environments,”
Future Generation Computer Systems, vol. 26, no. 3, pp. 479-485, March 2010.
(1.54 MB)
“Tile QR Factorization with Parallel Panel Processing for Multicore Architectures,”
24th IEEE International Parallel and Distributed Processing Symposium (submitted), 00 2010.
(313.98 KB)
“Towards a Complexity Analysis of Sparse Hybrid Linear Solvers,”
PARA 2010, Reykjavik, Iceland, June 2010.
“Algebraic Schwarz Preconditioning for the Schur Complement: Application to the Time-Harmonic Maxwell Equations Discretized by a Discontinuous Galerkin Method.,”
The Twentieth International Conference on Domain Decomposition Methods, La Jolla, California, February 2011.
“Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems
, no. UT-CS-11-689, December 2011.
(608.95 KB)