Publications
Analysis of Dynamically Scheduled Tile Algorithms for Dense Linear Algebra on Multicore Architectures,”
University of Tennessee Computer Science Technical Report, UT-CS-11-666, (also Lawn 243), March 2011.
(1.65 MB)
“Autotuned Parallel I/O for Highly Scalable Biosequence Analysis,”
TeraGrid'11, Salt Lake City, Utah, July 2011.
(275.34 KB)
“Autotuning GEMMs for Fermi,”
University of Tennessee Computer Science Technical Report, UT-CS-11-671, (also Lawn 245), April 2011.
(397.45 KB)
“BlackjackBench: Hardware Characterization with Portable Micro-Benchmarks and Automatic Statistical Analysis of Results,”
IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 2011.
“Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems
, no. UT-CS-11-689, December 2011.
(608.95 KB)
A Block-Asynchronous Relaxation Method for Graphics Processing Units,”
University of Tennessee Computer Science Technical Report, no. UT-CS-11-687 / LAWN 258, November 2011.
(1.08 MB)
“Changes in Dense Linear Algebra Kernels - Decades Long Perspective,”
in Solving the Schrodinger Equation: Has everything been tried? (to appear): Imperial College Press, 00 2011.
“A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures,”
Symposium for Application Accelerators in High Performance Computing (SAAHPC'11), Knoxville, TN, July 2011.
(329.68 KB)
“Correlated Set Coordination in Fault Tolerant Message Logging Protocols,”
Proceedings of 17th International Conference, Euro-Par 2011, Part II, vol. 6853, Bordeaux, France, Springer, pp. 51-64, August 2011.
(486.68 KB)
“DAGuE: A Generic Distributed DAG Engine for High Performance Computing,”
Proceedings of the Workshops of the 25th IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2011 Workshops), Anchorage, Alaska, USA, IEEE, pp. 1151-1158, 00 2011.
(830.85 KB)
“The Design of an Auto-tuning I/O Framework on Cray XT5 System,”
Cray Users Group Conference (CUG'11) (Best Paper Finalist), Fairbanks, Alaska, May 2011.
(459.57 KB)
“Efficient Support for Matrix Computations on Heterogeneous Multi-core and Multi-GPU Architectures,”
University of Tennessee Computer Science Technical Report, UT-CS-11-668, (also Lawn 250), June 2011.
(5.93 MB)
“Energy and performance characteristics of different parallel implementations of scientific applications on multicore systems,”
International Journal of High Performance Computing Applications, vol. 25, no. 3, pp. 342-350, 00 2011.
(467.18 KB)
“Evaluation of the HPC Challenge Benchmarks in Virtualized Environments,”
6th Workshop on Virtualization in High-Performance Cloud Computing, Bordeaux, France, August 2011.
(114.73 KB)
“Exploiting Fine-Grain Parallelism in Recursive LU Factorization,”
Proceedings of PARCO'11, no. ICL-UT-11-04, Gent, Belgium, April 2011.
“Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA,”
Proceedings of the Workshops of the 25th IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2011 Workshops), Anchorage, Alaska, USA, IEEE, pp. 1432-1441, May 2011.
(1.26 MB)
“GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement,”
University of Tennessee Computer Science Technical Report UT-CS-11-690 (also Lawn 260), December 2011.
(662.98 KB)
“Hierarchical QR Factorization Algorithms for Multi-Core Cluster Systems,”
University of Tennessee Computer Science Technical Report (also Lawn 257), no. UT-CS-11-684, October 2011.
(405.71 KB)
“High Performance Bidiagonal Reduction using Tile Algorithms on Homogeneous Multicore Architectures,”
University of Tennessee Computer Science Technical Report, UT-CS-11-673, (also Lawn 247), May 2011.
(424.93 KB)
“High Performance Dense Linear System Solver with Soft Error Resilience,”
IEEE Cluster 2011, Austin, TX, September 2011.
(1.27 MB)
“High Performance Matrix Inversion Based on LU Factorization for Multicore Architectures,”
Proceedings of MTAGS11, Seattle, WA, November 2011.
(879.49 KB)
“High-Performance High-Resolution Semi-Lagrangian Tracer Transport on a Sphere,”
Journal of Computational Physics, vol. 230, issue 17, pp. 6778-6799, July 2011.
(1.68 MB)
“A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs,”
in GPU Computing Gems, Jade Edition, vol. 2: Elsevier, pp. 473-484, 00 2011.
“Impact of Kernel-Assisted MPI Communication over Scientific Applications: CPMD and FFTW,”
18th EuroMPI, Santorini, Greece, Springer, pp. 247-254, September 2011.
“The International Exascale Software Project Roadmap,”
International Journal of High Performance Computing, vol. 25, no. 1, pp. 3-60, January 2011.
(719.74 KB)
“Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community,”
IEEE Computing in Science & Engineering, vol. 13, issue 5, pp. 90-95, August 2011.
(932.57 KB)
“Kernel Assisted Collective Intra-node MPI Communication Among Multi-core and Many-core CPUs,”
Int'l Conference on Parallel Processing (ICPP '11), Taipei, Taiwan, September 2011.
“LU Factorization for Accelerator-Based Systems,”
IEEE/ACS AICCSA 2011, Sharm-El-Sheikh, Egypt, December 2011.
(234.86 KB)
“MAGMA - LAPACK for GPUs
, Atlanta, GA, Keeneland GPU Tutorial, April 2011.
(742.14 KB)
MAGMA - LAPACK for HPC on Heterogeneous Architectures
, Oak Ridge, TN, Titan Summit at Oak Ridge National Laboratory, Presentation, August 2011.
(20.43 MB)
Matrix Algebra on GPU and Multicore Architectures
, Basel, Switzerland, Workshop on GPU-enabled Numerical Libraries, Presentation, May 2011.
(49.27 MB)
OMPIO: A Modular Software Architecture for MPI I/O,”
18th EuroMPI, Santorini, Greece, Springer, pp. 81-89, September 2011.
“An open-source tool-chain for performance analysis,”
Parallel Tools Workshop, Dresden, Germany, September 2011.
(622.1 KB)
“Optimizing Symmetric Dense Matrix-Vector Multiplication on GPUs,”
ACM/IEEE Conference on Supercomputing (SC’11), Seattle, WA, November 2011.
(630.63 KB)
“Overlapping Computation and Communication for Advection on a Hybrid Parallel Computer,”
IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 2011.
“Parallel algebraic domain decomposition solver for the solution of augmented systems.,”
Parallel, Distributed, Grid and Cloud Computing for Engineering, Ajaccio, Corsica, France, 12-15 April, 00 2011.
“Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs,”
International Conference on Parallel Processing (ICPP'11), Taipei, Taiwan, ACM, September 2011.
(1.41 MB)
“Parallel Reduction to Condensed Forms for Symmetric Eigenvalue Problems using Aggregated Fine-Grained and Memory-Aware Kernels,”
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC11), Seattle, WA, November 2011.
(636.01 KB)
“Parallel Reduction to Condensed Forms for Symmetric Eigenvalue Problems using Aggregated Fine-Grained and Memory-Aware Kernels,”
University of Tennessee Computer Science Technical Report, UT-CS-11-677, (also Lawn254), August 2011.
(636.01 KB)
“A parallel tiled solver for dense symmetric indefinite systems on multicore architectures,”
University of Tennessee Computer Science Technical Report, no. ICL-UT-11-07, October 2011.
(544.2 KB)
“Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),”
University of Tennessee Computer Science Technical Report, no. CS-89-85, 00 2011.
(6.42 MB)
“Performance Portability of a GPU Enabled Factorization with the DAGuE Framework,”
IEEE Cluster: workshop on Parallel Programming on Accelerator Clusters (PPAC), June 2011.
(290.98 KB)
“Power-aware Computing on GPGPUs
, Gatlinburg, TN, Fall Creek Falls Conference, Poster, September 2011.
(2.89 MB)
Power-Aware Prediction Models of Hybrid (MPI/OpenMP) Scientific Applications,”
International Conference on Energy-Aware High Performance Computing (EnA-HPC 2011), Hamburg, Germany, September 2011.
(479.49 KB)
“Process Distance-aware Adaptive MPI Collective Communications,”
IEEE Int'l Conference on Cluster Computing (Cluster 2011), Austin, Texas, 00 2011.
“Profiling High Performance Dense Linear Algebra Algorithms on Multicore Architectures for Power and Energy Efficiency,”
International Conference on Energy-Aware High Performance Computing (EnA-HPC 2011), Hamburg, Germany, September 2011.
(1.27 MB)
“QCG-OMPI: MPI Applications on Grids.,”
Future Generation Computer Systems, vol. 27, no. 4, pp. 435-369, January 2011.
(1.48 MB)
“QUARK Users' Guide: QUeueing And Runtime for Kernels,”
University of Tennessee Innovative Computing Laboratory Technical Report, no. ICL-UT-11-02, 00 2011.
(247.12 KB)
“Reducing the Amount of Pivoting in Symmetric Indefinite Systems,”
University of Tennessee Innovative Computing Laboratory Technical Report, no. ICL-UT-11-06, Knoxville, TN, Submitted to PPAM 2011, May 2011.
(145.76 KB)
“On Scalability for MPI Runtime Systems,”
International Conference on Cluster Computing (CLUSTER), Austin, TX, USA, IEEEE, pp. 187-195, September 2011.
(898.76 KB)
“