Publications
Export 1285 results:
Filters: 10.1007 is 978-3-030-66057-4_11 [Clear All Filters]
Modeling the Office of Science Ten Year Facilities Plan: The PERI Architecture Tiger Team,”
SciDAC 2009, Journal of Physics: Conference Series, vol. 180(2009)012039, San Diego, California, IOP Publishing, July 2009.
(906.39 KB)
“MPI-aware Compiler Optimizations for Improving Communication-Computation Overlap,”
Proceedings of the 23rd annual International Conference on Supercomputing (ICS '09), Yorktown Heights, NY, USA, ACM, pp. 316-325, June 2009.
(308.92 KB)
“A Note on Auto-tuning GEMM for GPUs,”
9th International Conference on Computational Science (ICCS 2009), no. 5544-5545, Baton Rouge, LA, pp. 884-892, May 2009.
(236.02 KB)
“Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects,”
Journal of Physics: Conference Series, vol. 180, 00 2009.
(119.37 KB)
“Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects
, Portland, OR, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC09), November 2009.
(3.53 MB)
Numerical Linear Algebra on Hybrid Architectures: Recent Developments in the MAGMA Project
, Portland, Oregon, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC09), November 2009.
(1.41 MB)
Optimizing Matrix Multiplication for a Short-Vector SIMD Architecture - CELL Processor,”
Parallel Computing, vol. 35, pp. 138-150, 00 2009.
(591.16 KB)
“Parallel Band Two-Sided Matrix Bidiagonalization for Multicore Architectures,”
IEEE Transactions on Parallel and Distributed Systems (to appear), May 2009.
(208.16 KB)
“Parallel Dense Linear Algebra Software in the Multicore Era,”
in Cyberinfrastructure Technologies and Applications: Nova Science Publishers, Inc., pp. 9-24, 00 2009.
“Parallel Programming in MATLAB,”
The International Journal of High Performance Computing Applications, vol. 23, no. 3, pp. 277-283, July 2009.
(215.71 KB)
“Paravirtualization Effect on Single- and Multi-threaded Memory-Intensive Linear Algebra Software,”
Cluster Computing Journal: Special Issue on High Performance Distributed Computing, vol. 12, no. 2: Springer Netherlands, pp. 101-122, 00 2009.
(451.07 KB)
“Performance evaluation for petascale quantum simulation tools,”
Proceedings of CUG09, Atlanta, GA, May 2009.
(1.09 MB)
“The Problem with the Linpack Benchmark Matrix Generator,”
International Journal of High Performance Computing Applications, vol. 23, no. 1, pp. 5-14, 00 2009.
(136.41 KB)
“QR Factorization for the CELL Processor,”
Scientific Programming (to appear), 00 2009.
(234.02 KB)
“Reasons for a Pessimistic or Optimistic Message Logging Protocol in MPI Uncoordinated Failure Recovery,”
CLUSTER '09, New Orleans, IEEE, August 2009.
(191.36 KB)
“Recent Trends in High Performance Computing,”
in Birth of Numerical Analysis (to appear), 00 2009.
“Recording the Control Flow of Parallel Applications to Determine Iterative and Phase-Based Behavior,”
Future Generation Computing Systems, vol. 26, pp. 162-166, 00 2009.
“Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion,”
ACM TOMS (to appear), 00 2009.
(896.03 KB)
“Reliability and Performance Modeling and Analysis for Grid Computing,”
in Handbook of Research on Scalable Computing Technologies (to appear): IGI Global, pp. 219-245, 00 2009.
(200.57 KB)
“A Scalable Non-blocking Multicast Scheme for Distributed DAG Scheduling,”
The International Conference on Computational Science 2009 (ICCS 2009), vol. 5544, Baton Rouge, LA, pp. 195-204, May 2009.
(228.45 KB)
“Scheduling Linear Algebra Operations on Multicore Processors,”
University of Tennessee Computer Science Department Technical Report, UT-CS-09-636 (Also LAPACK Working Note 213), 00 2009.
(716.18 KB)
“Scheduling Linear Algebra Operations on Multicore Processors,”
Concurrency Practice and Experience (to appear), 00 2009.
(716.18 KB)
“Tall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures,”
Innovative Computing Laboratory Technical Report (also LAPACK Working Note 222 and CS Tech Report UT-CS-09-645), no. ICL-UT-09-03, September 2009.
(464.23 KB)
“Tile QR Factorization with Parallel Panel Processing for Multicore Architectures,”
accepted in 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2010), Atlanta, GA, December 2009.
“Towards Efficient MapReduce Using MPI,”
Lecture Notes in Computer Science, Recent Advances in Parallel Virtual Machine and Message Passing Interface - 16th European PVM/MPI Users' Group Meeting, vol. 5759, Espoo, Finland, Springer Berlin / Heidelberg, pp. 240-249, 00 2009.
“Trace-based Performance Analysis for the Petascale Simulation Code FLASH,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-09-01, April 2009.
(887.54 KB)
“Transparent Cross-Platform Access to Software Services using GridSolve and GridRPC,”
in Cloud Computing and Software Services: Theory and Techniques (to appear): CRC Press, 00 2009.
“VGrADS: Enabling e-Science Workflows on Grids and Clouds with Fault Tolerance,”
SC’09 The International Conference for High Performance Computing, Networking, Storage and Analysis (to appear), Portland, OR, 00 2009.
(648.82 KB)
““8th International Conference on Parallel Processing and Applied Mathematics, Lecture Notes in Computer Science (LNCS),”
PPAM 2009 Proceedings, vol. 6067, Wroclaw, Poland, Springer, September 2010.
Accelerating GPU Kernels for Dense Linear Algebra,”
Proc. of VECPAR'10, Berkeley, CA, June 2010.
(615.07 KB)
“Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers
: 2010 Symposium on Application Accelerators in. High-Performance Computing (SAAHPC'10), Tutorial, July 2010.
(499.51 KB)
Accelerating the Reduction to Upper Hessenberg, Tridiagonal, and Bidiagonal Forms through Hybrid GPU-Based Computing,”
Parallel Computing, vol. 36, no. 12, pp. 645-654, 00 2010.
(1.39 MB)
“Analysis of Dynamically Scheduled Tile Algorithms for Dense Linear Algebra on Multicore Architectures,”
Submitted to Concurrency and Computations: Practice and Experience, November 2010.
(1.65 MB)
“Analysis of Various Scalar, Vector, and Parallel Implementations of RandomAccess,”
Innovative Computing Laboratory (ICL) Technical Report, no. ICL-UT-10-03, June 2010.
(226.9 KB)
“Autotuning Dense Linear Algebra Libraries on GPUs
, Basel, Switzerland, Sixth International Workshop on Parallel Matrix Algorithms and Applications (PMAA 2010), June 2010.
(579.44 KB)
Blas for GPUs,”
Scientific Computing with Multicore and Accelerators, Boca Raton, Florida, CRC Press, 2010.
(1.05 MB)
“Can Hardware Performance Counters Produce Expected, Deterministic Results?,”
3rd Workshop on Functionality of Hardware Performance Monitoring, Atlanta, GA, December 2010.
(392.71 KB)
“A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures,”
Parallel Computing (to appear), 00 2010.
(612.23 KB)
“Collecting Performance Data with PAPI-C,”
Tools for High Performance Computing 2009, 3rd Parallel Tools Workshop, Dresden, Germany, Springer Berlin / Heidelberg, pp. 157-173, May 2010.
(4.45 MB)
“Constructing Resiliant Communication Infrastructure for Runtime Environments in Advances in Parallel Computing,”
Advances in Parallel Computing - Parallel Computing: From Multicores and GPU's to Petascale, vol. 19, pp. 441-451, 2010.
“DAGuE: A generic distributed DAG engine for high performance computing,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-10-01, April 2010.
(830.85 KB)
“Dense Linear Algebra for Hybrid GPU-based Systems,”
Scientific Computing with Multicore and Accelerators, Boca Raton, Florida, CRC Press, 2010.
“Dense Linear Algebra Solvers for Multicore with GPU Accelerators
, Atlanta, GA, International Parallel and Distributed Processing Symposium (IPDPS 2010), April 2010.
(956.68 KB)
Dense Linear Algebra Solvers for Multicore with GPU Accelerators,”
Parallel Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on, Atlanta, GA, pp. 1-8, 2010.
(1 MB)
“Distributed Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA,”
University of Tennessee Computer Science Technical Report, UT-CS-10-660, September 2010.
(366.26 KB)
“Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-10-02, 00 2010.
(400.75 KB)
“Divide & Conquer on Hybrid GPU-Accelerated Multicore Systems,”
SIAM Journal on Scientific Computing (submitted), August 2010.
“Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols,”
Proceedings of EuroMPI 2010, Stuttgart, Germany, Springer, September 2010.
(202.87 KB)
“Empirical Performance Tuning of Dense Linear Algebra Software,”
in Performance Tuning of Scientific Applications (to appear), 00 2010.
“EZTrace: a generic framework for performance analysis,”
ICL Technical Report, no. ICL-UT-11-01, December 2010.
“