Publications
Export 1285 results:
Filters: 10.1002 is cpe.7400 [Clear All Filters]
Self Adaptability in Grid Computing,”
Concurrency: Practice and Experience (submitted), March 2003.
(258.89 KB)
“A Metascheduler For The Grid,”
Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing (HPDC 2002), Edinburgh, Scotland, IEEE Computer Society, pp. 343-351, July 2002.
(99.53 KB)
“GrADSolve - RPC for High Performance Computing on the Grid,”
Lecture Notes in Computer Science, Proceedings of the 9th International Euro-Par Conference, vol. 2790, Klagenfurt, Austria, Springer-Verlag, Berlin, pp. 394-403, January 2003.
DOI: 10.1007/978-3-540-45209-6_58 (125.96 KB)
“A Performance Oriented Migration Framework for the Grid,”
Proceedings of the 3rd International Symposium on Cluster Computing and the Grid, Tokyo, Japan, pp. 130-137, May 2003.
(113.6 KB)
“Towards an Accurate Model for Collective Communications,”
International Journal of High Performance Applications, Special Issue: Automatic Performance Tuning, vol. 18, no. 1, pp. 159-167, January 2004.
(250.73 KB)
“GrADSolve - A Grid-based RPC System for Remote Invocation of Parallel Software,”
Journal of Parallel and Distributed Computing (submitted), March 2003.
(241.3 KB)
“Automatically Tuned Collective Communications,”
Proceedings of SuperComputing 2000 (SC'2000), Dallas, TX, November 2000.
(232.69 KB)
“Self Adaptivity in Grid Computing,”
Concurrency and Computation: Practice and Experience, Special Issue: Grid Performance, vol. 17, no. 2-4, pp. 235-257, 00 2005.
(394.66 KB)
“SRS - A Framework for Developing Malleable and Migratable Parallel Software,”
Parallel Processing Letters, vol. 13, no. 2, pp. 291-312, June 2003.
(211.6 KB)
“Towards an Accurate Model for Collective Communications,”
ICL Technical Report, no. ICL-UT-05-03, January 2005.
(250.73 KB)
“Efficient Parallelization of Batch Pattern Training Algorithm on Many-core and Cluster Architectures,”
7th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems, Berlin, Germany, September 2013.
(102.51 KB)
“Improvement of parallelization efficiency of batch pattern BP training algorithm using Open MPI,”
Proceedings of International Conference on Computational Science, ICCS 2010 (to appear), Amsterdam The Netherlands, Elsevier, June 2010.
(125.01 KB)
“Towards Portable Online Prediction of Network Utilization Using MPI-Level Monitoring,”
2019 European Conference on Parallel Processing (Euro-Par 2019), Göttingen, Germany, Springer, August 2019.
DOI: 10.1007/978-3-030-29400-7_4 (1.07 MB)
“Providing performance portable numerics for Intel GPUs,”
Concurrency and Computation: Practice and Experience, vol. 17, October 2022.
DOI: 10.1002/cpe.7400 (3.16 MB)
“Using Quantized Integer in LU Factorization with Partial Pivoting (Poster)
, Seattle, WA, SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP20), February 2020.
(6.65 MB)
Three-precision algebraic multigrid on GPUs,”
Future Generation Computer Systems, July 2023.
DOI: 10.1016/j.future.2023.07.024
“Mixed-Precision Algorithm for Finding Selected Eigenvalues and Eigenvectors of Symmetric and Hermitian Matrices,”
ICL Technical Report, no. ICL-UT-21-05, August 2021.
(3.93 MB)
“Sparse Linear Algebra on AMD and NVIDIA GPUs—The Race is On,”
ISC High Performance: Springer, June 2020.
DOI: 10.1007/978-3-030-50743-5_16 (5.63 MB)
““Recent Advances in the Message Passing Interface: 19th European MPI Users' Group Meeting, EuroMPI 2012,”
Lecture Notes in Computer Science, vol. 7490, Vienna, Austria, 00 2012.
FFT-ECP Fast Fourier Transform
, Houston, TX, 2019 ECP Annual Meeting (Research Poster), January 2019.
(1.51 MB)
Performance Evaluation for Petascale Quantum Simulation Tools,”
Proceedings of the Cray Users' Group Meeting, Atlanta, GA, May 2010.
“The Future of Computing: Software Libraries
, Savannah, GA, DOD CREATE Developers' Review, Keynote Presentation, February 2012.
(6.76 MB)
Accelerating the Reduction to Upper Hessenberg, Tridiagonal, and Bidiagonal Forms through Hybrid GPU-Based Computing,”
Parallel Computing, vol. 36, no. 12, pp. 645-654, 00 2010.
(1.39 MB)
“Dense Linear Algebra Solvers for Multicore with GPU Accelerators
, Atlanta, GA, International Parallel and Distributed Processing Symposium (IPDPS 2010), April 2010.
(956.68 KB)
Conjugate-Gradient Eigenvalue Solvers in Computing Electronic Properties of Nanostructure Architectures,”
International Journal of Computational Science and Engineering (to appear), January 2005.
(428.21 KB)
“Dense Linear Algebra for Hybrid GPU-based Systems,”
Scientific Computing with Multicore and Accelerators, Boca Raton, Florida, CRC Press, 2010.
“MAGMA - LAPACK for GPUs
, Atlanta, GA, Keeneland GPU Tutorial, April 2011.
(742.14 KB)
Dense Linear Algebra Solvers for Multicore with GPU Accelerators,”
Parallel Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on, Atlanta, GA, pp. 1-8, 2010.
DOI: 10.1109/IPDPSW.2010.5470941 (1 MB)
“Using MAGMA with PGI Fortran,”
PGI Insider, November 2010.
(176.67 KB)
“MAGMA: A Breakthrough in Solvers for Eigenvalue Problems
, San Jose, CA, GPU Technology Conference (GTC12), Presentation, May 2012.
(9.23 MB)
FFT-ECP API and High-Performance Library Prototype for 2-D and 3-D FFTs on Large-Scale Heterogeneous Systems with GPUs,”
ECP Milestone Report, no. FFT-ECP STML13-27: Innovative Computing Laboratory, University of Tennessee, January 2020.
(9.71 MB)
“Evaluation and Design of FFT for Distributed Accelerated Systems,”
ECP WBS 2.3.3.09 Milestone Report, no. FFT-ECP ST-MS-10-1216: Innovative Computing Laboratory, University of Tennessee, October 2018.
(7.53 MB)
“MAGMA - LAPACK for HPC on Heterogeneous Architectures
, Oak Ridge, TN, Titan Summit at Oak Ridge National Laboratory, Presentation, August 2011.
(20.43 MB)
Integrating Deep Learning in Domain Science at Exascale (MagmaDNN)
, virtual, DOD HPCMP seminar, December 2020.
(11.12 MB)
FFT-ECP Implementation Optimizations and Features Phase,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-19-12: University of Tennessee, October 2019.
(4.14 MB)
“Accelerating Linear Algebra with MAGMA
, Knoxville, TN, ECP Annual Meeting 2018, Tutorial, February 2018.
(35.27 MB)
Conjugate-Gradient Eigenvalue Solvers in Computing Electronic Properties of Nanostructure Architectures,”
International Journal of Computational Science and Engineering, vol. 2, no. 3/4, pp. 205-212, 00 2006.
(428.21 KB)
“Towards Dense Linear Algebra for Hybrid GPU Accelerated Manycore Systems,”
University of Tennessee Computer Science Technical Report, UT-CS-08-632 (also LAPACK Working Note 210), January 2008.
(606.41 KB)
“Comparison of Nonlinear Conjugate-Gradient methods for computing the Electronic Properties of Nanostructure Architectures,”
Proceedings of 5th International Conference on Computational Science (ICCS), Atlanta, GA, USA, Springer's Lecture Notes in Computer Science, pp. 317-325, January 2005.
(172.86 KB)
“Design and Implementation for FFT-ECP on Distributed Accelerated Systems,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-19-05: University of Tennessee, April 2019.
(3.19 MB)
“MAGMA: Evolution and Revolution
, Knoxville, TN, ICL Lunch Talk Seminar, July 2021.
(8.88 MB)
Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers
: 2010 Symposium on Application Accelerators in. High-Performance Computing (SAAHPC'10), Tutorial, July 2010.
(499.51 KB)
MAGMA Tensors and Batched Computing for Accelerating Applications on GPUs
, San Jose, CA, GPU Technology Conference (GTC17), Presentation in Session S7728, May 2017.
(11.12 MB)
Performance evaluation for petascale quantum simulation tools,”
Proceedings of CUG09, Atlanta, GA, May 2009.
(1.09 MB)
“Matrix Algebra on GPU and Multicore Architectures
, Basel, Switzerland, Workshop on GPU-enabled Numerical Libraries, Presentation, May 2011.
(49.27 MB)
Optimizing Krylov Subspace Solvers on Graphics Processing Units,”
Fourth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2014, Phoenix, AZ, IEEE, May 2014.
(536.32 KB)
“CEED ECP Milestone Report: Performance Tuning of CEED Software and 1st and 2nd Wave Apps
: Zenodo, October 2019.
DOI: 10.5281/zenodo.3477618 (8.31 MB)
Linear Algebra Prepara.on for Emergent Neural Network Architectures: MAGMA, BLAS, and Batched GPU Computing
, Virtual, LAPENNA Workshop, November 2021.
(17.8 MB)
Linear Algebra Software for High-Performance Computing (Part 2: Software for Hardware Accelerators and Coprocessors)
, Frankfurt, Germany, ISC High Performance (ISC18), Tutorial Presentation, June 2015.
(15.41 MB)
Accelerating the Reduction to Upper Hessenberg Form through Hybrid GPU-Based Computing,”
University of Tennessee Computer Science Technical Report, UT-CS-09-642 (also LAPACK Working Note 219), May 2009.
(2.37 MB)
“