Publications
Export 971 results:
Filters: Author is Jack Dongarra [Clear All Filters]
MAGMA Batched: A Batched BLAS Approach for Small Matrix Factorizations and Applications on GPUs,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-16-02: University of Tennessee, August 2016.
(929.79 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA Embedded: Towards a Dense Linear Algebra Library for Energy Efficient Extreme Computing,”
2015 IEEE High Performance Extreme Computing Conference (HPEC ’15), (Best Paper Award), Waltham, MA, IEEE, September 2015.
(678.86 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA - LAPACK for HPC on Heterogeneous Architectures
, Oak Ridge, TN, Titan Summit at Oak Ridge National Laboratory, Presentation, August 2011.
(20.43 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA MIC: Linear Algebra Library for Intel Xeon Phi Coprocessors
, Salt Lake City, UT, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), November 2012.
(6.4 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA MIC: Optimizing Linear Algebra for Intel Xeon Phi
, Frankfurt, Germany, ISC High Performance (ISC15), Intel Booth Presentation, June 2015.
(2.03 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA Templates for Scalable Linear Algebra on Emerging Architectures,”
The International Journal of High Performance Computing Applications, vol. 34, issue 6, pp. 645-658, November 2020.
“MagmaDNN 0.2 High-Performance Data Analytics for Manycore GPUs and CPUs
: University of Tennessee, January 2019.
(7.84 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MagmaDNN – High-Performance Data Analytics for Manycore GPUs and CPUs
, Knoxville, TN, 2017 Summer Research Experiences for Undergraduate (REU), Presentation, December 2017.
(5.06 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing,”
ISC High Performance, Frankfurt, Germany, Springer International Publishing, June 2019.
(1.37 MB)
(8.72 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA-sparse Interface Design Whitepaper,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-17-05, September 2017.
(1.28 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
The Marketplace for High-Performance Computers,”
Parallel Computing, vol. 25, no. 13-14, pp. 1517-1545, October 2002.
(285.78 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Massively Parallel Automated Software Tuning,”
48th International Conference on Parallel Processing (ICPP 2019), Kyoto, Japan, ACM Press, August 2019.
(911.88 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
MATEDOR: MAtrix, TEnsor, and Deep-learning Optimized Routines
, Dallas, TX, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), Research Poster, November 2018.
(2.55 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Matrices Over Runtime Systems at Exascale,”
Supercomputing '12 (poster), Salt Lake City, Utah, November 2012.
“Matrix Multiplication on Batches of Small Matrices in Half and Half-Complex Precisions,”
Journal of Parallel and Distributed Computing, vol. 145, pp. 188-201, November 2020.
(1.3 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Matrix Powers Kernels for Thick-Restart Lanczos with Explicit External Deflation,”
International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019.
(480.73 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Matrix Product on Heterogeneous Master Worker Platforms,”
2008 PPoPP Conference, Salt Lake City, Utah, January 2008.
“MAtrix, TEnsor, and Deep-learning Optimized Routines (MATEDOR)
, Washington, DC, NSF PI Meeting, Poster, April 2018.
(2.4 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Measuring Computer Performance: A Practioner's Guide,”
SIAM Review (book review), vol. 43, no. 2, pp. 383-384, 00 2001.
(558.9 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Memory Traffic and Complete Application Profiling with PAPI Multi-Component Measurements
, St. Petersburg, FL, 28th HIPS Workshop, May 2023.
(3.99 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Memory Traffic and Complete Application Profiling with PAPI Multi-Component Measurements,”
2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), St. Petersburg, Florida, IEEE, August 2023.
(1.81 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Message Passing Software Systems,”
Encyclopedia of Electrical and Engineering, Supplement 1: John Wiley & Sons, Inc., 00 2000.
(289.38 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Metascheduler For The Grid,”
Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing (HPDC 2002), Edinburgh, Scotland, IEEE Computer Society, pp. 343-351, July 2002.
(99.53 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
MIAMI: A Framework for Application Performance Diagnosis ,”
IPASS-2014, Monterey, CA, IEEE, March 2014.
(1010.75 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Middleware for the Use of Storage in Communication,”
Parallel Computing, vol. 28, no. 12, pp. 1773-1788, August 2002.
(87.97 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Mixed precision and approximate 3D FFTs: Speed for accuracy trade-off with GPU-aware MPI and run-time data compression,”
ICL Technical Report, no. ICL-UT-22-04, May 2022.
(706.14 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems,”
International Journal of High Performance Computer Applications (to appear), August 2007.
(157.4 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Mixed-Precision Algorithm for Finding Selected Eigenvalues and Eigenvectors of Symmetric and Hermitian Matrices,”
ICL Technical Report, no. ICL-UT-21-05, August 2021.
(3.93 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Mixed-precision Block Gram Schmidt Orthogonalization,”
6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Austin, TX, ACM, November 2015.
(235.69 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Mixed-Precision Cholesky QR Factorization and its Case Studies on Multicore CPU with Multiple GPUs,”
SIAM Journal on Scientific Computing, vol. 37, no. 3, pp. C203-C330, May 2015.
(374.8 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Mixed-Precision Iterative Refinement using Tensor Cores on GPUs to Accelerate Solution of Linear Systems,”
Proceedings of the Royal Society A, vol. 476, issue 2243, November 2020.
(2.24 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Mixed-precision orthogonalization process Performance on multicore CPUs with GPUs,”
2015 SIAM Conference on Applied Linear Algebra, Atlanta, GA, SIAM, October 2015.
(301.01 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Mixed-precision orthogonalization scheme and adaptive step size for CA-GMRES on GPUs,”
VECPAR 2014 (Best Paper), Eugene, OR, June 2014.
(438.54 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Mixed-Precision Solution of Linear Systems Using Accelerator-Based Computing,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-20-05: University of Tennessee, May 2020.
(1.03 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Mixed-Tool Performance Analysis on Hybrid Multicore Architectures,”
First International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI 2010), San Diego, CA, September 2010.
(1.24 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Mixing LU-QR Factorization Algorithms to Design High-Performance Dense Linear Algebra Solvers,”
Journal of Parallel and Distributed Computing, vol. 85, pp. 32-46, November 2015.
(5.06 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Model-Driven One-Sided Factorizations on Multicore, Accelerated Systems,”
Supercomputing Frontiers and Innovations, vol. 1, issue 1, 2014.
(1.86 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Modeling of L2 Cache Behavior for Thread-Parallel Scientific Programs on Chip Multi-Processors,”
University of Tennessee Computer Science Technical Report, no. UT-CS-06-583, January 2006.
(652.93 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A More Portable HeFFTe: Implementing a Fallback Algorithm for Scalable Fourier Transforms,”
ICL Technical Report, no. ICL-UT-21-04: University of Tennessee, August 2021.
(493.17 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
MPI Collective Algorithm Selection and Quadtree Encoding,”
ICL Technical Report, no. ICL-UT-06-11, 00 2006.
(308.39 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
MPI Collective Algorithm Selection and Quadtree Encoding,”
Parallel Computing (Special Edition: EuroPVM/MPI 2006): Elsevier, 00 2007.
(308.39 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
MPI Collective Algorithm Selection and Quadtree Encoding,”
Lecture Notes in Computer Science, vol. 4192, no. ICL-UT-06-13: Springer Berlin / Heidelberg, pp. 40-48, September 2006.
(308.39 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
MPI - The Complete Reference, Volume 1: The MPI Core
, Second, Cambridge, MA, USA, MIT Press, pp. 426, August 1998.
Multi-criteria checkpointing strategies: optimizing response-time versus resource utilization,”
University of Tennessee Computer Science Technical Report, no. ICL-UT-13-01, February 2013.
(497.64 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Multi-criteria Checkpointing Strategies: Response-Time versus Resource Utilization,”
Euro-Par 2013, Aachen, Germany, Springer, August 2013.
(431.84 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Multithreading for synchronization tolerance in matrix factorization,”
Journal of Physics: Conference Series, SciDAC 2007, vol. 78, no. 2007, January 2007.
(577.73 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Multithreading in the PLASMA Library,”
Multi and Many-Core Processing: Architecture, Programming, Algorithms, & Applications: Taylor & Francis, 00 2013.
(536.28 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
NanoPSE: A Nanoscience Problem Solving Environment for Atomistic Electronic Structure of Semiconductor Nanostructures,”
Journal of Physics: Conference Series, issue 16, pp. 277-282, June 2005.
(476.64 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
National HPCC Software Exchange (NHSE): Uniting the High Performance Computing and Communications Community,”
D-Lib Magazine, January 1998.
(56.15 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
NetBuild,”
University of Tennessee Computer Science Technical Report, no. UT-CS-O1-461, January 2001.
(17.71 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)