Publications
Export 987 results:
Filters: Author is Dongarra, Jack [Clear All Filters]
Algorithm-Based Fault Tolerance for Fail-Stop Failures,”
IEEE Transactions on Parallel and Distributed Systems, vol. 19, no. 12, January 2008.
(340.49 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Algorithmic Based Fault Tolerance Applied to High Performance Computing,”
University of Tennessee Computer Science Technical Report, UT-CS-08-620 (also LAPACK Working Note 205), January 2008.
(313.55 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Algorithmic Based Fault Tolerance Applied to High Performance Computing,”
Journal of Parallel and Distributed Computing, vol. 69, pp. 410-416, 00 2009.
(313.55 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Algorithmic Issues on Heterogeneous Computing Platforms,”
Parallel Processing Letters, vol. 9, no. 2, pp. 197-213, January 1999.
(301.17 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Algorithmic Redistribution Methods for Block Cyclic Decompositions,”
IEEE Transactions on Parallel and Distributed Computing, vol. 10, no. 12, pp. 201-220, October 2002.
(524.82 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
On Algorithmic Variants of Parallel Gaussian Elimination: Comparison of Implementations in Terms of Performance and Numerical Properties,”
University of Tennessee Computer Science Technical Report, no. UT-CS-13-715, July 2013, 2012.
(358.98 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Algorithms and Optimization Techniques for High-Performance Matrix-Matrix Multiplications of Very Small Matrices,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-18-09: Innovative Computing Laboratory, University of Tennessee, September 2018.
(3.74 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Algorithms and Optimization Techniques for High-Performance Matrix-Matrix Multiplications of Very Small Matrices,”
Parallel Computing, vol. 81, pp. 1–21, January 2019.
(3.27 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Analysis and Design Techniques towards High-Performance and Energy-Efficient Dense Linear Solvers on GPUs,”
IEEE Transactions on Parallel and Distributed Systems, vol. 29, issue 12, pp. 2700–2712, December 2018.
(2.53 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Analysis of Dynamically Scheduled Tile Algorithms for Dense Linear Algebra on Multicore Architectures,”
Submitted to Concurrency and Computations: Practice and Experience, November 2010.
(1.65 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Analysis of Dynamically Scheduled Tile Algorithms for Dense Linear Algebra on Multicore Architectures,”
University of Tennessee Computer Science Technical Report, UT-CS-11-666, (also Lawn 243), March 2011.
(1.65 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Analysis of the Communication and Computation Cost of FFT Libraries towards Exascale,”
ICL Technical Report, no. ICL-UT-22-07: Innovative Computing Laboratory, July 2022.
(5.91 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Analysis of Various Scalar, Vector, and Parallel Implementations of RandomAccess,”
Innovative Computing Laboratory (ICL) Technical Report, no. ICL-UT-10-03, June 2010.
(226.9 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Analytical Modeling and Optimization for Affinity Based Thread Scheduling on Multicore Systems,”
IEEE Cluster 2009, New Orleans, August 2009.
(395.53 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Analytical Modeling for Affinity-Based Thread Scheduling on Multicore Platforms,”
University of Tennessee Computer Science Technical Report, UT-CS-08-626, January 2008.
(650.75 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Analyzing Performance of BiCGStab with Hierarchical Matrix on GPU Clusters,”
IEEE International Parallel and Distributed Processing Symposium (IPDPS), Vancouver, BC, Canada, IEEE, May 2018.
(1.37 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Anatomy of a Globally Recursive Embedded LINPACK Benchmark,”
2012 IEEE High Performance Extreme Computing Conference, Waltham, MA, pp. 1-6, September 2012.
(204.74 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Applying Aspect-Oriented Programming Concepts to a Component-based Programming Model,”
IPDPS 2003, Workshop on NSF-Next Generation Software, Nice, France, March 2003.
(66.99 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
ASCR@40: Four Decades of Department of Energy Leadership in Advanced Scientific Computing Research
: Advanced Scientific Computing Advisory Committee (ASCAC), US Department of Energy, August 2020.
ASCR@40: Highlights and Impacts of ASCR’s Programs
: US Department of Energy’s Office of Advanced Scientific Computing Research, June 2020.
Assessing the Cost of Redistribution followed by a Computational Kernel: Complexity and Performance Results,”
Parallel Computing, vol. 52, pp. 22-41, February 2016.
(2.06 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Assessing the impact of ABFT and Checkpoint composite strategies,”
University of Tennessee Computer Science Technical Report, no. ICL-UT-13-03, 2013.
(968.47 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Assessing the Impact of ABFT and Checkpoint Composite Strategies,”
16th Workshop on Advances in Parallel and Distributed Computational Models, IPDPS 2014, Phoenix, AZ, IEEE, May 2014.
(1.02 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
An Asynchronous Algorithm on NetSolve Global Computing System,”
PRiSM - Laboratoire de recherche en informatique, Université de Versailles St-Quentin Technical Report, March 2004.
(377.33 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
An Asynchronous Algorithm on NetSolve Global Computing System,”
Future Generation Computer Systems, vol. 22, issue 3, pp. 279-290, February 2006.
(568.92 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Asynchronous Iterative Algorithm for Computing Incomplete Factorizations on GPUs,”
International Supercomputing Conference (ISC 2015), Frankfurt, Germany, July 2015.
“Asynchronous SGD for DNN Training on Shared-Memory Parallel Architectures,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-20-04: University of Tennessee, Knoxville, March 2020.
(188.51 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Asynchronous SGD for DNN Training on Shared-Memory Parallel Architectures,”
Workshop on Scalable Deep Learning over Parallel And Distributed Infrastructures (ScaDL 2020), May 2020.
(188.51 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Atlanta Organizers Put Mathematics to Work For the Math Sciences Community,”
SIAM News, vol. 32, no. 6, January 1999.
(45.98 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
ATLAS on the BlueGene/L – Preliminary Results,”
ICL Technical Report, no. ICL-UT-06-10, January 2006.
(46.19 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Automated Empirical Optimization of Software and the ATLAS Project,”
Parallel Computing, vol. 27, no. 1-2, pp. 3-25, January 2001.
(370.71 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Automated Empirical Optimizations of Software and the ATLAS Project (LAPACK Working Note 147),”
University of Tennessee Computer Science Department Technical Report,, no. UT-CS-00-448, September 2000.
(373.69 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Automated Empirical Tuning of a Multiresolution Analysis Kernel,”
ICL Technical Report, no. ICL-UT-07-01, pp. 10, January 2007.
(120.7 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Automatic analysis of inefficiency patterns in parallel applications,”
Concurrency and Computation: Practice and Experience, Special issue "Automatic Performance Analysis" (submitted), 00 2005.
(233.31 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Automatic Analysis of Inefficiency Patterns in Parallel Applications,”
Concurrency and Computation: Practice and Experience, vol. 19, no. 11, pp. 1481-1496, August 2007.
(233.31 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Automatic Blocking of QR and LU Factorizations for Locality,”
2nd ACM SIGPLAN Workshop on Memory System Performance (MSP 2004), Washington, DC, ACM, June 2004.
(212.77 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Automatic Experimental Analysis of Communication Patterns in Virtual Topologies,”
In Proceedings of the International Conference on Parallel Processing, Oslo, Norway, IEEE Computer Society, June 2005.
(227.13 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load,”
EuroPar 2002, Paderborn, Germany, August 2002.
(92.59 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Automatic Translation of Fortran to JVM Bytecode,”
Joint ACM Java Grande - ISCOPE 2001 Conference (submitted), Stanford University, California, June 2001.
(185.8 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Automatic Translation of Fortran to JVM Bytecode,”
Concurrency and Computation: Practice and Experience, vol. 15, no. 3-5, pp. 202-207, 00 2003.
(185.8 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Automatically Tuned Collective Communications,”
Proceedings of SuperComputing 2000 (SC'2000), Dallas, TX, November 2000.
(232.69 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Automatically Tuned Linear Algebra Software,”
1998 ACM/IEEE conference on Supercomputing (SC '98), Orlando, FL, IEEE Computer Society, November 1998.
“Automating the Large-Scale Collection and Analysis of Performance,”
5th LCI International Conference on Linux Clusters: The HPC Revolution, Austin, Texas, May 2004.
(511.6 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Autotuning Batch Cholesky Factorization in CUDA with Interleaved Layout of Matrices,”
Parallel and Distributed Processing Symposium Workshops (IPDPSW), Orlando, FL, IEEE, June 2017.
“Autotuning Dense Linear Algebra Libraries on GPUs
, Basel, Switzerland, Sixth International Workshop on Parallel Matrix Algorithms and Applications (PMAA 2010), June 2010.
(579.44 KB)
![application/pdf](/modules/file/icons/application-pdf.png)
Autotuning GEMM Kernels for the Fermi GPU,”
IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 11, November 2012.
(742.5 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Autotuning GEMMs for Fermi,”
University of Tennessee Computer Science Technical Report, UT-CS-11-671, (also Lawn 245), April 2011.
(397.45 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Autotuning in High-Performance Computing Applications,”
Proceedings of the IEEE, vol. 106, issue 11, pp. 2068–2083, November 2018.
(2.5 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Autotuning Numerical Dense Linear Algebra for Batched Computation With GPU Hardware Accelerators,”
Proceedings of the IEEE, vol. 106, issue 11, pp. 2040–2055, November 2018.
(2.53 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Autotuning Techniques for Performance-Portable Point Set Registration in 3D,”
Supercomputing Frontiers and Innovations, vol. 5, no. 4, December 2018.
(720.15 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)