Publications
Export 971 results:
Filters: Author is Jack Dongarra [Clear All Filters]
Optimizing GPU Kernels for Irregular Batch Workloads: A Case Study for Cholesky Factorization,”
IEEE High Performance Extreme Computing Conference (HPEC’18), Waltham, MA, IEEE, September 2018.
(729.87 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Optimizing Symmetric Dense Matrix-Vector Multiplication on GPUs,”
ACM/IEEE Conference on Supercomputing (SC’11), Seattle, WA, November 2011.
(630.63 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Optimizing the SVD Bidiagonalization Process for a Batch of Small Matrices,”
International Conference on Computational Science (ICCS 2017), Zurich, Switzerland, Procedia Computer Science, June 2017.
DOI: 10.1016/j.procs.2017.05.237
(364.95 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Out of Memory SVD Solver for Big Data,”
2017 IEEE High Performance Extreme Computing Conference (HPEC'17), Waltham, MA, IEEE, September 2017.
(1.33 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
PAPI: Counting outside the Box
, Barcelona, Spain, 8th JLESC Meeting, April 2018.
PAPI Software-Defined Events for in-Depth Performance Analysis,”
The International Journal of High Performance Computing Applications, vol. 33, issue 6, pp. 1113-1127, November 2019.
(442.39 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
PAPI's New Software-Defined Events for In-Depth Performance Analysis
, Lyon, France, CCDSC 2018: Workshop on Clusters, Clouds, and Data for Scientific Computing, September 2018.
PAPI's new Software-Defined Events for in-depth Performance Analysis
, Dresden, Germany, 13th Parallel Tools Workshop, September 2019.
(3.14 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
PAQR: Pivoting Avoiding QR factorization,”
ICL Technical Report, no. ICL-UT-22-06, June 2022.
(364.85 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Parallel BLAS Performance Report,”
SLATE Working Notes, no. 05, ICL-UT-18-01: University of Tennessee, April 2018.
(4.39 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Parallel Block Hessenberg Reduction using Algorithms-By-Tiles for Multicore Architectures Revisited,”
University of Tennessee Computer Science Technical Report, UT-CS-08-624 (also LAPACK Working Note 208), August 2008.
(420.31 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Parallel Dense Linear Algebra Software in the Multicore Era,”
in Cyberinfrastructure Technologies and Applications: Nova Science Publishers, Inc., pp. 9-24, 00 2009.
“Parallel Norms Performance Report,”
SLATE Working Notes, no. 06, ICL-UT-18-06: Innovative Computing Laboratory, University of Tennessee, June 2018.
(1.13 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Parallel Processing and Applied Mathematics: 13th International Conference, PPAM 2019, Bialystok, Poland, September 8–11, 2019, Revised Selected Papers, Part II,”
Lecture Notes in Computer Science, no. 12044: Springer International Publishing, pp. 503, March 2020.
DOI: 10.1007/978-3-030-43222-5
“Parallel Processing and Applied Mathematics: 13th International Conference, PPAM 2019, Bialystok, Poland, September 8–11, 2019, Revised Selected Papers, Part I,”
Lecture Notes in Computer Science, 1, no. 12043: Springer International Publishing, pp. 581, March 2020.
DOI: 10.1007/978-3-030-43229-4
“Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems,”
Supercomputing Frontiers and Innovations, vol. 2, no. 4, October 2015.
DOI: 10.14529/jsfi1504
(3.68 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Parallel Reduction to Condensed Forms for Symmetric Eigenvalue Problems using Aggregated Fine-Grained and Memory-Aware Kernels,”
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC11), Seattle, WA, November 2011.
(636.01 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Parallel Tiled QR Factorization for Multicore Architectures,”
University of Tennessee Computer Science Dept. Technical Report, UT-CS-07-598 (also LAPACK Working Note 190), 00 2007.
(277.92 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A parallel tiled solver for dense symmetric indefinite systems on multicore architectures,”
University of Tennessee Computer Science Technical Report, no. ICL-UT-11-07, October 2011.
(544.2 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
ParILUT - A New Parallel Threshold ILU,”
SIAM Journal on Scientific Computing, vol. 40, issue 4: SIAM, pp. C503–C519, July 2018.
DOI: 10.1137/16M1079506
(19.26 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
ParILUT – A Parallel Threshold ILU for GPUs,”
IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019.
DOI: 10.1109/IPDPS.2019.00033
(505.95 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
PaRSEC: Exploiting Heterogeneity to Enhance Scalability,”
IEEE Computing in Science and Engineering, vol. 15, issue 6, pp. 36-45, November 2013.
DOI: 10.1109/MCSE.2013.98
(2.16 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
PaRSEC in Practice: Optimizing a Legacy Chemistry Application through Distributed Task-Based Execution,”
2015 IEEE International Conference on Cluster Computing, Chicago, IL, IEEE, September 2015.
(1.77 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)