Publications
Export 31 results:
Filters: Author is Jakub Kurzak [Clear All Filters]
Scheduling Linear Algebra Operations on Multicore Processors,”
Concurrency Practice and Experience (to appear), 00 2009.
(716.18 KB)
“Multithreading in the PLASMA Library,”
Multi and Many-Core Processing: Architecture, Programming, Algorithms, & Applications: Taylor & Francis, 00 2013.
(536.28 KB)
“Massively Parallel Automated Software Tuning,”
48th International Conference on Parallel Processing (ICPP 2019), Kyoto, Japan, ACM Press, August 2019.
(911.88 KB)
“The PlayStation 3 for High Performance Scientific Computing,”
Computing in Science and Engineering, pp. 80-83, January 2008.
(2.45 MB)
“Autotuning GEMMs for Fermi,”
University of Tennessee Computer Science Technical Report, UT-CS-11-671, (also Lawn 245), April 2011.
(397.45 KB)
“Design and Implementation of the PULSAR Programming System for Large Scale Computing,”
Supercomputing Frontiers and Innovations, vol. 4, issue 1, 2017.
(764.96 KB)
“Implementing Linear Algebra Routines on Multi-Core Processors with Pipelining and a Look Ahead,”
University of Tennessee Computer Science Tech Report, UT-CS-06-581, LAPACK Working Note #178, January 2006.
(304.4 KB)
“QR Factorization for the CELL Processor,”
Scientific Programming (to appear), 00 2009.
(234.02 KB)
“LU Factorization with Partial Pivoting for a Multicore System with Accelerators,”
IEEE Transactions on Parallel and Distributed Computing, vol. 24, issue 8, pp. 1613-1621, August 2013.
(1.08 MB)
“SLATE Working Note 12: Implementing Matrix Inversions,”
SLATE Working Notes, no. 12, ICL-UT-19-04: Innovative Computing Laboratory, University of Tennessee, June 2019.
(1.95 MB)
“QR Factorization for the CELL Processor,”
University of Tennessee Computer Science Technical Report, UT-CS-08-616 (also LAPACK Working Note 201), May 2008.
(194.95 KB)
“QR Factorization for the CELL Processor,”
Scientific Programming, vol. 17, no. 1-2, pp. 31-42, 00 2010.
(194.95 KB)
“Designing SLATE: Software for Linear Algebra Targeting Exascale,”
SLATE Working Notes, no. 03, ICL-UT-17-06: Innovative Computing Laboratory, University of Tennessee, October 2017.
(2.8 MB)
“Implementation of the Mixed-Precision High Performance LINPACK Benchmark on the CELL Processor,”
University of Tennessee Computer Science Tech Report, no. UT-CS-06-580, LAPACK Working Note #177, September 2006.
(506.18 KB)
“Scheduling Linear Algebra Operations on Multicore Processors,”
University of Tennessee Computer Science Department Technical Report, UT-CS-09-636 (Also LAPACK Working Note 213), 00 2009.
(716.18 KB)
“Virtual Systolic Array for QR Decomposition,”
15th Workshop on Advances in Parallel and Distributed Computational Models, IEEE International Parallel & Distributed Processing Symposium (IPDPS 2013), Boston, MA, IEEE, May 2013.
(749.84 KB)
“Linear Systems Performance Report,”
SLATE Working Notes, no. 08, ICL-UT-18-08: Innovative Computing Laboratory, University of Tennessee, September 2018.
(1.64 MB)
“The PlayStation 3 for High Performance Scientific Computing,”
University of Tennessee Computer Science Technical Report, no. UT-CS-08-608, January 2008.
(2.45 MB)
“Scheduling Dense Linear Algebra Operations on Multicore Processors,”
Concurrency and Computation: Practice and Experience, vol. 22, no. 1, pp. 15-44, January 2010.
(1.23 MB)
“Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs,”
IEEE Transactions on Parallel and Distributed Systems, no. 1045-9219, November 2015.
“Fully Dynamic Scheduler for Numerical Computing on Multicore Processors,”
University of Tennessee Computer Science Department Technical Report, UT-CS-09-643 (Also LAPACK Working Note 220), 00 2009.
(488.24 KB)
“An Implementation of the Tile QR Factorization for a GPU and Multiple CPUs,”
Applied Parallel and Scientific Computing, vol. 7133, pp. 248-257, 00 2012.
(623.5 KB)
“Parallel Norms Performance Report,”
SLATE Working Notes, no. 06, ICL-UT-18-06: Innovative Computing Laboratory, University of Tennessee, June 2018.
(1.13 MB)
“Implementation of Mixed Precision in Solving Systems of Linear Equations on the Cell Processor,”
Concurrency and Computation: Practice and Experience, vol. 19, no. 10, pp. 1371-1385, July 2007.
(453.78 KB)
“Dependency-Driven Scheduling of Dense Matrix Factorizations on Shared-Memory Systems,”
PPAM 2009, Poland, September 2009.
“Preliminary Results of Autotuning GEMM Kernels for the NVIDIA Kepler Architecture,”
LAWN 267, 00 2012.
(1.14 MB)
“Autotuning GEMM Kernels for the Fermi GPU,”
IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 11, November 2012.
(742.5 KB)
“Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization,”
IEEE Transactions on Parallel and Distributed Systems, vol. 19, no. 9, pp. 1-11, January 2008.
(751.57 KB)
“Programming the LU Factorization for a Multicore System with Accelerators,”
Proceedings of VECPAR’12, Kobe, Japan, April 2012.
(414.33 KB)
“Parallel BLAS Performance Report,”
SLATE Working Notes, no. 05, ICL-UT-18-01: University of Tennessee, April 2018.
(4.39 MB)
“Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization,”
UT Computer Science Technical Report (Also LAPACK Working Note 184), no. UT-CS-07-596, January 2007.
(751.57 KB)
“