Publications
Export 192 results:
Filters: Author is Piotr Luszczek [Clear All Filters]
Accelerating Restarted GMRES with Mixed Precision Arithmetic,”
IEEE Transactions on Parallel and Distributed Systems, June 2021.
DOI: 10.1109/TPDS.2021.3090757 (572.4 KB)
“Accelerating Scientific Computations with Mixed Precision Algorithms,”
Computer Physics Communications, vol. 180, issue 12, pp. 2526-2533, December 2009.
DOI: 10.1016/j.cpc.2008.11.005 (402.69 KB)
“Achieving Numerical Accuracy and High Performance using Recursive Tile LU Factorization,”
University of Tennessee Computer Science Technical Report (also as a LAWN), no. ICL-UT-11-08, September 2011.
(618.53 KB)
“Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting,”
Concurrency and Computation: Practice and Experience, vol. 26, issue 7, pp. 1408-1431, May 2014.
DOI: 10.1002/cpe.3110 (1.96 MB)
“Analysis of the Communication and Computation Cost of FFT Libraries towards Exascale,”
ICL Technical Report, no. ICL-UT-22-07: Innovative Computing Laboratory, July 2022.
(5.91 MB)
“Autotuning Batch Cholesky Factorization in CUDA with Interleaved Layout of Matrices,”
Parallel and Distributed Processing Symposium Workshops (IPDPSW), Orlando, FL, IEEE, June 2017.
DOI: 10.1109/IPDPSW.2017.18
“Autotuning Numerical Dense Linear Algebra for Batched Computation With GPU Hardware Accelerators,”
Proceedings of the IEEE, vol. 106, issue 11, pp. 2040–2055, November 2018.
DOI: 10.1109/JPROC.2018.2868961 (2.53 MB)
“Autotuning Techniques for Performance-Portable Point Set Registration in 3D,”
Supercomputing Frontiers and Innovations, vol. 5, no. 4, December 2018.
DOI: 10.14529/jsfi180404 (720.15 KB)
“Batched BLAS (Basic Linear Algebra Subprograms) 2018 Specification
, July 2018.
(483.05 KB)
Bringing High Performance Computing to Big Data Algorithms,”
Handbook of Big Data Technologies: Springer, 2017.
DOI: 10.1007/978-3-319-49340-4 (1.22 MB)
“C++ API for Batch BLAS,”
SLATE Working Notes, no. 04, ICL-UT-17-12: University of Tennessee, December 2017.
(1.89 MB)
“C++ API for BLAS and LAPACK,”
SLATE Working Notes, no. 02, ICL-UT-17-03: Innovative Computing Laboratory, University of Tennessee, June 2017.
(1.12 MB)
“The Case for Directive Programming for Accelerator Autotuner Optimization,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-17-07: University of Tennessee, October 2017.
(341.52 KB)
“CholeskyQR with Randomization and Pivoting for Tall Matrices (CQRRPT)
: arXiv, February 2024.
Clover: Computational Libraries Optimized via Exascale Research
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(872 KB)
A Collection of White Papers from the BDEC2 Workshop in Poznan, Poland,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-19-10: University of Tennessee, Knoxville, May 2019.
(5.82 MB)
“Communication Avoiding 2D Stencil Implementations over PaRSEC Task-Based Runtime,”
2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), New Orleans, LA, IEEE, May 2020.
DOI: 10.1109/IPDPSW50202.2020.00127 (1.33 MB)
“Dense Linear Algebra on Distributed Heterogeneous Hardware with a Symbolic DAG Approach,”
Scalable Computing and Communications: Theory and Practice: John Wiley & Sons, pp. 699-735, March 2013.
(1.01 MB)
“Design and Implementation of the PULSAR Programming System for Large Scale Computing,”
Supercomputing Frontiers and Innovations, vol. 4, issue 1, 2017.
DOI: 10.14529/jsfi170101 (764.96 KB)
“Designing SLATE: Software for Linear Algebra Targeting Exascale,”
SLATE Working Notes, no. 03, ICL-UT-17-06: Innovative Computing Laboratory, University of Tennessee, October 2017.
(2.8 MB)
“Distributed Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA,”
University of Tennessee Computer Science Technical Report, UT-CS-10-660, September 2010.
(366.26 KB)
“Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-10-02, 00 2010.
(400.75 KB)
“Docker Container based PaaS Cloud Computing Comprehensive Benchmarks using LAPACK,”
Computer Modeling and Intelligent Systems CMIS-2020, Zaporizhzhoa, March 2020.
(451.33 KB)
“