Publications
Evaluation of Dataflow Programming Models for Electronic Structure Theory,”
Concurrency and Computation: Practice and Experience: Special Issue on Parallel and Distributed Algorithms, vol. 2018, issue e4490, pp. 1–20, May 2018.
DOI: 10.1002/cpe.4490 (1.69 MB)
“Bidiagonalization and R-Bidiagonalization: Parallel Tiled Algorithms, Critical Paths and Distributed-Memory Implementation,”
IEEE International Parallel and Distributed Processing Symposium (IPDPS), Orlando, FL, IEEE, May 2017.
DOI: 10.1109/IPDPS.2017.46 (328.15 KB)
“ A Data Flow Divide and Conquer Algorithm for Multicore Architecture,”
29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Hyderabad, India, IEEE, May 2015.
(535.44 KB)
“Hierarchical DAG scheduling for Hybrid Distributed Systems,”
29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Hyderabad, India, IEEE, May 2015.
(1.11 MB)
“Mixing LU-QR Factorization Algorithms to Design High-Performance Dense Linear Algebra Solvers,”
Journal of Parallel and Distributed Computing, vol. 85, pp. 32-46, November 2015.
DOI: doi:10.1016/j.jpdc.2015.06.007 (5.06 MB)
“A Survey of Recent Developments in Parallel Implementations of Gaussian Elimination,”
Concurrency and Computation: Practice and Experience, vol. 27, issue 5, pp. 1292-1309, April 2015.
DOI: 10.1002/cpe.3306 (783.45 KB)
“Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting,”
Concurrency and Computation: Practice and Experience, vol. 26, issue 7, pp. 1408-1431, May 2014.
DOI: 10.1002/cpe.3110 (1.96 MB)
“Designing LU-QR Hybrid Solvers for Performance and Stability,”
IPDPS 2014, Phoenix, AZ, IEEE, May 2014.
DOI: 10.1109/IPDPS.2014.108 (4.2 MB)
“Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes,”
23rd International Heterogeneity in Computing Workshop, IPDPS 2014, Phoenix, AZ, IEEE, May 2014.
(807.33 KB)
“Designing LU-QR hybrid solvers for performance and stability,”
University of Tennessee Computer Science Technical Report (also LAWN 282), no. ut-eecs-13-719: University of Tennessee, October 2013.
(4.11 MB)
“Hierarchical QR Factorization Algorithms for Multi-core Cluster Systems,”
Parallel Computing, vol. 39, issue 4-5, pp. 212-232, May 2013.
(1.43 MB)
“Implementing a systolic algorithm for QR factorization on multicore clusters with PaRSEC,”
Lawn 277, no. UT-CS-13-709, May 2013.
(298.63 KB)
“Multithreading in the PLASMA Library,”
Multi and Many-Core Processing: Architecture, Programming, Algorithms, & Applications: Taylor & Francis, 00 2013.
(536.28 KB)
“PaRSEC: Exploiting Heterogeneity to Enhance Scalability,”
IEEE Computing in Science and Engineering, vol. 15, issue 6, pp. 36-45, November 2013.
DOI: 10.1109/MCSE.2013.98 (2.16 MB)
“On Algorithmic Variants of Parallel Gaussian Elimination: Comparison of Implementations in Terms of Performance and Numerical Properties,”
University of Tennessee Computer Science Technical Report, no. UT-CS-13-715, July 2013, 2012.
(358.98 KB)
“Hierarchical QR Factorization Algorithms for Multi-Core Cluster Systems,”
IPDPS 2012, the 26th IEEE International Parallel and Distributed Processing Symposium, Shanghai, China, IEEE Computer Society Press, May 2012.
(405.71 KB)
“Programming the LU Factorization for a Multicore System with Accelerators,”
Proceedings of VECPAR’12, Kobe, Japan, April 2012.
(414.33 KB)
“Achieving Numerical Accuracy and High Performance using Recursive Tile LU Factorization,”
University of Tennessee Computer Science Technical Report (also as a LAWN), no. ICL-UT-11-08, September 2011.
(618.53 KB)
“Exploiting Fine-Grain Parallelism in Recursive LU Factorization,”
Proceedings of PARCO'11, no. ICL-UT-11-04, Gent, Belgium, April 2011.
“Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA,”
Proceedings of the Workshops of the 25th IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2011 Workshops), Anchorage, Alaska, USA, IEEE, pp. 1432-1441, May 2011.
(1.26 MB)
“Hierarchical QR Factorization Algorithms for Multi-Core Cluster Systems,”
University of Tennessee Computer Science Technical Report (also Lawn 257), no. UT-CS-11-684, October 2011.
(405.71 KB)
“High Performance Matrix Inversion Based on LU Factorization for Multicore Architectures,”
Proceedings of MTAGS11, Seattle, WA, November 2011.
(879.49 KB)
“LU Factorization for Accelerator-Based Systems,”
IEEE/ACS AICCSA 2011, Sharm-El-Sheikh, Egypt, December 2011.
(234.86 KB)
“An open-source tool-chain for performance analysis,”
Parallel Tools Workshop, Dresden, Germany, September 2011.
(622.1 KB)
“Towards a Parallel Tile LDL Factorization for Multicore Architectures,”
ICL Technical Report, no. ICL-UT-11-03, Seattle, WA, April 2011.
(425.45 KB)
“Distributed Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA,”
University of Tennessee Computer Science Technical Report, UT-CS-10-660, September 2010.
(366.26 KB)
“Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-10-02, 00 2010.
(400.75 KB)
“EZTrace: a generic framework for performance analysis,”
ICL Technical Report, no. ICL-UT-11-01, December 2010.
“QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators,”
Proceedings of IPDPS 2011, no. ICL-UT-10-04, Anchorage, AK, October 2010.
(468.17 KB)
“Using MAGMA with PGI Fortran,”
PGI Insider, November 2010.
(176.67 KB)
“