Publications
A Survey of Recent Developments in Parallel Implementations of Gaussian Elimination,”
Concurrency and Computation: Practice and Experience, vol. 27, issue 5, pp. 1292-1309, April 2015.
DOI: 10.1002/cpe.3306 (783.45 KB)
“Towards a High-Performance Tensor Algebra Package for Accelerators
, Gatlinburg, TN, moky Mountains Computational Sciences and Engineering Conference (SMC15), September 2015.
(1.76 MB)
Visualizing Execution Traces with Task Dependencies,”
2nd Workshop on Visual Performance Analysis (VPA '15), Austin, TX, ACM, November 2015.
(927.5 KB)
“Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting,”
Concurrency and Computation: Practice and Experience, vol. 26, issue 7, pp. 1408-1431, May 2014.
DOI: 10.1002/cpe.3110 (1.96 MB)
“Communication-Avoiding Symmetric-Indefinite Factorization,”
SIAM Journal on Matrix Analysis and Application, vol. 35, issue 4, pp. 1364-1406, July 2014.
(593.18 KB)
“Design for a Soft Error Resilient Dynamic Task-based Runtime,”
ICL Technical Report, no. ICL-UT-14-04: University of Tennessee, November 2014.
(2.61 MB)
“Designing LU-QR Hybrid Solvers for Performance and Stability,”
IPDPS 2014, Phoenix, AZ, IEEE, May 2014.
DOI: 10.1109/IPDPS.2014.108 (4.2 MB)
“Dynamically balanced synchronization-avoiding LU factorization with multicore and GPUs,”
Fourth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2014, May 2014.
(490.08 KB)
“An Efficient Distributed Randomized Algorithm for Solving Large Dense Symmetric Indefinite Linear Systems,”
Parallel Computing, vol. 40, issue 7, pp. 213-223, July 2014.
DOI: 10.1016/j.parco.2013.12.003 (1.42 MB)
“Mixed-precision orthogonalization scheme and adaptive step size for CA-GMRES on GPUs,”
VECPAR 2014 (Best Paper), Eugene, OR, June 2014.
(438.54 KB)
“Power Monitoring with PAPI for Extreme Scale Architectures and Dataflow-based Programming Models,”
2014 IEEE International Conference on Cluster Computing, no. ICL-UT-14-04, Madrid, Spain, IEEE, September 2014.
DOI: 10.1109/CLUSTER.2014.6968672 (3.45 MB)
“PTG: An Abstraction for Unhindered Parallelism,”
International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), New Orleans, LA, IEEE Press, November 2014.
(480.05 KB)
“Self-Adaptive Multiprecision Preconditioners on Multicore and Manycore Architectures,”
VECPAR 2014, Eugene, OR, June 2014.
(430.56 KB)
“Utilizing Dataflow-based Execution for Coupled Cluster Methods,”
2014 IEEE International Conference on Cluster Computing, no. ICL-UT-14-02, Madrid, Spain, IEEE, September 2014.
(260.23 KB)
“