Publications
Integrating Deep Learning in Domain Sciences at Exascale,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-20-10: University of Tennessee, August 2020.
(1.09 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
On block-asynchronous execution on GPUs,”
LAPACK Working Note, no. 291, November 2016.
(1.05 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Asynchronous SGD for DNN Training on Shared-Memory Parallel Architectures,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-20-04: University of Tennessee, Knoxville, March 2020.
(188.51 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Integrating Deep Learning in Domain Science at Exascale (MagmaDNN)
, virtual, DOD HPCMP seminar, December 2020.
(11.12 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Using Jacobi Iterations and Blocking for Solving Sparse Triangular Systems in Incomplete Factorization Preconditioning,”
Journal of Parallel and Distributed Computing, vol. 119, pp. 219–230, November 2018.
(273.53 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Updating Incomplete Factorization Preconditioners for Model Order Reduction,”
Numerical Algorithms, vol. 73, issue 3, no. 3, pp. 611–630, February 2016.
(565.34 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance of Asynchronous Optimized Schwarz with One-sided Communication,”
Parallel Computing, vol. 86, pp. 66-81, August 2019.
(3.09 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
ParILUT - A New Parallel Threshold ILU,”
SIAM Journal on Scientific Computing, vol. 40, issue 4: SIAM, pp. C503–C519, July 2018.
(19.26 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Domain Overlap for Iterative Sparse Triangular Solves on GPUs,”
Software for Exascale Computing - SPPEXA, vol. 113: Springer International Publishing, pp. 527–545, September 2016.
“Batched Generation of Incomplete Sparse Approximate Inverses on GPUs,”
Proceedings of the 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, pp. 49–56, November 2016.
“Random-Order Alternating Schwarz for Sparse Triangular Solves,”
2015 SIAM Conference on Applied Linear Algebra (SIAM LA), Atlanta, GA, SIAM, October 2015.
(1.53 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
ParILUT – A Parallel Threshold ILU for GPUs,”
IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019.
(505.95 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Iterative Sparse Triangular Solves for Preconditioning,”
EuroPar 2015, Vienna, Austria, Springer Berlin, August 2015.
(322.36 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Integrating Deep Learning in Domain Sciences at Exascale,”
2020 Smoky Mountains Computational Sciences and Engineering Conference (SMC 2020), August 2020.
“Asynchronous SGD for DNN Training on Shared-Memory Parallel Architectures,”
Workshop on Scalable Deep Learning over Parallel And Distributed Infrastructures (ScaDL 2020), May 2020.
(188.51 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Asynchronous Iterative Algorithm for Computing Incomplete Factorizations on GPUs,”
International Supercomputing Conference (ISC 2015), Frankfurt, Germany, July 2015.
“