Publications
On the Development of Variable Size Batched Computation for Heterogeneous Parallel Architectures,”
The 17th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2016), IPDPS 2016, Chicago, IL, IEEE, May 2016.
(708.62 KB)
“Diagnosis and Optimization of Application Prefetching Performance,”
Proceedings of the 27th ACM International Conference on Supercomputing (ICS '13), Eugene, Oregon, USA, ACM Press, June 2013.
DOI: 10.1145/2464996.2465014 (827.31 KB)
“Direct Determination of Optimal Real-Space Orbitals for Correlated Electronic Structure of Molecules,”
Journal of Chemical Theory and Computation, vol. 19, issue 20, pp. 7230 - 7241, October 2023.
DOI: 10.1021/acs.jctc.3c00732
“Disaster Survival Guide in Petascale Computing: An Algorithmic Approach,”
in Petascale Computing: Algorithms and Applications (to appear): Chapman & Hall - CRC Press, 00 2007.
(260.18 KB)
“Distributed Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA,”
University of Tennessee Computer Science Technical Report, UT-CS-10-660, September 2010.
(366.26 KB)
“O(N) distributed direct factorization of structured dense matrices using runtime systems,”
52nd International Conference on Parallel Processing (ICPP 2023), Salt Lake City, Utah, ACM, August 2023.
DOI: 10.1145/3605573.3605606
“Distributed Probablistic Model-Building Genetic Algorithm,”
Lecture Notes in Computer Science, vol. 2723: Springer-Verlag, Heidelberg, pp. 1015-1028, January 2003.
(288.91 KB)
“Distributed Storage in RIB,”
ICL Tech Report, no. ICL-UT-03-01, March 2003.
(213.02 KB)
“Distributed Termination Detection for HPC Task-Based Environments,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-18-14: University of Tennessee, June 2018.
“Distributed-Memory Lattice H-Matrix Factorization,”
The International Journal of High Performance Computing Applications, vol. 33, issue 5, pp. 1046–1063, August 2019.
DOI: 10.1177/1094342019861139 (1.14 MB)
“Distributed-Memory Multi-GPU Block-Sparse Tensor Contraction for Electronic Structure,”
35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021), Portland, OR, IEEE, May 2021.
“Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-10-02, 00 2010.
(400.75 KB)
“Divide and Conquer on Hybrid GPU-Accelerated Multicore Systems,”
SIAM Journal on Scientific Computing, vol. 34(2), pp. C70-C82, April 2012.
“Divide & Conquer on Hybrid GPU-Accelerated Multicore Systems,”
SIAM Journal on Scientific Computing (submitted), August 2010.
“Do moldable applications perform better on failure-prone HPC platforms?,”
11th Workshop on Resiliency in High Performance Computing in Clusters, Clouds, and Grids, Turin, Italy, Springer Verlag, August 2018.
(360.72 KB)
“Docker Container based PaaS Cloud Computing Comprehensive Benchmarks using LAPACK,”
Computer Modeling and Intelligent Systems CMIS-2020, Zaporizhzhoa, March 2020.
(451.33 KB)
“Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols,”
Proceedings of EuroMPI 2010, Stuttgart, Germany, Springer, September 2010.
(202.87 KB)
“Does your tool support PAPI SDEs yet?
, Tahoe City, CA, 13th Scalable Tools Workshop, July 2019.
(3.09 MB)
Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster,”
The International Conference for High Performance Computing, Networking, Storage and Analysis (SC 14), New Orleans, LA, IEEE, November 2014.
“Domain Overlap for Iterative Sparse Triangular Solves on GPUs,”
Software for Exascale Computing - SPPEXA, vol. 113: Springer International Publishing, pp. 527–545, September 2016.
DOI: 10.1007/978-3-319-40528-5_24
“DTE: PaRSEC Enabled Libraries and Applications
: 2021 Exascale Computing Project Annual Meeting, April 2021.
(3.24 MB)
DTE: PaRSEC Enabled Libraries and Applications (Poster)
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(979.27 KB)
DTE: PaRSEC Systems and Interfaces (Poster)
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(840.54 KB)
Dynamic DAG scheduling under memory constraints for shared-memory platforms,”
Int. J. of Networking and Computing, vol. 11, no. 1, pp. 27-49, 2021.
(574.64 KB)
“Dynamic Process Management for Pipelined Applications,”
Proceedings of DoD HPCMP UGC 2005 (to appear), Nashville, TN, IEEE, January 2005.
“Dynamic Task Discovery in PaRSEC- A data-flow task-based Runtime,”
ScalA17, Denver, ACM, September 2017.
DOI: 10.1145/3148226.3148233 (1.15 MB)
“Dynamic Task Execution on Shared and Distributed Memory Architectures
, 2012.
(3.29 MB)
Dynamic Task Scheduling for Linear Algebra Algorithms on Distributed-Memory Multicore Systems,”
International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '09), Portland, OR, November 2009.
(502.49 KB)
“Dynamically balanced synchronization-avoiding LU factorization with multicore and GPUs,”
Fourth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2014, May 2014.
(490.08 KB)
“Dynamically balanced synchronization-avoiding LU factorization with multicore and GPUs,”
University of Tennessee Computer Science Technical Report, no. ut-cs-13-713, July 2013.
(659.77 KB)
“EARL - API Documentation,”
ICL Technical Report, no. ICL-UT-04-03, October 2004.
(111.36 KB)
“Earth Virtualization Engines - A Technical Perspective
, September 2023.
Economical Quasi-Newton Unitary Optimization of Electronic Orbitals,”
Physical Chemistry Chemical Physics, December 2023, 2024.
DOI: 10.1039/D3CP05557D
“An Effective Empirical Search Method for Automatic Software Tuning,”
ICL Technical Report, no. ICL-UT-05-02, January 2005.
(74.66 KB)
“Efficiency of General Krylov Methods on GPUs – An Experimental Study,”
The Sixth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), Chicago, IL, IEEE, May 2016.
DOI: 10.1109/IPDPSW.2016.45 (285.28 KB)
“Efficiency of General Krylov Methods on GPUs – An Experimental Study,”
2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 683-691, May 2016.
DOI: 10.1109/IPDPSW.2016.45
“Efficient Checkpoint/Verification Patterns,”
International Journal on High Performance Computing Applications, July 2015.
DOI: 10.1177/1094342015594531 (392.76 KB)
“Efficient checkpoint/verification patterns for silent error detection,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-14-03: University of Tennessee, May 2014.
(397.75 KB)
“Efficient Communications in Training Large Scale Neural Networks,”
ACM MultiMedia Workshop 2017, Mountain View, CA, ACM, October 2017.
(1.41 MB)
“An Efficient Distributed Randomized Algorithm for Solving Large Dense Symmetric Indefinite Linear Systems,”
Parallel Computing, vol. 40, issue 7, pp. 213-223, July 2014.
DOI: 10.1016/j.parco.2013.12.003 (1.42 MB)
“An efficient distributed randomized solver with application to large dense linear systems,”
ICL Technical Report, no. ICL-UT-12-02, July 2012.
(626.26 KB)
“Efficient Eigensolver Algorithms on Accelerator Based Architectures,”
2015 SIAM Conference on Applied Linear Algebra (SIAM LA), Atlanta, GA, SIAM, October 2015.
(6.98 MB)
“Efficient exascale discretizations: High-order finite element methods,”
The International Journal of High Performance Computing Applications, pp. 10943420211020803, 2021.
DOI: 10.1177/10943420211020803
“Efficient Implementation Of Quantum Materials Simulations On Distributed CPU-GPU Systems,”
The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.
(1.09 MB)
“Efficient Parallelization of Batch Pattern Training Algorithm on Many-core and Cluster Architectures,”
7th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems, Berlin, Germany, September 2013.
(102.51 KB)
“Efficient Pattern Search in Large Traces through Successive Refinement,”
Proceedings of Euro-Par 2004, Pisa, Italy, Springer-Verlag, August 2004.
(177.46 KB)
“Efficient Support for Matrix Computations on Heterogeneous Multi-core and Multi-GPU Architectures,”
University of Tennessee Computer Science Technical Report, UT-CS-11-668, (also Lawn 250), June 2011.
(5.93 MB)
“Effortless Monitoring of Arithmetic Intensity with PAPI’s Counter Analysis Toolkit,”
Tools for High Performance Computing 2018/2019: Springer, pp. 195–218, 2021.
DOI: 10.1007/978-3-030-66057-4_11
“Effortless Monitoring of Arithmetic Intensity with PAPI's Counter Analysis Toolkit,”
13th International Workshop on Parallel Tools for High Performance Computing, Dresden, Germany, Springer International Publishing, September 2020.
(738.47 KB)
“Elastic deep learning through resilient collective operations,”
SC-W 2023: Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, Denver, CO, ACM, November 2023.
DOI: 10.1145/3624062.3626080
“