Publications
O(N) distributed direct factorization of structured dense matrices using runtime systems,”
52nd International Conference on Parallel Processing (ICPP 2023), Salt Lake City, Utah, ACM, August 2023.
DOI: 10.1145/3605573.3605606
“Improving the Scaling of an Asynchronous Many-Task Runtime with a Lightweight Communication Engine,”
52nd International Conference on Parallel Processing (ICPP 2023), Salt Lake City, Utah, ACM, September 2023.
DOI: 10.1145/3605573.3605642
“MPI Continuations And How To Invoke Them,”
Sustained Simulation Performance 2021, Cham, Springer International Publishing, pp. 67 - 83, February 2023.
DOI: 10.1007/978-3-031-18046-010.1007/978-3-031-18046-0_5
“Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC,”
IEEE Transactions on Parallel and Distributed Systems, vol. 33, issue 4, pp. 964 - 976, April 2022.
DOI: 10.1109/TPDS.2021.3084071
“Comparing Distributed Termination Detection Algorithms for Modern HPC Platforms,”
International Journal of Networking and Computing, vol. 12, issue 1, pp. 26 - 46, January 2022.
DOI: 10.15803/ijnc.12.1_26
“Composition of Algorithmic Building Blocks in Template Task Graphs,”
2022 IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM), Dallas, TX, USA, IEEE, January 2023, 2022.
DOI: 10.1109/PAW-ATM56565.2022.00008 (1015.99 KB)
“Evaluating Data Redistribution in PaRSEC,”
IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 8, pp. 1856-1872, August 2022.
DOI: 10.1109/TPDS.2021.3131657 (3.19 MB)
“Evaluations of molecular modeling and machine learning for predictive capabilities in binding of lanthanum and actinium with carboxylic acids,”
Journal of Radioanalytical and Nuclear Chemistry, December 2022.
DOI: 10.1007/s10967-022-08620-7
“A Framework to Exploit Data Sparsity in Tile Low-Rank Cholesky Factorization,”
IEEE International Parallel and Distributed Processing Symposium (IPDPS), July 2022.
DOI: 10.1109/IPDPS53621.2022.00047 (1.03 MB)
“Generalized Flow-Graph Programming Using Template Task-Graphs: Initial Implementation and Assessment,”
2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Lyon, France, IEEE, July 2022.
DOI: 10.1109/IPDPS53621.2022.00086
“Implicit Actions and Non-blocking Failure Recovery with MPI,”
2022 IEEE/ACM 12th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS), Dallas, TX, USA, IEEE, January 2023, 2022.
DOI: 10.1109/FTXS56515.2022.00009
“Integrating process, control-flow, and data resiliency layers using a hybrid Fenix/Kokkos approach,”
2022 IEEE International Conference on Cluster Computing (CLUSTER 2022), Heidelberg, Germany, September 2022.
“Lossy all-to-all exchange for accelerating parallel 3-D FFTs on hybrid architectures with GPUs,”
2022 IEEE International Conference on Cluster Computing (CLUSTER), pp. 152-160, September 2022.
DOI: 10.1109/CLUSTER51413.2022.00029
“Mixed precision and approximate 3D FFTs: Speed for accuracy trade-off with GPU-aware MPI and run-time data compression,”
ICL Technical Report, no. ICL-UT-22-04, May 2022.
(706.14 KB)
“Pushing the Boundaries of Small Tasks: Scalable Low-Overhead Data-Flow Programming in TTG,”
2022 IEEE International Conference on Cluster Computing (CLUSTER), Heidelberg, Germany, IEEE, September 2022.
DOI: 10.1109/CLUSTER51413.2022.00026
“Reshaping Geostatistical Modeling and Prediction for Extreme-Scale Environmental Applications,”
2022 International Conference for High Performance Computing, Networking, Storage and Analysis (SC22), Dallas, TX, IEEE Press, November 2022.
“Using long vector extensions for MPI reductions,”
Parallel Computing, vol. 109, pp. 102871, March 2022.
DOI: 10.1016/j.parco.2021.102871
“Accelerating FFT towards Exascale Computing
: NVIDIA GPU Technology Conference (GTC2021), 2021.
(27.23 MB)
Callback-based completion notification using MPI Continuations,”
Parallel Computing, vol. 21238566, issue 0225, pp. 102793, May Jan.
DOI: 10.1016/j.parco.2021.102793
“Distributed-Memory Multi-GPU Block-Sparse Tensor Contraction for Electronic Structure,”
35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021), Portland, OR, IEEE, May 2021.
“DTE: PaRSEC Enabled Libraries and Applications
: 2021 Exascale Computing Project Annual Meeting, April 2021.
(3.24 MB)
An international survey on MPI users,”
Parallel Computing, vol. 108, December 2021.
DOI: 10.1016/j.parco.2021.102853 (1.49 MB)
“An Introduction to High Performance Computing and Its Intersection with Advances in Modeling Rare Earth Elements and Actinides,”
Rare Earth Elements and Actinides: Progress in Computational Science Applications, vol. 1388, Washington, DC, American Chemical Society, pp. 3-53, October 2021.
DOI: 10.1021/bk-2021-1388.ch001
“Leveraging PaRSEC Runtime Support to Tackle Challenging 3D Data-Sparse Matrix Problems,”
35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021), Portland, OR, IEEE, May 2021.
(1.08 MB)
“Quo Vadis MPI RMA? Towards a More Efficient Use of MPI One-Sided Communication,”
EuroMPI'21, Garching, Munich Germany, 2021.
(835.27 KB)
“Revisiting Credit Distribution Algorithms for Distributed Termination Detection,”
2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW): IEEE, pp. 611–620, 2021.
DOI: 10.1109/IPDPSW52791.2021.00095
“Communication Avoiding 2D Stencil Implementations over PaRSEC Task-Based Runtime,”
2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), New Orleans, LA, IEEE, May 2020.
DOI: 10.1109/IPDPSW50202.2020.00127 (1.33 MB)
“DeepFreeze: Towards Scalable Asynchronous Checkpointing of Deep Learning Models,”
20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), Melbourne, VIC, Australia, IEEE, May 2020.
DOI: 10.1109/CCGrid49817.2020.00-76 (424.19 KB)
“DTE: PaRSEC Enabled Libraries and Applications (Poster)
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(979.27 KB)
DTE: PaRSEC Systems and Interfaces (Poster)
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(840.54 KB)
Extreme-Scale Task-Based Cholesky Factorization Toward Climate and Weather Prediction Applications,”
Platform for Advanced Scientific Computing Conference (PASC20), Geneva, Switzerland, ACM, June 2020.
DOI: 10.1145/3394277.3401846 (2.71 MB)
“Fault Tolerance of MPI Applications in Exascale Systems: The ULFM Solution,”
Future Generation Computer Systems, vol. 106, pp. 467-481, May 2020.
DOI: 10.1016/j.future.2020.01.026 (2.06 MB)
“FFT-Based Gradient Sparsification for the Distributed Training of Deep Neural Networks,”
9th International Symposium on High-Performance Parallel and Distributed Computing (HPDC 20), Stockholm, Sweden, ACM, June 2020.
DOI: 10.1145/3369583.3392681 (4.72 MB)
“Flexible Data Redistribution in a Task-Based Runtime System,”
IEEE International Conference on Cluster Computing (Cluster 2020), Kobe, Japan, IEEE, September 2020.
DOI: 10.1109/CLUSTER49012.2020.00032 (354.8 KB)
“HAN: A Hierarchical AutotuNed Collective Communication Framework,”
IEEE Cluster Conference, Kobe, Japan, Best Paper Award, IEEE Computer Society Press, September 2020.
(764.05 KB)
“Overhead of Using Spare Nodes,”
The International Journal of High Performance Computing Applications, February 2020.
DOI: 10.1177%2F1094342020901885 (2.15 MB)
“Predicting MPI Collective Communication Performance Using Machine Learning,”
2020 IEEE International Conference on Cluster Computing (CLUSTER), Kobe, Japan, IEEE, September 2020.
DOI: 10.1109/CLUSTER49012.2020.00036 (619.68 KB)
“A Report of the MPI International Survey (Poster)
, Austin, TX, EuroMPI/USA '20: 27th European MPI Users' Group Meeting, September 2020.
Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance,”
International Conference for High Performance Computing Networking, Storage, and Analysis (SC20): ACM, November 2020.
(644.92 KB)
“The Template Task Graph (TTG) - An Emerging Practical Dataflow Programming Paradigm for Scientific Simulation at Extreme Scale,”
2020 IEEE/ACM 5th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2): IEEE, November 2020.
DOI: 10.1109/ESPM251964.2020.00011 (139.6 KB)
“Using Advanced Vector Extensions AVX-512 for MPI Reduction,”
EuroMPI/USA '20: 27th European MPI Users' Group Meeting, Austin, TX, September 2020.
DOI: 10.1145/3416315.3416316 (634.45 KB)
“Using Advanced Vector Extensions AVX-512 for MPI Reduction (Poster)
, Austin, TX, EuroMPI/USA '20: 27th European MPI Users' Group Meeting, September 2020.
(708.68 KB)
Using Arm Scalable Vector Extension to Optimize Open MPI,”
20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID 2020), Melbourne, Australia, IEEE/ACM, May 2020.
DOI: 10.1109/CCGrid49817.2020.00-71 (359.95 KB)
“Asynchronous Receiver-Driven Replay for Local Rollback of MPI Applications,”
Fault Tolerance for HPC at eXtreme Scale (FTXS) Workshop at The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'19), November 2019.
(440.7 KB)
“Checkpointing Strategies for Shared High-Performance Computing Platforms,”
International Journal of Networking and Computing, vol. 9, no. 1, pp. 28–52, 2019.
(490.5 KB)
“Comparing the Performance of Rigid, Moldable, and Grid-Shaped Applications on Failure-Prone HPC Platforms,”
Parallel Computing, vol. 85, pp. 1–12, July 2019.
DOI: 10.1016/j.parco.2019.02.002 (865.18 KB)
“Evaluation of Programming Models to Address Load Imbalance on Distributed Multi-Core CPUs: A Case Study with Block Low-Rank Factorization,”
PAW-ATM Workshop at SC19, Denver, CO, ACM, November 2019.
(4.51 MB)
“Generic Matrix Multiplication for Multi-GPU Accelerated Distributed-Memory Platforms over PaRSEC,”
ScalA'19: 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Denver, CO, IEEE, November 2019.
(260.69 KB)
“Give MPI Threading a Fair Chance: A Study of Multithreaded MPI Designs,”
IEEE Cluster, Albuquerque, NM, IEEE, September 2019.
(220.84 KB)
“Impacts of Multi-GPU MPI Collective Communications on Large FFT Computation,”
Workshop on Exascale MPI (ExaMPI) at SC19, Denver, CO, November 2019.
(1.6 MB)
“