Publications
Elastic deep learning through resilient collective operations,”
SC-W 2023: Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, Denver, CO, ACM, November 2023.
“Integrating process, control-flow, and data resiliency layers using a hybrid Fenix/Kokkos approach,”
2022 IEEE International Conference on Cluster Computing (CLUSTER 2022), Heidelberg, Germany, September 2022.
“