Publications
The International Exascale Software Project Roadmap,”
International Journal of High Performance Computing, vol. 25, no. 1, pp. 3-60, January 2011.
DOI: 10.1177/1094342010391989 (719.74 KB)
“Big Data and Extreme-Scale Computing: Pathways to Convergence - Toward a Shaping Strategy for a Future Software and Data Ecosystem for Scientific Inquiry,”
The International Journal of High Performance Computing Applications, vol. 32, issue 4, pp. 435–479, July 2018.
DOI: 10.1177/1094342018778123 (1.29 MB)
“Coping with Silent and Fail-Stop Errors at Scale by Combining Replication and Checkpointing,”
Journal of Parallel and Distributed Computing, vol. 122, pp. 209–225, December 2018.
DOI: 10.1016/j.jpdc.2018.08.002 (837 KB)
“A Collection of Presentations from the BDEC2 Workshop in Kobe, Japan,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-19-09: University of Tennessee, Knoxville, February 2019.
(58.85 MB)
“A Collection of White Papers from the BDEC2 Workshop in San Diego, CA,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-19-13: University of Tennessee, October 2019.
(8.25 MB)
“Towards Portable Online Prediction of Network Utilization Using MPI-Level Monitoring,”
2019 European Conference on Parallel Processing (Euro-Par 2019), Göttingen, Germany, Springer, August 2019.
DOI: 10.1007/978-3-030-29400-7_4 (1.07 MB)
“DeepFreeze: Towards Scalable Asynchronous Checkpointing of Deep Learning Models,”
20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), Melbourne, VIC, Australia, IEEE, May 2020.
DOI: 10.1109/CCGrid49817.2020.00-76 (424.19 KB)
“