Publications
Co-Scheduling HPC Workloads on Cache-Partitioned CMP Platforms,”
Cluster 2018, Belfast, UK, IEEE Computer Society Press, September 2018.
(423.75 KB)
“Design and Comparison of Resilient Scheduling Heuristics for Parallel Jobs,”
22nd Workshop on Advances in Parallel and Distributed Computational Models (APDCM 2020), New Orleans, LA, IEEE Computer Society Press, May 2020.
(696.21 KB)
“A Performance Model to Execute Workflows on High-Bandwidth Memory Architectures,”
The 47th International Conference on Parallel Processing (ICPP 2018), Eugene, OR, IEEE Computer Society Press, August 2018.
(868.44 KB)
“Replication is More Efficient Than You Think,”
The IEEE/ACM Conference on High Performance Computing Networking, Storage and Analysis (SC19), Denver, CO, ACM Press, November 2019.
(975.69 KB)
“When to checkpoint at the end of a fixed-length reservation?,”
Fault Tolerance for HPC at eXtreme Scales (FTXS) Workshop, Denver, United States, August 2023.
“Checkpointing à la Young/Daly: An Overview,”
IC3-2022: Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing, Noida, India, ACM Press, pp. 701-710, August 2022.
DOI: 10.1145/3549206 (639.77 KB)
“Max-Stretch Minimization on an Edge-Cloud Platform,”
IPDPS'2021, the 34th IEEE International Parallel and Distributed Processing Symposium: IEEE Computer Society Press, 2021.
(4.94 MB)
“Combining Checkpointing and Replication for Reliable Execution of Linear Workflows with Fail-Stop and Silent Errors,”
International Journal of Networking and Computing, vol. 9, no. 1, pp. 2-27.
(754.6 KB)
“Coping with Silent and Fail-Stop Errors at Scale by Combining Replication and Checkpointing,”
Journal of Parallel and Distributed Computing, vol. 122, pp. 209–225, December 2018.
DOI: 10.1016/j.jpdc.2018.08.002 (837 KB)
“Co-Scheduling Amdhal Applications on Cache-Partitioned Systems,”
International Journal of High Performance Computing Applications, vol. 32, issue 1, pp. 123–138, January 2018.
DOI: 10.1177/1094342017710806 (672.52 KB)
“Co-Scheduling HPC Workloads on Cache-Partitioned CMP Platforms,”
International Journal of High Performance Computing Applications, vol. 33, issue 6, pp. 1221-1239, November 2019.
DOI: 10.1177/1094342019846956 (930.28 KB)
“Multi-Level Checkpointing and Silent Error Detection for Linear Workflows,”
Journal of Computational Science, vol. 28, pp. 398–415, September 2018.
“Resilient scheduling heuristics for rigid parallel jobs,”
Int. J. of Networking and Computing, vol. 11, no. 1, pp. 2-26, 2021.
(8.67 MB)
“Revisiting I/O bandwidth-sharing strategies for HPC applications,”
INRIA Research Report, no. RR-9502: INRIA, March 2023.
“