Publications

Export 18 results:
Filters: Author is Anne Benoit  [Clear All Filters]
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
B
Benoit, A., A. Cavelan, V. Le Fèvre, and Y. Robert, Optimal Checkpointing Period with replicated execution on heterogeneous platforms,” 2017 Workshop on Fault-Tolerance for HPC at Extreme Scale, Washington, DC, IEEE Computer Society Press, June 2017.  (1.02 MB)
Benoit, A., R. Elghazi, and Y. Robert, Max-Stretch Minimization on an Edge-Cloud Platform,” IPDPS'2021, the 34th IEEE International Parallel and Distributed Processing Symposium: IEEE Computer Society Press, 2021.  (4.94 MB)
Benoit, A., A. Cavelan, Y. Robert, and H. Sun, Optimal Resilience Patterns to Cope with Fail-stop and Silent Errors,” 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Chicago, IL, IEEE, May 2016.  (603.58 KB)
Benoit, A., A. Cavelan, Y. Robert, and H. Sun, Multi-Level Checkpointing and Silent Error Detection for Linear Workflows,” Journal of Computational Science, vol. 28, pp. 398–415, September 2018.
Benoit, A., L. Pottier, and Y. Robert, Resilient Co-Scheduling of Malleable Applications,” International Journal of High Performance Computing Applications (IJHPCA), May 2017.  (1.62 MB)
Benoit, A., V. Le Fèvre, P. Raghavan, Y. Robert, and H. Sun, Resilient scheduling heuristics for rigid parallel jobs,” Int. J. of Networking and Computing, vol. 11, no. 1, pp. 2-26, 2021.  (8.67 MB)
Benoit, A., Y. Robert, and S. K. Raina, Efficient checkpoint/verification patterns for silent error detection,” Innovative Computing Laboratory Technical Report, no. ICL-UT-14-03: University of Tennessee, May 2014.  (397.75 KB)
Benoit, A., A. Cavelan, F. Cappello, P. Raghavan, Y. Robert, and H. Sun, Coping with Silent and Fail-Stop Errors at Scale by Combining Replication and Checkpointing,” Journal of Parallel and Distributed Computing, vol. 122, pp. 209–225, December 2018.  (837 KB)
Benoit, A., A. Cavelan, V. Le Fèvre, Y. Robert, and H. Sun, Towards Optimal Multi-Level Checkpointing,” IEEE Transactions on Computers, vol. 66, issue 7, pp. 1212–1226, July 2017.  (1.39 MB)
Benoit, A., V. Le Fèvre, P. Raghavan, Y. Robert, and H. Sun, Design and Comparison of Resilient Scheduling Heuristics for Parallel Jobs,” 22nd Workshop on Advances in Parallel and Distributed Computational Models (APDCM 2020), New Orleans, LA, IEEE Computer Society Press, May 2020.  (696.21 KB)
Benoit, A., S. Perarnau, L. Pottier, and Y. Robert, A Performance Model to Execute Workflows on High-Bandwidth Memory Architectures,” The 47th International Conference on Parallel Processing (ICPP 2018), Eugene, OR, IEEE Computer Society Press, August 2018.  (868.44 KB)
Benoit, A., T. Herault, L. Perotin, Y. Robert, and F. Vivien, Revisiting I/O bandwidth-sharing strategies for HPC applications,” INRIA Research Report, no. RR-9502: INRIA, March 2023.
Benoit, A., A. Cavelan, Y. Robert, and H. Sun, Assessing General-purpose Algorithms to Cope with Fail-stop and Silent Errors,” ACM Transactions on Parallel Computing, August 2016.  (573.71 KB)
Benoit, A., T. Herault, V. Le Fèvre, and Y. Robert, Replication is More Efficient Than You Think,” The IEEE/ACM Conference on High Performance Computing Networking, Storage and Analysis (SC19), Denver, CO, ACM Press, November 2019.  (975.69 KB)
Benoit, A., F. Cappello, A. Cavelan, Y. Robert, and H. Sun, Identifying the Right Replication Level to Detect and Correct Silent Errors at Scale,” 2017 Workshop on Fault-Tolerance for HPC at Extreme Scale, Washington, DC, ACM, June 2017.  (865.68 KB)
Benoit, A., Y. Du, T. Herault, L. Marchal, G. Pallez, L. Perotin, Y. Robert, H. Sun, and F. Vivien, Checkpointing à la Young/Daly: An Overview,” IC3-2022: Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing, Noida, India, ACM Press, pp. 701-710, August 2022.  (639.77 KB)
Benoit, A., S. K. Raina, and Y. Robert, Efficient Checkpoint/Verification Patterns,” International Journal on High Performance Computing Applications, July 2015.  (392.76 KB)
Benoit, A., A. Cavelan, F. M. Ciorba, V. Le Fèvre, and Y. Robert, Combining Checkpointing and Replication for Reliable Execution of Linear Workflows with Fail-Stop and Silent Errors,” International Journal of Networking and Computing, vol. 9, no. 1, pp. 2-27.  (754.6 KB)