Publications

Export 9 results:
Filters: Author is Aurelien Cavelan  [Clear All Filters]
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
I
Benoit, A., F. Cappello, A. Cavelan, Y. Robert, and H. Sun, Identifying the Right Replication Level to Detect and Correct Silent Errors at Scale,” 2017 Workshop on Fault-Tolerance for HPC at Extreme Scale, Washington, DC, ACM, June 2017. DOI: 10.1145/3086157.3086162  (865.68 KB)
M
Benoit, A., A. Cavelan, Y. Robert, and H. Sun, Multi-Level Checkpointing and Silent Error Detection for Linear Workflows,” Journal of Computational Science, vol. 28, pp. 398–415, September 2018.
O
Benoit, A., A. Cavelan, V. Le Fèvre, and Y. Robert, Optimal Checkpointing Period with replicated execution on heterogeneous platforms,” 2017 Workshop on Fault-Tolerance for HPC at Extreme Scale, Washington, DC, IEEE Computer Society Press, June 2017. DOI: 10.1145/3086157.3086165  (1.02 MB)
Benoit, A., A. Cavelan, Y. Robert, and H. Sun, Optimal Resilience Patterns to Cope with Fail-stop and Silent Errors,” 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Chicago, IL, IEEE, May 2016. DOI: 10.1109/IPDPS.2016.39  (603.58 KB)
R
Fang, A., A. Cavelan, Y. Robert, and A. Chien, Resilience for Stencil Computations with Latent Errors,” International Conference on Parallel Processing (ICPP), Bristol, UK, IEEE Computer Society Press, August 2017.  (1.19 MB)
T
Benoit, A., A. Cavelan, V. Le Fèvre, Y. Robert, and H. Sun, Towards Optimal Multi-Level Checkpointing,” IEEE Transactions on Computers, vol. 66, issue 7, pp. 1212–1226, July 2017. DOI: 10.1109/TC.2016.2643660  (1.39 MB)