|
Publications |
| Showing records 1 - 10 of 18 | |
Herault, T., Bouteiller, A., Bosilca, G., Gamell, M., Teranishi, K., Parashar, M., Dongarra, J. "Practical Scalable Consensus for Pseudo-Synchronous Distributed Systems: Formal Proof,"
University of Tennessee Computer Science Technical Report,
ICL-UT-15-01,
April, 2015.
|
|
Benoit A., Robert, Y., Raina S.K. "Efficient checkpoint/verification patterns for silent error detection,"
University of Tennessee Computer Science Technical Report,
ICL-UT-14-03,
May, 2014.
|
|
Aurelien Bouteiller, Thomas Herault, George Bosilca, Peng Du, and Jack Dongarra "Algorithm-based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures and Accuracy,"
ACM Transactions on Parallel Computing,
Phillip B. Gibbons eds.
ACM,
New York, NY, USA, (to appear),
2014, 2014.
|
|
Yulu Jia, George Bosilca, Piotr Luszczek, and Jack Dongarra "Parallel Reduction to Hessenberg Form with Algorithm-Based Fault Tolerance,"
International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE-SC 2013,
Denver, CO, November, 2013.
|
|
Yulu Jia, Piotr Luszczek, George Bosilca, Jack Dongarra "CPU-GPU Hybrid Bidiagonal Reduction With Soft Error Resilience,"
ScalA '13 Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems,
November, 2013.
|
|
Bosilca, G., Bouteiller, A., Herault, T., Robert, Y., and Jack Dongarra "Assessing the impact of {ABFT} and Checkpoint composite strategies,"
University of Tennessee Computer Science Technical Report,
ICL-UT-13-03,
September, 2013.
|
|
Jia, Y., Luszczek, P., Dongarra, J. "Transient Error Resilient Hessenberg Reduction on GPU-based Hybrid Architectures,"
University of Tennessee Computer Science Technical Report, UT-CS-13-712 (lawn279),
June, 2013.
|
|
Bland, W., Bouteiller, A., Herault, T., Hursey, J., Bosilca, G., Dongarra, J.J. "An evaluation of User-Level Failure Mitigation support in MPI,"
Computing,
Springer,
Vienna, DOI 10.1007/s00607-013-0331-3,
1-14,
May, 2013.
|
|
Jack Dongarra, Thomas Herault and Yves Robert "Revisiting the Double Checkpointing Algorithm,"
15th Workshop on Advances in Parallel and Distributed Computational Models, at the IEEE International Parallel & Distributed Processing Symposium,
Boston, MA, January, 2013.
|
|
Bland, W., Du, P., Bouteiller, A., Herault, T., Bosilca, G., Dongarra, J. "A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI,"
18th International European Conference on Parallel and Distributed Computing (Euro-Par 2012) (Best Paper Award),
Christos Kaklamanis, Theodore Papatheodorou and Paul Spirakis eds.
Springer-Verlag,
Rhodes, Greece, August 27-31, 2012.
|
|
| Showing records 1 - 10 of 18 | |
|