George Bosilca
Innovative Computing
Laboratory, University of Tennessee
(865) 974-6321 email: bosilca@eecs.utk.edu
Education and Training:
University
of Paris XI Orsay, France
|
Math
and Computer Science
|
B.S.
1999
|
University
of Paris XI Orsay, France
|
Computer
Science
|
Ph.D.
2003
|
University
of Tennessee, ICL
|
Parallel
Computing
|
Post
Doc 2004-2005
|
Research and Professional
Experience:
Research Asst. Professor, Innovative Computing Laboratory,
University of Tennessee (2007-)
Adjunct Assistant Professor, University of Tennessee (2004
– present)
Research Scientist, Innovative Computing Laboratory,
University of Tennessee (2005 - 2007)
Sr. Research Assoc., Innovative Computing Laboratory, University
of Tennessee (2004 – 2005)
Research Assoc., Innovative Computing Laboratory, University
of Tennessee (2003 – 2004)
Synergistic
Activities:
á
Technical lead, release manager and active
member of the Open MPI development team.
á
Active member of the MPI Forum.
á
Technical lead for the AtomS,
System Noise and STCI Software Packages; Technical lead for the Fault Tolerant
FT-MPI Library Development; and Technical lead for the MPICH-V
á
Architect and Technical Lead for DAGuE / DPLASMA.
Collaborators and Co-editors:
Emmanuel Agullo (INRIA, France), Brad
Benton (IBM), Franck Cappello (INRIA Futur, France), Ralph Castain
(LANL), D. Cronk (University of Tennessee), J. Dongarra (University of Tennessee), Terry Dontje (SUN/Oracle), G. Fagg
(Microsoft), Patrick Geoffray (Myrinet), Brice Goglin (INRIA, France), Rich Graham (ORNL), Thomas Herault (INRIA Futur, France),
Yutaka Ishikawa (University of Tokyo), Emmanuel Jeanot
(INRIA, France), Andrew Lumsdaine (University of
Indiana), Christine Morin (INRIA, France), Yves Robert (ENS, Lyon, France), Jeff
Squyres (CISCO), Stan Tomov
(University of Tennessee)
Graduate and Postdoctoral
Advisors and Advisees
Graduate Students (past 5 years):
Daniel Andrzejewski, Thara Angskun, Wesley Bland, Kartheek V. Bodanki, Camille Coti, Jelena Pjesivac–Grbovic, Kusolchu Krerkchai, Narapat Saengpatsa, Gwang Son, Teng Ma, Wei Wu, Anthony Canino,
Peter Gaultney, Peng Du,
Postdoctoral Associates (past 5 years):
Stephanie Moreaud, Anthony Danalis, Aurelien Bouteiller, Pierre Lemarinier, Yuan
Tang
Thesis Advisor:
Dr. Franck Cappello, INRIA Futur, University of Paris XI Orsay
and INRIA-Illinois Joint Laboratory on PetaScale
Computing.
Publications:
Hoque, R., Herault, T., Bosilca, G., Dongarra, J. "Dynamic Task Discovery in PaRSEC- A data-flow task-based Runtime," ScalA17: 8th Workshop on Latest Advances in Scalable Algorithms for LargeScale Systems, ACM, Denver, CO, November, 13, 2017 [pdf] [bibtex] @inproceedings{icl:958,
author = {Hoque, R. and Herault, T. and Bosilca, G. and Dongarra, J.},
title = {Dynamic Task Discovery in PaRSEC- A data-flow task-based Runtime},
booktitle = {ScalA17: 8th Workshop on Latest Advances in Scalable Algorithms for LargeScale Systems},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Denver, CO},
month = {November},
year = {2017}
}
[
hide]
Eberius, D., Patinyasakdikul, T., Bosilca, G. "Using Software-Based Performance Counters to Expose Low-Level Open MPI Performance Information," Proceedings of the 24th European MPI Users' Group Meeting, Peña, A., Balaji, P., Gropp, W., Thakur, R. eds. ACM, Chicago, IL, Article No. 7, September 25-28, 2017 [pdf] [bibtex] @inproceedings{icl:957,
author = {Eberius, D. and Patinyasakdikul, T. and Bosilca, G.},
title = {Using Software-Based Performance Counters to Expose Low-Level Open MPI Performance Information},
booktitle = {Proceedings of the 24th European MPI Users' Group Meeting},
institution = {Innovative Computing Laboratory, University of Tennessee},
pages = {Article No. 7},
address = {Chicago, IL},
month = {September},
year = {2017}
}
[
hide]
Herault, T., Bouteiller, A., Bosilca, G., Gamell, M., Teranishi, K., Parashar, M., Dongarra, J. "Practical Scalable Consensus for Pseudo-Synchronous Distributed Systems," Supercomputing, Austin, TX, November, 2015 [pdf] [bibtex] @article{icl:883,
author = {Herault, T. and Bouteiller, A. and Bosilca, G. and Gamell, M. and Teranishi, K. and Parashar, M. and Dongarra, J.},
title = {Practical Scalable Consensus for Pseudo-Synchronous Distributed Systems},
booktitle = {Supercomputing},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Austin, TX},
month = {November},
year = {2015}
}
[
hide]
Danalis, A., Jagode, H., Bosilca, G., Dongarra, J. "PaRSEC in Practice: Optimizing a Legacy Chemistry Application through Distributed Task-Based Execution," 2015 IEEE International Conference on Cluster Computing, IEEE, Chicago, IL, pp. 304-313, September 8-11, 2015 [pdf] [bibtex] @inproceedings{icl:908,
author = {Danalis, A. and Jagode, H. and Bosilca, G. and Dongarra, J.},
title = {PaRSEC in Practice: Optimizing a Legacy Chemistry Application through Distributed Task-Based Execution},
booktitle = {2015 IEEE International Conference on Cluster Computing},
institution = {Innovative Computing Laboratory, University of Tennessee},
pages = {pp. 304-313},
address = {Chicago, IL},
month = {September},
year = {2015}
}
[
hide]
Wu, W., Bouteiller, A., Bosilca, G., Faverge, M., Dongarra, J. "Hierarchical DAG scheduling for Hybrid Distributed Systems," 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), IEEE, Hyderabad, India, May, 2015 [pdf] [bibtex] @inproceedings{icl:837,
author = {Wu, W. and Bouteiller, A. and Bosilca, G. and Faverge, M. and Dongarra, J.},
title = {Hierarchical DAG scheduling for Hybrid Distributed Systems},
booktitle = {29th IEEE International Parallel & Distributed Processing Symposium (IPDPS)},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Hyderabad, India},
month = {May},
year = {2015}
}
[
hide]
Cao, C., Bosilca, G., Herault, T., Dongarra, J. "Design for a Soft Error Resilient Dynamic Task-based Runtime," 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), IEEE, Hyderabad, India, May, 2015 [pdf] [bibtex] @inproceedings{icl:836,
author = {Cao, C. and Bosilca, G. and Herault, T. and Dongarra, J.},
title = {Design for a Soft Error Resilient Dynamic Task-based Runtime},
booktitle = {29th IEEE International Parallel & Distributed Processing Symposium (IPDPS)},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Hyderabad, India},
month = {May},
year = {2015}
}
[
hide]
Herault, T., Bouteiller, A., Bosilca, G., Gamell, M., Teranishi, K., Parashar, M., Dongarra, J. "Practical Scalable Consensus for Pseudo-Synchronous Distributed Systems: Formal Proof," University of Tennessee Computer Science Technical Report, ICL-UT-15-01, April, 2015 [pdf] [bibtex] @techreport{icl:865,
author = {Herault, T. and Bouteiller, A. and Bosilca, G. and Gamell, M. and Teranishi, K. and Parashar, M. and Dongarra, J.},
title = {Practical Scalable Consensus for Pseudo-Synchronous Distributed Systems: Formal Proof},
booktitle = {University of Tennessee Computer Science Technical Report},
institution = {Innovative Computing Laboratory, University of Tennessee},
month = {April},
year = {2015}
}
[
hide]
Danalis, A., Bosilca, G., Bouteiller, A., Herault, T., Dongarra, J. "PTG: an abstraction for unhindered parallelism," Proceedings of the International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), IEEE Press, New Orleans, Louisiana, Nov 17, 2014 [pdf] [bibtex] @inproceedings{icl:864,
author = {Danalis, A. and Bosilca, G. and Bouteiller, A. and Herault, T. and Dongarra, J.},
title = {PTG: an abstraction for unhindered parallelism},
booktitle = {Proceedings of the International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC)},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {New Orleans, Louisiana},
month = {Nov},
year = {2014}
}
[
hide]
Baboulin, M. , Becker, D., Bosilca, G., Danalis, A., Dongarra, J. "An efficient distributed randomized algorithm for solving large dense symmetric indefinite linear systems," Parallel Computing, By Costas Bekas, Ananth Grama, Olaf Schenk and Yousef Saad eds. 7th Workshop on Parallel Matrix Algorithms and Applications, Vol 40, Issue 7, 213-223, July, 2014 [bibtex] @article{icl:820,
author = {Baboulin, M. , Becker, D. and Bosilca, G. and Danalis, A. and Dongarra, J.},
title = {An efficient distributed randomized algorithm for solving large dense symmetric indefinite linear systems},
booktitle = {Parallel Computing},
institution = {Innovative Computing Laboratory, University of Tennessee},
volume = {Vol 40, Issue 7},
pages = {213-223},
address = {7th Workshop on Parallel Matrix Algorithms and Applications},
month = {July},
year = {2014}
}
[
hide]
Bosilca, G., Bouteiller, A., Herault, T., Robert, Y., Dongarra, J. "Assessing the Impact of ABFT and Checkpoint Composite Strategies," 16th Workshop on Advances in Parallel and Distributed Computational Models, IPDPS 2014, IEEE, Phoenix, AZ, May, 2014 [pdf] [bibtex] @inproceedings{icl:780,
author = {Bosilca, G. and Bouteiller, A. and Herault, T. and Robert, Y. and Dongarra, J.},
title = {Assessing the Impact of ABFT and Checkpoint Composite Strategies},
booktitle = {16th Workshop on Advances in Parallel and Distributed Computational Models, IPDPS 2014},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Phoenix, AZ},
month = {May},
year = {2014}
}
[
hide]
Lacoste, X., Faverge, M., Ramet, P., Thibault, S., Bosilca, G. "Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes," 23rd International Heterogeneity in Computing Workshop, IPDPS 2014, IEEE, Phoenix, AZ, May, 2014 [pdf] [bibtex] @inproceedings{icl:781,
author = {Lacoste, X. and Faverge, M. and Ramet, P. and Thibault, S. and Bosilca, G.},
title = {Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes},
booktitle = {23rd International Heterogeneity in Computing Workshop, IPDPS 2014},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Phoenix, AZ},
month = {May},
year = {2014}
}
[
hide]
Bosilca, G., Bouteiller, A., Brunet, E., Cappello, F., Dongarra, J., Guermouche, A., Herault, T., Robert, Y., Vivien, F., Zaidouni, D. "Unified Model for Assessing Checkpointing Protocols at Extreme-Scale," Concurrency and Computation: Practice and Experience, John Wiley & Sons, Ltd., November, 2013 [pdf] [bibtex] @article{icl:785,
author = {Bosilca, G. and Bouteiller, A. and Brunet, E. and Cappello, F. and Dongarra, J. and Guermouche, A. and Herault, T. and Robert, Y. and Vivien, F. and Zaidouni, D.},
title = {Unified Model for Assessing Checkpointing Protocols at Extreme-Scale},
booktitle = {Concurrency and Computation: Practice and Experience},
institution = {Innovative Computing Laboratory, University of Tennessee},
month = {November},
year = {2013}
}
[
hide]
Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Herault, T., Dongarra, J. "PaRSEC: Exploiting Heterogeneity to Enhance Scalability," IEEE Computing in Science and Engineering, Vol. 15, No. 6, 36-45, November, 2013 [pdf] [bibtex] @article{icl:786,
author = {Bosilca, G. and Bouteiller, A. and Danalis, A. and Faverge, M. and Herault, T. and Dongarra, J.},
title = {PaRSEC: Exploiting Heterogeneity to Enhance Scalability},
booktitle = {IEEE Computing in Science and Engineering},
institution = {Innovative Computing Laboratory, University of Tennessee},
volume = {Vol. 15, No. 6},
pages = {36-45},
month = {November},
year = {2013}
}
[
hide]
Bosilca, G., Bouteiller, A., Herault, T., Robert, Y., and Jack Dongarra "Assessing the impact of {ABFT} and Checkpoint composite strategies," University of Tennessee Computer Science Technical Report, ICL-UT-13-03, September, 2013 [pdf] [bibtex] @techreport{icl:757,
author = {Bosilca, G. and Bouteiller, A. and Herault, T. and Robert, Y. and and Jack Dongarra},
title = {Assessing the impact of {ABFT} and Checkpoint composite strategies},
booktitle = {University of Tennessee Computer Science Technical Report},
institution = {Innovative Computing Laboratory, University of Tennessee},
month = {September},
year = {2013}
}
[
hide]
Bland, W., Du, P., Bouteiller, A., Herault, T., Bosilca, G., Dongarra, J. "Extending the Scope of the Checkpoint-on-Failure Protocol for Forward Recovery in Standard MPI," Concurrency and Computation: Practice and Experience, July, 2013 [pdf] [bibtex] @article{icl:755,
author = {Bland, W. and Du, P. and Bouteiller, A. and Herault, T. and Bosilca, G. and Dongarra, J.},
title = {Extending the Scope of the Checkpoint-on-Failure Protocol for Forward Recovery in Standard MPI},
booktitle = {Concurrency and Computation: Practice and Experience},
institution = {Innovative Computing Laboratory, University of Tennessee},
month = {July},
year = {2013}
}
[
hide]
Bland, W., Bouteiller, A., Herault, T., Bosilca, G., Dongarra, J. "Post-failure recovery of MPI communication capability: Design and Rationale," International Journal of High Performance Computing Applications, June, 2013 [pdf] [bibtex] @article{icl:756,
author = {Bland, W. and Bouteiller, A. and Herault, T. and Bosilca, G. and Dongarra, J.},
title = {Post-failure recovery of MPI communication capability: Design and Rationale},
booktitle = {International Journal of High Performance Computing Applications},
institution = {Innovative Computing Laboratory, University of Tennessee},
month = {June},
year = {2013}
}
[
hide]
Bland, W., Bouteiller, A., Herault, T., Hursey, J., Bosilca, G., Dongarra, J.J. "An evaluation of User-Level Failure Mitigation support in MPI," Computing, Springer, Vienna, DOI 10.1007/s00607-013-0331-3, 1-14, May, 2013 [pdf] [bibtex] @article{icl:744,
author = {Bland, W. and Bouteiller, A. and Herault, T. and Hursey, J. and Bosilca, G. and Dongarra, J.J.},
title = {An evaluation of User-Level Failure Mitigation support in MPI},
booktitle = {Computing},
institution = {Innovative Computing Laboratory, University of Tennessee},
volume = {DOI 10.1007/s00607-013-0331-3},
pages = {1-14},
address = {Vienna},
month = {May},
year = {2013}
}
[
hide]
Ma, T., Bosilca, G., Bouteiller, A., Dongarra, J. "Kernel-assisted and topology-aware MPI collective communications on multi-core/many-core platforms," Journal of Parallel and Distributed Computing, accepted, January, 2013 [pdf] [bibtex] @article{icl:734,
author = {Ma, T. and Bosilca, G. and Bouteiller, A. and Dongarra, J.},
title = {Kernel-assisted and topology-aware MPI collective communications on multi-core/many-core platforms},
booktitle = {Journal of Parallel and Distributed Computing},
institution = {Innovative Computing Laboratory, University of Tennessee},
volume = {accepted},
month = {January},
year = {2013}
}
[
hide]
Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Kurzak, J., Luszczek, P., Tomov, S., and J. Dongarra "Scalable Dense Linear Algebra on Heterogeneous Hardware," HPC: Transition Towards Exascale Processing, in the series Advances in Parallel Computing, IOS Press, 2013 [pdf] [bibtex] @article{icl:758,
author = {Bosilca, G. and Bouteiller, A. and Danalis, A. and Herault, T. and Kurzak, J. and Luszczek, P. and Tomov, S. and and J. Dongarra},
title = {Scalable Dense Linear Algebra on Heterogeneous Hardware},
booktitle = {HPC: Transition Towards Exascale Processing, in the series Advances in Parallel Computing},
institution = {Innovative Computing Laboratory, University of Tennessee},
year = {2013}
}
[
hide]
Bouteiller, A., Herault, T., Bosilca, G., Dongarra, J. "Correlated Set Coordination in Fault Tolerant Message Logging Protocols," Concurrency and Computation: Practice and Experience, Vol. 25, No. 4, pp. 572-585, 2013 [pdf] [bibtex] @article{icl:787,
author = {Bouteiller, A. and Herault, T. and Bosilca, G. and Dongarra, J.},
title = {Correlated Set Coordination in Fault Tolerant Message Logging Protocols},
booktitle = {Concurrency and Computation: Practice and Experience},
institution = {Innovative Computing Laboratory, University of Tennessee},
volume = {Vol. 25, No. 4},
pages = {pp. 572-585},
year = {2013}
}
[
hide]
Agullo, E., Bosilca, G., Castagnède, C., Dongarra, J., Ltaief, H., Tomov, S. "Matrices Over Runtime Systems at Exascale," Supercomputing '12 (poster), Salt Lake City, Utah, November, 2012 [bibtex] @article{icl:730,
author = {Agullo, E. and Bosilca, G. and Castagnède, C. and Dongarra, J. and Ltaief, H. and Tomov, S.},
title = {Matrices Over Runtime Systems at Exascale},
booktitle = {Supercomputing '12 (poster)},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Salt Lake City, Utah},
month = {November},
year = {2012}
}
[
hide]
Bland, W., Bouteiller, A., Herault, T., Hursey, J., Bosilca, G., Dongarra, J. "An Evaluation of User-Level Failure Mitigation Support in MPI," Proceedings of Recent Advances in Message Passing Interface - 19th European MPI Users' Group Meeting, EuroMPI 2012, Springer, Vienna, Austria, September 23 - 26, 2012 [pdf] [bibtex] @inproceedings{icl:680,
author = {Bland, W. and Bouteiller, A. and Herault, T. and Hursey, J. and Bosilca, G. and Dongarra, J.},
title = {An Evaluation of User-Level Failure Mitigation Support in MPI},
booktitle = {Proceedings of Recent Advances in Message Passing Interface - 19th European MPI Users' Group Meeting, EuroMPI 2012},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Vienna, Austria},
month = {September},
year = {2012}
}
[
hide]
Bosilca, G., Dongarra, J., Ltaief, H. "Power Profiling of Cholesky and QR Factorizations on Distributed Memory Systems," Third International Conference on Energy-Aware High Performance Computing, Hamburg, Germany, September, 2012 [pdf] [bibtex] @inproceedings{icl:710,
author = {Bosilca, G. and Dongarra, J. and Ltaief, H.},
title = {Power Profiling of Cholesky and QR Factorizations on Distributed Memory Systems},
booktitle = {Third International Conference on Energy-Aware High Performance Computing},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Hamburg, Germany},
month = {September},
year = {2012}
}
[
hide]
Bland, W., Du, P., Bouteiller, A., Herault, T., Bosilca, G., Dongarra, J. "A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI," 18th International European Conference on Parallel and Distributed Computing (Euro-Par 2012) (Best Paper Award), Christos Kaklamanis, Theodore Papatheodorou and Paul Spirakis eds. Springer-Verlag, Rhodes, Greece, August 27-31, 2012 [pdf] [bibtex] @inproceedings{icl:679,
author = {Bland, W. and Du, P. and Bouteiller, A. and Herault, T. and Bosilca, G. and Dongarra, J.},
title = {A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI},
booktitle = {18th International European Conference on Parallel and Distributed Computing (Euro-Par 2012) (Best Paper Award)},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Rhodes, Greece},
month = {August},
year = {2012}
}
[
hide]
Baboulin, M., Becker, D., Bosilca, G., Danalis, A., Dongarra, J. "An efficient distributed randomized solver with application to large dense linear systems," ICL Technical Report, ICL-UT-12-02, July 11, 2012 [pdf] [bibtex] @techreport{icl:683,
author = {Baboulin, M. and Becker, D. and Bosilca, G. and Danalis, A. and Dongarra, J.},
title = {An efficient distributed randomized solver with application to large dense linear systems},
booktitle = {ICL Technical Report},
institution = {Innovative Computing Laboratory, University of Tennessee},
month = {July},
year = {2012}
}
[
hide]
Bosilca, G., Bouteiller, A., Brunet, E., Cappello, F., Dongarra, J., Guermouche, A., Herault, T., Robert, Y., Vivien, F., Zaidouni, D. "Unified Model for Assessing Checkpointing Protocols at Extreme-Scale," University of Tennessee Computer Science Technical Report (also LAWN 269), UT-CS-12-697, June, 2012 [pdf] [bibtex] @techreport{icl:716,
author = {Bosilca, G. and Bouteiller, A. and Brunet, E. and Cappello, F. and Dongarra, J. and Guermouche, A. and Herault, T. and Robert, Y. and Vivien, F. and Zaidouni, D.},
title = {Unified Model for Assessing Checkpointing Protocols at Extreme-Scale},
booktitle = {University of Tennessee Computer Science Technical Report (also LAWN 269)},
institution = {Innovative Computing Laboratory, University of Tennessee},
month = {June},
year = {2012}
}
[
hide]
Ma, T., Bosilca, G., Bouteiller, A., Dongarra, J. "HierKNEM: An Adaptive Framework for Kernel-Assisted and Topology-Aware Collective Communications on Many-core Clusters," IPDPS 2012 (Best Paper), Shanghai, China, May, 2012 [pdf] [bibtex] @article{icl:700,
author = {Ma, T. and Bosilca, G. and Bouteiller, A. and Dongarra, J.},
title = {HierKNEM: An Adaptive Framework for Kernel-Assisted and Topology-Aware Collective Communications on Many-core Clusters},
booktitle = {IPDPS 2012 (Best Paper)},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Shanghai, China},
month = {May},
year = {2012}
}
[
hide]
Bouteiller, A., Herault, T., Bosilca, G., Dongarra, J. "Correlated Set Coordination in Fault Tolerant Message Logging Protocols," Concurrency and Computation: Practice and Experience (accepted), March, 2012 [bibtex] @article{icl:720,
author = {Bouteiller, A. and Herault, T. and Bosilca, G. and Dongarra, J.},
title = {Correlated Set Coordination in Fault Tolerant Message Logging Protocols},
booktitle = {Concurrency and Computation: Practice and Experience (accepted)},
institution = {Innovative Computing Laboratory, University of Tennessee},
month = {March},
year = {2012}
}
[
hide]
Du, P., Bouteiller, A., Bosilca, G., Herault, T., Dongarra, J. "Algorithm-Based Fault Tolerance for Dense Matrix Factorization," Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2012, J. Ramanujam, P. Sadayappan eds. ACM, New Orleans, LA, USA, 225-234, February 25-29, 2012 [pdf] [bibtex] @inproceedings{icl:672,
author = {Du, P. and Bouteiller, A. and Bosilca, G. and Herault, T. and Dongarra, J.},
title = {Algorithm-Based Fault Tolerance for Dense Matrix Factorization},
booktitle = {Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2012},
institution = {Innovative Computing Laboratory, University of Tennessee},
pages = {225-234},
address = {New Orleans, LA, USA},
month = {February},
year = {2012}
}
[
hide]
Bland, W., Bosilca, G., Bouteiller, A., Herault, T., Dongarra, J. "A Proposal for User-Level Failure Mitigation in the MPI-3 Standard," University of Tennessee Electrical Engineering and Computer Science Technical Report, ut-cs-12-693, February 24, 2012 [pdf] [bibtex] @techreport{icl:667,
author = {Bland, W. and Bosilca, G. and Bouteiller, A. and Herault, T. and Dongarra, J.},
title = {A Proposal for User-Level Failure Mitigation in the MPI-3 Standard},
booktitle = {University of Tennessee Electrical Engineering and Computer Science Technical Report},
institution = {Innovative Computing Laboratory, University of Tennessee},
month = {February},
year = {2012}
}
[
hide]
Danalis, A., Bouteiller, A., Bosilca, G., Dongarra, J., Herault, T. "From Serial Loops to Parallel Execution on Distributed Systems," PPoPP 2012 (submitted), New Orleans, LA, February, 2012 [pdf] [bibtex] @article{icl:699,
author = {Danalis, A. and Bouteiller, A. and Bosilca, G. and Dongarra, J. and Herault, T.},
title = {From Serial Loops to Parallel Execution on Distributed Systems},
booktitle = {PPoPP 2012 (submitted)},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {New Orleans, LA},
month = {February},
year = {2012}
}
[
hide]
Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Luszczek, P., Dongarra, J. "Dense Linear Algebra on Distributed Heterogeneous Hardware with a Symbolic DAG Approach," Scalable Computing and Communications: Theory and Practice, Khan, S., Wang, L., Zomaya, A. eds. John Wiley & Sons, 699-735, March, 2013 [bibtex] @article{icl:698,
author = {Bosilca, G. and Bouteiller, A. and Danalis, A. and Herault, T. and Luszczek, P. and Dongarra, J.},
title = {Dense Linear Algebra on Distributed Heterogeneous Hardware with a Symbolic DAG Approach},
booktitle = {Scalable Computing and Communications: Theory and Practice},
institution = {Innovative Computing Laboratory, University of Tennessee},
pages = {699-735},
month = {March},
year = {2012}
}
[
hide]
Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Lemarinier, P., Dongarra, J. "DAGuE: A generic distributed DAG Engine for High Performance Computing.," Parallel Computing, T. Hoefler eds. Elsevier, Vol. 38, No 1-2, 27-51, 2012 [pdf] [bibtex] @article{icl:670,
author = {Bosilca, G. and Bouteiller, A. and Danalis, A. and Herault, T. and Lemarinier, P. and Dongarra, J.},
title = {DAGuE: A generic distributed DAG Engine for High Performance Computing.},
booktitle = {Parallel Computing},
institution = {Innovative Computing Laboratory, University of Tennessee},
volume = {Vol. 38, No 1-2},
pages = {27-51},
year = {2012}
}
[
hide]
Bland, W., Du, P., Bouteiller, A., Herault, T., Bosilca, G., Dongarra, J. "Extending the Scope of the Checkpoint-on-Failure Protocol for Forward Recovery in Standard MPI," University of Tennessee Computer Science Technical Report, ut-cs-12-702, 2012 [pdf] [bibtex] @techreport{icl:724,
author = {Bland, W. and Du, P. and Bouteiller, A. and Herault, T. and Bosilca, G. and Dongarra, J.},
title = {Extending the Scope of the Checkpoint-on-Failure Protocol for Forward Recovery in Standard MPI},
booktitle = {University of Tennessee Computer Science Technical Report},
institution = {Innovative Computing Laboratory, University of Tennessee},
year = {2012}
}
[
hide]
Bosilca, G., Herault, T., Rezmerita, A., Dongarra, J. "On Scalability for MPI Runtime Systems," International Conference on Cluster Computing (CLUSTER), IEEEE, Austin, TX, USA, 187-195, September 26-30, 2011 [pdf] [bibtex] @inproceedings{icl:671,
author = {Bosilca, G. and Herault, T. and Rezmerita, A. and Dongarra, J.},
title = {On Scalability for MPI Runtime Systems},
booktitle = {International Conference on Cluster Computing (CLUSTER)},
institution = {Innovative Computing Laboratory, University of Tennessee},
pages = {187-195},
address = {Austin, TX, USA},
month = {September},
year = {2011}
}
[
hide]
Bosilca, G., Herault, T., Lemarinier, P. Rezmerita, A., Dongarra, J. "Scalable Runtime for MPI: Efficiently Building the Communication Infrastructure," Proceedings of Recent Advances in the Message Passing Interface - 18th European MPI Users' Group Meeting, EuroMPI 2011, Yiannis Cotronis, Anthony Danalis, Dimitrios S. Nikolopoulos, Jack Dongarra eds. Springer, Santorini, Greece, LNCS 6960, 342-344, September 18-21, 2011 [pdf] [bibtex] @inproceedings{icl:674,
author = {Bosilca, G. and Herault, T. and Lemarinier, P. Rezmerita, A. and Dongarra, J.},
title = {Scalable Runtime for MPI: Efficiently Building the Communication Infrastructure},
booktitle = {Proceedings of Recent Advances in the Message Passing Interface - 18th European MPI Users' Group Meeting, EuroMPI 2011},
institution = {Innovative Computing Laboratory, University of Tennessee},
volume = {LNCS 6960},
pages = {342-344},
address = {Santorini, Greece},
month = {September},
year = {2011}
}
[
hide]
Ma, T., Bouteiller, A., Bosilca, G., Dongarra, J. "Impact of Kernel-Assisted MPI Communication over Scientific Applications: CPMD and FFTW," 18th EuroMPI, Cotronis, Y., Danalis, A., Nikolopoulos, D., Dongarra, J. eds. Springer, Santorini, Greece, pp. 247-254, September, 2011 [bibtex] @article{icl:646,
author = {Ma, T. and Bouteiller, A. and Bosilca, G. and Dongarra, J.},
title = {Impact of Kernel-Assisted MPI Communication over Scientific Applications: CPMD and FFTW},
booktitle = {18th EuroMPI},
institution = {Innovative Computing Laboratory, University of Tennessee},
pages = {pp. 247-254},
address = {Santorini, Greece},
month = {September},
year = {2011}
}
[
hide]
Chaarawi, M., Gabriel, E., Keller, R., Graham, R., Bosilca, G., Dongarra, J. "OMPIO: A Modular Software Architecture for MPI I/O," 18th EuroMPI, Cotronis, Y., Danalis, A., Nikolopoulos, D., Dongarra, J. eds. Springer, Santorini, Greece, pp. 81-89, September, 2011 [bibtex] @article{icl:647,
author = {Chaarawi, M. and Gabriel, E. and Keller, R. and Graham, R. and Bosilca, G. and Dongarra, J.},
title = {OMPIO: A Modular Software Architecture for MPI I/O},
booktitle = {18th EuroMPI},
institution = {Innovative Computing Laboratory, University of Tennessee},
pages = {pp. 81-89},
address = {Santorini, Greece},
month = {September},
year = {2011}
}
[
hide]
Ma, T., Bosilca, G., Bouteiller, A., Goglin, B., Squyres, J., Dongarra, J. "Kernel Assisted Collective Intra-node MPI Communication Among Multi-core and Many-core CPUs," Int'l Conference on Parallel Processing (ICPP '11), Taipei, Taiwan, September, 2011 [bibtex] @inproceedings{icl:649,
author = {Ma, T. and Bosilca, G. and Bouteiller, A. and Goglin, B. and Squyres, J. and Dongarra, J.},
title = {Kernel Assisted Collective Intra-node MPI Communication Among Multi-core and Many-core CPUs},
booktitle = {Int'l Conference on Parallel Processing (ICPP '11)},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Taipei, Taiwan},
month = {September},
year = {2011}
}
[
hide]
Bosilca, G., Herault, T., Rezmerita, A., Dongarra, J. "On Scalability for MPI Runtime Systems," Proceedings of the 2011 IEEE International Conference on Cluster Computing, IEEE Computer Society, Austin, TX, 187 - 195, September, 2011 [pdf] [bibtex] @inproceedings{icl:765,
author = {Bosilca, G. and Herault, T. and Rezmerita, A. and Dongarra, J.},
title = {On Scalability for MPI Runtime Systems},
booktitle = {Proceedings of the 2011 IEEE International Conference on Cluster Computing},
institution = {Innovative Computing Laboratory, University of Tennessee},
pages = {187 - 195},
address = {Austin, TX},
month = {September},
year = {2011}
}
[
hide]
Bouteiller, A., Herault, T., Bosilca, G., Dongarra, J. "Correlated Set Coordination in Fault Tolerant Message Logging Protocols," Proceedings of 17th International Conference, Euro-Par 2011, Part II, Emmanuel Jeannot, Raymond Namyst, Jean Roman eds. Springer, Bordeaux, France, LNCS Vol. 6853, 51-64, August 29 - September 2, 2011 [pdf] [bibtex] @inproceedings{icl:673,
author = {Bouteiller, A. and Herault, T. and Bosilca, G. and Dongarra, J.},
title = {Correlated Set Coordination in Fault Tolerant Message Logging Protocols},
booktitle = {Proceedings of 17th International Conference, Euro-Par 2011, Part II},
institution = {Innovative Computing Laboratory, University of Tennessee},
volume = {LNCS Vol. 6853},
pages = {51-64},
address = {Bordeaux, France},
month = {August},
year = {2011}
}
[
hide]
Du, P., Bouteiller, A., Bosilca, G., Herault, T., Dongarra, J. "Algorithm-based Fault Tolerance for Dense Matrix Factorizations," University of Tennessee Computer Science Technical Report, Knoxville, TN, UT-CS-11-676, August 05, 2011 [pdf] [bibtex] @techreport{icl:626,
author = {Du, P. and Bouteiller, A. and Bosilca, G. and Herault, T. and Dongarra, J.},
title = {Algorithm-based Fault Tolerance for Dense Matrix Factorizations},
booktitle = {University of Tennessee Computer Science Technical Report},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Knoxville, TN},
month = {August},
year = {2011}
}
[
hide]
Bosilca, G., Bouteiller, A., Herault, T., Lemarier, P., Saengpatsa, N., Tomov, S., Dongarra, J. "Performance Portability of a GPU Enabled Factorization with the DAGuE Framework," IEEE Cluster: workshop on Parallel Programming on Accelerator Clusters (PPAC), June 24, 2011 [pdf] [bibtex] @inproceedings{icl:636,
author = {Bosilca, G. and Bouteiller, A. and Herault, T. and Lemarier, P. and Saengpatsa, N. and Tomov, S. and Dongarra, J.},
title = {Performance Portability of a GPU Enabled Factorization with the DAGuE Framework},
booktitle = {IEEE Cluster: workshop on Parallel Programming on Accelerator Clusters (PPAC)},
institution = {Innovative Computing Laboratory, University of Tennessee},
month = {June},
year = {2011}
}
[
hide]
Bosilca, G., Bouteiller, A., Herault, T., Lemarinier, P., Saengpatsa, N., Tomov, S., Dongarra, J. "A Unified HPC Environment for Hybrid Manycore/GPU Distributed Systems," IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 16-20, 2011 [bibtex] @inproceedings{icl:593,
author = {Bosilca, G. and Bouteiller, A. and Herault, T. and Lemarinier, P. and Saengpatsa, N. and Tomov, S. and Dongarra, J.},
title = {A Unified HPC Environment for Hybrid Manycore/GPU Distributed Systems},
booktitle = {IEEE International Parallel and Distributed Processing Symposium (submitted)},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Anchorage, AK},
month = {May},
year = {2011}
}
[
hide]
Bosilca, G., Herault, T., Rezmerita, A., Dongarra, J. "On Scalability for MPI Runtime Systems," University of Tennessee Computer Science Technical Report, Knoxville, TN, ICL-UT-11-05, May 1, 2011 [pdf] [bibtex] @techreport{icl:612,
author = {Bosilca, G. and Herault, T. and Rezmerita, A. and Dongarra, J.},
title = {On Scalability for MPI Runtime Systems},
booktitle = {University of Tennessee Computer Science Technical Report},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Knoxville, TN},
month = {May},
year = {2011}
}
[
hide]
Ma, T., Herault, T., Bosilca, G., Dongarra, J. "Process Distance-aware Adaptive MPI Collective Communications," IEEE Int'l Conference on Cluster Computing (Cluster 2011), Austin, Texas, September, 2011 [bibtex] @inproceedings{icl:648,
author = {Ma, T. and Herault, T. and Bosilca, G. and Dongarra, J.},
title = {Process Distance-aware Adaptive MPI Collective Communications},
booktitle = {IEEE Int'l Conference on Cluster Computing (Cluster 2011)},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Austin, Texas},
month = {September},
year = {2011}
}
[
hide]
Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Lemarinier, P., Dongarra, J. "DAGuE: A Generic Distributed DAG Engine for High Performance Computing," Proceedings of the Workshops of the 25th IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2011 Workshops), IEEE, Anchorage, Alaska, USA, 1151-1158, 16-20 May, 2011 [bibtex] @inproceedings{icl:675,
author = {Bosilca, G. and Bouteiller, A. and Danalis, A. and Herault, T. and Lemarinier, P. and Dongarra, J.},
title = {DAGuE: A Generic Distributed DAG Engine for High Performance Computing},
booktitle = {Proceedings of the Workshops of the 25th IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2011 Workshops)},
institution = {Innovative Computing Laboratory, University of Tennessee},
pages = {1151-1158},
address = {Anchorage, Alaska, USA},
year = {2011}
}
[
hide]
Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Haidar, A., Herault, T., Kurzak, J., Langou, J., Lemarinier, P., Ltaeif, H., Luszczek, P., YarKhan, A., Dongarra, J. "Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA," Proceedings of the Workshops of the 25th IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2011 Workshops), IEEE, Anchorage, Alaska, USA, 1432-1441, 16-20 May, 2011 [pdf] [bibtex] @inproceedings{icl:676,
author = {Bosilca, G. and Bouteiller, A. and Danalis, A. and Faverge, M. and Haidar, A. and Herault, T. and Kurzak, J. and Langou, J. and Lemarinier, P. and Ltaeif, H. and Luszczek, P. and YarKhan, A. and Dongarra, J.},
title = {Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA},
booktitle = {Proceedings of the Workshops of the 25th IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2011 Workshops)},
institution = {Innovative Computing Laboratory, University of Tennessee},
pages = {1432-1441},
address = {Anchorage, Alaska, USA},
year = {2011}
}
[
hide]
Ma, T., Bosilca, G., Bouteiller, A., Goglin, B., Squyres, J., Dongarra, J. "Kernel Assisted Collective Intra-node Communication Among Multicore and Manycore CPUs," University of Tennessee Computer Science Technical Report, UT-CS-10-663, November, 2010 [pdf] [bibtex] @techreport{icl:597,
author = {Ma, T. and Bosilca, G. and Bouteiller, A. and Goglin, B. and Squyres, J. and Dongarra, J.},
title = {Kernel Assisted Collective Intra-node Communication Among Multicore and Manycore CPUs},
booktitle = {University of Tennessee Computer Science Technical Report, UT-CS-10-663},
institution = {Innovative Computing Laboratory, University of Tennessee},
month = {November},
year = {2010}
}
[
hide]
Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Haidar, H., Herault, T., Kurzak, J., Langou, J., Lemariner, P., Ltaief, H., Luszczek, P., YarKhan, A., Dongarra, J. "Distributed Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA," University of Tennessee Computer Science Technical Report, UT-CS-10-660, Sept. 15, 2010 [pdf] [bibtex] @techreport{icl:563,
author = {Bosilca, G. and Bouteiller, A. and Danalis, A. and Faverge, M. and Haidar, H. and Herault, T. and Kurzak, J. and Langou, J. and Lemariner, P. and Ltaief, H. and Luszczek, P. and YarKhan, A. and Dongarra, J.},
title = {Distributed Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA},
booktitle = {University of Tennessee Computer Science Technical Report, UT-CS-10-660},
institution = {Innovative Computing Laboratory, University of Tennessee},
month = {Sept},
year = {2010}
}
[
hide]
Ma, T., Bouteiller, A., Bosilca, G., Dongarra, J. "Locality and Topology aware Intra-node Communication Among Multicore CPUs," Proceedings of the 17th EuroMPI conference, LNCS, Stuttgart, Germany, September, 2010 [pdf] [bibtex] @inproceedings{icl:535,
author = {Ma, T. and Bouteiller, A. and Bosilca, G. and Dongarra, J.},
title = {Locality and Topology aware Intra-node Communication Among Multicore CPUs},
booktitle = {Proceedings of the 17th EuroMPI conference},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Stuttgart, Germany},
month = {September},
year = {2010}
}
[
hide]
Bosilca, G., Bouteiller, A., Herault, T., Lemarinier, P., Dongarra, J. "Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols," Proceedings of EuroMPI 2010, Jack Dongarra, Michael Resch, Rainer Keller, Edgar Gabriel, eds. eds. Springer, Stuttgart, Germany, September, 2010 [pdf] [bibtex] @inproceedings{icl:534,
author = {Bosilca, G. and Bouteiller, A. and Herault, T. and Lemarinier, P. and Dongarra, J.},
title = {Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols},
booktitle = {Proceedings of EuroMPI 2010},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Stuttgart, Germany},
month = {September},
year = {2010}
}
[
hide]
Bouteiller, A., Bosilca, G., Dongarra, J. "Redesigning the Message Logging Model for High Performance," Concurrency and Computation: Practice and Experience (online version), June 27, 2010 [pdf] [bibtex] @article{icl:565,
author = {Bouteiller, A. and Bosilca, G. and Dongarra, J.},
title = {Redesigning the Message Logging Model for High Performance},
booktitle = {Concurrency and Computation: Practice and Experience (online version)},
institution = {Innovative Computing Laboratory, University of Tennessee},
month = {June},
year = {2010}
}
[
hide]
Turchenko, V., Grandinetti, L., Bosilca, G., Dongarra, J. "Improvement of parallelization efficiency of batch pattern BP training algorithm using Open MPI," Proceedings of International Conference on Computational Science, ICCS 2010 (to appear), Elsevier, Amsterdam The Netherlands, June, 2010 [pdf] [bibtex] @inproceedings{icl:527,
author = {Turchenko, V. and Grandinetti, L. and Bosilca, G. and Dongarra, J.},
title = {Improvement of parallelization efficiency of batch pattern BP training algorithm using Open MPI},
booktitle = {Proceedings of International Conference on Computational Science, ICCS 2010 (to appear)},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Amsterdam The Netherlands},
month = {June},
year = {2010}
}
[
hide]
Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Lemarinier, P., Dongarra, J. "DAGuE: A generic distributed DAG engine for high performance computing," Innovative Computing Laboratory Technical Report, ICL-UT-10-01, April 11, 2010 [pdf] [bibtex] @techreport{icl:528,
author = {Bosilca, G. and Bouteiller, A. and Danalis, A. and Herault, T. and Lemarinier, P. and Dongarra, J.},
title = {DAGuE: A generic distributed DAG engine for high performance computing},
booktitle = {Innovative Computing Laboratory Technical Report},
institution = {Innovative Computing Laboratory, University of Tennessee},
month = {April},
year = {2010}
}
[
hide]
Angskun, T., Fagg, G., Bosilca, G., Pjesivac-Grbovic, J., Dongarra, J. "Self-Healing Network for Scalable Fault-Tolerant Runtime Environments," Future Generation Computer Systems, Vol. 26, Number 3, pp. 479-485, March, 2010 [pdf] [bibtex] @article{icl:567,
author = {Angskun, T. and Fagg, G. and Bosilca, G. and Pjesivac-Grbovic, J. and Dongarra, J.},
title = {Self-Healing Network for Scalable Fault-Tolerant Runtime Environments},
booktitle = {Future Generation Computer Systems},
institution = {Innovative Computing Laboratory, University of Tennessee},
volume = {Vol. 26, Number 3},
pages = {pp. 479-485},
month = {March},
year = {2010}
}
[
hide]
Bosilca, G., Bouteiller, A., Danalis, A, Faverge, M., Haidar, A., Herault, T., Kurzak, J., Langou, J., Lemarinier, P., Ltaief, H., Luszczek, P., YarKhan, A., Dongarra, J. "Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project," Innovative Computing Laboratory Technical Report, ICL-UT-10-02, 2010 [pdf] [bibtex] @techreport{icl:529,
author = {Bosilca, G. and Bouteiller, A. and Danalis, A, Faverge, M. and Haidar, A. and Herault, T. and Kurzak, J. and Langou, J. and Lemarinier, P. and Ltaief, H. and Luszczek, P. and YarKhan, A. and Dongarra, J.},
title = {Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project},
booktitle = {Innovative Computing Laboratory Technical Report},
institution = {Innovative Computing Laboratory, University of Tennessee},
year = {2010}
}
[
hide]
Bosilca, G., Coti, C., Herault, T., Lemarinier, P., Dongarra, J. "Constructing Resiliant Communication Infrastructure for Runtime Environments in Advances in Parallel Computing," in Advances in Parallel Computing - Parallel Computing: From Multicores and GPU's to Petascale, Chapman, B., Desprez, F., Joubert, G., Lichnewsky, A., Peters, F., Priol, T. Eds. eds. Volume 19, pp. 441-451, 2010 [bibtex] @article{icl:555,
author = {Bosilca, G. and Coti, C. and Herault, T. and Lemarinier, P. and Dongarra, J.},
title = {Constructing Resiliant Communication Infrastructure for Runtime Environments in Advances in Parallel Computing},
booktitle = {in Advances in Parallel Computing - Parallel Computing: From Multicores and GPU's to Petascale},
institution = {Innovative Computing Laboratory, University of Tennessee},
volume = {Volume 19},
pages = {pp. 441-451},
year = {2010}
}
[
hide]
Lemarinier, P., Bosilca, G., Coti, C., Herault, T., Dongarra, J. "Constructing Resilient Communication Infrastructure for Runtime Environments," ParCo 2009, Lyon France, September, 2009 [bibtex] @article{icl:517,
author = {Lemarinier, P. and Bosilca, G. and Coti, C. and Herault, T. and Dongarra, J.},
title = {Constructing Resilient Communication Infrastructure for Runtime Environments},
booktitle = {ParCo 2009},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Lyon France},
month = {September},
year = {2009}
}
[
hide]
Bouteiller, A., Ropars, T., Bosilca, G., Morin, C., Dongarra, J. "Reasons for a pessimistic or optimistic message logging protocol in MPI uncoordinated failure, recovery," Cluster Computing and Workshops, 2009. CLUSTER '09. IEEE International Conference on, IEEE, New Orleans, LA, 1-9, August, 2009 [pdf] [bibtex] @inproceedings{icl:863,
author = {Bouteiller, A. and Ropars, T. and Bosilca, G. and Morin, C. and Dongarra, J.},
title = {Reasons for a pessimistic or optimistic message logging protocol in MPI uncoordinated failure, recovery},
booktitle = {Cluster Computing and Workshops, 2009. CLUSTER '09. IEEE International Conference on},
institution = {Innovative Computing Laboratory, University of Tennessee},
pages = {1-9},
address = {New Orleans, LA},
month = {August},
year = {2009}
}
[
hide]
Bosilca, G., Coti, C., Herault, T., Lemarinier, P., Dongarra, J. "Constructing resiliant communication infrastructure for runtime environments," Innovative Computing Laboratory Technical Report, ICL-UT-09-02, July 31, 2009 [pdf] [bibtex] @techreport{icl:484,
author = {Bosilca, G. and Coti, C. and Herault, T. and Lemarinier, P. and Dongarra, J.},
title = {Constructing resiliant communication infrastructure for runtime environments},
booktitle = {Innovative Computing Laboratory Technical Report},
institution = {Innovative Computing Laboratory, University of Tennessee},
month = {July},
year = {2009}
}
[
hide]
Dongarra, J., Bosilca, G., Delmas, R., Langou, J. "Algorithmic Based Fault Tolerance Applied to High Performance Computing," Journal of Parallel and Distributed Computing, Volume 69, pp. 410-416, 2009 [pdf] [bibtex] @article{icl:490,
author = {Dongarra, J. and Bosilca, G. and Delmas, R. and Langou, J.},
title = {Algorithmic Based Fault Tolerance Applied to High Performance Computing},
booktitle = {Journal of Parallel and Distributed Computing},
institution = {Innovative Computing Laboratory, University of Tennessee},
volume = {Volume 69},
pages = {pp. 410-416},
year = {2009}
}
[
hide]
Bosilca, G., Delmas, R., Dongarra, J., Langou, J. "Algorithmic Based Fault Tolerance Applied to High Performance Computing," University of Tennessee Computer Science Technical Report, UT-CS-08-620 (also LAPACK Working Note 205), June 19, 2008 [pdf] [bibtex] @techreport{icl:426,
author = {Bosilca, G. and Delmas, R. and Dongarra, J. and Langou, J.},
title = {Algorithmic Based Fault Tolerance Applied to High Performance Computing},
booktitle = {University of Tennessee Computer Science Technical Report, UT-CS-08-620 (also LAPACK Working Note 205)},
institution = {Innovative Computing Laboratory, University of Tennessee},
month = {June},
}
[
hide]
Bouteiller, A., Bosilca, G., Dongarra, J. "Redesigning the Message Logging Model for High Performance," International Supercomputer Conference (ISC 2008), Dresden, Germany, June 17, 2008 [pdf] [bibtex] @inproceedings{icl:456,
author = {Bouteiller, A. and Bosilca, G. and Dongarra, J.},
title = {Redesigning the Message Logging Model for High Performance},
booktitle = {International Supercomputer Conference (ISC 2008)},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Dresden, Germany},
month = {June},
}
[
hide]
Angskun, T., Bosilca, G., Vander Zanden, B., Dongarra, J. "Optimal Routing in Binomial Graph Networks," The International Conference on Parallel and Distributed Computing, applications and Technologies (PDCAT), IEEE Computer Society, Adelaide, Australia, December 3-6, 2007 [bibtex] @inproceedings{icl:374,
author = {Angskun, T. and Bosilca, G. and Vander Zanden, B. and Dongarra, J.},
title = {Optimal Routing in Binomial Graph Networks},
booktitle = {The International Conference on Parallel and Distributed Computing, applications and Technologies (PDCAT)},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Adelaide, Australia},
month = {December},
year = {2007}
}
[
hide]
Angskun, T., Bosilca, G., Dongarra, J. "Self-Healing in Binomial Graph Networks," 2nd International Workshop On Reliability in Decentralized Distributed Systems (RDDS 2007), Vilamoura, Algarve, Portugal, November, 2007 [pdf] [bibtex] @inproceedings{icl:380,
author = {Angskun, T. and Bosilca, G. and Dongarra, J.},
title = {Self-Healing in Binomial Graph Networks},
booktitle = {2nd International Workshop On Reliability in Decentralized Distributed Systems (RDDS 2007)},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Vilamoura, Algarve, Portugal},
month = {November},
year = {2007}
}
[
hide]
Bouteiller, A., Bosilca, G., Dongarra, J. "Retrospect: Deterministic Relay of MPI Applications for Interactive Distributed Debugging," Accepted for Euro PVM/MPI 2007, Springer, September, 2007 [bibtex] @article{icl:353,
author = {Bouteiller, A. and Bosilca, G. and Dongarra, J.},
title = {Retrospect: Deterministic Relay of MPI Applications for Interactive Distributed Debugging},
booktitle = {Accepted for Euro PVM/MPI 2007},
institution = {Innovative Computing Laboratory, University of Tennessee},
month = {September},
year = {2007}
}
[
hide]
Graham, R., Brightwell, R., Barrett, B., Bosilca, G., Pjesivac-Grbovic, J. "An Evaluation of Open MPI's Matching Transport Layer on the Cray XT," EuroPVM/MPI 2007, September, 2007 [bibtex] @article{icl:359,
author = {Graham, R. and Brightwell, R. and Barrett, B. and Bosilca, G. and Pjesivac-Grbovic, J.},
title = {An Evaluation of Open MPI's Matching Transport Layer on the Cray XT},
booktitle = {EuroPVM/MPI 2007},
institution = {Innovative Computing Laboratory, University of Tennessee},
month = {September},
year = {2007}
}
[
hide]
Angskun, T., Bosilca, G., Dongarra, J. "Binomial Graph: A Scalable and Fault- Tolerant Logical Network Topology," Proceedings of The Fifth International Symposium on Parallel and Distributed Processing and Applications (ISPA07), Springer, Niagara Falls, Canada, August 29-30, 2007 [pdf] [bibtex] @inproceedings{icl:355,
author = {Angskun, T. and Bosilca, G. and Dongarra, J.},
title = {Binomial Graph: A Scalable and Fault- Tolerant Logical Network Topology},
booktitle = {Proceedings of The Fifth International Symposium on Parallel and Distributed Processing and Applications (ISPA07)},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Niagara Falls, Canada},
month = {August},
year = {2007}
}
[
hide]
Pjesivac-Grbovic, J., Bosilca, G., Fagg, G., Angskun, T., Dongarra, J. "Decision Trees and MPI Collective Algorithm Selection Problem," Euro-Par 2007, Springer, Rennes, France, 105--115, August, 2007 [pdf] [bibtex] @article{icl:357,
author = {Pjesivac-Grbovic, J. and Bosilca, G. and Fagg, G. and Angskun, T. and Dongarra, J.},
title = {Decision Trees and MPI Collective Algorithm Selection Problem},
booktitle = {Euro-Par 2007},
institution = {Innovative Computing Laboratory, University of Tennessee},
pages = {105--115},
address = {Rennes, France},
month = {August},
year = {2007}
}
[
hide]
Pjesivac-Grbovic, J., Angskun, T., Bosilca, G., Fagg, G., Gabriel, E., Dongarra, J. "Performance Analysis of MPI Collective Operations," Cluster computing, Springer Netherlands, Volume 10, Number 2, 127-143, June, 2007 [pdf] [bibtex] @article{icl:358,
author = {Pjesivac-Grbovic, J. and Angskun, T. and Bosilca, G. and Fagg, G. and Gabriel, E. and Dongarra, J.},
title = {Performance Analysis of MPI Collective Operations},
booktitle = {Cluster computing},
institution = {Innovative Computing Laboratory, University of Tennessee},
volume = {Volume 10, Number 2},
pages = {127-143},
month = {June},
year = {2007}
}
[
hide]
Angskun, T., Bosilca, G., Fagg, G., Pjesivac-Grbovic, J., Dongarra, J. "Reliability Analysis of Self-Healing Network using Discrete-Event Simulation," Proceedings of Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07), IEEE Computer Society, 437-444, May, 2007 [bibtex] @inproceedings{icl:354,
author = {Angskun, T. and Bosilca, G. and Fagg, G. and Pjesivac-Grbovic, J. and Dongarra, J.},
title = {Reliability Analysis of Self-Healing Network using Discrete-Event Simulation},
booktitle = {Proceedings of Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07)},
institution = {Innovative Computing Laboratory, University of Tennessee},
pages = {437-444},
month = {May},
year = {2007}
}
[
hide]
Graham, R., Bosilca, G., Pjesivac-Grbovic, J. "A Comparison of Application Performance Using Open MPI and Cray MPI," Cray User Group, CUG 2007, May, 2007 [pdf] [bibtex] @article{icl:360,
author = {Graham, R. and Bosilca, G. and Pjesivac-Grbovic, J.},
title = {A Comparison of Application Performance Using Open MPI and Cray MPI},
booktitle = {Cray User Group, CUG 2007},
institution = {Innovative Computing Laboratory, University of Tennessee},
month = {May},
year = {2007}
}
[
hide]
Langou, J., Chen, Z., Bosilca, G., Dongarra, J., "Recovery Patterns for Iterative Methods in a Parallel Unstable Environment," SIAM SISC (to appear), May, 2007 [pdf] [bibtex] @article{icl:397,
author = {Langou, J. and Chen, Z. and Bosilca, G. and Dongarra, J. and },
title = {Recovery Patterns for Iterative Methods in a Parallel Unstable Environment},
booktitle = {SIAM SISC (to appear)},
institution = {Innovative Computing Laboratory, University of Tennessee},
month = {May},
year = {2007}
}
[
hide]
Buttari, A., Luszczek, P., Kurzak, J., Dongarra, J., Bosilca, G. "SCOP3: A Rough Guide to Scientific Computing On the PlayStation 3," University of Tennessee Computer Science Dept. Technical Report, UT-CS-07-595, April 17, 2007 [pdf] [bibtex] @techreport{icl:364,
author = {Buttari, A. and Luszczek, P. and Kurzak, J. and Dongarra, J. and Bosilca, G.},
title = {SCOP3: A Rough Guide to Scientific Computing On the PlayStation 3},
booktitle = {University of Tennessee Computer Science Dept. Technical Report, UT-CS-07-595},
institution = {Innovative Computing Laboratory, University of Tennessee},
month = {April},
year = {2007}
}
[
hide]
Dongarra, J., Chen, Z., Bosilca, G., Langou, J. "Disaster Survival Guide in Petascale Computing: An Algorithmic Approach," in Petascale Computing: Algorithms and Applications (to appear), Chapman & Hall - CRC Press, 2007 [pdf] [bibtex] @article{icl:366,
author = {Dongarra, J. and Chen, Z. and Bosilca, G. and Langou, J.},
title = {Disaster Survival Guide in Petascale Computing: An Algorithmic Approach},
booktitle = {in Petascale Computing: Algorithms and Applications (to appear)},
institution = {Innovative Computing Laboratory, University of Tennessee},
year = {2007}
}
[
hide]
Pjesivac--Grbovic, J., Bosilca, G., Fagg, G., Angskun, T., Dongarra, J. "MPI Collective Algorithm Selection and Quadtree Encoding," Parallel Computing (Special Edition: EuroPVM/MPI 2006), Elsevier, 2007 [pdf] [bibtex] @article{icl:356,
author = {Pjesivac--Grbovic, J. and Bosilca, G. and Fagg, G. and Angskun, T. and Dongarra, J.},
title = {MPI Collective Algorithm Selection and Quadtree Encoding},
booktitle = {Parallel Computing (Special Edition: EuroPVM/MPI 2006)},
institution = {Innovative Computing Laboratory, University of Tennessee},
year = {2007}
}
[
hide]
Pjesivac-Grbovic, J., Fagg, G., Angskun, T., Bosilca, G., Dongarra, J. "MPI Collective Algorithm Selection and Quadtree Encoding," Lecture Notes in Computer Science, Springer Berlin / Heidelberg, ICL-UT-06-13, Vol. 4192, Number 2006, pp. 40-48, September, 2006 [pdf] [bibtex] @article{icl:323,
author = {Pjesivac-Grbovic, J. and Fagg, G. and Angskun, T. and Bosilca, G. and Dongarra, J.},
title = {MPI Collective Algorithm Selection and Quadtree Encoding},
booktitle = {Lecture Notes in Computer Science},
institution = {Innovative Computing Laboratory, University of Tennessee},
volume = {Vol. 4192, Number 2006},
pages = {pp. 40-48},
month = {September},
year = {2006}
}
[
hide]
Fagg, G., Pjesivac-Grbovic, J., Bosilca, G., Angskun, T., Dongarra, J. "Flexible collective communication tuning architecture applied to Open MPI," 2006 Euro PVM/MPI (submitted), Bonn, Germany, September, 2006 [pdf] [bibtex] @article{icl:315,
author = {Fagg, G. and Pjesivac-Grbovic, J. and Bosilca, G. and Angskun, T. and Dongarra, J.},
title = {Flexible collective communication tuning architecture applied to Open MPI},
booktitle = {2006 Euro PVM/MPI (submitted)},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Bonn, Germany},
month = {September},
}
[
hide]
Angskun, T., Fagg, G., Bosilca, G., Pjesivac-Grbovic, J., Dongarra, J. "Self-Healing Network for Scalable Fault Tolerant Runtime Environments," DAPSYS 2006, 6th Austrian-Hungarian Workshop on Distributed and Parallel Systems, Innsbruck, Austria, September 21-23, 2006 [pdf] [bibtex] @inproceedings{icl:330,
author = {Angskun, T. and Fagg, G. and Bosilca, G. and Pjesivac-Grbovic, J. and Dongarra, J.},
title = {Self-Healing Network for Scalable Fault Tolerant Runtime Environments},
booktitle = {DAPSYS 2006, 6th Austrian-Hungarian Workshop on Distributed and Parallel Systems},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Innsbruck, Austria},
month = {September},
}
[
hide]
Bosilca, G., Chen, Z., Dongarra, J., Eijkhout, V., Fagg, G., Fuentes, E., Langou, J., Luszczek, P., Pjesivac-Grbovic, J., Seymour, K., You, H., Vadhiyar, S. "Self Adapting Numerical Software SANS Effort," IBM Journal of Research and Development, Volume 50, number 2/3, pp. 223-238, 2006 [pdf] [bibtex] @article{icl:332,
author = {Bosilca, G. and Chen, Z. and Dongarra, J. and Eijkhout, V. and Fagg, G. and Fuentes, E. and Langou, J. and Luszczek, P. and Pjesivac-Grbovic, J. and Seymour, K. and You, H. and Vadhiyar, S.},
title = {Self Adapting Numerical Software SANS Effort},
booktitle = {IBM Journal of Research and Development},
institution = {Innovative Computing Laboratory, University of Tennessee},
volume = {Volume 50, number 2/3},
pages = {pp. 223-238},
}
[
hide]
Pjesivac-Grbovic, J., Fagg, G., Angskun, T., Bosilca, G., Dongarra, J. "MPI Collective Algorithm Selection and Quadtree Encoding," ICL Technical Report, ICL-UT-06-11, 2006 [pdf] [bibtex] @techreport{icl:314,
author = {Pjesivac-Grbovic, J. and Fagg, G. and Angskun, T. and Bosilca, G. and Dongarra, J.},
title = {MPI Collective Algorithm Selection and Quadtree Encoding},
booktitle = {ICL Technical Report},
institution = {Innovative Computing Laboratory, University of Tennessee},
year = {2006}
}
[
hide]
Angskun, T., Fagg, G., Bosilca, G., Pjesivac-Grbovic, J., Dongarra, J. "Scalable Fault Tolerant Protocol for Parallel Runtime Environments," 2006 Euro PVM/MPI, Bonn, Germany, ICL-UT-06-12, 2006 [pdf] [bibtex] @article{icl:316,
author = {Angskun, T. and Fagg, G. and Bosilca, G. and Pjesivac-Grbovic, J. and Dongarra, J.},
title = {Scalable Fault Tolerant Protocol for Parallel Runtime Environments},
booktitle = {2006 Euro PVM/MPI},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Bonn, Germany},
year = {2006}
}
[
hide]
Fagg, G., Angskun, T., Bosilca, G., Pjesivac-Grbovic, J., Dongarra, J. "Scalable Fault Tolerant MPI: Extending the Recovery Algorithm," Proceedings of 12th European Parallel Virtual Machine and Message Passing Interface Conference - Euro PVM/MPI, Di Martino, B. et al. eds. Springer-Verlag Berlin, Sorrento (Naples) , Italy, LCNS 3666, pp. 67, September 18-21, 2005 [pdf] [bibtex] @inproceedings{icl:279,
author = {Fagg, G. and Angskun, T. and Bosilca, G. and Pjesivac-Grbovic, J. and Dongarra, J.},
title = {Scalable Fault Tolerant MPI: Extending the Recovery Algorithm},
booktitle = {Proceedings of 12th European Parallel Virtual Machine and Message Passing Interface Conference - Euro PVM/MPI},
institution = {Innovative Computing Laboratory, University of Tennessee},
volume = {LCNS 3666},
pages = {pp. 67},
address = {Sorrento (Naples) , Italy},
month = {September},
year = {2005}
}
[
hide]
Bosilca, G., Dongarra, J., Fagg, G., Langou, J. "Hash Functions for Datatype Signatures in MPI," Proceedings of 12th European Parallel Virtual Machine and Message Passing Interface Conference - Euro PVM/MPI, Di Martino, B. et al. eds. Springer-Verlag Berlin, Sorrento (Naples), Italy, LCNS 3666, pp. 76-83, September 18-21, 2005 [pdf] [bibtex] @inproceedings{icl:280,
author = {Bosilca, G. and Dongarra, J. and Fagg, G. and Langou, J.},
title = {Hash Functions for Datatype Signatures in MPI},
booktitle = {Proceedings of 12th European Parallel Virtual Machine and Message Passing Interface Conference - Euro PVM/MPI},
institution = {Innovative Computing Laboratory, University of Tennessee},
volume = {LCNS 3666},
pages = {pp. 76-83},
address = {Sorrento (Naples), Italy},
month = {September},
year = {2005}
}
[
hide]
Pjesivac-Grbovic, J., Angskun, T., Bosilca, G., Fagg, G., Gabriel, E., Dongarra, J. "Performance Analysis of MPI Collective Operations," 4th International Workshop on Performance Modeling, Evaluation, and Optmization of Parallel and Distributed Systems (PMEO-PDS '05), Denver, Colorado, April 4-8, 2005 [pdf] [bibtex] @inproceedings{icl:249,
author = {Pjesivac-Grbovic, J. and Angskun, T. and Bosilca, G. and Fagg, G. and Gabriel, E. and Dongarra, J.},
title = {Performance Analysis of MPI Collective Operations},
booktitle = {4th International Workshop on Performance Modeling, Evaluation, and Optmization of Parallel and Distributed Systems (PMEO-PDS '05)},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Denver, Colorado},
month = {April},
year = {2005}
}
[
hide]
Chen, Z., Fagg, G., Gabriel, E., Langou, J., Angskun, T., Bosilca, G., Dongarra, J. "Fault Tolerant High Performance Computing by a Coding Approach," Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (to appear), Chicago, Illinois, June 15-17, 2005 [pdf] [bibtex] @inproceedings{icl:265,
author = {Chen, Z. and Fagg, G. and Gabriel, E. and Langou, J. and Angskun, T. and Bosilca, G. and Dongarra, J.},
title = {Fault Tolerant High Performance Computing by a Coding Approach},
booktitle = {Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (to appear)},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Chicago, Illinois},
month = {June},
}
[
hide]
Pjesivac-Grbovic, J., Angskun, Bosilca, G., Fagg, G., Gabriel, E., Dongarra, J. "Performance Analysis of MPI Collective Operations," Cluster Computing Journal (to appear), 2006 [pdf] [bibtex] @article{icl:306,
author = {Pjesivac-Grbovic, J. and Angskun, Bosilca, G. and Fagg, G. and Gabriel, E. and Dongarra, J.},
title = {Performance Analysis of MPI Collective Operations},
booktitle = {Cluster Computing Journal (to appear)},
institution = {Innovative Computing Laboratory, University of Tennessee},
}
[
hide]
Bosilca, G., Chen, Z., Dongarra, J., Langou, J. "Recovery Patterns for Iterative Methods in a Parallel Unstable Environment," University of Tennessee Computer Science Department Technical Report, UT-CS-04-538, 2005 [pdf] [bibtex] @techreport{icl:301,
author = {Bosilca, G. and Chen, Z. and Dongarra, J. and Langou, J. },
title = {Recovery Patterns for Iterative Methods in a Parallel Unstable Environment},
booktitle = {University of Tennessee Computer Science Department Technical Report, UT-CS-04-538},
institution = {Innovative Computing Laboratory, University of Tennessee},
year = {2005}
}
[
hide]
Fagg, G., Gabriel, E., Bosilca, G., Angskun, T., Chen, Z., Pjesivac-Grbovic, J., London, K., Dongarra, J. "Extending the MPI Specification for Process Fault Tolerance on High Performance Computing Systems," Proceedings of ISC2004 (to appear), Heidelberg, Germany, June 23, 2004 [pdf] [bibtex] @inproceedings{icl:230,
author = {Fagg, G. and Gabriel, E. and Bosilca, G. and Angskun, T. and Chen, Z. and Pjesivac-Grbovic, J. and London, K. and Dongarra, J.},
title = {Extending the MPI Specification for Process Fault Tolerance on High Performance Computing Systems},
booktitle = {Proceedings of ISC2004 (to appear)},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Heidelberg, Germany},
month = {June},
year = {2004}
}
[
hide]
Fagg, G., Gabriel, E., Chen, Z., Angskun, T., Bosilca, G., Pjesivac-Grbovic, J., Dongarra, J. "Process Fault-Tolerance: Semantics, Design and Applications for High Performance Computing," International Journal for High Performance Applications and Supercomputing (to appear), April, 2004 [pdf] [bibtex] @article{icl:240,
author = {Fagg, G. and Gabriel, E. and Chen, Z. and Angskun, T. and Bosilca, G. and Pjesivac-Grbovic, J. and Dongarra, J.},
title = {Process Fault-Tolerance: Semantics, Design and Applications for High Performance Computing},
booktitle = {International Journal for High Performance Applications and Supercomputing (to appear)},
institution = {Innovative Computing Laboratory, University of Tennessee},
month = {April},
year = {2004}
}
[
hide]
Bosilca, G., Chen, Z., Dongarra, J., Langou, J. "Recovery Patterns for Iterative Methods in a Parallel Unstable Environment," ICL Technical Report, ICL-UT-04-04, 2004 [pdf] [bibtex] @techreport{icl:251,
author = {Bosilca, G. and Chen, Z. and Dongarra, J. and Langou, J.},
title = {Recovery Patterns for Iterative Methods in a Parallel Unstable Environment},
booktitle = {ICL Technical Report},
institution = {Innovative Computing Laboratory, University of Tennessee},
}
[
hide]
Fagg, G., Gabriel, E., Chen, Z., Angskun, T., Bosilca, G., Bukovsky, A., Dongarra, J. "Fault Tolerant Communication Library and Applications for High Performance Computing," Los Alamos Computer Science Institute (LACSI) Symposium 2003 (presented), Santa Fe, NM, October 27-29, 2003 [pdf] [bibtex] @inproceedings{icl:153,
author = {Fagg, G. and Gabriel, E. and Chen, Z. and Angskun, T. and Bosilca, G. and Bukovsky, A. and Dongarra, J.},
title = {Fault Tolerant Communication Library and Applications for High Performance Computing},
booktitle = {Los Alamos Computer Science Institute (LACSI) Symposium 2003 (presented)},
institution = {Innovative Computing Laboratory, University of Tennessee},
address = {Santa Fe, NM},
month = {October},
year = {2003}
}
[
hide]