Publications

Benoit, A., V. Le Fèvre, P. Raghavan, Y. Robert, and H. Sun, “Resilient scheduling heuristics for rigid parallel jobs,” Int. J. of Networking and Computing, vol. 11, no. 1, pp. 2-26, 2021.

(8.67 MB)

Benoit, A., T. Herault, L. Perotin, Y. Robert, and F. Vivien, “Revisiting I/O bandwidth-sharing strategies for HPC applications,” INRIA Research Report, no. RR-9502: INRIA, March 2023.

Benoit, A., A. Cavelan, Y. Robert, and H. Sun, “Optimal Resilience Patterns to Cope with Fail-stop and Silent Errors,” 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Chicago, IL, IEEE, May 2016.

(603.58 KB)

Benoit, A., Y. Robert, and S. K. Raina, “Efficient checkpoint/verification patterns for silent error detection,” Innovative Computing Laboratory Technical Report, no. ICL-UT-14-03: University of Tennessee, May 2014.

(397.75 KB)

Benoit, A., V. Le Fèvre, P. Raghavan, Y. Robert, and H. Sun, “Design and Comparison of Resilient Scheduling Heuristics for Parallel Jobs,” 22nd Workshop on Advances in Parallel and Distributed Computational Models (APDCM 2020), New Orleans, LA, IEEE Computer Society Press, May 2020.

(696.21 KB)

Benoit, A., A. Cavelan, F. M. Ciorba, V. Le Fèvre, and Y. Robert, “Combining Checkpointing and Replication for Reliable Execution of Linear Workflows with Fail-Stop and Silent Errors,” International Journal of Networking and Computing, vol. 9, no. 1, pp. 2-27.

(754.6 KB)

Benoit, A., S. Perarnau, L. Pottier, and Y. Robert, “A Performance Model to Execute Workflows on High-Bandwidth Memory Architectures,” The 47th International Conference on Parallel Processing (ICPP 2018), Eugene, OR, IEEE Computer Society Press, August 2018.

(868.44 KB)

Benoit, A., T. Herault, V. Le Fèvre, and Y. Robert, “Replication is More Efficient Than You Think,” The IEEE/ACM Conference on High Performance Computing Networking, Storage and Analysis (SC19), Denver, CO, ACM Press, November 2019.

(975.69 KB)

Benoit, A., R. Elghazi, and Y. Robert, “Max-Stretch Minimization on an Edge-Cloud Platform,” IPDPS'2021, the 34th IEEE International Parallel and Distributed Processing Symposium: IEEE Computer Society Press, 2021.

(4.94 MB)

Benoit, A., A. Cavelan, Y. Robert, and H. Sun, “Assessing General-purpose Algorithms to Cope with Fail-stop and Silent Errors,” ACM Transactions on Parallel Computing, August 2016.

(573.71 KB)

Benoit, A., L. Pottier, and Y. Robert, “Resilient Co-Scheduling of Malleable Applications,” International Journal of High Performance Computing Applications (IJHPCA), May 2017.

(1.62 MB)

Berman, F., A. Chien, K. Cooper, J. Dongarra, I. Foster, D. Gannon, L. Johnsson, K. Kennedy, C. Kesselman, D. Reed, et al., “The GrADS Project: Software Support for High-Level Grid Application Development,” Technical Report, February 2000.

(347.41 KB)

Berman, F., H. Casanova, A. Chien, K. Cooper, H. Dail, A. Dasgupta, W. Deng, J. Dongarra, L. Johnsson, K. Kennedy, et al., “New Grid Scheduling and Rescheduling Methods in the GrADS Project,” International Journal of Parallel Programming, vol. 33, no. 2: Springer, pp. 209-229, June 2005.

(306.41 KB)

Berman, F., A. Chien, K. Cooper, J. Dongarra, I. Foster, D. Gannon, L. Johnsson, K. Kennedy, C. Kesselman, J. Mellor-Crummey, et al., “The GrADS Project: Software Support for High-Level Grid Application Development,” International Journal of High Performance Applications and Supercomputing, vol. 15, no. 4, pp. 327-344, January 2001.

(271.52 KB)

Bernholc, J., M. Hodak, W. Lu, S. Moore, and S. Tomov, “Scalability Study of a Quantum Simulation Code,” PARA 2010, Reykjavik, Iceland, June 2010.

Bernholdt, D. E., S. Boehm, G. Bosilca, M G. Venkata, R. E. Grant, T. Naughton, H. P. Pritchard, M. Schulz, and G. R. Vallee, “A Survey of MPI Usage in the US Exascale Computing Project,” Concurrency Computation: Practice and Experience, September 2018.

(359.54 KB)

Berry, M., and J. Dongarra, “Atlanta Organizers Put Mathematics to Work For the Math Sciences Community,” SIAM News, vol. 32, no. 6, January 1999.

(45.98 KB)

Betancourt, F., K. Wong, E. Asemota, Q. Marshall, D. Nichols, and S. Tomov, “OpenDIEL: A Parallel Workflow Engine and DataAnalytics Framework,” Practice and Experience in Advanced Research Computing (PEARC ’19), Chicago, IL, ACM, July 2019.

(1.48 MB)

Bhatia, N., S. Moore, F. Wolf, J. Dongarra, and B. Mohr, “A Pattern-Based Approach to Automated Application Performance Analysis,” Workshop on Patterns in High Performance Computing, University of Illinois at Urbana-Champaign, May 2005.

(3.47 MB)

Bhatia, N., F. Song, F. Wolf, J. Dongarra, B. Mohr, and S. Moore, “Automatic Experimental Analysis of Communication Patterns in Virtual Topologies,” In Proceedings of the International Conference on Parallel Processing, Oslo, Norway, IEEE Computer Society, June 2005.

(227.13 KB)

Bhowmick, S., V. Eijkhout, Y. Freund, E. Fuentes, and D. Keyes, “Application of Machine Learning to the Selection of Sparse Linear Solvers,” International Journal of High Performance Computing Applications (submitted), 00 2006.

(392.96 KB)

Blackford, S., J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, et al., “Basic Linear Algebra Subprograms (BLAS),” (an update), submitted to ACM TOMS, February 2001.

(228.33 KB)

Blackford, S., J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, et al., “An Updated Set of Basic Linear Algebra Subprograms (BLAS),” ACM Transactions on Mathematical Software, vol. 28, no. 2, pp. 135-151, December 2002.

(228.33 KB)

Bland, W., P. Du, A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, “Extending the Scope of the Checkpoint-on-Failure Protocol for Forward Recovery in Standard MPI,” University of Tennessee Computer Science Technical Report, no. ut-cs-12-702, 00 2012.

(422.76 KB)

Bland, W., G. Bosilca, A. Bouteiller, T. Herault, and J. Dongarra, “A Proposal for User-Level Failure Mitigation in the MPI-3 Standard,” University of Tennessee Electrical Engineering and Computer Science Technical Report, no. ut-cs-12-693: University of Tennessee, February 2012.

(159.46 KB)

Bland, W., A. Bouteiller, T. Herault, J. Hursey, G. Bosilca, and J. Dongarra, “An evaluation of User-Level Failure Mitigation support in MPI,” Computing, vol. 95, issue 12, pp. 1171-1184, December 2013.

(311.23 KB)

Bland, W., P. Du, A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, “Extending the scope of the Checkpoint-on-Failure protocol for forward recovery in standard MPI,” Concurrency and Computation: Practice and Experience, July 2013.

(3.89 MB)

Bland, W., A. Bouteiller, T. Herault, J. Hursey, G. Bosilca, and J. Dongarra, “An Evaluation of User-Level Failure Mitigation Support in MPI,” Proceedings of Recent Advances in Message Passing Interface - 19th European MPI Users' Group Meeting, EuroMPI 2012, Vienna, Austria, Springer, September 2012.

Bland, W., “User Level Failure Mitigation in MPI,” Euro-Par 2012: Parallel Processing Workshops, vol. 7640, Rhodes Island, Greece, Springer Berlin Heidelberg, pp. 499-504, August 2012.

(136.15 KB)

Bland, W., P. Du, A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, “A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI,” 18th International European Conference on Parallel and Distributed Computing (Euro-Par 2012) (Best Paper Award), Rhodes, Greece, Springer-Verlag, August 2012.

(289.32 KB)

Bland, W., A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, “Post-failure recovery of MPI communication capability: Design and rationale,” International Journal of High Performance Computing Applications, vol. 27, issue 3, pp. 244 - 254, January 2013.

(285.77 KB)

Bland, W., “Enabling Application Resilience With and Without the MPI Standard,” 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Ottawa, Canada, May 2012.

(262.93 KB)

Boehmann, T. B., “Distributed Storage in RIB,” ICL Tech Report, no. ICL-UT-03-01, March 2003.

(213.02 KB)

Boillot, L., G. Bosilca, E. Agullo, and H. Calandra, “Task-Based Programming for Seismic Imaging: Preliminary Results,” 2014 IEEE International Conference on High Performance Computing and Communications (HPCC), Paris, France, IEEE, August 2014.

(625.86 KB)

Bosilca, G., J. Dongarra, and H. Ltaeif, “Power Profiling of Cholesky and QR Factorizations on Distributed Memory Systems,” Third International Conference on Energy-Aware High Performance Computing, Hamburg, Germany, September 2012.

(290.27 KB)

Bosilca, G., A. Bouteiller, T. Herault, P. Lemariner, and J. Dongarra, “Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols,” Proceedings of EuroMPI 2010, Stuttgart, Germany, Springer, September 2010.

(202.87 KB)

Bosilca, G., A. Bouteiller, T. Herault, V. Le Fèvre, Y. Robert, and J. Dongarra, “Distributed Termination Detection for HPC Task-Based Environments,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-14: University of Tennessee, June 2018.

Bosilca, G., A. Bouteiller, A. Danalis, M. Faverge, A. Haidar, T. Herault, J. Kurzak, J. Langou, P. Lemariner, H. Ltaeif, et al., “Distributed Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA,” University of Tennessee Computer Science Technical Report, UT-CS-10-660, September 2010.

(366.26 KB)

Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, P. Lemariner, and J. Dongarra, “DAGuE: A generic distributed DAG engine for high performance computing,” Innovative Computing Laboratory Technical Report, no. ICL-UT-10-01, April 2010.

(830.85 KB)

Bosilca, G., A. Bouteiller, T. Herault, P. Lemariner, N. Ohm Saengpatsa, S. Tomov, and J. Dongarra, “Performance Portability of a GPU Enabled Factorization with the DAGuE Framework,” IEEE Cluster: workshop on Parallel Programming on Accelerator Clusters (PPAC), June 2011.

(290.98 KB)

Bosilca, G., A. Bouteiller, A. Guermouche, T. Herault, Y. Robert, P. Sens, and J. Dongarra, “A Failure Detector for HPC Platforms,” The International Journal of High Performance Computing Applications, vol. 32, issue 1, pp. 139–158, January 2018.

(1.04 MB)

Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, and J. Dongarra, “From Serial Loops to Parallel Execution on Distributed Systems,” International European Conference on Parallel and Distributed Computing (Euro-Par '12), Rhodes, Greece, August 2012.

(203.08 KB)

Bosilca, G., A. Bouteiller, A. Danalis, M. Faverge, A. Haidar, T. Herault, J. Kurzak, J. Langou, P. Lemariner, H. Ltaeif, et al., “Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA,” Proceedings of the Workshops of the 25th IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2011 Workshops), Anchorage, Alaska, USA, IEEE, pp. 1432-1441, May 2011.

(1.26 MB)

Bosilca, G., A. Bouteiller, A. Danalis, M. Faverge, T. Herault, and J. Dongarra, “PaRSEC: Exploiting Heterogeneity to Enhance Scalability,” IEEE Computing in Science and Engineering, vol. 15, issue 6, pp. 36-45, November 2013.

(2.16 MB)

Bosilca, G., A. Bouteiller, T. Herault, P. Lemariner, N. Ohm Saengpatsa, S. Tomov, and J. Dongarra, “A Unified HPC Environment for Hybrid Manycore/GPU Distributed Systems,” IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 2011.

Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, J. Kurzak, P. Luszczek, S. Tomov, and J. Dongarra, “Scalable Dense Linear Algebra on Heterogeneous Hardware,” HPC: Transition Towards Exascale Processing, in the series Advances in Parallel Computing, 2013.

(760.32 KB)

Bosilca, G., C. Coti, T. Herault, P. Lemariner, and J. Dongarra, “Constructing resiliant communication infrastructure for runtime environments,” Innovative Computing Laboratory Technical Report, no. ICL-UT-09-02, July 2009.

(463.71 KB)

Bosilca, G., A. Bouteiller, A. Guermouche, T. Herault, Y. Robert, P. Sens, and J. Dongarra, “Failure Detection and Propagation in HPC Systems,” Proceedings of the The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Salt Lake City, Utah, IEEE Press, pp. 27:1-27:11, November 2016.

Bosilca, G., C. Coti, T. Herault, P. Lemariner, and J. Dongarra, “Constructing Resiliant Communication Infrastructure for Runtime Environments in Advances in Parallel Computing,” Advances in Parallel Computing - Parallel Computing: From Multicores and GPU's to Petascale, vol. 19, pp. 441-451, 2010.

Bosilca, G., T. Herault, A.. Rezmerita, and J. Dongarra, “On Scalability for MPI Runtime Systems,” International Conference on Cluster Computing (CLUSTER), Austin, TX, USA, IEEEE, pp. 187-195, September 2011.

(898.76 KB)

Main menu

Pages