Publications

Export 1274 results:
Filters: 10.1016 is j.parco.2021.102856  [Clear All Filters]
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
B
Benoit, A., A. Cavelan, Y. Robert, and H. Sun, Assessing General-purpose Algorithms to Cope with Fail-stop and Silent Errors,” ACM Transactions on Parallel Computing, August 2016.  (573.71 KB)
Benoit, A., V. Le Fèvre, P. Raghavan, Y. Robert, and H. Sun, Resilient scheduling heuristics for rigid parallel jobs,” Int. J. of Networking and Computing, vol. 11, no. 1, pp. 2-26, 2021.  (8.67 MB)
Benoit, A., L. Pottier, and Y. Robert, Resilient Co-Scheduling of Malleable Applications,” International Journal of High Performance Computing Applications (IJHPCA), May 2017.  (1.62 MB)
Benoit, A., T. Herault, L. Perotin, Y. Robert, and F. Vivien, Revisiting I/O bandwidth-sharing strategies for HPC applications,” INRIA Research Report, no. RR-9502: INRIA, March 2023.
Benoit, A., F. Cappello, A. Cavelan, Y. Robert, and H. Sun, Identifying the Right Replication Level to Detect and Correct Silent Errors at Scale,” 2017 Workshop on Fault-Tolerance for HPC at Extreme Scale, Washington, DC, ACM, June 2017.  (865.68 KB)
Benoit, A., V. Le Fèvre, P. Raghavan, Y. Robert, and H. Sun, Design and Comparison of Resilient Scheduling Heuristics for Parallel Jobs,” 22nd Workshop on Advances in Parallel and Distributed Computational Models (APDCM 2020), New Orleans, LA, IEEE Computer Society Press, May 2020.  (696.21 KB)
Benoit, A., A. Cavelan, Y. Robert, and H. Sun, Multi-Level Checkpointing and Silent Error Detection for Linear Workflows,” Journal of Computational Science, vol. 28, pp. 398–415, September 2018.
Benoit, A., A. Cavelan, V. Le Fèvre, Y. Robert, and H. Sun, Towards Optimal Multi-Level Checkpointing,” IEEE Transactions on Computers, vol. 66, issue 7, pp. 1212–1226, July 2017.  (1.39 MB)
Benoit, A., Y. Robert, and S. K. Raina, Efficient checkpoint/verification patterns for silent error detection,” Innovative Computing Laboratory Technical Report, no. ICL-UT-14-03: University of Tennessee, May 2014.  (397.75 KB)
Benoit, A., R. Elghazi, and Y. Robert, Max-Stretch Minimization on an Edge-Cloud Platform,” IPDPS'2021, the 34th IEEE International Parallel and Distributed Processing Symposium: IEEE Computer Society Press, 2021.  (4.94 MB)
Benoit, A., A. Cavelan, V. Le Fèvre, and Y. Robert, Optimal Checkpointing Period with replicated execution on heterogeneous platforms,” 2017 Workshop on Fault-Tolerance for HPC at Extreme Scale, Washington, DC, IEEE Computer Society Press, June 2017.  (1.02 MB)
Berman, F., H. Casanova, A. Chien, K. Cooper, H. Dail, A. Dasgupta, W. Deng, J. Dongarra, L. Johnsson, K. Kennedy, et al., New Grid Scheduling and Rescheduling Methods in the GrADS Project,” International Journal of Parallel Programming, vol. 33, no. 2: Springer, pp. 209-229, June 2005.  (306.41 KB)
Berman, F., A. Chien, K. Cooper, J. Dongarra, I. Foster, D. Gannon, L. Johnsson, K. Kennedy, C. Kesselman, J. Mellor-Crummey, et al., The GrADS Project: Software Support for High-Level Grid Application Development,” International Journal of High Performance Applications and Supercomputing, vol. 15, no. 4, pp. 327-344, January 2001.  (271.52 KB)
Berman, F., A. Chien, K. Cooper, J. Dongarra, I. Foster, D. Gannon, L. Johnsson, K. Kennedy, C. Kesselman, D. Reed, et al., The GrADS Project: Software Support for High-Level Grid Application Development,” Technical Report, February 2000.  (347.41 KB)
Bernholc, J., M. Hodak, W. Lu, S. Moore, and S. Tomov, Scalability Study of a Quantum Simulation Code,” PARA 2010, Reykjavik, Iceland, June 2010.
Bernholdt, D. E., S. Boehm, G. Bosilca, M G. Venkata, R. E. Grant, T. Naughton, H. P. Pritchard, M. Schulz, and G. R. Vallee, A Survey of MPI Usage in the US Exascale Computing Project,” Concurrency Computation: Practice and Experience, September 2018.  (359.54 KB)
Berry, M., and J. Dongarra, Atlanta Organizers Put Mathematics to Work For the Math Sciences Community,” SIAM News, vol. 32, no. 6, January 1999.  (45.98 KB)
Betancourt, F., K. Wong, E. Asemota, Q. Marshall, D. Nichols, and S. Tomov, OpenDIEL: A Parallel Workflow Engine and DataAnalytics Framework,” Practice and Experience in Advanced Research Computing (PEARC ’19), Chicago, IL, ACM, July 2019.  (1.48 MB)
Bhatia, N., S. Moore, F. Wolf, J. Dongarra, and B. Mohr, A Pattern-Based Approach to Automated Application Performance Analysis,” Workshop on Patterns in High Performance Computing, University of Illinois at Urbana-Champaign, May 2005.  (3.47 MB)
Bhatia, N., F. Song, F. Wolf, J. Dongarra, B. Mohr, and S. Moore, Automatic Experimental Analysis of Communication Patterns in Virtual Topologies,” In Proceedings of the International Conference on Parallel Processing, Oslo, Norway, IEEE Computer Society, June 2005.  (227.13 KB)
Bhowmick, S., V. Eijkhout, Y. Freund, E. Fuentes, and D. Keyes, Application of Machine Learning to the Selection of Sparse Linear Solvers,” International Journal of High Performance Computing Applications (submitted), 00 2006.  (392.96 KB)
Blackford, S., J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, et al., Basic Linear Algebra Subprograms (BLAS),” (an update), submitted to ACM TOMS, February 2001.  (228.33 KB)
Blackford, S., J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, et al., An Updated Set of Basic Linear Algebra Subprograms (BLAS),” ACM Transactions on Mathematical Software, vol. 28, no. 2, pp. 135-151, December 2002.  (228.33 KB)
Bland, W., User Level Failure Mitigation in MPI,” Euro-Par 2012: Parallel Processing Workshops, vol. 7640, Rhodes Island, Greece, Springer Berlin Heidelberg, pp. 499-504, August 2012.  (136.15 KB)
Bland, W., P. Du, A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI,” 18th International European Conference on Parallel and Distributed Computing (Euro-Par 2012) (Best Paper Award), Rhodes, Greece, Springer-Verlag, August 2012.  (289.32 KB)
Bland, W., A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, Post-failure recovery of MPI communication capability: Design and rationale,” International Journal of High Performance Computing Applications, vol. 27, issue 3, pp. 244 - 254, January 2013.  (285.77 KB)
Bland, W., Enabling Application Resilience With and Without the MPI Standard,” 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Ottawa, Canada, May 2012.  (262.93 KB)
Bland, W., P. Du, A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, Extending the Scope of the Checkpoint-on-Failure Protocol for Forward Recovery in Standard MPI,” University of Tennessee Computer Science Technical Report, no. ut-cs-12-702, 00 2012.  (422.76 KB)
Bland, W., G. Bosilca, A. Bouteiller, T. Herault, and J. Dongarra, A Proposal for User-Level Failure Mitigation in the MPI-3 Standard,” University of Tennessee Electrical Engineering and Computer Science Technical Report, no. ut-cs-12-693: University of Tennessee, February 2012.  (159.46 KB)
Bland, W., P. Du, A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, Extending the scope of the Checkpoint-on-Failure protocol for forward recovery in standard MPI,” Concurrency and Computation: Practice and Experience, July 2013.  (3.89 MB)
Bland, W., A. Bouteiller, T. Herault, J. Hursey, G. Bosilca, and J. Dongarra, An evaluation of User-Level Failure Mitigation support in MPI,” Computing, vol. 95, issue 12, pp. 1171-1184, December 2013.  (311.23 KB)
Bland, W., A. Bouteiller, T. Herault, J. Hursey, G. Bosilca, and J. Dongarra, An Evaluation of User-Level Failure Mitigation Support in MPI,” Proceedings of Recent Advances in Message Passing Interface - 19th European MPI Users' Group Meeting, EuroMPI 2012, Vienna, Austria, Springer, September 2012.
Boehmann, T. B., Distributed Storage in RIB,” ICL Tech Report, no. ICL-UT-03-01, March 2003.  (213.02 KB)
Boillot, L., G. Bosilca, E. Agullo, and H. Calandra, Task-Based Programming for Seismic Imaging: Preliminary Results,” 2014 IEEE International Conference on High Performance Computing and Communications (HPCC), Paris, France, IEEE, August 2014.  (625.86 KB)
Bosilca, G., A. Bouteiller, E. Brunet, F. Cappello, J. Dongarra, A. Guermouche, T. Herault, Y. Robert, F. Vivien, and D. Zaidouni, Unified Model for Assessing Checkpointing Protocols at Extreme-Scale,” University of Tennessee Computer Science Technical Report (also LAWN 269), no. UT-CS-12-697, June 2012.  (2.76 MB)
Bosilca, G., A. Bouteiller, T. Herault, Y. Robert, and J. Dongarra, Assessing the impact of ABFT and Checkpoint composite strategies,” University of Tennessee Computer Science Technical Report, no. ICL-UT-13-03, 2013.  (968.47 KB)
Bosilca, G., R. Harrison, T. Herault, M. Mahdi Javanmard, P. Nookala, and E. Valeev, The Template Task Graph (TTG) - An Emerging Practical Dataflow Programming Paradigm for Scientific Simulation at Extreme Scale,” 2020 IEEE/ACM 5th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2): IEEE, November 2020.  (139.6 KB)
Bosilca, G., A. Bouteiller, A. Guermouche, T. Herault, Y. Robert, P. Sens, and J. Dongarra, A Failure Detector for HPC Platforms,” The International Journal of High Performance Computing Applications, vol. 32, issue 1, pp. 139–158, January 2018.  (1.04 MB)
Bosilca, G., Z. Chen, J. Dongarra, and J. Langou, Recovery Patterns for Iterative Methods in a Parallel Unstable Environment,” ICL Technical Report, no. ICL-UT-04-04, January 2004.  (241.36 KB)
Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, P. Lemariner, and J. Dongarra, DAGuE: A generic distributed DAG engine for high performance computing,” Innovative Computing Laboratory Technical Report, no. ICL-UT-10-01, April 2010.  (830.85 KB)
Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, P. Lemariner, and J. Dongarra, DAGuE: A generic distributed DAG Engine for High Performance Computing.,” Parallel Computing, vol. 38, no. 1-2: Elsevier, pp. 27-51, 00 2012.  (830.85 KB)
Bosilca, G., Z. Chen, J. Dongarra, and J. Langou, Recovery Patterns for Iterative Methods in a Parallel Unstable Environment,” University of Tennessee Computer Science Department Technical Report, UT-CS-04-538, 00 2005.  (241.36 KB)
Bosilca, G., A. Bouteiller, A. Guermouche, T. Herault, Y. Robert, P. Sens, and J. Dongarra, Failure Detection and Propagation in HPC Systems,” Proceedings of the The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Salt Lake City, Utah, IEEE Press, pp. 27:1-27:11, November 2016.
Bosilca, G., D. Genet, R. Harrison, T. Herault, M. Mahdi Javanmard, C. Peng, and E. Valeev, Tensor Contraction on Distributed Hybrid Architectures using a Task-Based Runtime System,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-13: University of Tennessee, December 2018.  (326.11 KB)
Bosilca, G., T. Herault, P. Lemariner, J. Dongarra, and A.. Rezmerita, Scalable Runtime for MPI: Efficiently Building the Communication Infrastructure,” Proceedings of Recent Advances in the Message Passing Interface - 18th European MPI Users' Group Meeting, EuroMPI 2011, vol. 6960, Santorini, Greece, Springer, pp. 342-344, September 2011.  (115.75 KB)
Bosilca, G., A. Bouteiller, T. Herault, P. Lemariner, N. Ohm Saengpatsa, S. Tomov, and J. Dongarra, Performance Portability of a GPU Enabled Factorization with the DAGuE Framework,” IEEE Cluster: workshop on Parallel Programming on Accelerator Clusters (PPAC), June 2011.  (290.98 KB)
Bosilca, G., J. Dongarra, and H. Ltaeif, Power Profiling of Cholesky and QR Factorizations on Distributed Memory Systems,” Third International Conference on Energy-Aware High Performance Computing, Hamburg, Germany, September 2012.  (290.27 KB)
Bosilca, G., T. Herault, and J. Dongarra, Context Identifier Allocation in Open MPI,” University of Tennessee Computer Science Technical Report, no. ICL-UT-16-01: Innovative Computing Laboratory, University of Tennessee, January 2016.  (490.89 KB)
Bosilca, G., A. Bouteiller, T. Herault, P. Lemariner, N. Ohm Saengpatsa, S. Tomov, and J. Dongarra, A Unified HPC Environment for Hybrid Manycore/GPU Distributed Systems,” IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 2011.
Bosilca, G., C. Coti, T. Herault, P. Lemariner, and J. Dongarra, Constructing Resiliant Communication Infrastructure for Runtime Environments in Advances in Parallel Computing,” Advances in Parallel Computing - Parallel Computing: From Multicores and GPU's to Petascale, vol. 19, pp. 441-451, 2010.

Pages