Publications

Export 1275 results:
Filters: 10.1109 is TPDS.2021.3131657  [Clear All Filters]
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
L
Lemariner, P., G. Bosilca, C. Coti, T. Herault, and J. Dongarra, Constructing Resilient Communication Infrastructure for Runtime Environments,” ParCo 2009, Lyon France, September 2009.
Li, J., B. Nicolae, J. M. Wozniak, and G. Bosilca, Understanding Scalability and Fine-Grain Parallelism of Synchronous Data Parallel Training,” 2019 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC), Denver, CO, IEEE, November 2019.  (696.89 KB)
Li, Y., and J. Dongarra, Request Sequencing: Enabling Workflow for Efficient Parallel Problem Solving in GridSolve,” ICL Technical Report, no. ICL-UT-08-01, April 2008.  (1.64 MB)
Li, Y., J. Dongarra, and S. Tomov, A Note on Auto-tuning GEMM for GPUs,” 9th International Conference on Computational Science (ICCS 2009), no. 5544-5545, Baton Rouge, LA, pp. 884-892, May 2009.  (236.02 KB)
Li, Y., A. YarKhan, J. Dongarra, K. Seymour, and A. Hurault, Enabling Workflows in GridSolve: Request Sequencing and Service Trading,” Journal of Supercomputing, vol. 64, issue 3, pp. 1133-1152, June 2013.  (821.29 KB)
Li, Y., J. Dongarra, K. Seymour, and A. YarKhan, Request Sequencing: Enabling Workflow for Efficient Problem Solving in GridSolve,” International Conference on Grid and Cooperative Computing (GCC 2008) (submitted), Shenzhen, China, October 2008.  (1.64 MB)
Funk, Y., M. Götz, and H. Anzt, Prediction of Optimal Solvers for Sparse Linear Systems Using Deep Learning,” 2022 SIAM Conference on Parallel Processing for Scientific Computing (PP), Philadelphia, PA, Society for Industrial and Applied Mathematics, pp. 14 - 24.
Li, J., G. Bosilca, A. Bouteiller, and B. Nicolae, Elastic deep learning through resilient collective operations,” SC-W 2023: Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, Denver, CO, ACM, November 2023.
Lindquist, N., P. Luszczek, and J. Dongarra, Replacing Pivoting in Distributed Gaussian Elimination with Randomized Techniques,” 2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), Atlanta, GA, IEEE, November 2020.  (184.6 KB)
Lindquist, N., M. Gates, P. Luszczek, and J. Dongarra, Threshold Pivoting for Dense LU Factorization,” ScalAH22: 13th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Heterogeneous Systems , Dallas, Texas, IEEE, November 2022.  (721.77 KB)
Lindquist, N., P. Luszczek, and J. Dongarra, Improving the Performance of the GMRES Method using Mixed-Precision Techniques,” Smoky Mountains Computational Sciences & Engineering Conference (SMC2020), August 2020.  (600.33 KB)
Lindquist, N., P. Luszczek, and J. Dongarra, Generalizing Random Butterfly Transforms to Arbitrary Matrix Sizes : arXiv, December 2023.
Lindquist, N., P. Luszczek, and J. Dongarra, Using Additive Modifications in LU Factorization Instead of Pivoting,” 37th ACM International Conference on Supercomputing (ICS'23), Orlando, FL, ACM, June 2023.  (624.18 KB)
Lindquist, N., P. Luszczek, and J. Dongarra, Accelerating Restarted GMRES with Mixed Precision Arithmetic,” IEEE Transactions on Parallel and Distributed Systems, June 2021.  (572.4 KB)
Lively, C., X. Wu, V. Taylor, S. Moore, H-C. Chang, C-Y. Su, and K. Cameron, Power-Aware Prediction Models of Hybrid (MPI/OpenMP) Scientific Applications,” International Conference on Energy-Aware High Performance Computing (EnA-HPC 2011), Hamburg, Germany, September 2011.  (479.49 KB)
Lively, C., X. Wu, V. Taylor, S. Moore, H-C. Chang, and K. Cameron, Energy and performance characteristics of different parallel implementations of scientific applications on multicore systems,” International Journal of High Performance Computing Applications, vol. 25, no. 3, pp. 342-350, 00 2011.  (467.18 KB)
London, K., S. Moore, P. Mucci, K. Seymour, and R. Luczak, The PAPI Cross-Platform Interface to Hardware Performance Counters,” Department of Defense Users' Group Conference Proceedings, Biloxi, Mississippi, June 2001.  (328.56 KB)
London, K., J. Dongarra, S. Moore, P. Mucci, K. Seymour, and T.. Spencer, End-user Tools for Application Performance Analysis, Using Hardware Counters,” International Conference on Parallel and Distributed Computing Systems, Dallas, TX, August 2001.  (306.54 KB)
Lopez, F., E. Chow, S. Tomov, and J. Dongarra, Asynchronous SGD for DNN Training on Shared-Memory Parallel Architectures,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-04: University of Tennessee, Knoxville, March 2020.  (188.51 KB)
Lopez, F., E. Chow, S. Tomov, and J. Dongarra, Asynchronous SGD for DNN Training on Shared-Memory Parallel Architectures,” Workshop on Scalable Deep Learning over Parallel And Distributed Infrastructures (ScaDL 2020), May 2020.  (188.51 KB)
Lopez, M. G., W. Joubert, V. Larrea, O. Hernandez, A. Haidar, S. Tomov, and J. Dongarra, Evaluation of Directive-Based Performance Portable Programming Models,” International Journal of High Performance Computing and Networking, vol. 14, issue 2, pp. 165-182.  (1.12 MB)
Lopez, F., and T. Mary, Mixed Precision LU Factorization on GPU Tensor Cores: Reducing Data Movement and Memory Footprint,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-13: University of Tennessee, September 2020.  (409 KB)
Lopez, M. G., V. Larrea, W. Joubert, O. Hernandez, A. Haidar, S. Tomov, and J. Dongarra, Towards Achieving Performance Portability Using Directives for Accelerators,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Third Workshop on Accelerator Programming Using Directives (WACCPD), Salt Lake City, Utah, Innovative Computing Laboratory, University of Tennessee, November 2016.  (567.02 KB)
Losada, N., A. Bouteiller, and G. Bosilca, Asynchronous Receiver-Driven Replay for Local Rollback of MPI Applications,” Fault Tolerance for HPC at eXtreme Scale (FTXS) Workshop at The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'19), November 2019.  (440.7 KB)
Losada, N., P. González, M. J. Martín, G. Bosilca, A. Bouteiller, and K. Teranishi, Fault Tolerance of MPI Applications in Exascale Systems: The ULFM Solution,” Future Generation Computer Systems, vol. 106, pp. 467-481, May 2020.  (2.06 MB)
Losada, N., G. Bosilca, A. Bouteiller, P. González, and M. J. Martín, Local Rollback for Resilient MPI Applications with Application-Level Checkpointing and Message Logging,” Future Generation Computer Systems, vol. 91, pp. 450-464, February 2019.  (1.16 MB)
Ltaeif, H., S. Tomov, R. Nath, and J. Dongarra, Hybrid Multicore Cholesky Factorization with Multiple GPU Accelerators,” IEEE Transaction on Parallel and Distributed Systems (submitted), March 2010.  (3.75 MB)
Ltaeif, H., P. Luszczek, and J. Dongarra, Profiling High Performance Dense Linear Algebra Algorithms on Multicore Architectures for Power and Energy Efficiency,” International Conference on Energy-Aware High Performance Computing (EnA-HPC 2011), Hamburg, Germany, September 2011.  (1.27 MB)
Ltaeif, H., S. Tomov, R. Nath, P. Du, and J. Dongarra, A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators,” Proc. of VECPAR'10 (to appear), Berkeley, CA, June 2010.  (870.46 KB)
Ltaeif, H., J. Kurzak, and J. Dongarra, Parallel Block Hessenberg Reduction using Algorithms-By-Tiles for Multicore Architectures Revisited,” University of Tennessee Computer Science Technical Report, UT-CS-08-624 (also LAPACK Working Note 208), August 2008.  (420.31 KB)
Ltaeif, H., P. Luszczek, and J. Dongarra, High Performance Bidiagonal Reduction using Tile Algorithms on Homogeneous Multicore Architectures,” ACM Transactions on Mathematical Software (TOMS), vol. 39, issue 3, no. 16, 2013.  (665.7 KB)
Ltaeif, H., J. Kurzak, J. Dongarra, and R. M. Badia, Scheduling Two-sided Transformations using Tile Algorithms on Multicore Architectures,” Journal of Scientific Computing, vol. 18, no. 1, pp. 33-50, 00 2010.  (334.5 KB)
Ltaeif, H., P. Luszczek, and J. Dongarra, High Performance Bidiagonal Reduction using Tile Algorithms on Homogeneous Multicore Architectures,” University of Tennessee Computer Science Technical Report, UT-CS-11-673, (also Lawn 247), May 2011.  (424.93 KB)
Ltaeif, H., J. Kurzak, and J. Dongarra, Parallel Band Two-Sided Matrix Bidiagonalization for Multicore Architectures,” IEEE Transactions on Parallel and Distributed Systems (to appear), May 2009.  (208.16 KB)
Ltaeif, H., P. Luszczek, and J. Dongarra, Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures using Tree Reduction,” Lecture Notes in Computer Science, vol. 7203, pp. 661-670, September 2012.  (185.77 KB)
Ltaeif, H., J. Kurzak, and J. Dongarra, Parallel Band Two-Sided Matrix Bidiagonalization for Multicore Architectures,” IEEE Transactions on Parallel and Distributed Systems, pp. 417-423, April 2010.  (208.16 KB)
Lu, Y., I. Yamazaki, F. Ino, Y. Matsushita, S. Tomov, and J. Dongarra, Reducing the Amount of out-of-core Data Access for GPU-Accelerated Randomized SVD,” Concurrency and Computation: Practice and Experience, April 2020.  (1.43 MB)
Lukarski, D., H. Anzt, S. Tomov, and J. Dongarra, Hybrid Multi-Elimination ILU Preconditioners on GPUs,” International Heterogeneity in Computing Workshop (HCW), IPDPS 2014, Phoenix, AZ, IEEE, May 2014.  (1.67 MB)
Luo, X., W. Wu, G. Bosilca, Y. Pei, Q. Cao, T. Patinyasakdikul, D. Zhong, and J. Dongarra, HAN: A Hierarchical AutotuNed Collective Communication Framework,” IEEE Cluster Conference, Kobe, Japan, Best Paper Award, IEEE Computer Society Press, September 2020.  (764.05 KB)
Luo, X., W. Wu, G. Bosilca, T. Patinyasakdikul, L. Wang, and J. Dongarra, ADAPT: An Event-Based Adaptive Collective Communication Framework,” The 27th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '18), Tempe, Arizona, ACM Press, June 2018.  (493.65 KB)
Luszczek, P., E. Meek, S. Moore, D. Terpstra, V. M. Weaver, and J. Dongarra, Evaluation of the HPC Challenge Benchmarks in Virtualized Environments,” 6th Workshop on Virtualization in High-Performance Cloud Computing, Bordeaux, France, August 2011.  (114.73 KB)
Luszczek, P., High Performance Development for High End Computing with Python Language Wrapper (PLW),” International Journal of High Performance Computing Applications (to appear), 00 2006.  (179.32 KB)
Luszczek, P., W. M. Sid-Lakhdar, and J. Dongarra, Combining multitask and transfer learning with deep Gaussian processes for autotuning-based performance engineering,” The International Journal of High Performance Computing Applications, March 2023.
Luszczek, P., and J. Dongarra, Analysis of Various Scalar, Vector, and Parallel Implementations of RandomAccess,” Innovative Computing Laboratory (ICL) Technical Report, no. ICL-UT-10-03, June 2010.  (226.9 KB)
Luszczek, P., J. Kurzak, and J. Dongarra, Looking Back at Dense Linear Algebra Software,” Journal of Parallel and Distributed Computing, vol. 74, issue 7, pp. 2548–2560, July 2014.  (1.79 MB)
Luszczek, P., J. Kurzak, I. Yamazaki, D. Keffer, V. Maroulas, and J. Dongarra, Autotuning Techniques for Performance-Portable Point Set Registration in 3D,” Supercomputing Frontiers and Innovations, vol. 5, no. 4, December 2018.  (720.15 KB)
Luszczek, P., Y. Tsai, N. Lindquist, H. Anzt, and J. Dongarra, Scalable Data Generation for Evaluating Mixed-Precision Solvers,” 2020 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, USA, IEEE, September 2020.  (1.3 MB)
Luszczek, P., J. Kurzak, and J. Dongarra, Looking Back at Dense Linear Algebra Software,” Perspectives on Parallel and Distributed Processing: Looking Back and What's Ahead (to appear), 00 2012.  (235.91 KB)
Luszczek, P., J. Kurzak, I. Yamazaki, and J. Dongarra, Towards Numerical Benchmark for Half-Precision Floating Point Arithmetic,” 2017 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, IEEE, September 2017.  (1.67 MB)
Luszczek, P., H. Ltaeif, and J. Dongarra, Two-stage Tridiagonal Reduction for Dense Symmetric Matrices using Tile Algorithms on Multicore Architectures,” IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 2011.

Pages