Publications

Shamis, P.., M G. Venkata, M. G. Lopez, M.. B. Baker, O.. Hernandez, Y.. Itigin, M.. Dubman, G.. Shainer, R.. L. Graham, L.. Liss, et al., “UCX: An Open Source Framework for HPC Network APIs and Beyond,” 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, Santa Clara, CA, USA, IEEE, pp. 40-43, 2015.

Danalis, A., H. Jagode, D. Barry, and J. Dongarra, Understanding Native Event Semantics , Knoxville, TN, 9th JLESC Workshop, April 2019.

(2.33 MB)

Li, J., B. Nicolae, J. M. Wozniak, and G. Bosilca, “Understanding Scalability and Fine-Grain Parallelism of Synchronous Data Parallel Training,” 2019 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC), Denver, CO, IEEE, November 2019.

(696.89 KB)

Haidar, A., C. Cao, J. Dongarra, P. Luszczek, and S. Tomov, “Unified Development for Mixed Multi-GPU and Multi-Coprocessor Environments using a Lightweight Runtime Environment,” IPDPS 2014, Phoenix, AZ, IEEE, May 2014.

(1.51 MB)

Bosilca, G., A. Bouteiller, T. Herault, P. Lemariner, N. Ohm Saengpatsa, S. Tomov, and J. Dongarra, “A Unified HPC Environment for Hybrid Manycore/GPU Distributed Systems,” IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 2011.

Bosilca, G., A. Bouteiller, E. Brunet, F. Cappello, J. Dongarra, A. Guermouche, T. Herault, Y. Robert, F. Vivien, and D. Zaidouni, “Unified Model for Assessing Checkpointing Protocols at Extreme-Scale,” University of Tennessee Computer Science Technical Report (also LAWN 269), no. UT-CS-12-697, June 2012.

(2.76 MB)

Bosilca, G., A. Bouteiller, E. Brunet, F. Cappello, J. Dongarra, A. Guermouche, T. Herault, Y. Robert, F. Vivien, and D. Zaidouni, “Unified Model for Assessing Checkpointing Protocols at Extreme-Scale,” Concurrency and Computation: Practice and Experience, November 2013.

(894.61 KB)

Aliaga, J. I., H. Anzt, M. Castillo, J. C. Fernández, G. León, J. Pérez, and E. S. Quintana-Orti, “Unveiling the Performance-energy Trade-off in Iterative Linear System Solvers for Multithreaded Processors,” Concurrency and Computation: Practice and Experience, vol. 27, issue 4, pp. 885-904, September 2014.

(1.83 MB)

Blackford, S., J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, et al., “An Updated Set of Basic Linear Algebra Subprograms (BLAS),” ACM Transactions on Mathematical Software, vol. 28, no. 2, pp. 135-151, December 2002.

(228.33 KB)

Anzt, H., E. Chow, J. Saak, and J. Dongarra, “Updating Incomplete Factorization Preconditioners for Model Order Reduction,” Numerical Algorithms, vol. 73, issue 3, no. 3, pp. 611–630, February 2016.

(565.34 KB)

Wolf, F., B. Wylie, E. Abraham, W. Frings, K. Fürlinger, M. Geimer, M-A. Hermanns, B. Mohr, S. Moore, and M. Pfeifer, “Usage of the Scalasca Toolset for Scalable Performance Analysis of Large-scale Parallel Applications,” Proceedings of the 2nd International Workshop on Tools for High Performance Computing, Stuttgart, Germany, Springer, pp. 157-167, January 2008.

(229.2 KB)

Voemel, C., S. Tomov, L-W. Wang, O. Marques, and J. Dongarra, “The use of bulk states to accelerate the band edge state calculation of a semiconductor quantum dot,” Journal of Computational Physics (submitted), January 2006.

(337.08 KB)

Voemel, C., S. Tomov, L-W. Wang, O. Marques, and J. Dongarra, “The Use of Bulk States to Accelerate the Band Edge State Calculation of a Semiconductor Quantum Dot,” Journal of Computational Physics, vol. 223, pp. 774-782, 00 2007.

(452.6 KB)

Bland, W., “User Level Failure Mitigation in MPI,” Euro-Par 2012: Parallel Processing Workshops, vol. 7640, Rhodes Island, Greece, Springer Berlin Heidelberg, pp. 499-504, August 2012.

(136.15 KB)

Moore, S., and J. Ralph, “User-Defined Events for Hardware Performance Monitoring,” Procedia Computer Science, vol. 4: Elsevier, pp. 2096-2104, May 2011.

(361.76 KB)

Agrawal, S., D. Arnold, S. Blackford, J. Dongarra, M. Miller, K. Sagi, Z. Shi, K. Seymour, and S. Vadhiyar, “Users' Guide to NetSolve v1.4.1,” ICL Technical Report, no. ICL-UT-02-05, June 2002.

(328.01 KB)

Lindquist, N., P. Luszczek, and J. Dongarra, “Using Additive Modifications in LU Factorization Instead of Pivoting,” 37th ACM International Conference on Supercomputing (ICS'23), Orlando, FL, ACM, June 2023.

(624.18 KB)

Zhong, D., Q. Cao, G. Bosilca, and J. Dongarra, “Using Advanced Vector Extensions AVX-512 for MPI Reduction,” EuroMPI/USA '20: 27th European MPI Users' Group Meeting, Austin, TX, September 2020.

(634.45 KB)

Zhong, D., G. Bosilca, Q. Cao, and J. Dongarra, Using Advanced Vector Extensions AVX-512 for MPI Reduction (Poster) , Austin, TX, EuroMPI/USA '20: 27th European MPI Users' Group Meeting, September 2020.

(708.68 KB)

Zhong, D., P. Shamis, Q. Cao, G. Bosilca, and J. Dongarra, “Using Arm Scalable Vector Extension to Optimize Open MPI,” 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID 2020), Melbourne, Australia, IEEE/ACM, May 2020.

(359.95 KB)

Baboulin, M., and S. Gratton, “Using dual techniques to derive componentwise and mixed condition numbers for a linear functional of a linear least squares solution,” University of Tennessee Computer Science Technical Report, UT-CS-08-622 (also LAPACK Working Note 207), January 2008.

(159.65 KB)

Grützmacher, T., H. Anzt, and E. S. Quintana‐Ortí, “Using Ginkgo's memory accessor for improving the accuracy of memory‐bound low precision BLAS,” Software: Practice and Experience, vol. 532, issue 1, pp. 81 - 98, January Jan.

Haidar, A., S. Tomov, A. Abdelfattah, M. Zounon, and J. Dongarra, “Using GPU FP16 Tensor Cores Arithmetic to Accelerate Mixed-Precision Iterative Refinement Solvers and Reduce Energy Consumption,” ISC High Performance (ISC'18), Best Poster, Frankfurt, Germany, June 2018.

(3.01 MB)

Haidar, A., S. Tomov, A. Abdelfattah, M. Zounon, and J. Dongarra, Using GPU FP16 Tensor Cores Arithmetic to Accelerate Mixed-Precision Iterative Refinement Solvers and Reduce Energy Consumption , Frankfurt, Germany, ISC High Performance (ISC18), Best Poster Award, June 2018.

(3.01 MB)

Fürlinger, K., J. Dongarra, and M. Gerndt, “On Using Incremental Profiling for the Performance Analysis of Shared Memory Parallel Applications,” Proceedings of the 13th International Euro-Par Conference on Parallel Processing (Euro-Par '07), Rennes, France, Springer LNCS, January 2007.

Chow, E., H. Anzt, J. Scott, and J. Dongarra, “Using Jacobi Iterations and Blocking for Solving Sparse Triangular Systems in Incomplete Factorization Preconditioning,” Journal of Parallel and Distributed Computing, vol. 119, pp. 219–230, November 2018.

(273.53 KB)

Zhong, D., Q. Cao, G. Bosilca, and J. Dongarra, “Using long vector extensions for MPI reductions,” Parallel Computing, vol. 109, pp. 102871, March 2022.

Tomov, S., M. Faverge, P. Luszczek, and J. Dongarra, “Using MAGMA with PGI Fortran,” PGI Insider, November 2010.

(176.67 KB)

Buttari, A., J. Dongarra, J. Kurzak, P. Luszczek, and S. Tomov, “Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy,” ACM Transactions on Mathematical Software, vol. 34, no. 4, pp. 17-22, 00 2008.

(364.48 KB)

Giraud, L., A. Haidar, and S. Pralet, “Using multiple levels of parallelism to enhance the performance of domain decomposition solvers,” Parallel Computing, vol. 36, no. 5-6: Elsevier journals, pp. 285-296, 00 2010.

(418.57 KB)

Dongarra, J., K. London, S. Moore, P. Mucci, and D. Terpstra, “Using PAPI for Hardware Performance Monitoring on Linux Systems,” Conference on Linux Clusters: The HPC Revolution, Urbana, Illinois, Linux Clusters Institute, June 2001.

(422.35 KB)

Tsai, Y., P. Luszczek, and J. Dongarra, Using Quantized Integer in LU Factorization with Partial Pivoting (Poster) , Seattle, WA, SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP20), February 2020.

(6.65 MB)

Eberius, D., T. Patinyasakdikul, and G. Bosilca, “Using Software-Based Performance Counters to Expose Low-Level Open MPI Performance Information,” EuroMPI, Chicago, IL, ACM, September 2017.

(745.58 KB)

McCraw, H., A. Danalis, G. Bosilca, J. Dongarra, K. Kowalski, and T. Windus, “Utilizing Dataflow-based Execution for Coupled Cluster Methods,” 2014 IEEE International Conference on Cluster Computing, no. ICL-UT-14-02, Madrid, Spain, IEEE, September 2014.

(260.23 KB)