Publications

2022

Abdelfattah, A., S. Tomov, and J. Dongarra, “Batch QR Factorization on GPUs: Design, Optimization, and Tuning,” Lecture Notes in Computer Science, vol. 13350, Cham, Springer International Publishing, June 2022.

Kashi, A., P. Nayak, D. Kulkarni, A. Scheinberg, P. Lin, and H. Anzt, “Batched sparse iterative solvers on GPU for the collision operator for fusion plasma simulations,” 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Lyon, France, IEEE, July 2022.

(1.26 MB)

2021

Caron, E., Y. Caniou, A K W. Chang, and Y. Robert, “Budget-aware scheduling algorithms for scientific workflows with stochastic task weights on IaaS Cloud platforms,” Concurrency and Computation: Practice and Experience, vol. 33, no. 17, pp. e6065, 2021.

(1.99 MB)

2019

Gamblin, T., P. Beckman, K. Keahey, K. Sato, M. Kondo, and G. Balazs, “BDEC2 Platform White Paper,” Innovative Computing Laboratory Technical Report, no. ICL-UT-19-11: University of Tennessee, September 2019.

(30.16 KB)

2018

Dongarra, J., I. Duff, M. Gates, A. Haidar, S. Hammarling, N. J. Higham, J. Hogg, P. Valero Lara, P. Luszczek, M. Zounon, et al., Batched BLAS (Basic Linear Algebra Subprograms) 2018 Specification , July 2018.

(483.05 KB)

Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, “Batched One-Sided Factorizations of Tiny Matrices Using GPUs: Challenges and Countermeasures,” Journal of Computational Science, vol. 26, pp. 226–236, May 2018.

(3.73 MB)

Marques, O., J. Demmel, and P. B. Vasconcelos, “Bidiagonal SVD Computation via an Associated Tridiagonal Eigenproblem,” LAPACK Working Note, no. LAWN 295, ICL-UT-18-02: University of Tennessee, April 2018.

(1.53 MB)

Asch, M., T. Moore, R. M. Badia, M. Beck, P. Beckman, T. Bidot, F. Bodin, F. Cappello, A. Choudhary, B. R. de Supinski, et al., “Big Data and Extreme-Scale Computing: Pathways to Convergence - Toward a Shaping Strategy for a Future Software and Data Ecosystem for Scientific Inquiry,” The International Journal of High Performance Computing Applications, vol. 32, issue 4, pp. 435–479, July 2018.

(1.29 MB)

Caniou, Y., E. Caron, A K W. Chang, and Y. Robert, “Budget-Aware Scheduling Algorithms for Scientific Workflows with Stochastic Task Weights on Heterogeneous IaaS Cloud Platforms,” 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Vancouver, BC, Canada, IEEE, May 2018.

(1.31 MB)

2017

Anzt, H., J. Dongarra, G. Flegar, and E. S. Quintana-Orti, “Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioner Generation on GPUs,” Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, New York, NY, USA, ACM, pp. 1–10, February 2017.

(552.62 KB)

“BDEC Pathways to Convergence: Toward a Shaping Strategy for a Future Software and Data Ecosystem for Scientific Inquiry,” Innovative Computing Laboratory Technical Report, no. ICL-UT-17-08: University of Tennessee, November 2017.

Faverge, M., J. Langou, Y. Robert, and J. Dongarra, “Bidiagonalization and R-Bidiagonalization: Parallel Tiled Algorithms, Critical Paths and Distributed-Memory Implementation,” IEEE International Parallel and Distributed Processing Symposium (IPDPS), Orlando, FL, IEEE, May 2017.

(328.15 KB)

Anzt, H., J. Dongarra, M. Gates, J. Kurzak, P. Luszczek, S. Tomov, and I. Yamazaki, “Bringing High Performance Computing to Big Data Algorithms,” Handbook of Big Data Technologies: Springer, 2017.

(1.22 MB)

2016

Anzt, H., E. Chow, T. Huckle, and J. Dongarra, “Batched Generation of Incomplete Sparse Approximate Inverses on GPUs,” Proceedings of the 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, pp. 49–56, November 2016.

Anzt, H., E. Chow, and J. Dongarra, “On block-asynchronous execution on GPUs,” LAPACK Working Note, no. 291, November 2016.

(1.05 MB)

2015

Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, “Batched Matrix Computations on Hardware Accelerators,” EuroMPI/Asia 2015 Workshop, Bordeaux, France, September 2015.

(589.05 KB)

Haidar, A., A. Abdelfattah, S. Tomov, and J. Dongarra, “Batched Matrix Computations on Hardware Accelerators Based on GPUs,” 2015 SIAM Conference on Applied Linear Algebra (SIAM LA), Atlanta, GA, SIAM, October 2015.

(9.36 MB)

Haidar, A., T. Dong, P. Luszczek, S. Tomov, and J. Dongarra, “Batched matrix computations on hardware accelerators based on GPUs,” International Journal of High Performance Computing Applications, February 2015.

(2.16 MB)

2013

McCraw, H., D. Terpstra, J. Dongarra, K. Davis, and R. Musselman, “Beyond the CPU: Hardware Performance Counter Monitoring on Blue Gene/Q,” International Supercomputing Conference 2013 (ISC'13), Leipzig, Germany, Springer, June 2013.

(624.58 KB)

Danalis, A., P. Luszczek, G. Marin, J. Vetter, and J. Dongarra, “BlackjackBench: Portable Hardware Characterization with Automated Results Analysis,” The Computer Journal, March 2013.

(408.45 KB)

Anzt, H., S. Tomov, J. Dongarra, and V. Heuveline, “A Block-Asynchronous Relaxation Method for Graphics Processing Units,” Journal of Parallel and Distributed Computing, vol. 73, issue 12, pp. 1613–1626, December 2013.

(1.08 MB)

2012

Anzt, H., S. Tomov, M. Gates, J. Dongarra, and V. Heuveline, “Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems,” ICCS 2012, Omaha, NE, June 2012.

(608.95 KB)

2011

Danalis, A., P. Luszczek, G. Marin, J. Vetter, and J. Dongarra, “BlackjackBench: Hardware Characterization with Portable Micro-Benchmarks and Automatic Statistical Analysis of Results,” IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 2011.

Anzt, H., S. Tomov, M. Gates, J. Dongarra, and V. Heuveline, Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems , no. UT-CS-11-689, December 2011.

(608.95 KB)

Anzt, H., S. Tomov, J. Dongarra, and V. Heuveline, “A Block-Asynchronous Relaxation Method for Graphics Processing Units,” University of Tennessee Computer Science Technical Report, no. UT-CS-11-687 / LAWN 258, November 2011.

(1.08 MB)

2010

Nath, R., S. Tomov, and J. Dongarra, “Blas for GPUs,” Scientific Computing with Multicore and Accelerators, Boca Raton, Florida, CRC Press, 2010.

(1.05 MB)

2007

Angskun, T., G. Bosilca, and J. Dongarra, “Binomial Graph: A Scalable and Fault- Tolerant Logical Network Topology,” Proceedings of The Fifth International Symposium on Parallel and Distributed Processing and Applications (ISPA07), Niagara Falls, Canada, Springer, August 2007.

(480.47 KB)

Dongarra, J., E. Jeannot, E. Saule, and Z. Shi, “Bi-objective Scheduling Algorithms for Optimizing Makespan and Reliability on Heterogeneous Systems,” 19th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA) (submitted), San Diego, CA, June 2007.

(223.82 KB)

2005

YarKhan, A., and J. Dongarra, “Biological Sequence Alignment on the Computational Grid Using the GrADS Framework,” Future Generation Computing Systems, vol. 21, no. 6: Elsevier, pp. 980-986, June 2005.

(147.29 KB)

2004

Fagg, G., and J. Dongarra, “Building and using a Fault Tolerant MPI implementation,” International Journal of High Performance Applications and Supercomputing (to appear), 00 2004.

2002

“Basic Linear Algebra Subprograms Technical (BLAST) Forum Standard,” International Journal of High Performance Computing Applications: Special Issue - Part I & II, vol. 16, no. 1-2, pp. 1-199, January 2002.

Dongarra, J., H. Meuer, H. D. Simon, and E. Strohmaier, “Biannual Top-500 Computer Lists Track Changing Environments for Scientific Computing,” SIAM News, vol. 34, no. 9, October 2002.

(2.62 MB)

2001

Blackford, S., J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, et al., “Basic Linear Algebra Subprograms (BLAS),” (an update), submitted to ACM TOMS, February 2001.

(228.33 KB)

Basic Linear Algebra Subprograms Technical (BLAST) Forum Standard , January 2001.