Publications

Cheng, X., A. Soma, E. D'Azevedo, K. Wong, and S. Tomov, Accelerating 2D FFT: Exploit GPU Tensor Cores through Mixed-Precision , Dallas, TX, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), ACM Student Research Poster, November 2018.

(740.37 KB)

Gates, M., H. Anzt, J. Kurzak, and J. Dongarra, “Accelerating Collaborative Filtering for Implicit Feedback Datasets using GPUs,” 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, IEEE, November 2015.

(1.02 MB)

Gates, M., A. Haidar, and J. Dongarra, “Accelerating Eigenvector Computation in the Nonsymmetric Eigenvalue Problem,” VECPAR 2014, Eugene, OR, June 2014.

(199.44 KB)

Ayala, A., S. Tomov, A. Haidar, M.. Stoyanov, S. Cayrols, J. Li, G. Bosilca, and J. Dongarra, Accelerating FFT towards Exascale Computing : NVIDIA GPU Technology Conference (GTC2021), 2021.

(27.23 MB)

Abdulah, S., Q. Cao, Y. Pei, G. Bosilca, J. Dongarra, M. G. Genton, D. E. Keyes, H. Ltaief, and Y. Sun, “Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC,” IEEE Transactions on Parallel and Distributed Systems, vol. 33, issue 4, pp. 964 - 976, April 2022.

Nath, R., S. Tomov, and J. Dongarra, “Accelerating GPU Kernels for Dense Linear Algebra,” Proc. of VECPAR'10, Berkeley, CA, June 2010.

(615.07 KB)

Tomov, S., G. Bosilca, and C. Augonnet, Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers : 2010 Symposium on Application Accelerators in. High-Performance Computing (SAAHPC'10), Tutorial, July 2010.

(499.51 KB)

Tomov, S., M. Gates, and A. Haidar, Accelerating Linear Algebra with MAGMA , Knoxville, TN, ECP Annual Meeting 2018, Tutorial, February 2018.

(35.27 MB)

Baboulin, M., J. Dongarra, J. Herrmann, and S. Tomov, “Accelerating Linear System Solutions Using Randomization Techniques,” INRIA RR-7616 / LAWN #246 (presented at International AMMCS’11), Waterloo, Ontario, Canada, July 2011.

(358.79 KB)

Baboulin, M., J. Dongarra, J. Herrmann, and S. Tomov, “Accelerating Linear System Solutions Using Randomization Techniques,” ACM Transactions on Mathematical Software (also LAWN 246), vol. 39, issue 2, February 2013.

(358.79 KB)

Ayala, A., S. Tomov, M. Stoyanov, A. Haidar, and J. Dongarra, “Accelerating Multi - Process Communication for Parallel 3-D FFT,” 2021 Workshop on Exascale MPI (ExaMPI), St. Louis, MO, USA, IEEE, December 2021.

Dongarra, J., M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, and I. Yamazaki, “Accelerating Numerical Dense Linear Algebra Calculations with GPUs,” Numerical Computations with GPUs: Springer International Publishing, pp. 3-28, 2014.

(1.06 MB)

Jagode, H., A. Danalis, G. Bosilca, and J. Dongarra, “Accelerating NWChem Coupled Cluster through dataflow-based Execution,” 11th International Conference on Parallel Processing and Applied Mathematics (PPAM 2015), Krakow, Poland, Springer International Publishing, September 2015.

(452.82 KB)

Jagode, H., A. Danalis, and J. Dongarra, “Accelerating NWChem Coupled Cluster through Dataflow-Based Execution,” The International Journal of High Performance Computing Applications, pp. 1–13, January 2017.

(4.07 MB)

Jagode, H., A. Danalis, and J. Dongarra, “Accelerating NWChem Coupled Cluster through dataflow-based Execution,” The International Journal of High Performance Computing Applications, vol. 32, issue 4, pp. 540--551, July 2018.

(1.68 MB)

Lindquist, N., P. Luszczek, and J. Dongarra, “Accelerating Restarted GMRES with Mixed Precision Arithmetic,” IEEE Transactions on Parallel and Distributed Systems, June 2021.

(572.4 KB)

Baboulin, M., A. Buttari, J. Dongarra, J. Kurzak, J. Langou, J. Langou, P. Luszczek, and S. Tomov, “Accelerating Scientific Computations with Mixed Precision Algorithms,” Computer Physics Communications, vol. 180, issue 12, pp. 2526-2533, December 2009.

(402.69 KB)

Haidar, A., A. Abdelfattah, V. Dobrev, I. Karlin, T. Kolev, S. Tomov, and J. Dongarra, Accelerating Tensor Contractions for High-Order FEM on CPUs, GPUs, and KNLs , Gatlinburg, TN, moky Mountains Computational Sciences and Engineering Conference (SMC16), Poster, September 2016.

(4.29 MB)

Abdelfattah, A., M. Baboulin, V. Dobrev, J. Dongarra, C. Earl, J. Falcou, A. Haidar, I. Karlin, T. Kolev, I. Masliah, et al., Accelerating Tensor Contractions in High-Order FEM with MAGMA Batched , Atlanta, GA, SIAM Conference on Computer Science and Engineering (SIAM CSE17), Presentation, March 2017.

(9.29 MB)

Anzt, H., M. Baboulin, J. Dongarra, Y. Fournier, F. Hulsemann, A. Khabou, and Y. Wang, “Accelerating the Conjugate Gradient Algorithm with GPU in CFD Simulations,” VECPAR, 2016.

Anzt, H., S. Tomov, and J. Dongarra, “Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product,” University of Tennessee Computer Science Technical Report, no. UT-EECS-14-731: University of Tennessee, October 2014.

(1.83 MB)

Anzt, H., S. Tomov, and J. Dongarra, “Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product,” Spring Simulation Multi-Conference 2015 (SpringSim'15), Alexandria, VA, SCS, April 2015.

(1.46 MB)

Tomov, S., and J. Dongarra, “Accelerating the Reduction to Upper Hessenberg Form through Hybrid GPU-Based Computing,” University of Tennessee Computer Science Technical Report, UT-CS-09-642 (also LAPACK Working Note 219), May 2009.

(2.37 MB)

Tomov, S., R. Nath, and J. Dongarra, “Accelerating the Reduction to Upper Hessenberg, Tridiagonal, and Bidiagonal Forms through Hybrid GPU-Based Computing,” Parallel Computing, vol. 36, no. 12, pp. 645-654, 00 2010.

(1.39 MB)

Dong, T., A. Haidar, S. Tomov, and J. Dongarra, “Accelerating the SVD Bi-Diagonalization of a Batch of Small Matrices using GPUs,” Journal of Computational Science, vol. 26, pp. 237–245, May 2018.

(2.18 MB)

Gates, M., S. Tomov, and J. Dongarra, “Accelerating the SVD Two Stage Bidiagonal Reduction and Divide and Conquer Using GPUs,” Parallel Computing, vol. 74, pp. 3–18, May 2018.

(1.34 MB)

Demmel, J., J. Dongarra, A. Fox, S. Williams, V. Volkov, and K. Yelick, “Accelerating Time-To-Solution for Computational Science and Engineering,” SciDAC Review, 00 2009.

(739.11 KB)

Anzt, H., W. Sawyer, S. Tomov, P. Luszczek, and J. Dongarra, “Acceleration of GPU-based Krylov solvers via Data Transfer Reduction,” International Journal of High Performance Computing Applications, 2015.

Dong, T., T. Kolev, R. Rieben, V. Dobrev, S. Tomov, and J. Dongarra, “Acceleration of the BLAST Hydro Code on GPU,” Supercomputing '12 (poster), Salt Lake City, Utah, SC12, November 2012.

Yamazaki, I., T. Mary, J. Kurzak, S. Tomov, and J. Dongarra, “Access-averse Framework for Computing Low-rank Matrix Approximations,” First International Workshop on High Performance Big Graph Data Management, Analysis, and Mining, Washington, DC, October 2014.

Dongarra, J., S. Moore, P. Mucci, K. Seymour, and H. You, “Accurate Cache and TLB Characterization Using Hardware Counters,” International Conference on Computational Science (ICCS 2004), Krakow, Poland, Springer, June 2004.

(167.1 KB)

Dongarra, J., M. Faverge, H. Ltaeif, and P. Luszczek, “Achieving Numerical Accuracy and High Performance using Recursive Tile LU Factorization,” University of Tennessee Computer Science Technical Report (also as a LAWN), no. ICL-UT-11-08, September 2011.

(618.53 KB)

Dongarra, J., M. Faverge, H. Ltaeif, and P. Luszczek, “Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting,” Concurrency and Computation: Practice and Experience, vol. 26, issue 7, pp. 1408-1431, May 2014.

(1.96 MB)

Beck, M., J. Dongarra, J. Huang, T. Moore, and J. Plank, “Active Logistical State Management in the GridSolve/L,” 4th International Symposium on Cluster Computing and the Grid (CCGrid 2004)(submitted), Chicago, Illinois, January 2004.

(123.69 KB)

Moore, S., A.J.. Baker, J. Dongarra, C. Halloy, and C. Ng, “Active Netlib: An Active Mathematical Software Collection for Inquiry-based Computational Science and Engineering Education,” Journal of Digital Information special issue on Interactivity in Digital Libraries, vol. 2, no. 4, 00 2002.

(182.59 KB)

Luo, X., W. Wu, G. Bosilca, T. Patinyasakdikul, L. Wang, and J. Dongarra, “ADAPT: An Event-Based Adaptive Collective Communication Framework,” The 27th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '18), Tempe, Arizona, ACM Press, June 2018.

(493.65 KB)

Anzt, H., J. Dongarra, G. Flegar, N. J. Higham, and E. S. Quintana-Orti, “Adaptive Precision in Block-Jacobi Preconditioning for Iterative Sparse Linear System Solvers,” Concurrency and Computation: Practice and Experience, vol. 31, no. 6, pp. e4460, March 2019.

(341.54 KB)

Anzt, H., J. Dongarra, and E. S. Quintana-Orti, “Adaptive Precision Solvers for Sparse Linear Systems,” 3rd International Workshop on Energy Efficient Supercomputing (E2SC '15), Austin, TX, ACM, November 2015.

Casanova, H., M H. Kim, J. Plank, and J. Dongarra, “Adaptive Scheduling for Task Farming with Grid Middleware,” International Journal of Supercomputer Applications and High-Performance Computing, vol. 13, no. 3, pp. 231-240, October 2002.

(461.08 KB)

Abdelfattah, A., P. Ghysels, W. Boukaram, S. Tomov, X. Sherry Li, and J. Dongarra, “Addressing Irregular Patterns of Matrix Computations on GPUs and Their Impact on Applications Powered by Sparse Direct Solvers,” 2022 International Conference for High Performance Computing, Networking, Storage and Analysis (SC22), Dallas, TX, IEEE Computer Society, pp. 354-367, November 2022.

(1.57 MB)

Thiyagalingam, J., G. von Laszewski, J. Yin, M. Emani, J. Papay, G. Barrett, P. Luszczek, A. Tsaris, C. Kirkpatrick, F. Wang, et al., “AI Benchmarking for Science: Efforts from the MLCommons Science Working Group,” Lecture Notes in Computer Science, vol. 13387: Springer International Publishing, pp. 47 - 64, January 2023.

Song, F., F. Wolf, N. Bhatia, J. Dongarra, and S. Moore, “An Algebra for Cross-Experiment Performance Analysis,” 2004 International Conference on Parallel Processing (ICCP-04), Montreal, Quebec, Canada, August 2004.

(166.12 KB)

Agullo, E., L. Giraud, A. Guermouche, A. Haidar, S. Lanteri, and J. Roman, “Algebraic Schwarz Preconditioning for the Schur Complement: Application to the Time-Harmonic Maxwell Equations Discretized by a Discontinuous Galerkin Method.,” The Twentieth International Conference on Domain Decomposition Methods, La Jolla, California, February 2011.

Chen, Z., and J. Dongarra, “Algorithm-Based Checkpoint-Free Fault Tolerance for Parallel Matrix Computations on Volatile Resources,” University of Tennessee Computer Science Department Technical Report, vol. –05-561, November 2005.

(266.54 KB)

Chen, Z., and J. Dongarra, “Algorithm-Based Checkpoint-Free Fault Tolerance for Parallel Matrix Computations on Volatile Resources,” IPDPS 2006, 20th IEEE International Parallel and Distributed Processing Symposium, Rhodes Island, Greece, January 2006.

(266.54 KB)

Du, P., A. Bouteiller, G. Bosilca, T. Herault, and J. Dongarra, “Algorithm-Based Fault Tolerance for Dense Matrix Factorization,” Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2012, New Orleans, LA, USA, ACM, pp. 225-234, February 2012.

(865.79 KB)

Du, P., A. Bouteiller, G. Bosilca, T. Herault, and J. Dongarra, “Algorithm-based Fault Tolerance for Dense Matrix Factorizations,” University of Tennessee Computer Science Technical Report, no. UT-CS-11-676, Knoxville, TN, August 2011.

(865.79 KB)

Bouteiller, A., T. Herault, G. Bosilca, P. Du, and J. Dongarra, “Algorithm-based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures, and Accuracy,” ACM Transactions on Parallel Computing, vol. 1, issue 2, no. 10, pp. 10:1-10:28, January 2015.

(1.14 MB)

Chen, Z., and J. Dongarra, “Algorithm-Based Fault Tolerance for Fail-Stop Failures,” IEEE Transactions on Parallel and Distributed Systems, vol. 19, no. 12, January 2008.

(340.49 KB)

Bosilca, G., R. Delmas, J. Dongarra, and J. Langou, “Algorithmic Based Fault Tolerance Applied to High Performance Computing,” University of Tennessee Computer Science Technical Report, UT-CS-08-620 (also LAPACK Working Note 205), January 2008.

(313.55 KB)

Main menu

Publications

Pages