Publications

Gustavson, F. G., J. Wasniewski, and J. Dongarra, “Level-3 Cholesky Kernel Subroutine of a Fully Portable High Performance Minimal Storage Hybrid Format Cholesky Algorithm,” ACM TOMS (submitted), also LAPACK Working Note (LAWN) 211, 00 2010.

(190.2 KB)

Gustavson, F. G., J. Wasniewski, J. Dongarra, and J. Langou, “Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion,” ACM TOMS (to appear), 00 2009.

(896.03 KB)

Gustavson, F. G., J. Wasniewski, J. Dongarra, and J. Langou, “Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion,” ACM Transactions on Mathematical Software (TOMS), vol. 37, no. 2, April 2010.

(896.03 KB)

Gustavson, F. G., J. Wasniewski, and J. Dongarra, “Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion,” University of Tennessee Computer Science Technical Report, UT-CS-08-614 (also LAPACK Working Note 199), April 2008.

(896.03 KB)

Gustavson, F. G., J. Wasniewski, J. Dongarra, and J. Langou, “Rectangular Full Packed Format for Cholesky’s Algorithm: Factorization, Solution, and Inversion,” ACM Transactions on Mathematical Software (TOMS), vol. 37, no. 2, Atlanta, GA, April 2010.

(896.03 KB)

Gustavson, F. G., J. Wasniewski, J. Dongarra, J. Herrero, and J. Langou, “Level-3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms,” ACM Transactions on Mathematical Software (TOMS), vol. 39, issue 2, February 2013.

(439.46 KB)

Guidry, M., and A. Haidar, On the Design, Autotuning, and Optimization of GPU Kernels for Kinetic Network Simulations Using Fast Explicit Integration and GPU Batched Computation , Oak Ridge, TN, Joint Institute for Computational Sciences Seminar Series, Presentation, September 2015.

(17.25 MB)

Grützmacher, T., H. Anzt, and E. S. Quintana‐Ortí, “Using Ginkgo's memory accessor for improving the accuracy of memory‐bound low precision BLAS,” Software: Practice and Experience, vol. 532, issue 1, pp. 81 - 98, January Jan.

Gruetzmacher, T., T. Cojean, G. Flegar, F. Göbel, and H. Anzt, “A Customized Precision Format Based on Mantissa Segmentation for Accelerating Sparse Linear Algebra,” Concurrency and Computation: Practice and Experience, vol. 40319, issue 262, January 2019.

Abdelfattah, A., S. Tomov, and J. Dongarra, “Batch QR Factorization on GPUs: Design, Optimization, and Tuning,” Lecture Notes in Computer Science, vol. 13350, Cham, Springer International Publishing, June 2022.

Graham, R. L., G. Bosilca, and J. Pjesivac–Grbovic, “A Comparison of Application Performance Using Open MPI and Cray MPI,” Cray User Group, CUG 2007, May 2007.

(248.83 KB)

Graham, R. L., G. M. Shipman, B. Barrett, R. Castain, G. Bosilca, and A. Lumsdaine, “A High-Performance, Heterogeneous MPI,” HeteroPar 2006, Barcelona, Spain, September 2006.

(193.73 KB)

Graham, R. L., R. Brightwell, B. Barrett, G. Bosilca, and J. Pjesivac–Grbovic, “An Evaluation of Open MPI's Matching Transport Layer on the Cray XT,” EuroPVM/MPI 2007, September 2007.

(369.01 KB)

Goebel, F., H. Anzt, T. Cojean, G. Flegar, and E. S. Quintana-Orti, “Multiprecision Block-Jacobi for Iterative Triangular Solves,” European Conference on Parallel Processing (Euro-Par 2020): Springer, August 2020.

Giraud, L., J. Langou, and G.. Sylvand, “On the Parallel Solution of Large Industrial Wave Propagation Problems,” Journal of Computational Acoustics (to appear), January 2005.

(1.08 MB)

Giraud, L., J. Langou, M. Rozložník, and J. van den Eshof, “Rounding Error Analysis of the Classical Gram-Schmidt Orthogonalization Process,” Numerische Mathematik, vol. 101, no. 1, pp. 87-100, January 2005.

(157.48 KB)

Giraud, L., A. Haidar, and Y. Saad, “Sparse approximations of the Schur complement for parallel algebraic hybrid solvers in 3D,” Numerical Mathematics: Theory, Methods and Applications, vol. 3, no. 3, Beijing, Golbal Science Press, pp. 64-82, 00 2010.

Giraud, L., A. Haidar, and S. Pralet, “Using multiple levels of parallelism to enhance the performance of domain decomposition solvers,” Parallel Computing, vol. 36, no. 5-6: Elsevier journals, pp. 285-296, 00 2010.

(418.57 KB)

Ghysels, P., S. Li, A. YarKhan, and J. Dongarra, “Initial Integration and Evaluation of SLATE and STRUMPACK,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-11: University of Tennessee, December 2018.

(249.78 KB)

Gerndt, M., and K. Fürlinger, “Specification and detection of performance problems with ASL,” Concurrency and Computation: Practice and Experience, vol. 19, no. 11: John Wiley and Sons Ltd., pp. 1451-1464, January 2007.

Genet, D., A. Guermouche, and G. Bosilca, “Assembly Operations for Multicore Architectures using Task-Based Runtime Systems,” Euro-Par 2014, Porto, Portugal, Springer International Publishing, August 2014.

(481.52 KB)

Gates, M., J. Kurzak, P. Luszczek, Y. Pei, and J. Dongarra, “Autotuning Batch Cholesky Factorization in CUDA with Interleaved Layout of Matrices,” Parallel and Distributed Processing Symposium Workshops (IPDPSW), Orlando, FL, IEEE, June 2017.

Gates, M., J. Kurzak, A. YarKhan, A. Charara, J. Finney, D. Sukkari, M. Al Farhan, I. Yamazaki, P. Wu, and J. Dongarra, SLATE Tutorial , Houston, TX, 2020 ECP Annual Meeting, February 2020.

(12.14 MB)

Gates, M., S. Tomov, and J. Dongarra, “Accelerating the SVD Two Stage Bidiagonal Reduction and Divide and Conquer Using GPUs,” Parallel Computing, vol. 74, pp. 3–18, May 2018.

(1.34 MB)

Gates, M., M. Al Farhan, A. Charara, J. Kurzak, D. Sukkari, A. YarKhan, and J. Dongarra, “SLATE Working Note 13: Implementing Singular Value and Symmetric/Hermitian Eigenvalue Solvers,” SLATE Working Notes, no. 13, ICL-UT-19-07: Innovative Computing Laboratory, University of Tennessee, September 2019.

(3.47 MB)

Gates, M., A. Charara, J. Kurzak, A. YarKhan, M. Al Farhan, D. Sukkari, and J. Dongarra, SLATE: Software for Linear Algebra Targeting Exascale (POSTER) , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.

(546.56 KB)

Gates, M., A. Haidar, and J. Dongarra, “Accelerating Eigenvector Computation in the Nonsymmetric Eigenvalue Problem,” VECPAR 2014, Eugene, OR, June 2014.

(199.44 KB)

Gates, M., A. Charara, J. Kurzak, A. YarKhan, M. Al Farhan, D. Sukkari, and J. Dongarra, “SLATE Users' Guide,” SLATE Working Notes, no. 10, ICL-UT-19-01: Innovative Computing Laboratory, University of Tennessee, July 2020.

(1.51 MB)

Gates, M., J. Kurzak, A. Charara, A. YarKhan, and J. Dongarra, SLATE: Design of a Modern Distributed and Accelerated Linear Algebra Library , Denver, CO, International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), November 2019.

(16.19 MB)

Gates, M., P. Luszczek, A. Abdelfattah, J. Kurzak, J. Dongarra, K. Arturov, C. Cecka, and C. Freitag, “C++ API for BLAS and LAPACK,” SLATE Working Notes, no. 02, ICL-UT-17-03: Innovative Computing Laboratory, University of Tennessee, June 2017.

(1.12 MB)

Gates, M., S. Tomov, H. Anzt, P. Luszczek, and J. Dongarra, Clover: Computational Libraries Optimized via Exascale Research , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.

(872 KB)

Gates, M., A. Charara, A. YarKhan, D. Sukkari, M. Al Farhan, and J. Dongarra, “Performance Tuning SLATE,” SLATE Working Notes, no. 14, ICL-UT-20-01: Innovative Computing Laboratory, University of Tennessee, January 2020.

(1.29 MB)

Gates, M., S. Tomov, and A. Haidar, “Comparing Hybrid CPU-GPU and Native GPU-only Acceleration for Linear Algebra,” 2015 SIAM Conference on Applied Linear Algebra, Atlanta, GA, SIAM, October 2015.

(4.7 MB)

Gates, M., MAGMA Tutorial , Atlanta, GA, Keeneland Workshop, February 2012.

(2.47 MB)

Gates, M., A. Charara, J. Kurzak, A. YarKhan, I. Yamazaki, and J. Dongarra, “Least Squares Performance Report,” SLATE Working Notes, no. 09, ICL-UT-18-10: Innovative Computing Laboratory, University of Tennessee, December 2018.

(1.76 MB)

Gates, M., J. Kurzak, A. Charara, A. YarKhan, and J. Dongarra, “SLATE: Design of a Modern Distributed and Accelerated Linear Algebra Library,” International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), Denver, CO, ACM, November 2019.

(2.01 MB)

Gates, M., H. Anzt, J. Kurzak, and J. Dongarra, “Accelerating Collaborative Filtering for Implicit Feedback Datasets using GPUs,” 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, IEEE, November 2015.

(1.02 MB)

Gao, Y., L-C. Canon, Y. Robert, and F. Vivien, “Scheduling Independent Stochastic Tasks on Heterogeneous Cloud Platforms,” IEEE Cluster 2019, Albuquerque, New Mexico, IEEE Computer Society Press, September 2019.

(651 KB)

Gao, Y., G. Pallez, Y. Robert, and F. Vivien, “Evaluating Task Dropping Strategies for Overloaded Real-Time Systems (Work-In-Progress),” 42nd Real Time Systems Symposium (RTSS): IEEE Computer Society Press, 2021.

(217.13 KB)

Gamblin, T., P. Beckman, K. Keahey, K. Sato, M. Kondo, and G. Balazs, “BDEC2 Platform White Paper,” Innovative Computing Laboratory Technical Report, no. ICL-UT-19-11: University of Tennessee, September 2019.

(30.16 KB)

Gainaru, A., B. Goglin, V. Honoré, P. Raghavan, G. Pallez, P. Raghavan, Y. Robert, and H. Sun, “Reservation and Checkpointing Strategies for Stochastic Jobs,” 34th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2020), New Orleans, LA, IEEE Computer Society Press, May 2020.

(692.4 KB)

Gabriel, E., G. Fagg, and J. Dongarra, “Evaluating The Performance Of MPI-2 Dynamic Communicators And One-Sided Communication,” Lecture Notes in Computer Science, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 10th European PVM/MPI User's Group Meeting, vol. 2840, Venice, Italy, Springer-Verlag, Berlin, pp. 88-97, September 2003.

(254.08 KB)

Gabriel, E., G. Fagg, A. Bukovsky, T. Angskun, and J. Dongarra, “A Fault-Tolerant Communication Library for Grid Environments,” 17th Annual ACM International Conference on Supercomputing (ICS'03) International Workshop on Grid Computing and e-Science, San Francisco, June 2003.

(377.14 KB)