Publications

Show only items where

Author

Type

Term

Year

Keyword

Export 1277 results:

Filters: 10.1007 is 978-3-030-90539-2 [Clear All Filters]

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Haidar, A., H. Bayraktar, S. Tomov, J. Dongarra, and N. J. Higham, “Mixed-Precision Solution of Linear Systems Using Accelerator-Based Computing,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-05: University of Tennessee, May 2020.

(1.03 MB)

Haidar, A., A. Abdelfattah, V. Dobrev, I. Karlin, T. Kolev, S. Tomov, and J. Dongarra, Accelerating Tensor Contractions for High-Order FEM on CPUs, GPUs, and KNLs , Gatlinburg, TN, moky Mountains Computational Sciences and Engineering Conference (SMC16), Poster, September 2016.

(4.29 MB)

Haidar, A., H. Ltaeif, P. Luszczek, and J. Dongarra, “A Comprehensive Study of Task Coalescing for Selecting Parallelism Granularity in a Two-Stage Bidiagonal Reduction,” IPDPS 2012, Shanghai, China, May 2012.

(480.43 KB)

Haidar, A., C. Cao, J. Dongarra, P. Luszczek, and S. Tomov, “Unified Development for Mixed Multi-GPU and Multi-Coprocessor Environments using a Lightweight Runtime Environment,” IPDPS 2014, Phoenix, AZ, IEEE, May 2014.

(1.51 MB)

Haidar, A., R. Solcà, M. Gates, S. Tomov, T. C. Schulthess, and J. Dongarra, “A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks,” International Journal of High Performance Computing Applications, vol. 28, issue 2, pp. 196-209, May 2014.

(1.74 MB)

Haidar, A., S. Tomov, A. Abdelfattah, M. Zounon, and J. Dongarra, Using GPU FP16 Tensor Cores Arithmetic to Accelerate Mixed-Precision Iterative Refinement Solvers and Reduce Energy Consumption , Frankfurt, Germany, ISC High Performance (ISC18), Best Poster Award, June 2018.

(3.01 MB)

Haidar, A., A. Abdelfattah, S. Tomov, and J. Dongarra, “Batched Matrix Computations on Hardware Accelerators Based on GPUs,” 2015 SIAM Conference on Applied Linear Algebra (SIAM LA), Atlanta, GA, SIAM, October 2015.

(9.36 MB)

Haidar, A., P. Luszczek, J. Kurzak, and J. Dongarra, “An Improved Parallel Singular Value Algorithm and Its Implementation for Multicore Hardware,” University of Tennessee Computer Science Technical Report (also LAWN 283), no. ut-eecs-13-720: University of Tennessee, October 2013.

(1.23 MB)

Haidar, A., H. Ltaeif, A. YarKhan, and J. Dongarra, “Analysis of Dynamically Scheduled Tile Algorithms for Dense Linear Algebra on Multicore Architectures,” University of Tennessee Computer Science Technical Report, UT-CS-11-666, (also Lawn 243), March 2011.

(1.65 MB)

Haidar, A., H. Jagode, A. YarKhan, P. Vaccaro, S. Tomov, and J. Dongarra, “Power-aware Computing: Measurement, Control, and Performance Analysis for Intel Xeon Phi,” 2017 IEEE High Performance Extreme Computing Conference (HPEC'17), Best Paper Finalist, Waltham, MA, IEEE, September 2017.

(908.84 KB)

Haidar, A., T. Dong, S. Tomov, P. Luszczek, and J. Dongarra, “Framework for Batched and GPU-resident Factorization Algorithms to Block Householder Transformations,” ISC High Performance, Frankfurt, Germany, Springer, July 2015.

(778.26 KB)

Haidar, A., S. Tomov, J. Dongarra, R. Solcà, and T. C. Schulthess, “Leading Edge Hybrid Multi-GPU Algorithms for Generalized Eigenproblems in Electronic Structure Calculations,” International Supercomputing Conference (ISC), Lecture Notes in Computer Science, vol. 7905, Leipzig, Germany, Springer Berlin Heidelberg, pp. 67-80, June 2013.

(2.14 MB)

Haidar, A., H. Jagode, P. Vaccaro, A. YarKhan, S. Tomov, and J. Dongarra, “Investigating Power Capping toward Energy-Efficient Scientific Applications,” Concurrency Computation: Practice and Experience, vol. 2018, issue e4485, pp. 1-14, April 2018.

(1.2 MB)

Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, “Towards Batched Linear Solvers on Accelerated Hardware Platforms,” 8th Workshop on General Purpose Processing Using GPUs (GPGPU 8) co-located with PPOPP 2015, San Francisco, CA, ACM, February 2015.

(403.74 KB)

Haidar, A., A. Abdelfattah, S. Tomov, and J. Dongarra, “High-performance Cholesky Factorization for GPU-only Execution,” Proceedings of the General Purpose GPUs (GPGPU-10), Austin, TX, ACM, February 2017.

(872.18 KB)

Haidar, A., H. Ltaeif, and J. Dongarra, “Toward High Performance Divide and Conquer Eigensolver for Dense Symmetric Matrices,” SIAM Journal on Scientific Computing (Accepted), July 2012.

Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, “Efficient Eigensolver Algorithms on Accelerator Based Architectures,” 2015 SIAM Conference on Applied Linear Algebra (SIAM LA), Atlanta, GA, SIAM, October 2015.

(6.98 MB)

Haidar, A., S. Tomov, A. Abdelfattah, M. Zounon, and J. Dongarra, “Using GPU FP16 Tensor Cores Arithmetic to Accelerate Mixed-Precision Iterative Refinement Solvers and Reduce Energy Consumption,” ISC High Performance (ISC'18), Best Poster, Frankfurt, Germany, June 2018.

(3.01 MB)

Hadri, B., H. Ltaeif, E. Agullo, and J. Dongarra, “Tall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures,” Innovative Computing Laboratory Technical Report (also LAPACK Working Note 222 and CS Tech Report UT-CS-09-645), no. ICL-UT-09-03, September 2009.

(464.23 KB)

Hadri, B., H. Ltaeif, E. Agullo, and J. Dongarra, “Tile QR Factorization with Parallel Panel Processing for Multicore Architectures,” accepted in 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2010), Atlanta, GA, December 2009.

Hadri, B., E. Agullo, and J. Dongarra, “Tile QR Factorization with Parallel Panel Processing for Multicore Architectures,” 24th IEEE International Parallel and Distributed Processing Symposium (submitted), 00 2010.

(313.98 KB)

Hadri, B., H. Ltaeif, E. Agullo, and J. Dongarra, “Enhancing Parallelism of Tile QR Factorization for Multicore Architectures,” Submitted to Transaction on Parallel and Distributed Systems, December 2009.

(464.23 KB)

Gustavson, F. G., J. Wasniewski, J. Dongarra, and J. Langou, “Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion,” ACM TOMS (to appear), 00 2009.

(896.03 KB)

Gustavson, F. G., J. Wasniewski, J. Dongarra, J. Herrero, and J. Langou, “Level-3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms,” ACM Transactions on Mathematical Software (TOMS), vol. 39, issue 2, February 2013.

(439.46 KB)

Gustavson, F. G., J. Wasniewski, and J. Dongarra, “Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion,” University of Tennessee Computer Science Technical Report, UT-CS-08-614 (also LAPACK Working Note 199), April 2008.

(896.03 KB)

Gustavson, F. G., J. Wasniewski, J. Dongarra, and J. Langou, “Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion,” ACM Transactions on Mathematical Software (TOMS), vol. 37, no. 2, April 2010.

(896.03 KB)

Gustavson, F. G., J. Wasniewski, J. Dongarra, and J. Langou, “Rectangular Full Packed Format for Cholesky’s Algorithm: Factorization, Solution, and Inversion,” ACM Transactions on Mathematical Software (TOMS), vol. 37, no. 2, Atlanta, GA, April 2010.

(896.03 KB)

Gustavson, F. G., J. Wasniewski, and J. Dongarra, “Level-3 Cholesky Kernel Subroutine of a Fully Portable High Performance Minimal Storage Hybrid Format Cholesky Algorithm,” ACM TOMS (submitted), also LAPACK Working Note (LAWN) 211, 00 2010.

(190.2 KB)

Guidry, M., and A. Haidar, On the Design, Autotuning, and Optimization of GPU Kernels for Kinetic Network Simulations Using Fast Explicit Integration and GPU Batched Computation , Oak Ridge, TN, Joint Institute for Computational Sciences Seminar Series, Presentation, September 2015.

(17.25 MB)

Grützmacher, T., H. Anzt, and E. S. Quintana‐Ortí, “Using Ginkgo's memory accessor for improving the accuracy of memory‐bound low precision BLAS,” Software: Practice and Experience, vol. 532, issue 1, pp. 81 - 98, January Jan.

Gruetzmacher, T., T. Cojean, G. Flegar, F. Göbel, and H. Anzt, “A Customized Precision Format Based on Mantissa Segmentation for Accelerating Sparse Linear Algebra,” Concurrency and Computation: Practice and Experience, vol. 40319, issue 262, January 2019.

Abdelfattah, A., S. Tomov, and J. Dongarra, “Batch QR Factorization on GPUs: Design, Optimization, and Tuning,” Lecture Notes in Computer Science, vol. 13350, Cham, Springer International Publishing, June 2022.

Graham, R. L., G. Bosilca, and J. Pjesivac–Grbovic, “A Comparison of Application Performance Using Open MPI and Cray MPI,” Cray User Group, CUG 2007, May 2007.

(248.83 KB)

Graham, R. L., R. Brightwell, B. Barrett, G. Bosilca, and J. Pjesivac–Grbovic, “An Evaluation of Open MPI's Matching Transport Layer on the Cray XT,” EuroPVM/MPI 2007, September 2007.

(369.01 KB)

Graham, R. L., G. M. Shipman, B. Barrett, R. Castain, G. Bosilca, and A. Lumsdaine, “A High-Performance, Heterogeneous MPI,” HeteroPar 2006, Barcelona, Spain, September 2006.

(193.73 KB)

Goebel, F., H. Anzt, T. Cojean, G. Flegar, and E. S. Quintana-Orti, “Multiprecision Block-Jacobi for Iterative Triangular Solves,” European Conference on Parallel Processing (Euro-Par 2020): Springer, August 2020.

Giraud, L., J. Langou, and G.. Sylvand, “On the Parallel Solution of Large Industrial Wave Propagation Problems,” Journal of Computational Acoustics (to appear), January 2005.

(1.08 MB)

Giraud, L., J. Langou, M. Rozložník, and J. van den Eshof, “Rounding Error Analysis of the Classical Gram-Schmidt Orthogonalization Process,” Numerische Mathematik, vol. 101, no. 1, pp. 87-100, January 2005.

(157.48 KB)

Giraud, L., A. Haidar, and Y. Saad, “Sparse approximations of the Schur complement for parallel algebraic hybrid solvers in 3D,” Numerical Mathematics: Theory, Methods and Applications, vol. 3, no. 3, Beijing, Golbal Science Press, pp. 64-82, 00 2010.

Giraud, L., A. Haidar, and S. Pralet, “Using multiple levels of parallelism to enhance the performance of domain decomposition solvers,” Parallel Computing, vol. 36, no. 5-6: Elsevier journals, pp. 285-296, 00 2010.

(418.57 KB)

Ghysels, P., S. Li, A. YarKhan, and J. Dongarra, “Initial Integration and Evaluation of SLATE and STRUMPACK,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-11: University of Tennessee, December 2018.

(249.78 KB)

Gerndt, M., and K. Fürlinger, “Specification and detection of performance problems with ASL,” Concurrency and Computation: Practice and Experience, vol. 19, no. 11: John Wiley and Sons Ltd., pp. 1451-1464, January 2007.

Genet, D., A. Guermouche, and G. Bosilca, “Assembly Operations for Multicore Architectures using Task-Based Runtime Systems,” Euro-Par 2014, Porto, Portugal, Springer International Publishing, August 2014.

(481.52 KB)

Gates, M., J. Kurzak, A. YarKhan, A. Charara, J. Finney, D. Sukkari, M. Al Farhan, I. Yamazaki, P. Wu, and J. Dongarra, SLATE Tutorial , Houston, TX, 2020 ECP Annual Meeting, February 2020.

(12.14 MB)

Gates, M., S. Tomov, and J. Dongarra, “Accelerating the SVD Two Stage Bidiagonal Reduction and Divide and Conquer Using GPUs,” Parallel Computing, vol. 74, pp. 3–18, May 2018.

(1.34 MB)

Gates, M., MAGMA Tutorial , Atlanta, GA, Keeneland Workshop, February 2012.

(2.47 MB)

Gates, M., J. Kurzak, A. Charara, A. YarKhan, and J. Dongarra, SLATE: Design of a Modern Distributed and Accelerated Linear Algebra Library , Denver, CO, International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), November 2019.

(16.19 MB)

Gates, M., S. Tomov, H. Anzt, P. Luszczek, and J. Dongarra, Clover: Computational Libraries Optimized via Exascale Research , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.

(872 KB)

Gates, M., A. Charara, J. Kurzak, A. YarKhan, M. Al Farhan, D. Sukkari, and J. Dongarra, SLATE: Software for Linear Algebra Targeting Exascale (POSTER) , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.

(546.56 KB)

Gates, M., M. Al Farhan, A. Charara, J. Kurzak, D. Sukkari, A. YarKhan, and J. Dongarra, “SLATE Working Note 13: Implementing Singular Value and Symmetric/Hermitian Eigenvalue Solvers,” SLATE Working Notes, no. 13, ICL-UT-19-07: Innovative Computing Laboratory, University of Tennessee, September 2019.

(3.47 MB)

Main menu

Publications

Pages