Export 287 results:
Filters: Author is Stanimire Tomov [Clear All Filters]
Non-GPU-resident Dense Symmetric Indefinite Factorization,” Concurrency and Computation: Practice and Experience, November 2016. DOI: 10.1002/cpe.4012“
A More Portable HeFFTe: Implementing a Fallback Algorithm for Scalable Fourier Transforms,” ICL Technical Report, no. ICL-UT-21-04: University of Tennessee, August 2021.“
Model-Driven One-Sided Factorizations on Multicore, Accelerated Systems,” Supercomputing Frontiers and Innovations, vol. 1, issue 1, 2014. DOI: http://dx.doi.org/10.14529/jsfi1401“
Mixed-Tool Performance Analysis on Hybrid Multicore Architectures,” First International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI 2010), San Diego, CA, September 2010.“
Mixed-Precision Solution of Linear Systems Using Accelerator-Based Computing,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-05: University of Tennessee, May 2020.“
Mixed-precision orthogonalization scheme and adaptive step size for CA-GMRES on GPUs,” VECPAR 2014 (Best Paper), Eugene, OR, June 2014.“
Mixed-precision orthogonalization process Performance on multicore CPUs with GPUs,” 2015 SIAM Conference on Applied Linear Algebra, Atlanta, GA, SIAM, October 2015.“
Mixed-Precision Iterative Refinement using Tensor Cores on GPUs to Accelerate Solution of Linear Systems,” Proceedings of the Royal Society A, vol. 476, issue 2243, November 2020. DOI: 10.1098/rspa.2020.0110“
Mixed-Precision Cholesky QR Factorization and its Case Studies on Multicore CPU with Multiple GPUs,” SIAM Journal on Scientific Computing, vol. 37, no. 3, pp. C203-C330, May 2015. DOI: DOI:10.1137/14M0973773“
Mixed-precision Block Gram Schmidt Orthogonalization,” 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Austin, TX, ACM, November 2015.“
Mixed precision and approximate 3D FFTs: Speed for accuracy trade-off with GPU-aware MPI and run-time data compression,” ICL Technical Report, no. ICL-UT-22-04, May 2022.“
MAtrix, TEnsor, and Deep-learning Optimized Routines (MATEDOR) , Washington, DC, NSF PI Meeting, Poster, April 2018. DOI: 10.6084/m9.figshare.6174143.v3
Matrix Multiplication on Batches of Small Matrices in Half and Half-Complex Precisions,” Journal of Parallel and Distributed Computing, vol. 145, pp. 188-201, November 2020. DOI: 10.1016/j.jpdc.2020.07.001“
Matrix Algebra on GPU and Multicore Architectures , Basel, Switzerland, Workshop on GPU-enabled Numerical Libraries, Presentation, May 2011.
Matrices Over Runtime Systems at Exascale,” Supercomputing '12 (poster), Salt Lake City, Utah, November 2012.“
MATEDOR: MAtrix, TEnsor, and Deep-learning Optimized Routines , Seattle, WA, 2020 NSF Cyberinfrastructure for Sustained Scientific Innovation (CSSI) Principal Investigator Meeting, February 2020.
MATEDOR: MAtrix, TEnsor, and Deep-learning Optimized Routines , Dallas, TX, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), Research Poster, November 2018.
MAGMA-sparse Interface Design Whitepaper,” Innovative Computing Laboratory Technical Report, no. ICL-UT-17-05, September 2017.“
MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing,” ISC High Performance, Frankfurt, Germany, Springer International Publishing, June 2019. DOI: 10.1007/978-3-030-34356-9_37“
MagmaDNN – High-Performance Data Analytics for Manycore GPUs and CPUs , Knoxville, TN, 2017 Summer Research Experiences for Undergraduate (REU), Presentation, December 2017.
MagmaDNN: Accelerated Deep Learning Using MAGMA,” Practice and Experience in Advanced Research Computing (PEARC ’19), Chicago, IL, ACM, July 2019.“
MagmaDNN 0.2 High-Performance Data Analytics for Manycore GPUs and CPUs : University of Tennessee, January 2019. DOI: 10.13140/RG.2.2.14906.64961
MAGMA Tensors and Batched Computing for Accelerating Applications on GPUs , San Jose, CA, GPU Technology Conference (GTC17), Presentation in Session S7728, May 2017.
MAGMA Templates for Scalable Linear Algebra on Emerging Architectures,” The International Journal of High Performance Computing Applications, vol. 34, issue 6, pp. 645-658, November 2020. DOI: 10.1177/1094342020938421“
MAGMA MIC: Optimizing Linear Algebra for Intel Xeon Phi , Frankfurt, Germany, ISC High Performance (ISC15), Intel Booth Presentation, June 2015.
MAGMA MIC: Linear Algebra Library for Intel Xeon Phi Coprocessors , Salt Lake City, UT, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), November 2012.
MAGMA - LAPACK for HPC on Heterogeneous Architectures , Oak Ridge, TN, Titan Summit at Oak Ridge National Laboratory, Presentation, August 2011.
MAGMA - LAPACK for GPUs , Atlanta, GA, Keeneland GPU Tutorial, April 2011.
MAGMA Embedded: Towards a Dense Linear Algebra Library for Energy Efficient Extreme Computing,” 2015 IEEE High Performance Extreme Computing Conference (HPEC ’15), (Best Paper Award), Waltham, MA, IEEE, September 2015.“
MAGMA Batched: A Batched BLAS Approach for Small Matrix Factorizations and Applications on GPUs,” Innovative Computing Laboratory Technical Report, no. ICL-UT-16-02: University of Tennessee, August 2016.“
MAGMA: A New Generation of Linear Algebra Library for GPU and Multicore Architectures , Salt Lake City, UT, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), Presentation, November 2012.
MAGMA: A Breakthrough in Solvers for Eigenvalue Problems , San Jose, CA, GPU Technology Conference (GTC12), Presentation, May 2012.
LU, QR, and Cholesky Factorizations: Programming Model, Performance Analysis and Optimization Techniques for the Intel Knights Landing Xeon Phi,” IEEE High Performance Extreme Computing Conference (HPEC'16), Waltham, MA, IEEE, September 2016.“
LU Factorization of Small Matrices: Accelerating Batched DGETRF on the GPU,” 16th IEEE International Conference on High Performance Computing and Communications (HPCC), Paris, France, IEEE, August 2014.“
LU Factorization for Accelerator-Based Systems,” IEEE/ACS AICCSA 2011, Sharm-El-Sheikh, Egypt, December 2011.“
Load-Balancing Sparse Matrix Vector Product Kernels on GPUs,” ACM Transactions on Parallel Computing, vol. 7, issue 1, March 2020. DOI: 10.1145/3380930“
Linear Algebra Software for Large-Scale Accelerated Multicore Computing,” Acta Numerica, vol. 25, pp. 1-160, May 2016. DOI: 10.1017/S0962492916000015“
Linear Algebra Software for High-Performance Computing (Part 2: Software for Hardware Accelerators and Coprocessors) , Frankfurt, Germany, ISC High Performance (ISC18), Tutorial Presentation, June 2015.
Linear Algebra Prepara.on for Emergent Neural Network Architectures: MAGMA, BLAS, and Batched GPU Computing , Virtual, LAPENNA Workshop, November 2021.
libCEED: Fast algebra for high-order element-based discretizations,” Journal of Open Source Software, vol. 6, no. 63, pp. 2945, 2021. DOI: 10.21105/joss.02945“
Leading Edge Hybrid Multi-GPU Algorithms for Generalized Eigenproblems in Electronic Structure Calculations,” International Supercomputing Conference (ISC), Lecture Notes in Computer Science, vol. 7905, Leipzig, Germany, Springer Berlin Heidelberg, pp. 67-80, June 2013. DOI: 10.1007/978-3-642-38750-0_6“
Keeneland: Computational Science Using Heterogeneous GPU Computing,” Contemporary High Performance Computing: From Petascale Toward Exascale, Boca Raton, FL, Taylor and Francis, 2013.“
Investigating the Benefit of FP16-Enabled Mixed-Precision Solvers for Symmetric Positive Definite Matrices using GPUs,” International Conference on Computational Science (ICCS 2020), Amsterdam, Netherlands, Springer, Cham, June 2020. DOI: 10.1007/978-3-030-50417-5_18“
Investigating Power Capping toward Energy-Efficient Scientific Applications,” Concurrency Computation: Practice and Experience, vol. 2018, issue e4485, pp. 1-14, April 2018. DOI: 10.1002/cpe.4485“
Investigating Half Precision Arithmetic to Accelerate Dense Linear System Solvers,” ScalA17: 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Denver, CO, ACM.“
An Introduction to the MAGMA project - Acceleration of Dense Linear Algebra : NVIDIA Webinar, June 2010.
Interior State Computation of Nano Structures,” PARA 2008, 9th International Workshop on State-of-the-Art in Scientific and Parallel Computing, Trondheim, Norway, May 2008.“
Interim Report on Benchmarking FFT Libraries on High Performance Systems,” Innovative Computing Laboratory Technical Report, no. ICL-UT-21-03: University of Tennessee, July 2021.“
Integrating Deep Learning in Domain Sciences at Exascale,” 2020 Smoky Mountains Computational Sciences and Engineering Conference (SMC 2020), August 2020.“
Integrating Deep Learning in Domain Sciences at Exascale,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-10: University of Tennessee, August 2020.“