Publications
Export 1285 results:
Filters: 10.1007 is 978-3-030-90539-2 [Clear All Filters]
MagmaDNN 0.2 High-Performance Data Analytics for Manycore GPUs and CPUs
: University of Tennessee, January 2019.
DOI: 10.13140/RG.2.2.14906.64961 (7.84 MB)
MagmaDNN – High-Performance Data Analytics for Manycore GPUs and CPUs
, Knoxville, TN, 2017 Summer Research Experiences for Undergraduate (REU), Presentation, December 2017.
(5.06 MB)
Heterogeneous Streaming,”
The Sixth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2016, Chicago, IL, IEEE, May 2016.
(2.73 MB)
“Analyzing PAPI Performance on Virtual Machines,”
VMWare Technical Journal, vol. Winter 2013, January 2014.
“Analyzing PAPI Performance on Virtual Machines,”
ICL Technical Report, no. ICL-UT-13-02, August 2013.
(437.37 KB)
“Evaluating Asynchronous Schwarz Solvers on GPUs,”
International Journal of High Performance Computing Applications, August 2020.
DOI: 10.1177/1094342020946814
“Autotuning Dense Linear Algebra Libraries on GPUs
, Basel, Switzerland, Sixth International Workshop on Parallel Matrix Algorithms and Applications (PMAA 2010), June 2010.
(579.44 KB)
An Improved MAGMA GEMM for Fermi GPUs,”
University of Tennessee Computer Science Technical Report, no. UT-CS-10-655 (also LAPACK working note 227), July 2010.
(486.71 KB)
“Blas for GPUs,”
Scientific Computing with Multicore and Accelerators, Boca Raton, Florida, CRC Press, 2010.
(1.05 MB)
“Accelerating GPU Kernels for Dense Linear Algebra,”
Proc. of VECPAR'10, Berkeley, CA, June 2010.
(615.07 KB)
“An Improved MAGMA GEMM for Fermi GPUs,”
International Journal of High Performance Computing, vol. 24, no. 4, pp. 511-515, 00 2010.
“Optimizing Symmetric Dense Matrix-Vector Multiplication on GPUs,”
ACM/IEEE Conference on Supercomputing (SC’11), Seattle, WA, November 2011.
(630.63 KB)
“Numerical Linear Algebra on Hybrid Architectures: Recent Developments in the MAGMA Project
, Portland, Oregon, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC09), November 2009.
(1.41 MB)
A Python Library for Matrix Algebra on GPU and Multicore Architectures,”
2022 IEEE 19th International Conference on Mobile Ad Hoc and Smart Systems (MASS), Denver, CO, IEEE, December 2022.
DOI: 10.1109/MASS56207.2022.00121 (414.36 KB)
“Randomized Numerical Linear Algebra: A Perspective on the Field with an Eye to Software,”
University of California, Berkeley EECS Technical Report, no. UCB/EECS-2022-258: University of California, Berkeley, November 2022.
DOI: 10.48550/arXiv.2302.11474 (1.05 MB) (1.54 MB)
“Memory Bandwidth and the Performance of Scientific Applications: A Study of the AMD Opteron Processor,”
2005 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (submitted), January 2004.
(210.29 KB)
“Automating the Large-Scale Collection and Analysis of Performance,”
5th LCI International Conference on Linux Clusters: The HPC Revolution, Austin, Texas, May 2004.
(511.6 KB)
“PerfMiner: Cluster-Wide Collection, Storage and Presentation of Application Level Hardware Performance Data,”
European Conference on Parallel Processing (Euro-Par 2005), Monte de Caparica, Portugal, Springer, September 2005.
DOI: 10.1007/11549468_1 (205.45 KB)
“Improving the Scaling of an Asynchronous Many-Task Runtime with a Lightweight Communication Engine,”
52nd International Conference on Parallel Processing (ICPP 2023), Salt Lake City, Utah, ACM, September 2023.
DOI: 10.1145/3605573.3605642
“A Scalable Approach to MPI Application Performance Analysis,”
In Proc. of the 12th European Parallel Virtual Machine and Message Passing Interface Conference: Springer LNCS, September 2005.
(988.58 KB)
“Review of Performance Analysis Tools for MPI Parallel Programs,”
European Parallel Virtual Machine / Message Passing Interface Users’ Group Meeting, Lecture Notes in Computer Science 2131, Greece, Springer Verlag, Berlin, pp. 241-248, September 2001.
DOI: 10.1007/3-540-45417-9_34 (39.61 KB)
“User-Defined Events for Hardware Performance Monitoring,”
Procedia Computer Science, vol. 4: Elsevier, pp. 2096-2104, May 2011.
DOI: 10.1016/j.procs.2011.04.229 (361.76 KB)
“NetBuild: Transparent Cross-Platform Access to Computational Software Libraries,”
Concurrency and Computation: Practice and Experience, Special Issue: Grid Computing Environments, vol. 14, no. 13-15, pp. 1445-1456, November 2002.
(74.84 KB)
“Metacomputing Support for the SARA3D Structural Acoustics Application,”
Department of Defense Users' Group Conference (to appear), Biloxi, Mississippi, June 2001.
(64.58 KB)
“Active Netlib: An Active Mathematical Software Collection for Inquiry-based Computational Science and Engineering Education,”
Journal of Digital Information special issue on Interactivity in Digital Libraries, vol. 2, no. 4, 00 2002.
(182.59 KB)
“Improving Time to Solution with Automated Performance Analysis,”
Second Workshop on Productivity and Performance in High-End Computing (P-PHEC) at 11th International Symposium on High Performance Computer Architecture (HPCA-2005), San Francisco, February 2005.
(112.63 KB)
“NetBuild,”
University of Tennessee Computer Science Technical Report, no. UT-CS-O1-461, January 2001.
(17.71 KB)
“Performance Profiling and Analysis of DoD Applications using PAPI and TAU,”
Proceedings of DoD HPCMP UGC 2005, Nashville, TN, IEEE, June 2005.
(322.56 KB)
“Recommendations for Automatic Responses to Electronic Mail,”
RFC 3834: Internet Engineering Task Force (IETF), January 2004.
(174.76 KB)
“A Comparison of Counting and Sampling Modes of Using Performance Monitoring Hardware,”
International Conference on Computational Science (ICCS 2002), Amsterdam, Netherlands, Springer, April 2002.
DOI: 10.1007/3-540-46080-2_95 (122 KB)
““BDEC Pathways to Convergence: Toward a Shaping Strategy for a Future Software and Data Ecosystem for Scientific Inquiry,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-17-08: University of Tennessee, November 2017.
NetBuild: Automated Installation and Use of Network-Accessible Software Libraries,”
ICL Technical Report, no. ICL-UT-04-02, January 2004.
(80.52 KB)
“Performance Analysis of One-sided Communication Mechanisms,”
Mini-Symposium "Tools Support for Parallel Programming", Proceedings of Parallel Computing (ParCo), no. ICL-UT-06-07, Malaga, Spain, September 2005.
(121.49 KB)
“KOJAK - A Tool Set for Automatic Performance Analysis of Parallel Applications,”
Proc. of the European Conference on Parallel Computing (EuroPar), vol. 2790, Klagenfurt, Austria, Springer-Verlag, pp. 1301-1304, August 2003.
(196.05 KB)
“Performance Insights into Device-initiated RMA Using Kokkos Remote Spaces,”
2023 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops), Santa Fe, NM, USA, IEEE, November 2023.
DOI: 10.1109/CLUSTERWorkshops61457.2023.00028
“Grid-Enabling Problem Solving Environments: A Case Study of SCIRUN and NetSolve,”
Proceedings of the High Performance Computing Symposium (HPC 2001) in 2001 Advanced Simulation Technologies Conference, Seattle, Washington, Society for Modeling and Simulation International, April 2001.
(144.19 KB)
“RIBAPI - Repository in a Box Application Programmer's Interface,”
University of Tennessee Computer Science Technical Report, no. UT-CS-00-438, 00 2001.
(57.5 KB)
“CholeskyQR with Randomization and Pivoting for Tall Matrices (CQRRPT)
: arXiv, February 2024.
Creating Software Technology to Harness the Power of Leadership-class Computing Systems,”
DOE SciDAC Review (to appear), June 2007.
(617.02 KB)
“Remote Software Toolkit Installer,”
ICL Technical Report, no. ICL-UT-05-04, June 2005.
(490.6 KB)
“Beyond the CPU: Hardware Performance Counter Monitoring on Blue Gene/Q,”
International Supercomputing Conference 2013 (ISC'13), Leipzig, Germany, Springer, June 2013.
(624.58 KB)
“Performance Counter Monitoring for the Blue Gene/Q Architecture,”
University of Tennessee Computer Science Technical Report, no. ICL-UT-12-01, 00 2012.
(92.5 KB)
“Power Monitoring with PAPI for Extreme Scale Architectures and Dataflow-based Programming Models,”
2014 IEEE International Conference on Cluster Computing, no. ICL-UT-14-04, Madrid, Spain, IEEE, September 2014.
DOI: 10.1109/CLUSTER.2014.6968672 (3.45 MB)
“Utilizing Dataflow-based Execution for Coupled Cluster Methods,”
2014 IEEE International Conference on Cluster Computing, no. ICL-UT-14-02, Madrid, Spain, IEEE, September 2014.
(260.23 KB)
“Standards for Graph Algorithm Primitives,”
17th IEEE High Performance Extreme Computing Conference (HPEC '13), Waltham, MA, IEEE, September 2013.
DOI: 10.1109/HPEC.2013.6670338 (108.86 KB)
“High-performance Matrix-matrix Multiplications of Very Small Matrices,”
22nd International European Conference on Parallel and Distributed Computing (Euro-Par'16), Grenoble, France, Springer International Publishing, August 2016.
“Algorithms and Optimization Techniques for High-Performance Matrix-Matrix Multiplications of Very Small Matrices,”
Parallel Computing, vol. 81, pp. 1–21, January 2019.
DOI: 10.1016/j.parco.2018.10.003 (3.27 MB)
“Algorithms and Optimization Techniques for High-Performance Matrix-Matrix Multiplications of Very Small Matrices,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-18-09: Innovative Computing Laboratory, University of Tennessee, September 2018.
(3.74 MB)
“Performance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs,”
The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.
“Bidiagonal SVD Computation via an Associated Tridiagonal Eigenproblem,”
LAPACK Working Note, no. LAWN 295, ICL-UT-18-02: University of Tennessee, April 2018.
(1.53 MB)
“