Publications

Export 1274 results:
Filters: 10.1109 is TPDS.2021.3131657  [Clear All Filters]
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
L
Luszczek, P., M. Gates, J. Kurzak, A. Danalis, and J. Dongarra, Search Space Generation and Pruning System for Autotuners,” 30th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Chicago, IL, IEEE, May 2016.  (555.44 KB)
Luszczek, P., W. M. Sid-Lakhdar, and J. Dongarra, Combining multitask and transfer learning with deep Gaussian processes for autotuning-based performance engineering,” The International Journal of High Performance Computing Applications, March 2023.
Luszczek, P., D. Bailey, J. Dongarra, J. Kepner, R. Lucas, R. Rabenseifner, and D. Takahashi, The HPC Challenge (HPCC) Benchmark Suite,” SC06 Conference Tutorial, Tampa, Florida, IEEE, November 2006.  (1.08 MB)
Luszczek, P., E. Meek, S. Moore, D. Terpstra, V. M. Weaver, and J. Dongarra, Evaluation of the HPC Challenge Benchmarks in Virtualized Environments,” 6th Workshop on Virtualization in High-Performance Cloud Computing, Bordeaux, France, August 2011.  (114.73 KB)
Luszczek, P., and J. Dongarra, Design of an Interactive Environment for Numerically Intensive Parallel Linear Algebra Calculations,” International Conference on Computational Science, Poland, Springer Verlag, June 2004.  (88.31 KB)
Luszczek, P., Y. Tsai, N. Lindquist, H. Anzt, and J. Dongarra, Scalable Data Generation for Evaluating Mixed-Precision Solvers,” 2020 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, USA, IEEE, September 2020.  (1.3 MB)
Luszczek, P., J. Kurzak, I. Yamazaki, and J. Dongarra, Towards Numerical Benchmark for Half-Precision Floating Point Arithmetic,” 2017 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, IEEE, September 2017.  (1.67 MB)
Luszczek, P., and D. Koester, HPC Challenge v1.x Benchmark Suite,” SC|05 Tutorial - S13, Seattle, Washington, January 2005.  (2.94 MB)
Luszczek, P., and J. Dongarra, Anatomy of a Globally Recursive Embedded LINPACK Benchmark,” 2012 IEEE High Performance Extreme Computing Conference, Waltham, MA, pp. 1-6, September 2012.  (204.74 KB)
Luszczek, P., High Performance Development for High End Computing with Python Language Wrapper (PLW),” International Journal of High Performance Computing Applications (to appear), 00 2006.  (179.32 KB)
Luszczek, P., Parallel Programming in MATLAB,” The International Journal of High Performance Computing Applications, vol. 23, no. 3, pp. 277-283, July 2009.  (215.71 KB)
Luszczek, P., H. Ltaeif, and J. Dongarra, Two-stage Tridiagonal Reduction for Dense Symmetric Matrices using Tile Algorithms on Multicore Architectures,” IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 2011.
M
Ma, T., G. Bosilca, A. Bouteiller, B. Goglin, J.. Squyres, and J. Dongarra, Kernel Assisted Collective Intra-node MPI Communication Among Multi-core and Many-core CPUs,” Int'l Conference on Parallel Processing (ICPP '11), Taipei, Taiwan, September 2011.
Ma, T., G. Bosilca, A. Bouteiller, and J. Dongarra, Kernel-assisted and topology-aware MPI collective communications on multi-core/many-core platforms,” Journal of Parallel and Distributed Computing, vol. 73, issue 7, pp. 1000-1010, July 2013.  (1.4 MB)
Ma, T., A. Bouteiller, G. Bosilca, and J. Dongarra, Locality and Topology aware Intra-node Communication Among Multicore CPUs,” Proceedings of the 17th EuroMPI conference, Stuttgart, Germany, LNCS, September 2010.  (327.01 KB)
Ma, T., G. Bosilca, A. Bouteiller, and J. Dongarra, HierKNEM: An Adaptive Framework for Kernel-Assisted and Topology-Aware Collective Communications on Many-core Clusters,” IPDPS 2012 (Best Paper), Shanghai, China, May 2012.  (165.9 KB)
Ma, T., T. Herault, G. Bosilca, and J. Dongarra, Process Distance-aware Adaptive MPI Collective Communications,” IEEE Int'l Conference on Cluster Computing (Cluster 2011), Austin, Texas, 00 2011.
Ma, T., G. Bosilca, A. Bouteiller, B. Goglin, J.. Squyres, and J. Dongarra, Kernel Assisted Collective Intra-node Communication Among Multicore and Manycore CPUs,” University of Tennessee Computer Science Technical Report, UT-CS-10-663, November 2010.  (384.75 KB)
Ma, T., A. Bouteiller, G. Bosilca, and J. Dongarra, Impact of Kernel-Assisted MPI Communication over Scientific Applications: CPMD and FFTW,” 18th EuroMPI, Santorini, Greece, Springer, pp. 247-254, September 2011.
Malony, A. D., S. Biersdorff, S. Shende, H. Jagode, S. Tomov, G. Juckeland, R. Dietrich, D. Poole, and C. Lamb, Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs,” International Conference on Parallel Processing (ICPP'11), Taipei, Taiwan, ACM, September 2011.  (1.41 MB)
Marin, G., C. McCurdy, and J. Vetter, Diagnosis and Optimization of Application Prefetching Performance,” Proceedings of the 27th ACM International Conference on Supercomputing (ICS '13), Eugene, Oregon, USA, ACM Press, June 2013.  (827.31 KB)
Haidar, A., M. Gates, S. Tomov, and J. Dongarra, Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication,” Proceedings of the 27th ACM International Conference on Supercomputing (ICS '13), Eugene, Oregon, USA, ACM Press, June 2013.  (1.27 MB)
Marin, G., Performance Analysis of the MPAS-Ocean Code using HPCToolkit and MIAMI,” ICL Technical Report, no. ICL-UT-14-01: University of Tennessee, February 2014.  (894.39 KB)
Marin, G., J. Dongarra, and D. Terpstra, MIAMI: A Framework for Application Performance Diagnosis ,” IPASS-2014, Monterey, CA, IEEE, March 2014.  (1010.75 KB)
Marques, O., J. Demmel, and P. B. Vasconcelos, Bidiagonal SVD Computation via an Associated Tridiagonal Eigenproblem,” LAPACK Working Note, no. LAWN 295, ICL-UT-18-02: University of Tennessee, April 2018.  (1.53 MB)
Mary, T., I. Yamazaki, J. Kurzak, P. Luszczek, S. Tomov, and J. Dongarra, Performance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.
Masliah, I., A. Abdelfattah, A. Haidar, S. Tomov, J. Falcou, and J. Dongarra, High-performance Matrix-matrix Multiplications of Very Small Matrices,” 22nd International European Conference on Parallel and Distributed Computing (Euro-Par'16), Grenoble, France, Springer International Publishing, August 2016.
Masliah, I., A. Abdelfattah, A. Haidar, S. Tomov, M. Baboulin, J. Falcou, and J. Dongarra, Algorithms and Optimization Techniques for High-Performance Matrix-Matrix Multiplications of Very Small Matrices,” Parallel Computing, vol. 81, pp. 1–21, January 2019.  (3.27 MB)
Masliah, I., A. Abdelfattah, A. Haidar, S. Tomov, M. Baboulin, J. Falcou, and J. Dongarra, Algorithms and Optimization Techniques for High-Performance Matrix-Matrix Multiplications of Very Small Matrices,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-09: Innovative Computing Laboratory, University of Tennessee, September 2018.  (3.74 MB)
Mattson, T., D. Bader, J. Berry, A. Buluc, J. Dongarra, C. Faloutsos, J. Feo, J. Gilbert, J. Gonzalez, B. Hendrickson, et al., Standards for Graph Algorithm Primitives,” 17th IEEE High Performance Extreme Computing Conference (HPEC '13), Waltham, MA, IEEE, September 2013.  (108.86 KB)
McCraw, H., Performance Counter Monitoring for the Blue Gene/Q Architecture,” University of Tennessee Computer Science Technical Report, no. ICL-UT-12-01, 00 2012.  (92.5 KB)
McCraw, H., J. Ralph, A. Danalis, and J. Dongarra, Power Monitoring with PAPI for Extreme Scale Architectures and Dataflow-based Programming Models,” 2014 IEEE International Conference on Cluster Computing, no. ICL-UT-14-04, Madrid, Spain, IEEE, September 2014.  (3.45 MB)
McCraw, H., A. Danalis, G. Bosilca, J. Dongarra, K. Kowalski, and T. Windus, Utilizing Dataflow-based Execution for Coupled Cluster Methods,” 2014 IEEE International Conference on Cluster Computing, no. ICL-UT-14-02, Madrid, Spain, IEEE, September 2014.  (260.23 KB)
McCraw, H., D. Terpstra, J. Dongarra, K. Davis, and R. Musselman, Beyond the CPU: Hardware Performance Counter Monitoring on Blue Gene/Q,” International Supercomputing Conference 2013 (ISC'13), Leipzig, Germany, Springer, June 2013.  (624.58 KB)
Meek, E., J. Larkin, and J. Dongarra, Remote Software Toolkit Installer,” ICL Technical Report, no. ICL-UT-05-04, June 2005.  (490.6 KB)
Mellor-Crummey, J., P. Beckman, J. Dongarra, B. Miller, and K. Yelick, Creating Software Technology to Harness the Power of Leadership-class Computing Systems,” DOE SciDAC Review (to appear), June 2007.  (617.02 KB)
Melnichenko, M., O. Balabanov, R. Murray, J. Demmel, M. W. Mahoney, and P. Luszczek, CholeskyQR with Randomization and Pivoting for Tall Matrices (CQRRPT) : arXiv, February 2024.
Millar, J., P. McMahan, and J. Dongarra, RIBAPI - Repository in a Box Application Programmer's Interface,” University of Tennessee Computer Science Technical Report, no. UT-CS-00-438, 00 2001.  (57.5 KB)
Miller, M., C. Moulding, J. Dongarra, and C. Johnson, Grid-Enabling Problem Solving Environments: A Case Study of SCIRUN and NetSolve,” Proceedings of the High Performance Computing Symposium (HPC 2001) in 2001 Advanced Simulation Technologies Conference, Seattle, Washington, Society for Modeling and Simulation International, April 2001.  (144.19 KB)
Mishler, D., J. Ciesko, S. Olivier, and G. Bosilca, Performance Insights into Device-initiated RMA Using Kokkos Remote Spaces,” 2023 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops), Santa Fe, NM, USA, IEEE, November 2023.
Mohr, B., A. Kühnal, M-A. Hermanns, and F. Wolf, Performance Analysis of One-sided Communication Mechanisms,” Mini-Symposium "Tools Support for Parallel Programming", Proceedings of Parallel Computing (ParCo), no. ICL-UT-06-07, Malaga, Spain, September 2005.  (121.49 KB)
Mohr, B., and F. Wolf, KOJAK - A Tool Set for Automatic Performance Analysis of Parallel Applications,” Proc. of the European Conference on Parallel Computing (EuroPar), vol. 2790, Klagenfurt, Austria, Springer-Verlag, pp. 1301-1304, August 2003.  (196.05 KB)
Moore, S., D. Cronk, F. Wolf, A. Purkayastha, P. J. Teller, R. Araiza, G. Aguilera, and J. Nava, Performance Profiling and Analysis of DoD Applications using PAPI and TAU,” Proceedings of DoD HPCMP UGC 2005, Nashville, TN, IEEE, June 2005.  (322.56 KB)
Moore, K., Recommendations for Automatic Responses to Electronic Mail,” RFC 3834: Internet Engineering Task Force (IETF), January 2004.  (174.76 KB)
Moore, K., and J. Dongarra, NetBuild,” University of Tennessee Computer Science Technical Report, no. UT-CS-O1-461, January 2001.  (17.71 KB)
Moore, K., J. Dongarra, S. Moore, and E. Grosse, NetBuild: Automated Installation and Use of Network-Accessible Software Libraries,” ICL Technical Report, no. ICL-UT-04-02, January 2004.  (80.52 KB)
Moore, S., A Comparison of Counting and Sampling Modes of Using Performance Monitoring Hardware,” International Conference on Computational Science (ICCS 2002), Amsterdam, Netherlands, Springer, April 2002.  (122 KB)
BDEC Pathways to Convergence: Toward a Shaping Strategy for a Future Software and Data Ecosystem for Scientific Inquiry,” Innovative Computing Laboratory Technical Report, no. ICL-UT-17-08: University of Tennessee, November 2017.
Moore, S., and J. Ralph, User-Defined Events for Hardware Performance Monitoring,” Procedia Computer Science, vol. 4: Elsevier, pp. 2096-2104, May 2011.  (361.76 KB)
Moore, S., F. Wolf, J. Dongarra, S. Shende, A. D. Malony, and B. Mohr, A Scalable Approach to MPI Application Performance Analysis,” In Proc. of the 12th European Parallel Virtual Machine and Message Passing Interface Conference: Springer LNCS, September 2005.  (988.58 KB)

Pages