Publications

Show only items where

Author

Type

Term

Year

Keyword

Export 1273 results:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Yamazaki, I., A. Abdelfattah, A. Ida, S. Ohshima, S. Tomov, R. Yokota, and J. Dongarra, “Analyzing Performance of BiCGStab with Hierarchical Matrix on GPU Clusters,” IEEE International Parallel and Distributed Processing Symposium (IPDPS), Vancouver, BC, Canada, IEEE, May 2018.

(1.37 MB)

Yamazaki, I., T. Dong, S. Tomov, and J. Dongarra, “Tridiagonalization of a Symmetric Dense Matrix on a GPU Cluster,” The Third International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), May 2013.

Yamazaki, I., S. Tomov, and J. Dongarra, “Non-GPU-resident Dense Symmetric Indefinite Factorization,” Concurrency and Computation: Practice and Experience, November 2016.

Yamazaki, I., J. Kurzak, P. Luszczek, and J. Dongarra, “Randomized Algorithms to Update Partial Singular Value Decomposition on a Hybrid CPU/GPU Cluster,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.

Yamazaki, I., S. Tomov, and J. Dongarra, “Deflation Strategies to Improve the Convergence of Communication-Avoiding GMRES,” 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, New Orleans, LA, November 2014.

(465.52 KB)

Kurzak, J., M. Gates, A. Charara, A. YarKhan, I. Yamazaki, and J. Dongarra, “Linear Systems Solvers for Distributed-Memory Machines with GPU Accelerators,” Euro-Par 2019: Parallel Processing, vol. 11725: Springer, pp. 495–506, August 2019.

“8th International Conference on Parallel Processing and Applied Mathematics, Lecture Notes in Computer Science (LNCS),” PPAM 2009 Proceedings, vol. 6067, Wroclaw, Poland, Springer, September 2010.

“,” 7th International parallel Processing and Applied Mathematics Conference, Lecture Notes in Comptuer Science, vol. 4967, Gdansk, Poland, Springer Berlin, January 2008.

Wyrzykowski, R., E. Deelman, J. Dongarra, and K. Karczewski, “Parallel Processing and Applied Mathematics: 13th International Conference, PPAM 2019, Bialystok, Poland, September 8–11, 2019, Revised Selected Papers, Part I,” Lecture Notes in Computer Science, 1, no. 12043: Springer International Publishing, pp. 581, March 2020.

Tsai, Y-H. Mike, N. Beams, and H. Anzt, “Mixed Precision Algebraic Multigrid on GPUs,” Parallel Processing and Applied Mathematics (PPAM 2022), vol. 13826, Cham, Springer International Publishing, April 2023.

“Parallel Processing and Applied Mathematics, 9th International Conference, PPAM 2011,” Lecture Notes in Computer Science, vol. 7203, Torun, Poland, 00 2012.

Wyrzykowski, R., E. Deelman, J. Dongarra, and K. Karczewski, “Parallel Processing and Applied Mathematics: 13th International Conference, PPAM 2019, Bialystok, Poland, September 8–11, 2019, Revised Selected Papers, Part II,” Lecture Notes in Computer Science, no. 12044: Springer International Publishing, pp. 503, March 2020.

Wu, W., A. Bouteiller, G. Bosilca, M. Faverge, and J. Dongarra, “Hierarchical DAG scheduling for Hybrid Distributed Systems,” 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Hyderabad, India, IEEE, May 2015.

(1.11 MB)

Wu, W., G. Bosilca, R. vandeVaart, S. Jeaugey, and J. Dongarra, “GPU-Aware Non-contiguous Data Movement In Open MPI,” 25th International Symposium on High-Performance Parallel and Distributed Computing (HPDC'16), Kyoto, Japan, ACM, June 2016.

(482.32 KB)

Worley, P. H., J. Candy, L. Carrington, K. Huck, T. Kaiser, K. Mahinthakumar, A. D. Malony, S. Moore, D. Reed, P. C. Roth, et al., “Performance Analysis of GYRO: A Tool Evaluation,” In Proceedings of the 2005 SciDAC Conference, San Francisco, CA, June 2005.

(172.07 KB)

Wong, K., S. Tomov, and J. Dongarra, “Hands-on Research and Training in High-Performance Data Sciences, Data Analytics, and Machine Learning for Emerging Environments,” ISC High Performance, Frankfurt, Germany, Springer International Publishing, June 2019.

(1016.52 KB)

Wong, K., S. Tomov, D. Nichols, R. Febbo, F. Lopez, J. Halloy, and X. Ma, How to Build Your Own Deep Neural Network : PEARC20, July 2020.

(18.8 MB)

Wong, K., S. Tomov, and J. Dongarra, “Project-Based Research and Training in High Performance Data Sciences, Data Analytics, and Machine Learning,” The Journal of Computational Science Education, vol. 11, issue 1, pp. 36-44, January 2020.

(4.4 MB)

Wolf, F., “EARL - API Documentation,” ICL Technical Report, no. ICL-UT-04-03, October 2004.

(111.36 KB)

Wolf, F., A. D. Malony, S. Shende, and A. Morris, “Trace-Based Parallel Performance Overhead Compensation,” In Proc. of the International Conference on High Performance Computing and Communications (HPCC), Sorrento (Naples), Italy, September 2005.

(306.88 KB)

Wolf, F., F. Freitag, B. Mohr, S. Moore, and B. Wylie, “Large Event Traces in Parallel Performance Analysis,” 8th Workshop 'Parallel Systems and Algorithms' (PASA), Lecture Notes in Informatics, no. ICL-UT-06-08, Frankfurt/Main, Germany, Gesellschaft für Informatik, March 2006.

(92.47 KB)

Wolf, F., B. Mohr, J. Dongarra, and S. Moore, “Efficient Pattern Search in Large Traces through Successive Refinement,” Proceedings of Euro-Par 2004, Pisa, Italy, Springer-Verlag, August 2004.

(177.46 KB)

Wolf, F., and B. Mohr, “Automatic performance analysis of hybrid MPI/OpenMP applications,” Journal of Systems Architecture, Special Issue 'Evolutions in parallel distributed and network-based processing', vol. 49(10-11): Elsevier, pp. 421-439, November 2003.

Wolf, F., B. Mohr, J. Dongarra, and S. Moore, “Automatic Analysis of Inefficiency Patterns in Parallel Applications,” Concurrency and Computation: Practice and Experience, vol. 19, no. 11, pp. 1481-1496, August 2007.

(233.31 KB)

Wolf, F., B. Mohr, J. Dongarra, and S. Moore, “Automatic analysis of inefficiency patterns in parallel applications,” Concurrency and Computation: Practice and Experience, Special issue "Automatic Performance Analysis" (submitted), 00 2005.

(233.31 KB)

Wolf, F., B. Wylie, E. Abraham, W. Frings, K. Fürlinger, M. Geimer, M-A. Hermanns, B. Mohr, S. Moore, and M. Pfeifer, “Usage of the Scalasca Toolset for Scalable Performance Analysis of Large-scale Parallel Applications,” Proceedings of the 2nd International Workshop on Tools for High Performance Computing, Stuttgart, Germany, Springer, pp. 157-167, January 2008.

(229.2 KB)

Wolf, F., and B. Mohr, “Hardware-Counter Based Automatic Performance Analysis of Parallel Programs,” Advances in Parallel Computing, vol. 13, Dresden, Germany, Elsevier, pp. 753-760, January 2004, 2003.

Winkler, F., “Redesigning PAPI's High-Level API,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-03: University of Tennessee, February 2020.

(356.41 KB)

Whitlock, M., N. Morales, G. Bosilca, A. Bouteiller, B. Nicolae, K. Teranishi, E. Giem, and V. Sarkar, “Integrating process, control-flow, and data resiliency layers using a hybrid Fenix/Kokkos approach,” 2022 IEEE International Conference on Cluster Computing (CLUSTER 2022), Heidelberg, Germany, September 2022.

White, J. B., and J. Dongarra, “Overlapping Computation and Communication for Advection on a Hybrid Parallel Computer,” IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 2011.

White, J. B., and J. Dongarra, “High-Performance High-Resolution Semi-Lagrangian Tracer Transport on a Sphere,” Journal of Computational Physics, vol. 230, issue 17, pp. 6778-6799, July 2011.

(1.68 MB)

Whaley, C., and J. Dongarra, “Automatically Tuned Linear Algebra Software,” 1998 ACM/IEEE conference on Supercomputing (SC '98), Orlando, FL, IEEE Computer Society, November 1998.

Whaley, C., A. Petitet, and J. Dongarra, “Automated Empirical Optimization of Software and the ATLAS Project,” Parallel Computing, vol. 27, no. 1-2, pp. 3-25, January 2001.

(370.71 KB)

Whaley, C., A. Petitet, and J. Dongarra, “Automated Empirical Optimizations of Software and the ATLAS Project (LAPACK Working Note 147),” University of Tennessee Computer Science Department Technical Report,, no. UT-CS-00-448, September 2000.

(373.69 KB)

Weaver, V., D. Terpstra, H. McCraw, M. Johnson, K. Kasichayanula, J. Ralph, J. Nelson, P. Mucci, T. Mohan, and S. Moore, PAPI 5: Measuring Power, Energy, and the Cloud , Austin, TX, 2013 IEEE International Symposium on Performance Analysis of Systems and Software, April 2013.

(78.39 KB)

Weaver, V., D. Terpstra, and S. Moore, “Non-Determinism and Overcount on Modern Hardware Performance Counter Implementations,” 2013 IEEE International Symposium on Performance Analysis of Systems and Software, Austin, TX, IEEE, April 2013.

(307.24 KB)

Weaver, V. M., and J. Dongarra, “Can Hardware Performance Counters Produce Expected, Deterministic Results?,” 3rd Workshop on Functionality of Hardware Performance Monitoring, Atlanta, GA, December 2010.

(392.71 KB)

Weaver, V. M., M. Johnson, K. Kasichayanula, J. Ralph, P. Luszczek, D. Terpstra, and S. Moore, “Measuring Energy and Power with PAPI,” International Workshop on Power-Aware Systems and Architectures, Pittsburgh, PA, September 2012.

(146.79 KB)

Wang, L., W. Wu, J. Zhang, H. Liu, G. Bosilca, M. Herlihy, and R. Fonseca, “FFT-Based Gradient Sparsification for the Distributed Training of Deep Neural Networks,” 9th International Symposium on High-Performance Parallel and Distributed Computing (HPDC 20), Stockholm, Sweden, ACM, June 2020.

(4.72 MB)

Wang, Y., M. Baboulin, J. Falcou, Y. Fraigneau, and O. Le Maître, “A Parallel Solver for Incompressible Fluid Flows,” International Conference on Computational Science (ICCS 2013), Barcelona, Spain, Elsevier B.V., June 2013.

(588.79 KB)

Voemel, C., S. Tomov, and J. Dongarra, “Divide & Conquer on Hybrid GPU-Accelerated Multicore Systems,” SIAM Journal on Scientific Computing (submitted), August 2010.

Voemel, C., S. Tomov, L-W. Wang, O. Marques, and J. Dongarra, “The use of bulk states to accelerate the band edge state calculation of a semiconductor quantum dot,” Journal of Computational Physics (submitted), January 2006.

(337.08 KB)

Voemel, C., S. Tomov, O. Marques, A. Canning, L-W. Wang, and J. Dongarra, “State-of-the-Art Eigensolvers for Electronic Structure Calculations of Large Scale Nano-Systems,” Journal of Computational Physics, vol. 227, no. 15, pp. 7113-7124, January 2008.

Voemel, C., S. Tomov, L-W. Wang, O. Marques, and J. Dongarra, “The Use of Bulk States to Accelerate the Band Edge State Calculation of a Semiconductor Quantum Dot,” Journal of Computational Physics, vol. 223, pp. 774-782, 00 2007.

(452.6 KB)

Voemel, C., S. Tomov, and J. Dongarra, “Divide and Conquer on Hybrid GPU-Accelerated Multicore Systems,” SIAM Journal on Scientific Computing, vol. 34(2), pp. C70-C82, April 2012.

Vetter, J., R. Glassbrook, J. Dongarra, K. Schwan, B. Loftis, S. McNally, J. Meredith, J. Rogers, P. Roth, K. Spafford, et al., “Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community,” IEEE Computing in Science & Engineering, vol. 13, issue 5, pp. 90-95, August 2011.

(932.57 KB)

Vetter, J., R. Glassbrook, K. Schwan, S. Yalamanchili, M. Horton, A. Gavrilovska, M. Slawinska, J. Dongarra, J. Meredith, P. Roth, et al., “Keeneland: Computational Science Using Heterogeneous GPU Computing,” Contemporary High Performance Computing: From Petascale Toward Exascale, Boca Raton, FL, Taylor and Francis, 2013.

(2.7 MB)

Valero-Lara, P., J. Dongarra, A. Haidar, S. D. Relton, S. Tomov, and M. Zounon, A Standard for Batched BLAS Routines , Paris, France, 17th SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP16), April 2016.

(1.93 MB)

Valeev, E. F., R. J. Harrison, A. A. Holmes, C. C. Peterson, and D. A. Penchoff, “Direct Determination of Optimal Real-Space Orbitals for Correlated Electronic Structure of Molecules,” Journal of Chemical Theory and Computation, vol. 19, issue 20, pp. 7230 - 7241, October 2023.

Vadhiyar, S., G. Fagg, and J. Dongarra, “Automatically Tuned Collective Communications,” Proceedings of SuperComputing 2000 (SC'2000), Dallas, TX, November 2000.

(232.69 KB)

Main menu

Publications

Pages