Publications

2016

Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, “Performance, Design, and Autotuning of Batched GEMM for GPUs,” The International Supercomputing Conference (ISC High Performance 2016), Frankfurt, Germany, June 2016.

(1.27 MB)

Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, “Performance, Design, and Autotuning of Batched GEMM for GPUs,” University of Tennessee Computer Science Technical Report, no. UT-EECS-16-739: University of Tennessee, February 2016.

(1.27 MB)

Abdelfattah, A., H. Ltaeif, D. Keyes, and J. Dongarra, “Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs,” Concurrency and Computation: Practice and Experience, vol. 28, issue 12, pp. 3447 - 3465, May 2016.

(3.21 MB)

Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, “Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs,” International Conference on Computational Science (ICCS'16), San Diego, CA, June 2016.

(626.21 KB)

YarKhan, A., J. Kurzak, P. Luszczek, and J. Dongarra, “Porting the PLASMA Numerical Library to the OpenMP Standard,” International Journal of Parallel Programming, June 2016.

(1.66 MB)

Jagode, H., A. YarKhan, A. Danalis, and J. Dongarra, “Power Management and Event Verification in PAPI,” Tools for High Performance Computing 2015: Proceedings of the 9th International Workshop on Parallel Tools for High Performance Computing, September 2015, Dresden, Germany, Dresden, Germany, Springer International Publishing, pp. pp. 41-51, 2016.

(565.14 KB)

2015

Abalenkovs, M., A. Abdelfattah, J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, I. Yamazaki, and A. YarKhan, “Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems,” Supercomputing Frontiers and Innovations, vol. 2, no. 4, October 2015.

(3.68 MB)

Danalis, A., H. Jagode, G. Bosilca, and J. Dongarra, “PaRSEC in Practice: Optimizing a Legacy Chemistry Application through Distributed Task-Based Execution,” 2015 IEEE International Conference on Cluster Computing, Chicago, IL, IEEE, September 2015.

(1.77 MB)

Kabir, K., A. Haidar, S. Tomov, and J. Dongarra, “Performance Analysis and Design of a Hessenberg Reduction using Stabilized Blocked Elementary Transformations for New Architectures,” The Spring Simulation Multi-Conference 2015 (SpringSim'15), Best Paper Award, Alexandria, VA, April 2015.

(608.44 KB)

Kabir, K., A. Haidar, S. Tomov, and J. Dongarra, “Performance Analysis and Optimization of Two-Sided Factorization Algorithms for Heterogeneous Platform,” International Conference on Computational Science (ICCS 2015), Reykjavík, Iceland, June 2015.

(1.12 MB)

Mary, T., I. Yamazaki, J. Kurzak, P. Luszczek, S. Tomov, and J. Dongarra, “Performance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.

Bouteiller, A., G. Bosilca, and J. Dongarra, “Plan B: Interruption of Ongoing MPI Operations to Support Failure Recovery,” 22nd European MPI Users' Group Meeting, Bordeaux, France, ACM, September 2015.

(543.32 KB)

Herault, T., A. Bouteiller, G. Bosilca, M. Gamell, K. Teranishi, M. Parashar, and J. Dongarra, “Practical Scalable Consensus for Pseudo-Synchronous Distributed Systems,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.

(550.96 KB)

Herault, T., A. Bouteiller, G. Bosilca, M. Gamell, K. Teranishi, M. Parashar, and J. Dongarra, “Practical Scalable Consensus for Pseudo-Synchronous Distributed Systems: Formal Proof,” Innovative Computing Laboratory Technical Report, no. ICL-UT-15-01, April 2015.

(570.97 KB)

2014

Marin, G., “Performance Analysis of the MPAS-Ocean Code using HPCToolkit and MIAMI,” ICL Technical Report, no. ICL-UT-14-01: University of Tennessee, February 2014.

(894.39 KB)

Haidar, A., C. Cao, I. Yamazaki, J. Dongarra, M. Gates, P. Luszczek, and S. Tomov, “Performance and Portability with OpenCL for Throughput-Oriented HPC Workloads Across Accelerators, Coprocessors, and Multicore Processors,” 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA '14), New Orleans, LA, IEEE, November 2014.

(407.5 KB)

Dongarra, J., T. Herault, and Y. Robert, “Performance and Reliability Trade-offs for the Double Checkpointing Algorithm,” International Journal of Networking and Computing, vol. 4, no. 1, pp. 32-41.

(859.04 KB)

Dongarra, J., “Performance of Various Computers Using Standard Linear Equations Software, (Linpack Benchmark Report),” University of Tennessee Computer Science Technical Report, no. CS-89-85: University of Tennessee, June 2014.

(514.64 KB)

McCraw, H., J. Ralph, A. Danalis, and J. Dongarra, “Power Monitoring with PAPI for Extreme Scale Architectures and Dataflow-based Programming Models,” 2014 IEEE International Conference on Cluster Computing, no. ICL-UT-14-04, Madrid, Spain, IEEE, September 2014.

(3.45 MB)

Danalis, A., G. Bosilca, A. Bouteiller, T. Herault, and J. Dongarra, “PTG: An Abstraction for Unhindered Parallelism,” International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), New Orleans, LA, IEEE Press, November 2014.

(480.05 KB)

Dongarra, J., J. Kurzak, P. Luszczek, and I. Yamazaki, “PULSAR Users’ Guide, Parallel Ultra-Light Systolic Array Runtime,” University of Tennessee EECS Technical Report, no. UT-EECS-14-733: University of Tennessee, November 2014.

(561.56 KB)

2013

Weaver, V., D. Terpstra, H. McCraw, M. Johnson, K. Kasichayanula, J. Ralph, J. Nelson, P. Mucci, T. Mohan, and S. Moore, PAPI 5: Measuring Power, Energy, and the Cloud , Austin, TX, 2013 IEEE International Symposium on Performance Analysis of Systems and Software, April 2013.

(78.39 KB)

Jia, Y., G. Bosilca, P. Luszczek, and J. Dongarra, “Parallel Reduction to Hessenberg Form with Algorithm-Based Fault Tolerance,” International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE-SC 2013, Denver, CO, November 2013.

(147.09 KB)

Wang, Y., M. Baboulin, J. Falcou, Y. Fraigneau, and O. Le Maître, “A Parallel Solver for Incompressible Fluid Flows,” International Conference on Computational Science (ICCS 2013), Barcelona, Spain, Elsevier B.V., June 2013.

(588.79 KB)

Bosilca, G., A. Bouteiller, A. Danalis, M. Faverge, T. Herault, and J. Dongarra, “PaRSEC: Exploiting Heterogeneity to Enhance Scalability,” IEEE Computing in Science and Engineering, vol. 15, issue 6, pp. 36-45, November 2013.

(2.16 MB)

Dongarra, J., “Performance of Various Computers Using Standard Linear Equations Software,” University of Tennessee Computer Science Technical Report, no. cs-89-85, February 2013.

(539.24 KB)

Dongarra, J., M. Gates, A. Haidar, Y. Jia, K. Kabir, P. Luszczek, and S. Tomov, “Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi,” PPAM 2013, Warsaw, Poland, September 2013.

(284.97 KB)

Bland, W., A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, “Post-failure recovery of MPI communication capability: Design and rationale,” International Journal of High Performance Computing Applications, vol. 27, issue 3, pp. 244 - 254, January 2013.

(285.77 KB)

2012

Johnson, M., H. McCraw, S. Moore, P. Mucci, J. Nelson, D. Terpstra, V. M. Weaver, and T. Mohan, “PAPI-V: Performance Monitoring for Virtual Machines,” CloudTech-HPC 2012, Pittsburgh, PA, September 2012.

(2.69 MB)

“Parallel Processing and Applied Mathematics, 9th International Conference, PPAM 2011,” Lecture Notes in Computer Science, vol. 7203, Torun, Poland, 00 2012.

Baboulin, M., D. Becker, and J. Dongarra, “A Parallel Tiled Solver for Symmetric Indefinite Systems On Multicore Architectures,” IPDPS 2012, Shanghai, China, May 2012.

(544.09 KB)

McCraw, H., “Performance Counter Monitoring for the Blue Gene/Q Architecture,” University of Tennessee Computer Science Technical Report, no. ICL-UT-12-01, 00 2012.

(92.5 KB)

Donfack, S., S. Tomov, and J. Dongarra, “Performance evaluation of LU factorization through hardware counter measurements,” University of Tennessee Computer Science Technical Report, no. ut-cs-12-700, October 2012.

(794.82 KB)

Kasichayanula, K., D. Terpstra, P. Luszczek, S. Tomov, S. Moore, and G. D. Peterson, “Power Aware Computing on GPUs,” SAAHPC '12 (Best Paper Award), Argonne, IL, July 2012.

(658.06 KB)

Bosilca, G., J. Dongarra, and H. Ltaeif, “Power Profiling of Cholesky and QR Factorizations on Distributed Memory Systems,” Third International Conference on Energy-Aware High Performance Computing, Hamburg, Germany, September 2012.

(290.27 KB)

Kurzak, J., P. Luszczek, S. Tomov, and J. Dongarra, “Preliminary Results of Autotuning GEMM Kernels for the NVIDIA Kepler Architecture,” LAWN 267, 00 2012.

(1.14 MB)

Kurzak, J., P. Luszczek, M. Faverge, and J. Dongarra, “Programming the LU Factorization for a Multicore System with Accelerators,” Proceedings of VECPAR’12, Kobe, Japan, April 2012.

(414.33 KB)

Bland, W., G. Bosilca, A. Bouteiller, T. Herault, and J. Dongarra, “A Proposal for User-Level Failure Mitigation in the MPI-3 Standard,” University of Tennessee Electrical Engineering and Computer Science Technical Report, no. ut-cs-12-693: University of Tennessee, February 2012.

(159.46 KB)

Du, P., S. Tomov, and J. Dongarra, “Providing GPU Capability to LU and QR within the ScaLAPACK Framework,” University of Tennessee Computer Science Technical Report (also LAWN 272), no. UT-CS-12-699, September 2012.

(7.48 MB)

2011

Agullo, E., L. Giraud, A. Guermouche, A. Haidar, and J. Roman, “Parallel algebraic domain decomposition solver for the solution of augmented systems.,” Parallel, Distributed, Grid and Cloud Computing for Engineering, Ajaccio, Corsica, France, 12-15 April, 00 2011.

Malony, A. D., S. Biersdorff, S. Shende, H. Jagode, S. Tomov, G. Juckeland, R. Dietrich, D. Poole, and C. Lamb, “Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs,” International Conference on Parallel Processing (ICPP'11), Taipei, Taiwan, ACM, September 2011.

(1.41 MB)

Haidar, A., H. Ltaeif, and J. Dongarra, “Parallel Reduction to Condensed Forms for Symmetric Eigenvalue Problems using Aggregated Fine-Grained and Memory-Aware Kernels,” Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC11), Seattle, WA, November 2011.

(636.01 KB)

Haidar, A., H. Ltaeif, and J. Dongarra, “Parallel Reduction to Condensed Forms for Symmetric Eigenvalue Problems using Aggregated Fine-Grained and Memory-Aware Kernels,” University of Tennessee Computer Science Technical Report, UT-CS-11-677, (also Lawn254), August 2011.

(636.01 KB)

Baboulin, M., D. Becker, and J. Dongarra, “A parallel tiled solver for dense symmetric indefinite systems on multicore architectures,” University of Tennessee Computer Science Technical Report, no. ICL-UT-11-07, October 2011.

(544.2 KB)

Dongarra, J., “Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),” University of Tennessee Computer Science Technical Report, no. CS-89-85, 00 2011.

(6.42 MB)

Bosilca, G., A. Bouteiller, T. Herault, P. Lemariner, N. Ohm Saengpatsa, S. Tomov, and J. Dongarra, “Performance Portability of a GPU Enabled Factorization with the DAGuE Framework,” IEEE Cluster: workshop on Parallel Programming on Accelerator Clusters (PPAC), June 2011.

(290.98 KB)

Kasichayanula, K., H. You, S. Moore, S. Tomov, H. Jagode, and M. Johnson, Power-aware Computing on GPGPUs , Gatlinburg, TN, Fall Creek Falls Conference, Poster, September 2011.

(2.89 MB)

Lively, C., X. Wu, V. Taylor, S. Moore, H-C. Chang, C-Y. Su, and K. Cameron, “Power-Aware Prediction Models of Hybrid (MPI/OpenMP) Scientific Applications,” International Conference on Energy-Aware High Performance Computing (EnA-HPC 2011), Hamburg, Germany, September 2011.

(479.49 KB)

Ma, T., T. Herault, G. Bosilca, and J. Dongarra, “Process Distance-aware Adaptive MPI Collective Communications,” IEEE Int'l Conference on Cluster Computing (Cluster 2011), Austin, Texas, 00 2011.

Ltaeif, H., P. Luszczek, and J. Dongarra, “Profiling High Performance Dense Linear Algebra Algorithms on Multicore Architectures for Power and Energy Efficiency,” International Conference on Energy-Aware High Performance Computing (EnA-HPC 2011), Hamburg, Germany, September 2011.

(1.27 MB)

Main menu

Pages