Publications

Cheng, X., A. Soma, E. D'Azevedo, K. Wong, and S. Tomov, Accelerating 2D FFT: Exploit GPU Tensor Cores through Mixed-Precision , Dallas, TX, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), ACM Student Research Poster, November 2018.

(740.37 KB)

Tomov, S., M. Gates, and A. Haidar, Accelerating Linear Algebra with MAGMA , Knoxville, TN, ECP Annual Meeting 2018, Tutorial, February 2018.

(35.27 MB)

Jagode, H., A. Danalis, and J. Dongarra, “Accelerating NWChem Coupled Cluster through dataflow-based Execution,” The International Journal of High Performance Computing Applications, vol. 32, issue 4, pp. 540--551, July 2018.

(1.68 MB)

Dong, T., A. Haidar, S. Tomov, and J. Dongarra, “Accelerating the SVD Bi-Diagonalization of a Batch of Small Matrices using GPUs,” Journal of Computational Science, vol. 26, pp. 237–245, May 2018.

(2.18 MB)

Gates, M., S. Tomov, and J. Dongarra, “Accelerating the SVD Two Stage Bidiagonal Reduction and Divide and Conquer Using GPUs,” Parallel Computing, vol. 74, pp. 3–18, May 2018.

(1.34 MB)

Luo, X., W. Wu, G. Bosilca, T. Patinyasakdikul, L. Wang, and J. Dongarra, “ADAPT: An Event-Based Adaptive Collective Communication Framework,” The 27th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '18), Tempe, Arizona, ACM Press, June 2018.

(493.65 KB)

Masliah, I., A. Abdelfattah, A. Haidar, S. Tomov, M. Baboulin, J. Falcou, and J. Dongarra, “Algorithms and Optimization Techniques for High-Performance Matrix-Matrix Multiplications of Very Small Matrices,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-09: Innovative Computing Laboratory, University of Tennessee, September 2018.

(3.74 MB)

Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, “Analysis and Design Techniques towards High-Performance and Energy-Efficient Dense Linear Solvers on GPUs,” IEEE Transactions on Parallel and Distributed Systems, vol. 29, issue 12, pp. 2700–2712, December 2018.

(2.53 MB)

Yamazaki, I., A. Abdelfattah, A. Ida, S. Ohshima, S. Tomov, R. Yokota, and J. Dongarra, “Analyzing Performance of BiCGStab with Hierarchical Matrix on GPU Clusters,” IEEE International Parallel and Distributed Processing Symposium (IPDPS), Vancouver, BC, Canada, IEEE, May 2018.

(1.37 MB)

Balaprakash, P., J. Dongarra, T. Gamblin, M. Hall, J. Hollingsworth, B. Norris, and R. Vuduc, “Autotuning in High-Performance Computing Applications,” Proceedings of the IEEE, vol. 106, issue 11, pp. 2068–2083, November 2018.

(2.5 MB)

Dongarra, J., M. Gates, J. Kurzak, P. Luszczek, and Y. Tsai, “Autotuning Numerical Dense Linear Algebra for Batched Computation With GPU Hardware Accelerators,” Proceedings of the IEEE, vol. 106, issue 11, pp. 2040–2055, November 2018.

(2.53 MB)

Luszczek, P., J. Kurzak, I. Yamazaki, D. Keffer, V. Maroulas, and J. Dongarra, “Autotuning Techniques for Performance-Portable Point Set Registration in 3D,” Supercomputing Frontiers and Innovations, vol. 5, no. 4, December 2018.

(720.15 KB)

Dongarra, J., I. Duff, M. Gates, A. Haidar, S. Hammarling, N. J. Higham, J. Hogg, P. Valero Lara, P. Luszczek, M. Zounon, et al., Batched BLAS (Basic Linear Algebra Subprograms) 2018 Specification , July 2018.

(483.05 KB)

Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, “Batched One-Sided Factorizations of Tiny Matrices Using GPUs: Challenges and Countermeasures,” Journal of Computational Science, vol. 26, pp. 226–236, May 2018.

(3.73 MB)

Marques, O., J. Demmel, and P. B. Vasconcelos, “Bidiagonal SVD Computation via an Associated Tridiagonal Eigenproblem,” LAPACK Working Note, no. LAWN 295, ICL-UT-18-02: University of Tennessee, April 2018.

(1.53 MB)

Asch, M., T. Moore, R. M. Badia, M. Beck, P. Beckman, T. Bidot, F. Bodin, F. Cappello, A. Choudhary, B. R. de Supinski, et al., “Big Data and Extreme-Scale Computing: Pathways to Convergence - Toward a Shaping Strategy for a Future Software and Data Ecosystem for Scientific Inquiry,” The International Journal of High Performance Computing Applications, vol. 32, issue 4, pp. 435–479, July 2018.

(1.29 MB)

Caniou, Y., E. Caron, A K W. Chang, and Y. Robert, “Budget-Aware Scheduling Algorithms for Scientific Workflows with Stochastic Task Weights on Heterogeneous IaaS Cloud Platforms,” 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Vancouver, BC, Canada, IEEE, May 2018.

(1.31 MB)

Han, L., L-C. Canon, H. Casanova, Y. Robert, and F. Vivien, “Checkpointing Workflows for Fail-Stop Errors,” IEEE Transactions on Computers, vol. 67, issue 8, pp. 1105–1120, August 2018.

Ahrens, J., C. M. Biwer, A. Costan, G. Antoniu, M. S. Pérez, N. Stojanovic, R. Badia, O. Beckstein, G. Fox, S. Jha, et al., “A Collection of White Papers from the BDEC2 Workshop in Bloomington, IN,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-15: University of Tennessee, Knoxville, November 2018.

(9.26 MB)

Sun, J., J. Fu, J. Drake, Q. Zhu, A. Haidar, M. Gates, S. Tomov, and J. Dongarra, “Computational Benefit of GPU Optimization for Atmospheric Chemistry Modeling,” Journal of Advances in Modeling Earth Systems, vol. 10, issue 8, pp. 1952–1969, August 2018.

(3.4 MB)

Casanova, H., J. Herrmann, and Y. Robert, “Computing the Expected Makespan of Task Graphs in the Presence of Silent Errors,” Parallel Computing, vol. 75, pp. 41–60, July 2018.

(2.56 MB)

Benoit, A., A. Cavelan, F. Cappello, P. Raghavan, Y. Robert, and H. Sun, “Coping with Silent and Fail-Stop Errors at Scale by Combining Replication and Checkpointing,” Journal of Parallel and Distributed Computing, vol. 122, pp. 209–225, December 2018.

(837 KB)

Aupy, G., A. Benoit, S. Dai, L. Pottier, P. Raghavan, Y. Robert, and M. Shantharam, “Co-Scheduling Amdhal Applications on Cache-Partitioned Systems,” International Journal of High Performance Computing Applications, vol. 32, issue 1, pp. 123–138, January 2018.

(672.52 KB)

Aupy, G., A. Benoit, B. Goglin, L. Pottier, and Y. Robert, “Co-Scheduling HPC Workloads on Cache-Partitioned CMP Platforms,” Cluster 2018, Belfast, UK, IEEE Computer Society Press, September 2018.

(423.75 KB)

Bouteiller, A., G. Bosilca, T. Herault, and J. Dongarra, “Data Movement Interfaces to Support Dataflow Runtimes,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-03: University of Tennessee, May 2018.

(210.94 KB)

Haidar, A., A. Abdelfattah, M. Zounon, P. Wu, S. Pranesh, S. Tomov, and J. Dongarra, “The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques,” International Conference on Computational Science (ICCS 2018), vol. 10860, Wuxi, China, Springer, pp. 586–600, June 2018.

(487.88 KB)

Bosilca, G., A. Bouteiller, T. Herault, V. Le Fèvre, Y. Robert, and J. Dongarra, “Distributed Termination Detection for HPC Task-Based Environments,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-14: University of Tennessee, June 2018.

Le Fèvre, V., G. Bosilca, A. Bouteiller, T. Herault, A. Hori, Y. Robert, and J. Dongarra, “Do moldable applications perform better on failure-prone HPC platforms?,” 11th Workshop on Resiliency in High Performance Computing in Clusters, Clouds, and Grids, Turin, Italy, Springer Verlag, August 2018.

(360.72 KB)

Bouteiller, A., S. Pophale, S. Boehm, M. B. Baker, and M G. Venkata, “Evaluating Contexts in OpenSHMEM-X Reference Implementation,” OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, Cham, Springer International Publishing, pp. 50–62, 2018.

Tomov, S., A. Haidar, D. Schultz, and J. Dongarra, “Evaluation and Design of FFT for Distributed Accelerated Systems,” ECP WBS 2.3.3.09 Milestone Report, no. FFT-ECP ST-MS-10-1216: Innovative Computing Laboratory, University of Tennessee, October 2018.

(7.53 MB)

Jagode, H., A. Danalis, R. Hoque, M. Faverge, and J. Dongarra, “Evaluation of Dataflow Programming Models for Electronic Structure Theory,” Concurrency and Computation: Practice and Experience: Special Issue on Parallel and Distributed Algorithms, vol. 2018, issue e4490, pp. 1–20, May 2018.

(1.69 MB)

Bosilca, G., A. Bouteiller, A. Guermouche, T. Herault, Y. Robert, P. Sens, and J. Dongarra, “A Failure Detector for HPC Platforms,” The International Journal of High Performance Computing Applications, vol. 32, issue 1, pp. 139–158, January 2018.

(1.04 MB)

Han, L., V. Le Fèvre, L-C. Canon, Y. Robert, and F. Vivien, “A Generic Approach to Scheduling and Checkpointing Workflows,” The 47th International Conference on Parallel Processing (ICPP 2018), Eugene, OR, IEEE Computer Society Press, August 2018.

(737.11 KB)

Haidar, A., A. Abdelfattah, M. Zounon, S. Tomov, and J. Dongarra, “A Guide for Achieving High Performance with Very Small Matrices on GPUs: A Case Study of Batched LU and Cholesky Factorizations,” IEEE Transactions on Parallel and Distributed Systems, vol. 29, issue 5, pp. 973–984, May 2018.

(832.92 KB)

Haidar, A., S. Tomov, J. Dongarra, and N. J. Higham, “Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers,” The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), Dallas, TX, IEEE, November 2018.

(642.51 KB)

Haidar, A., A. Abdelfattah, S. Tomov, and J. Dongarra, Harnessing GPU's Tensor Cores Fast FP16 Arithmetic to Speedup Mixed-Precision Iterative Refinement Solvers and Achieve 74 Gflops/Watt on Nvidia V100 , San Jose, CA, GPU Technology Conference (GTC), Poster, March 2018.

(2.96 MB)

Anzt, H., T. Gruetzmacher, E. S. Quintana-Orti, and F. Scheidegger, “High-Performance GPU Implementation of PageRank with Reduced Precision based on Mantissa Segmentation,” 8th Workshop on Irregular Applications: Architectures and Algorithms, 2018.

Abdelfattah, A., M. Gates, J. Kurzak, P. Luszczek, and J. Dongarra, “Implementation of the C++ API for Batch BLAS,” SLATE Working Notes, no. 07, ICL-UT-18-04: Innovative Computing Laboratory, University of Tennessee, June 2018.

(1.07 MB)

Anzt, H., T. Huckle, J. Bräckle, and J. Dongarra, “Incomplete Sparse Approximate Inverses for Parallel Preconditioning,” Parallel Computing, vol. 71, pp. 1–22, January 2018.

(1.24 MB)

Ghysels, P., S. Li, A. YarKhan, and J. Dongarra, “Initial Integration and Evaluation of SLATE and STRUMPACK,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-11: University of Tennessee, December 2018.

(249.78 KB)

YarKhan, A., G. Ragghianti, J. Dongarra, M. Cawkwell, D. Perez, and A. Voter, “Initial Integration and Evaluation of SLATE Parallel BLAS in LATTE,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-07: Innovative Computing Laboratory, University of Tennessee, June 2018.

(366.6 KB)

Haidar, A., H. Jagode, P. Vaccaro, A. YarKhan, S. Tomov, and J. Dongarra, “Investigating Power Capping toward Energy-Efficient Scientific Applications,” Concurrency Computation: Practice and Experience, vol. 2018, issue e4485, pp. 1-14, April 2018.

(1.2 MB)

Anzt, H., and J. Dongarra, “A Jaccard Weights Kernel Leveraging Independent Thread Scheduling on GPUs,” SBAC-PAD, Lyon, France, IEEE, 2018.

(237.68 KB)

Gates, M., A. Charara, J. Kurzak, A. YarKhan, I. Yamazaki, and J. Dongarra, “Least Squares Performance Report,” SLATE Working Notes, no. 09, ICL-UT-18-10: Innovative Computing Laboratory, University of Tennessee, December 2018.

(1.76 MB)

Kurzak, J., M. Gates, I. Yamazaki, A. Charara, A. YarKhan, J. Finney, G. Ragghianti, P. Luszczek, and J. Dongarra, “Linear Systems Performance Report,” SLATE Working Notes, no. 08, ICL-UT-18-08: Innovative Computing Laboratory, University of Tennessee, September 2018.

(1.64 MB)

Abdelfattah, A., J. Dongarra, A. Haidar, S. Tomov, and I. Yamazaki, MATEDOR: MAtrix, TEnsor, and Deep-learning Optimized Routines , Dallas, TX, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), Research Poster, November 2018.

(2.55 MB)

Haidar, A., S. Tomov, A. Abdelfattah, I. Yamazaki, and J. Dongarra, MAtrix, TEnsor, and Deep-learning Optimized Routines (MATEDOR) , Washington, DC, NSF PI Meeting, Poster, April 2018.

(2.4 MB)

Benoit, A., A. Cavelan, Y. Robert, and H. Sun, “Multi-Level Checkpointing and Silent Error Detection for Linear Workflows,” Journal of Computational Science, vol. 28, pp. 398–415, September 2018.

Herault, T., Y. Robert, A. Bouteiller, D. Arnold, K. Ferreira, G. Bosilca, and J. Dongarra, “Optimal Cooperative Checkpointing for Shared High-Performance Computing Platforms,” 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Best Paper Award, Vancouver, BC, Canada, IEEE, May 2018.

(899.3 KB)

Anzt, H., M. Kreutzer, E. Ponce, G. D. Peterson, G. Wellein, and J. Dongarra, “Optimization and Performance Evaluation of the IDR Iterative Krylov Solver on GPUs,” The International Journal of High Performance Computing Applications, vol. 32, no. 2, pp. 220–230, March 2018.

(2.08 MB)

Main menu

Pages