Submitted by scrawford on
Title | Evaluation of Directive-Based Performance Portable Programming Models |
Publication Type | Journal Article |
Year of Publication | 2019 |
Authors | Lopez, M. G., W. Joubert, V. Larrea, O. Hernandez, A. Haidar, S. Tomov, and J. Dongarra |
Journal | International Journal of High Performance Computing and Networking |
Volume | 14 |
Issue | 2 |
Pagination | 165-182 |
Date Published | 2019–07 |
Keywords | OpenACC, OpenMP 4, performance portability, Programming models |
Abstract | We present an extended exploration of the performance portability of directives provided by OpenMP 4 and OpenACC to program various types of node architecture with attached accelerators, both self-hosted multicore and offload multicore/GPU. Our goal is to examine how successful OpenACC and the newer offload features of OpenMP 4.5 are for moving codes between architectures, and we document how much tuning might be required and what lessons we can learn from these experiences. To do this, we use examples of algorithms with varying computational intensities for our evaluation, as both compute and data access efficiency are important considerations for overall application performance. To better understand fundamental compute vs. bandwidth bound characteristics, we add the compute-bound Level 3 BLAS GEMM kernel to our linear algebra evaluation. We implement the kernels of interest using various methods provided by newer OpenACC and OpenMP implementations, and we evaluate their performance on various platforms including both x86_64 and Power8 with attached NVIDIA GPUs, x86_64 multicores, self-hosted Intel Xeon Phi KNL, as well as an x86_64 host system with Intel Xeon Phi coprocessors. We update these evaluations with the newest version of the NVIDIA Pascal architecture (P100), Intel KNL 7230, Power8+, and the newest supporting compiler implementations. Furthermore, we present in detail what factors affected the performance portability, including how to pick the right programming model, its programming style, its availability on different platforms, and how well compilers can optimise and target multiple platforms. |
DOI | 10.1504/IJHPCN.2017.10009064 |