High-Order Finite Element Method using Standard and Device-Level Batch GEMM on GPUs

Natalie Beams; Ahmad Abdelfattah; Stanimire Tomov; Jack Dongarra; Tzanio Kolev; Yohann Dudouit

Submitted by scrawford on Thu, 12/10/2020 - 13:06

Title	High-Order Finite Element Method using Standard and Device-Level Batch GEMM on GPUs
Publication Type	Conference Paper
Year of Publication	2020
Authors	Beams, N., A. Abdelfattah, S. Tomov, J. Dongarra, T. Kolev, and Y. Dudouit
Conference Name	2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA)
Date Published	2020-11
Publisher	IEEE
Keywords	Batched linear algebra, finite elements, gpu, high-order methods, matrix-free FEM, Tensor contractions
Abstract	We present new GPU implementations of the tensor contractions arising from basis-related computations for highorder finite element methods. We consider both tensor and nontensor bases. In the case of tensor bases, we introduce new kernels based on a series of fused device-level matrix multiplications (GEMMs), specifically designed to utilize the fast memory of the GPU. For non-tensor bases, we develop a tuned framework for choosing standard batch-BLAS GEMMs that will maximize performance across groups of elements. The implementations are included in a backend of the libCEED library. We present benchmark results for the diffusion and mass operators using libCEED integration through the MFEM finite element library and compare to those of the previously best-performing GPU backends for stand-alone basis computations. In tensor cases, we see improvements of approximately 10-30% for some cases, particularly for higher basis orders. For the non-tensor tests, the new batch-GEMMs implementation is twice as fast as what was previously available for basis function order greater than five and greater than approximately 105 degrees of freedom in the mesh; up to ten times speedup is seen for eighth-order basis functions.

Project Tags:

magma

File:

icl-utk-1442-2020.pdf

External Publication Flag: