Integrating Deep Learning in Domain Science at Exascale (MagmaDNN)

TitleIntegrating Deep Learning in Domain Science at Exascale (MagmaDNN)
Publication TypePresentation
Year of Publication2020
AuthorsTomov, S., K. Wong, J. Dongarra, R. Archibald, E. Chow, E. D'Azevedo, M. Eisenbach, R. Febbo, F. Lopez, D. Nichols, and J. Yin
Date Published2020-12
EventDOD HPCMP seminar

We will present some of the current challenges in the design and integration of deep learning AI with traditional HPC simulations. We evaluate existing packages for readiness to run efficiently deep learning models and applications on large scale HPC systems, identify challenges, and propose new asynchronous parallelization and optimization techniques for current large-scale heterogeneous systems and up-coming exascale systems. These developments, along with existing HPC AI software capabilities, have been integrated in MagmaDNN, an open source HPC deep learning framework.
Many deep learning frameworks are targeted towards data scientists and fall short in providing quality integration into existing HPC workflows. This paper discusses the necessities of an HPC deep learning framework and how these can be provided, e.g., as in MagmaDNN, through a deep integration with existing HPC libraries such as MAGMA and its modular memory management, MPI, CuBLAS, CuDNN, MKL, and HIP. Advancements are also illustrated through the use of algorithmic enhancements in reduced and mixed-precision and asynchronous optimization methods. Finally, we present illustrations and potential solutions on enhancing traditional compute and data intensive applications at ORNL and UTK with AI. The approaches and future challenges are illustrated on materials science, imaging, and climate applications.

External Publication Flag: