News and Announcements
30 Years of Innovative Computing

And don’t forget to save the date: August 8–9, 2019 in Knoxville, TN!
Michela Taufer Interviews Jack and Sunita

Michela: Jack, you’ve been to every SC conference. How has the event evolved over the years?
Jack: Over that 31-year period, incredible changes have taken place—especially in the areas of computer speed and power. In the early days, we measured performance in terms of megaFLOP/s. And now, we’re a billion times faster in terms of computing power. That has tremendous ramifications for all of humanity. Another remarkable thing is the number and diversity of attendees. In the early days, it was just researchers. Now, people participating in HPC—academics, vendors, scientists—come from every industry sector you can imagine.
Michela: Where do you see HPC heading in the future? What excites you about its evolution?
Jack: One of the changes I’m seeing is the development and use of machine learning in scientific computing, which is changing how we can quickly resolve approximate solutions and then use conventional methods to compute a more accurate solution to many problems. It affects biology, drug design, materials, high-energy physics, etc. By augmenting our models and our ability to do simulation, HPC enables us to understand and do things so much faster than we could in the past—and it will only get better in the future.
Michela: Sunita, you have been attending SC for almost a decade now. What are the biggest changes you’ve noticed?
Sunita: SC has helped change my focus on what I do. For a long time, I was only developing software. But meeting and talking to scientists, nuclear physicists, and biologists at SC is very important for a computer scientist like me. Over the years, those discussions have really opened my eyes to this fact: before you spend time creating computer science tools, you need to spend time considering who is going to use your tools.
Conference Reports
ECP Houston
The Exascale Computing Project (ECP) is a collaborative effort of two DOE organizations—the Office of Science and the National Nuclear Security Administration—and is responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, and hardware, to support the nation’s exascale computing initiative. As many of you know, ICL competed for and won seven ECP awards in 2016 and picked up another ECP award for fast Fourier transform (FFT) development.
On January 14–18, 2019, ECP held its 3rd annual meeting in Houston, Texas, with over 500 attendees. Key topics included plans for the second phase of ECP and a new management structure. ICL team members presented posters and hosted tutorials for their respective projects—detailed below.
Distributed Tasking for Exascale (DTE)
The DTE project, as part of the ECP effort, extends the capabilities of ICL’s Parallel Runtime and Execution Controller (PaRSEC) project, which is a generic framework for architecture-aware scheduling and management of microtasks on distributed, many-core, heterogeneous architectures. The PaRSEC environment also provides a runtime component for dynamically executing tasks on heterogeneous distributed systems along with a productivity toolbox and development framework that supports multiple domain-specific languages and extensions and tools for debugging, trace collection, and analysis.
The DisCo team brought two posters for show and tell this year, including one with a bit of a deep dive into the programming interfaces. George Bosilca was on hand for the MPI breakout session and also co-hosted a tutorial on application-driven fault tolerance. Thomas Herault and Aurelien Bouteiller hosted a tutorial on how to program with PaRSEC using a symbolic directed acyclic graph (DAG) representation.
Exascale Performance Application Programming Interface (Exa-PAPI)
The Exa-PAPI project extends the classic Performance Application Programming Interface (PAPI) library by adding performance counter monitoring capabilities for new and advanced ECP hardware and software technologies, fine-grained power management support, and functionalities for performance counter analysis at “task granularity” for task-based runtime systems. Exa-PAPI also adds events that originate from the ECP software stack (i.e., communication libraries, math libraries, task runtime systems) and, as a result, extends the notion of performance events beyond just hardware to also include software-based information.
During the third annual ECP meeting, ICL’s Heike Jagode and Anthony Danalis participated in various breakout sessions with different application teams (GAMESS and NWChemEx), where they explained Exa-PAPI’s new support for software-defined events (SDEs) and the current status of SDE integration with other ECP applications. They also met with hardware vendors (AMD and Cray) to present new developments and support in PAPI.
The Exa-PAPI poster garnered significant attention from performance tools developers and hardware vendors during the well-attended poster session.
Fast Fourier Transform for ECP (FFT-ECP)
The fast Fourier transform (FFT) is used in many domain applications—including molecular dynamics, spectrum estimation, fast convolution and correlation, signal modulation, and wireless multimedia applications—but current state-of-the-art FFT libraries are not scalable on large heterogeneous machines with many nodes. The main objective of the FFT-ECP project is to design and implement a fast and robust 2-D and 3-D FFT library that targets large-scale heterogeneous systems with multi-core processors and hardware accelerators and to do so as a co-design activity with other ECP application developers.
Stan Tomov presented ICL’s work with the team’s FFT-ECP poster and later hosted a tutorial for the MAGMA library.
Production-ready, Exascale-Enabled Krylov Solvers for Exascale Computing (PEEKS)

The PEEKS project explores the redesign of solvers and extends the DOE’s Extreme-scale Algorithms and Solver Resilience (EASIR) project. Many large-scale scientific applications rely heavily on preconditioned iterative solvers for large linear systems. For these solvers to efficiently exploit extreme-scale hardware, both the solver algorithms and the implementations must be redesigned to address challenges like extreme concurrency, complex memory hierarchies, costly data movement, and heterogeneous node architectures.
As with the other five ICL-led ECP projects, PEEKS had their own poster describing the new features and directions of the project effort, and Ichitaro Yamazaki was on hand for the presentation.
Software for Linear Algebra Targeting Exascale (SLATE)
For decades, ICL has applied algorithmic and technological innovations to the process of pioneering, implementing, and disseminating dense linear algebra software—including the Linear Algebra PACKage (LAPACK) and Scalable Linear Algebra PACKage (ScaLAPACK) libraries. The Software for Linear Algebra Targeting Exascale (SLATE) project, as part of the ECP effort, is working to converge and consolidate that software into a dense linear algebra library that will integrate seamlessly into the ECP ecosystem.
At this year’s all-hands meeting, Mark Gates gave a tutorial on SLATE and went over the basic functions and provided downloadable examples to users so they could follow along. In addition, the SLATE team provided an overview of this year’s progress through the SLATE poster, which was on display at the poster session.
The Extreme-Scale Scientific Software Development Kit (xSDK)
The Extreme-Scale Scientific Software Development Kit (xSDK) is a collaboration between Argonne National Laboratory, ICL, Lawrence Berkeley National Laboratory, Lawrence Livermore National Laboratory, Sandia National Laboratories, and the University of California at Berkeley. The project aims to enable seamless integration and combined use of diverse, independently developed software packages for ECP applications. Currently, this includes a wide range of high-quality software libraries and solver packages that address the strategic needs to fulfill the mission of DOE’s Office of Science.
Piotr Luszczek and a few other xSDK collaborators hosted an introductory tutorial on how to use the xSDK and what libraries, software, and packages are included in the kit.
The editor would like to thank Thomas Herault, Heike Jagode, Piotr Luszczek, and Stan Tomov for their contributions to this article.
Recent Releases
2019 ICL Annual Report
For eighteen years, ICL has produced an annual report to provide a concise profile of our research, including information about the people and external organizations who make it all happen. Please download a copy and check it out.
MAGMA 2.5.0
MAGMA 2.5.0 is now available. Matrix Algebra on GPU and Multicore Architectures (MAGMA) is a collection of next-generation linear algebra (LA) libraries for heterogeneous architectures. The MAGMA package supports interfaces for current LA packages and standards (e.g., LAPACK and BLAS) to allow computational scientists to easily port any LA-reliant software components to heterogeneous architectures.
MAGMA 2.5.0 features LAPACK-compliant routines for multi-core CPUs enhanced with NVIDIA GPUs (including the Volta V100). MAGMA now includes more than 400 routines, covering one-sided dense matrix factorizations and solvers, and two-sided factorizations and eigen/singular-value problem solvers, as well as a subset of highly optimized BLAS for GPUs.
Updates and features in MAGMA 2.5.0 include:
- New routines: New NVIDIA Tensor Cores version of the linear mixed-precision solver is able to provide an FP64 solution with up to a 4× speedup using the fast FP16 Tensor Cores arithmetic. This version includes:
magma_dhgesv_iteref_gpu(FP64-FP16 solver with FP64 input and solution);
magma_dsgesv_iteref_gpu(FP64-FP32 solver with FP64 input and solution);
magma_hgetrf_gpu(mixed-precision FP32-FP16 LU factorization); and
magma_htgetrf_gpu(mixed-precision FP32-FP16 LU factorization using Tensor Cores). Additional details for the function names and the testing routines are provided in
README_FP16_Iterative_Refinement.txt. - New routine:
magmablas_Xgemm_batched_strided(X = {s, d, c, z}) is the stride-based variant ofmagmablas_Xgemm_batched; - New routine:
magma_Xgetrf_native(X = {s, d, c, z}) performs the LU factorization with partial pivoting using the GPU only. It has the same interface as the hybrid (CPU+GPU) implementation provided bymagma_Xgetrf_gpu. Testing the performance of this routine is possible through runningtesting_Xgetrf_gpuwith the option (–version 3); - New routine:
magma_Xpotrf_native(X = {s, d, c, z}) performs the Cholesky factorization using the GPU only. It has the same interface as the hybrid (CPU + GPU) implementation provided bymagma_Xpotrf_gpu. Testing the performance of this routine is possible through runningtesting_Xpotrf_gpuwith the option(--version 2); and - New benchmark: for GEMM in FP16 arithmetic (HGEMM) as well as auxiliary functions to cast matrices from FP32 to FP16 storage (
magmablas_slag2h) and from FP16 to FP32 (magmablas_hlag2s).
Click here to download the tarball.
Taufer to Receive IBM Faculty Award
EECS Prof. Michela Taufer is set to receive an IBM Faculty Award for $20,000. This award recognizes her leadership in HPC and IBM’s investment in UTK as a leader in high-performance and scientific computing.
IBM is thrilled to recognize Taufer’s leadership on high-performance computing and partnership with Oak Ridge National Lab.
Jamie M. Thomas, General Manager of Systems Strategy and Development, IBM
Taufer is also hoping to bring an IBM Onsite Deep Learning Workshop to train EECS students in deep learning and AI.
Beyond this award, Taufer—along with Greg Peterson (EECS) and Jack Dongarra—approached IBM with the idea of building an on-campus, “mini-Summit” by showcasing the many HPC applications and libraries developed at the University and explaining how their development would benefit from having access to such a machine.
















































