ICL Newsletter

News and Announcements

30 Years of Innovative Computing

Plans are coming together for ICL’s 30th anniversary workshop. Keep an eye on this space and on your email inbox for more information about lodging, registration, and venues.

And don’t forget to save the date: August 8–9, 2019 in Knoxville, TN!

Michela Taufer Interviews Jack and Sunita

SC19 General Chair and EECS faculty member Michela Taufer recently sat down with Jack Dongarra and Sunita Chandrasekaran (University of Delaware) for a debriefing on SC18 and a glimpse of what to expect for SC19.

Michela: Jack, you’ve been to every SC conference. How has the event evolved over the years?

Jack: Over that 31-year period, incredible changes have taken place—especially in the areas of computer speed and power. In the early days, we measured performance in terms of megaFLOP/s. And now, we’re a billion times faster in terms of computing power. That has tremendous ramifications for all of humanity. Another remarkable thing is the number and diversity of attendees. In the early days, it was just researchers. Now, people participating in HPC—academics, vendors, scientists—come from every industry sector you can imagine.

Michela: Where do you see HPC heading in the future? What excites you about its evolution?

Jack: One of the changes I’m seeing is the development and use of machine learning in scientific computing, which is changing how we can quickly resolve approximate solutions and then use conventional methods to compute a more accurate solution to many problems. It affects biology, drug design, materials, high-energy physics, etc. By augmenting our models and our ability to do simulation, HPC enables us to understand and do things so much faster than we could in the past—and it will only get better in the future.

Michela: Sunita, you have been attending SC for almost a decade now. What are the biggest changes you’ve noticed?

Sunita: SC has helped change my focus on what I do. For a long time, I was only developing software. But meeting and talking to scientists, nuclear physicists, and biologists at SC is very important for a computer scientist like me. Over the years, those discussions have really opened my eyes to this fact: before you spend time creating computer science tools, you need to spend time considering who is going to use your tools.

Read the full interview here.

Conference Reports

ECP Houston

The Exascale Computing Project (ECP) is a collaborative effort of two DOE organizations—the Office of Science and the National Nuclear Security Administration—and is responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, and hardware, to support the nation’s exascale computing initiative. As many of you know, ICL competed for and won seven ECP awards in 2016 and picked up another ECP award for fast Fourier transform (FFT) development.

On January 14–18, 2019, ECP held its 3rd annual meeting in Houston, Texas, with over 500 attendees. Key topics included plans for the second phase of ECP and a new management structure. ICL team members presented posters and hosted tutorials for their respective projects—detailed below.

Distributed Tasking for Exascale (DTE)

The DTE project, as part of the ECP effort, extends the capabilities of ICL’s Parallel Runtime and Execution Controller (PaRSEC) project, which is a generic framework for architecture-aware scheduling and management of microtasks on distributed, many-core, heterogeneous architectures. The PaRSEC environment also provides a runtime component for dynamically executing tasks on heterogeneous distributed systems along with a productivity toolbox and development framework that supports multiple domain-specific languages and extensions and tools for debugging, trace collection, and analysis.

The DisCo team brought two posters for show and tell this year, including one with a bit of a deep dive into the programming interfaces. George Bosilca was on hand for the MPI breakout session and also co-hosted a tutorial on application-driven fault tolerance. Thomas Herault and Aurelien Bouteiller hosted a tutorial on how to program with PaRSEC using a symbolic directed acyclic graph (DAG) representation.

Exascale Performance Application Programming Interface (Exa-PAPI)

The Exa-PAPI project extends the classic Performance Application Programming Interface (PAPI) library by adding performance counter monitoring capabilities for new and advanced ECP hardware and software technologies, fine-grained power management support, and functionalities for performance counter analysis at “task granularity” for task-based runtime systems. Exa-PAPI also adds events that originate from the ECP software stack (i.e., communication libraries, math libraries, task runtime systems) and, as a result, extends the notion of performance events beyond just hardware to also include software-based information.

During the third annual ECP meeting, ICL’s Heike Jagode and Anthony Danalis participated in various breakout sessions with different application teams (GAMESS and NWChemEx), where they explained Exa-PAPI’s new support for software-defined events (SDEs) and the current status of SDE integration with other ECP applications. They also met with hardware vendors (AMD and Cray) to present new developments and support in PAPI.

The Exa-PAPI poster garnered significant attention from performance tools developers and hardware vendors during the well-attended poster session.

Fast Fourier Transform for ECP (FFT-ECP)

The fast Fourier transform (FFT) is used in many domain applications—including molecular dynamics, spectrum estimation, fast convolution and correlation, signal modulation, and wireless multimedia applications—but current state-of-the-art FFT libraries are not scalable on large heterogeneous machines with many nodes. The main objective of the FFT-ECP project is to design and implement a fast and robust 2-D and 3-D FFT library that targets large-scale heterogeneous systems with multi-core processors and hardware accelerators and to do so as a co-design activity with other ECP application developers.

Stan Tomov presented ICL’s work with the team’s FFT-ECP poster and later hosted a tutorial for the MAGMA library.

Production-ready, Exascale-Enabled Krylov Solvers for Exascale Computing (PEEKS)

The PEEKS project explores the redesign of solvers and extends the DOE’s Extreme-scale Algorithms and Solver Resilience (EASIR) project. Many large-scale scientific applications rely heavily on preconditioned iterative solvers for large linear systems. For these solvers to efficiently exploit extreme-scale hardware, both the solver algorithms and the implementations must be redesigned to address challenges like extreme concurrency, complex memory hierarchies, costly data movement, and heterogeneous node architectures.

As with the other five ICL-led ECP projects, PEEKS had their own poster describing the new features and directions of the project effort, and Ichitaro Yamazaki was on hand for the presentation.

Software for Linear Algebra Targeting Exascale (SLATE)

For decades, ICL has applied algorithmic and technological innovations to the process of pioneering, implementing, and disseminating dense linear algebra software—including the Linear Algebra PACKage (LAPACK) and Scalable Linear Algebra PACKage (ScaLAPACK) libraries. The Software for Linear Algebra Targeting Exascale (SLATE) project, as part of the ECP effort, is working to converge and consolidate that software into a dense linear algebra library that will integrate seamlessly into the ECP ecosystem.

At this year’s all-hands meeting, Mark Gates gave a tutorial on SLATE and went over the basic functions and provided downloadable examples to users so they could follow along. In addition, the SLATE team provided an overview of this year’s progress through the SLATE poster, which was on display at the poster session.

The Extreme-Scale Scientific Software Development Kit (xSDK)

The Extreme-Scale Scientific Software Development Kit (xSDK) is a collaboration between Argonne National Laboratory, ICL, Lawrence Berkeley National Laboratory, Lawrence Livermore National Laboratory, Sandia National Laboratories, and the University of California at Berkeley. The project aims to enable seamless integration and combined use of diverse, independently developed software packages for ECP applications. Currently, this includes a wide range of high-quality software libraries and solver packages that address the strategic needs to fulfill the mission of DOE’s Office of Science.

Piotr Luszczek and a few other xSDK collaborators hosted an introductory tutorial on how to use the xSDK and what libraries, software, and packages are included in the kit.

The editor would like to thank Thomas Herault, Heike Jagode, Piotr Luszczek, and Stan Tomov for their contributions to this article.

Recent Releases

2019 ICL Annual Report

For eighteen years, ICL has produced an annual report to provide a concise profile of our research, including information about the people and external organizations who make it all happen. Please download a copy and check it out.

MAGMA 2.5.0

MAGMA 2.5.0 is now available. Matrix Algebra on GPU and Multicore Architectures (MAGMA) is a collection of next-generation linear algebra (LA) libraries for heterogeneous architectures. The MAGMA package supports interfaces for current LA packages and standards (e.g., LAPACK and BLAS) to allow computational scientists to easily port any LA-reliant software components to heterogeneous architectures.

MAGMA 2.5.0 features LAPACK-compliant routines for multi-core CPUs enhanced with NVIDIA GPUs (including the Volta V100). MAGMA now includes more than 400 routines, covering one-sided dense matrix factorizations and solvers, and two-sided factorizations and eigen/singular-value problem solvers, as well as a subset of highly optimized BLAS for GPUs.

Updates and features in MAGMA 2.5.0 include:

New routines: New NVIDIA Tensor Cores version of the linear mixed-precision solver is able to provide an FP64 solution with up to a 4× speedup using the fast FP16 Tensor Cores arithmetic. This version includes:
magma_dhgesv_iteref_gpu (FP64-FP16 solver with FP64 input and solution);
magma_dsgesv_iteref_gpu (FP64-FP32 solver with FP64 input and solution);
magma_hgetrf_gpu (mixed-precision FP32-FP16 LU factorization); and
magma_htgetrf_gpu (mixed-precision FP32-FP16 LU factorization using Tensor Cores). Additional details for the function names and the testing routines are provided in
README_FP16_Iterative_Refinement.txt.
New routine: magmablas_Xgemm_batched_strided (X = {s, d, c, z}) is the stride-based variant of magmablas_Xgemm_batched;
New routine: magma_Xgetrf_native (X = {s, d, c, z}) performs the LU factorization with partial pivoting using the GPU only. It has the same interface as the hybrid (CPU+GPU) implementation provided by magma_Xgetrf_gpu. Testing the performance of this routine is possible through running testing_Xgetrf_gpu with the option (–version 3);
New routine: magma_Xpotrf_native (X = {s, d, c, z}) performs the Cholesky factorization using the GPU only. It has the same interface as the hybrid (CPU + GPU) implementation provided by magma_Xpotrf_gpu. Testing the performance of this routine is possible through running testing_Xpotrf_gpu with the option (--version 2); and
New benchmark: for GEMM in FP16 arithmetic (HGEMM) as well as auxiliary functions to cast matrices from FP32 to FP16 storage (magmablas_slag2h) and from FP16 to FP32 (magmablas_hlag2s).

Click here to download the tarball.

Taufer to Receive IBM Faculty Award

EECS Prof. Michela Taufer is set to receive an IBM Faculty Award for $20,000. This award recognizes her leadership in HPC and IBM’s investment in UTK as a leader in high-performance and scientific computing.

IBM is thrilled to recognize Taufer’s leadership on high-performance computing and partnership with Oak Ridge National Lab.

Jamie M. Thomas, General Manager of Systems Strategy and Development, IBM

Taufer is also hoping to bring an IBM Onsite Deep Learning Workshop to train EECS students in deep learning and AI.

Beyond this award, Taufer—along with Greg Peterson (EECS) and Jack Dongarra—approached IBM with the idea of building an on-campus, “mini-Summit” by showcasing the many HPC applications and libraries developed at the University and explaining how their development would benefit from having access to such a machine.

Recent Papers

Gruetzmacher, T., T. Cojean, G. Flegar, F. Göbel, and H. Anzt, “A Customized Precision Format Based on Mantissa Segmentation for Accelerating Sparse Linear Algebra,” Concurrency and Computation: Practice and Experience, vol. 40319, issue 262, January 2019. DOI: 10.1002/cpe.5418
Masliah, I., A. Abdelfattah, A. Haidar, S. Tomov, M. Baboulin, J. Falcou, and J. Dongarra, “Algorithms and Optimization Techniques for High-Performance Matrix-Matrix Multiplications of Very Small Matrices,” Parallel Computing, vol. 81, pp. 1â21, January 2019. DOI: 10.1016/j.parco.2018.10.003 (3.27 MB)
Tomov, S., A. Haidar, A. Ayala, D. Schultz, and J. Dongarra, FFT-ECP Fast Fourier Transform , Houston, TX, 2019 ECP Annual Meeting (Research Poster), January 2019. (1.51 MB)
Ng, L., S. Chen, A. Gessinger, D. Nichols, S. Cheng, A. Meenasorna, K. Wong, S. Tomov, A. Haidar, E. D'Azevedo, et al., MagmaDNN 0.2 High-Performance Data Analytics for Manycore GPUs and CPUs : University of Tennessee, January 2019. DOI: 10.13140/RG.2.2.14906.64961 (7.84 MB)
Anzt, H., J. Dongarra, G. Flegar, and E. S. Quintana-Orti, “Variable-Size Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioning on Graphics Processors,” Parallel Computing, vol. 81, pp. 131-146, January 2019. DOI: 10.1016/j.parco.2017.12.006 (1.9 MB)
Badia, R. M., M. Beck, F. Bodin, T. Boku, F. Cappello, A. Choudhary, C. Costa, E. Deelman, N. Ferrier, K. Fujisawa, et al., “A Collection of Presentations from the BDEC2 Workshop in Kobe, Japan,” Innovative Computing Laboratory Technical Report, no. ICL-UT-19-09: University of Tennessee, Knoxville, February 2019. (58.85 MB)
Danalis, A., H. Jagode, H. Hanumantharayappa, S. Ragate, and J. Dongarra, “Counter Inspection Toolkit: Making Sense out of Hardware Performance Events,” 11th International Workshop on Parallel Tools for High Performance Computing, Dresden, Germany, Cham, Switzerland: Springer, February 2019. DOI: 10.1007/978-3-030-11987-4_2 (216.39 KB)
Losada, N., G. Bosilca, A. Bouteiller, P. GonzÃ¡lez, and M. J. MartÃn, “Local Rollback for Resilient MPI Applications with Application-Level Checkpointing and Message Logging,” Future Generation Computer Systems, vol. 91, pp. 450-464, February 2019. DOI: 10.1016/j.future.2018.09.041 (1.16 MB)

Recent Conferences

JAN
14-18

2019 ECP Annual Meeting Houston, Texas
Alan
Anthony
Asim
Aurelien
Damien
Earl
George
Gerald
Heike
Ichitaro
Jack
Jakub
Mark
Piotr
Stan
Thomas

Alan Ayala, Anthony Danalis, Asim YarKhan, Aurelien Bouteiller, Damien Genet, Earl Carr, George Bosilca, Gerald Ragghianti, Heike Jagode, Ichitaro Yamazaki, Jack Dongarra, Jakub Kurzak, Mark Gates, Piotr Luszczek, Stanimire Tomov, Thomas Herault
JAN
29-30

The DOE ASCR Applied Mathematics PI meeting Rockville, Maryland
Ichitaro

Ichitaro Yamazaki
FEB
17-22

BDEC2 Kobe, Japan
Jack
Sam
Terry
Tracy

Jack Dongarra, Sam Crawford, Terry Moore, Tracy Rafferty
FEB
24-2

SIAM Conference on Computational Science and Engineering (CSE19) Spokane, Washington
Ahmad
Asim
Aurelien
George
Ichitaro
Jack
Jakub
Jamie
Mark
Piotr
Stan
Mike

Ahmad Abdelfattah, Asim YarKhan, Aurelien Bouteiller, George Bosilca, Ichitaro Yamazaki, Jack Dongarra, Jakub Kurzak, Jamie Finney, Mark Gates, Piotr Luszczek, Stanimire Tomov, Yaohung Tsai

Upcoming Conferences

MAR
4-8

MPI Forum Chattanooga, Tennessee
Aurelien
George

Aurelien Bouteiller, George Bosilca
MAR
10-15

GPU Hackathon Santa Fe Santa Fe, New Mexico
Piotr

Piotr Luszczek
MAR
14-15

Future Information and Communication Conference San Francisco, California
Terry

Terry Moore
MAR
18-21

GPU Technology Conference (GTC 2019) San Jose, California
Ahmad
Stan

Ahmad Abdelfattah, Stanimire Tomov
MAR
25-26

7th ADAC Workshop Oak Ridge, Tennessee
Stan

Stanimire Tomov

Recent Lunch Talks

JAN
4
Jack Dongarra
ICL in the New Year
JAN
11
Joseph Schuchart
High Performance Computing Center, Stuttgart (HLRS)
MPI-3 RMA in Practice: Interesting Issues and where to Find Them
JAN
25
Dong Zhong
Runtime Level Failure Detection and Propagation in HPC Systems PDF
FEB
1
Yu Pei
Preliminary Results: Integrating Future as Async Structure into PaRSEC PDF
FEB
8
Aurelien Bouteiller
Extensions of ULFM for Easier Application Design PDF
FEB
15
Yves Robert
ENS-Lyon
Scheduling Independent Stochastic Tasks under Deadline and Budget Constraints PDF
FEB
22
Stanimire Tomov
A Novel Approach to Using Fused Batched BLAS to Accelerate High-Order Discretizations PDF

Upcoming Lunch Talks

MAR
8
Stephen Herbein
Lawrence Livermore National Laboratory
Flux: Using Next-Generation Resource Management and Scheduling Infrastructure for Exascale Workflows PDF
MAR
15
Danny Rorabaugh
Global Computing Laboratory
A Workflow for Soil Moisture Analytics PDF
MAR
22
Jamie Finney
DevOps: Tools and Practices PDF
MAR
26
Torsten Hoefler
ETH Zürich
High-Performance Communication in Machine Learning PDF
MAR
29
Qinglei Cao
Understanding the Performance of PaRSEC in the "Task Bench" Benchmark

People

Xiaoyang Wang recently joined the Linear Algebra Group as a Graduate Research Assistant. Welcome!
Valentin Le Fevre, a PhD student working with Yves Robert, will be visiting ICL in February.
In January, Nuria Losada joined the DisCo group as a Post-Doctoral Research Associate. Welcome aboard, Nuria!
Frequent ICL visitor and collaborator Yves Robert is visiting ICL in February and will give a lunch talk. Welcome back, Yves!

congratulations

Mr. & Mrs. Mike Tsai

On December 15, 2018, ICL’s Yaohung “Mike” Tsai married Yingchia Chen in a ceremony in Taiwan. Congratulations to Mr. and Mrs. Mike Tsai!

February 2019