ICL Newsletter

News and Announcements

HPCG Benchmark

TOP500 (LINPACK) and HPCG rankings of the fastest supercomputers of 2017. Image courtesy of Sandia National Laboratories.

The final 2017 results for the HPC Preconditioned Conjugate Gradient (HPCG) benchmark were released on November 14, 2017 at SC17 in Denver, Colorado. A joint effort between ICL and Sandia National Laboratories, HPCG is designed to measure performance that is representative of modern HPC capability by simulating compute and communication patterns from sparse iterative solvers commonly found in science and engineering applications.

HPCG results are now released alongside the TOP500 rankings to show how real-world applications might fare on a given machine. In the image above (courtesy of Sandia National Laboratories), you can see how the HPCG benchmark would have ranked its top 10 machines and where those machines ranked on the LINPACK-based TOP500 list. The full list of HPCG rankings is available here.

To read Mike Heroux’s thoughts on HPCG and its increased relevance as an HPC benchmark, click on over to this article on HPCWire.

David Rogers Receives Award from Tickle College of Engineering

From left to right: Dean Wayne Davis, Kathy Williams, David Rogers, Ashly Perason, Yvette Gooden, and Associate Dean Masood Parang. Photo by Erik Campos.

On April 5th, ICL’s David Rogers received a Tickle College of Engineering Outstanding Support Staff Award. ICL management nominated David for the award with what was described as an “excellent [submission] packet.”

As our graphic designer and web developer, David has been with ICL since 2000 and has been an undeniable influence on the research output of ICL and on the ICL brand itself, contributing to nearly 20 years of proposals, posters, and papers. Congratulations, David!

Conference Reports

ECP 2nd Annual Meeting

Play Prev|Next

The Exascale Computing Project (ECP) is a collaborative effort of two DOE organizations—the Office of Science and the National Nuclear Security Administration—and is responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, and hardware, to support the nation’s exascale computing initiative. As many of you know, ICL competed for and won seven ECP awards in 2016, is the PI institution on four of these projects (detailed below), and is actively collaborating with the ECP community at large.

The ECP project held its 2nd annual meeting to highlight the technical accomplishments and developments that resulted from the ECP community’s interactions and collaborations over the last two years. Project participants were asked to explore ways to leverage and share requirements with other ECP efforts. Key topics included plans for future systems, the future of the software stack, and interactions with DOE computing facilities.

Distributed Tasking for Exascale (DTE)

The DTE project, as part of the ECP effort, extends the capabilities of ICL’s Parallel Runtime and Execution Controller (PaRSEC) project, which is a generic framework for architecture-aware scheduling and management of microtasks on distributed, many-core, heterogeneous architectures. The PaRSEC environment also provides a runtime component for dynamically executing tasks on heterogeneous distributed systems along with a productivity toolbox and development framework that supports multiple domain-specific languages and extensions and tools for debugging, trace collection, and analysis.

All of the above-mentioned features are essential to the ECP mission, and the DTE principals were on hand for a DTE PaRSEC tutorial and a breakout session focused on mixed-model programming. DTE’s latest innovations were also illustrated and on display in the aptly named “poster session.”

Exascale Performance Application Programming Interface (Exa-PAPI)

Exa-PAPI builds on ICL’s Performance Application Programming Interface (PAPI) project and extends it with performance counter monitoring capabilities for new and advanced ECP hardware and software technologies. PAPI provides a consistent interface and methodology for collecting performance counter information from various hardware and software components, including most major CPUs, GPUs and accelerators, interconnects, I/O systems, and power interfaces, as well as virtual cloud environments.

Exa-PAPI adds performance counter monitoring capabilities for new and advanced ECP hardware and software technologies, fine-grained power management support, and integration capabilities for exascale paradigms like task-based runtime systems.

ICL’s Heike Jagode participated in the breakout session, or “lightning round” as she called it—a missed opportunity to use the word “blitz” in the opinion of this editor—where she explained Exa-PAPI’s new support for software-defined events (SDEs). These SDEs extend PAPI’s role as a standardizing layer for monitoring performance counters.

Heike also said that the Exa-PAPI poster garnered significant attention during the well-attended, two-hour poster session.

Production-ready, Exascale-Enabled Krylov Solvers for Exascale Computing (PEEKS)

The PEEKS project explores the redesign of solvers and extends the DOE’s Extreme-scale Algorithms and Solver Resilience (EASIR) project. Many large-scale scientific applications rely heavily on preconditioned iterative solvers for large linear systems. For these solvers to efficiently exploit extreme-scale hardware, both the solver algorithms and the implementations must be redesigned to address challenges like extreme concurrency, complex memory hierarchies, costly data movement, and heterogeneous node architectures.

Members of the PEEKS team, including Hartwig Anzt who was in Knoxville for the meeting, presented the latest developments in the PEEKS project at the breakout session. As with the other four ICL-led ECP projects, PEEKS had their own poster describing the new features and directions of the project effort.

Software for Linear Algebra Targeting Exascale (SLATE)

For decades, ICL has applied algorithmic and technological innovations to the process of pioneering, implementing, and disseminating dense linear algebra software—including the Linear Algebra PACKage (LAPACK) and Scalable Linear Algebra PACKage (ScaLAPACK) libraries. The Software for Linear Algebra Targeting Exascale (SLATE) project, as part of the ECP effort, is working to converge and consolidate that software into a dense linear algebra library that will integrate seamlessly into the ECP ecosystem.

At the all-hands meeting, the SLATE team provided the ECP community with more details of their progress through the SLATE poster, which was also on display at the poster session.

SIAM Conference on Parallel Processing for Scientific Computing (PP18)

The Society for Industrial and Applied Mathematics (SIAM) recently hosted their 2018 Conference on Parallel Processing for Scientific Computing (PP18), which provides a forum for communication among the applied mathematics, computer science, and computational science and engineering communities, at Waseda University in Tokyo, Japan. From March 7th to March 10th, around 650 attendees gathered at the Nishi Waseda Campus for invited talks, presentations, and a poster session.

This conference series has played a key role in promoting parallel scientific computing, algorithms for parallel systems, and parallel numerical algorithms and is unique in its emphasis on the intersection between high-performance scientific computing and scalable algorithms, architectures, and software. Conference organizers noted that the number of attendees in Tokyo was up significantly from the last conference in the series.

ICL had a significant presence at PP18, with several presentations and a poster. Jack Dongarra presented the latest developments on “Dense Linear Systems for Extreme Scale” and discussed the joint Parallel Numerical Linear Algebra for Extreme Scale Systems (NLAFET) effort and collaboration between ICL and the University of Manchester.

George Bosilca gave a presentation on “The Case for Resilience Support in MPI,” where he proposed a local rollback mechanism—a combination of ULFM, the CPPC application-level checkpointing tool, and OpenMPI Vprotocol pessimist message logging—for generic SPMD programs, in which only the failed processes are recovered from the last checkpoint, while consistency and further progress of the computation are enabled using message logging capabilities.

Ichitaro Yamazaki gave a presentation on “Hierarchical-Matrix BiCGStab on GPU Clusters with MAGMA Variable-Size Batched Kernel,” where he discussed ICL’s recent efforts to port a low-rank compression solver onto GPU clusters, namely Reedbush-H and Tsubame3.

Mike Tsai represented ICL at the poster session with his panel on “Pseudo-Assembly Programming for Batched Matrix Factorization,” where he demonstrated the advantage of low-level programming over CUDA by showing some preliminary results of batched matrix factorizations written in PTX for NVIDIA GPUs.

Hartwig Anzt gave a talk, “ParILUT – A New Parallel Threshold ILU,” where he presented a parallel algorithm for computing a threshold incomplete LU factorization, with the main idea being to interleave a parallel fixed-point iteration that approximates an incomplete factorization for a given sparsity pattern with a procedure that adaptively changes the pattern.

Not to be outdone, and thanks to Ichitaro Yamazaki and Azzam Haidar, Piotr Luszczek gave two presentations at SIAM. First, Piotr presented Ichitaro’s work on “Performance of S-Step and Pipelined Krylov Methods,” where he compared the performance of pipelined and s-step variants of a Krylov solver; these implementations of both s-step and pipelined methods focus on reducing the cost of global all-reduce operations needed for the orthogonalization.

Second, Piotr presented Azzam’s work on “MAGMA Batched Computations: Current Development and Trend,” where he described MAGMA Batched—a library that achieves dramatically better performance by executing small operations in “batches.”

Several ICL alum and collaborators were also in attendance, including Keita Teranishi, Hatem Ltaief, and Sven J. Hammarling. All in all, a busy—but rewarding—endeavor for the ICL crew.

Interview

Where are you from, originally?
I was born and raised in middle Tennessee, near Winchester.

Can you summarize your educational background?
I began at the University of Tennessee right out of high school but decided to pursue a career as an automotive technician. I attended Universal Technical Institute in Mooresville, North Carolina and then attended a BMW technician training program in Orlando, Florida. I returned to UT to complete my Bachelor’s in Computer Science, and I finished my degree in 2016.

Where did you work before joining ICL?
I worked first as an intern, then as an Embedded Software Engineer at Emerson Automation Solutions, which produces vibration detection and analysis software and hardware. Most recently, I worked at a Farragut-based company, Perceptics. Perceptics makes licence plate reader systems used on the Canadian and Mexican borders by the US border crossing agents and by electronic toll collectors on toll roads.

How did you first hear about the lab, and what made you want to work here?
As a UT student, I had heard of the various research groups in the EECS department, but even before that I had seen articles referencing the TOP500 list of the fastest supercomputers and the LINPACK benchmark used for the list. As to what made me want to work here, I can’t imagine why anyone with even a passing interest in computers wouldn’t want to be a part of a group that routinely works with the some of the most powerful computing resources in the world.

What is your focus here at ICL? What are you working on?
Currently, I am part of the SLATE project which focuses on creating a new linear algebra library for the Exascale Computing Project. I have been working on testing routines for LAPACK++ and BLAS++, the C++ versions of the canonical linear algebra libraries, and—most recently—on implementing a CMake build system for SLATE.

What are your interests/hobbies outside of work?
I enjoy spending most of my free time with my wife of fifteen years, Alicia, and with and our two children. I also play trumpet with the Knoxville Community Band, which plays several free concerts throughout the year.

Tell us something about yourself that might surprise people.
After my partner and I won second place in a diagnostic competition at our automotive school, we used rock-paper-scissors to see who would be the first to select a prize. The most valuable remaining prize was a $3,500 scholarship. I lost and chose an air-powered ratchet.

If you weren’t working at ICL, where would you like to be working and why?
I would likely be still working at my previous employer, Perceptics.

Recent Papers

Haidar, A., A. Abdelfattah, S. Tomov, and J. Dongarra, Harnessing GPU's Tensor Cores Fast FP16 Arithmetic to Speedup Mixed-Precision Iterative Refinement Solvers and Achieve 74 Gflops/Watt on Nvidia V100 , San Jose, CA, GPU Technology Conference (GTC), Poster, March 2018. (2.96 MB)
Anzt, H., M. Kreutzer, E. Ponce, G. D. Peterson, G. Wellein, and J. Dongarra, “Optimization and Performance Evaluation of the IDR Iterative Krylov Solver on GPUs,” The International Journal of High Performance Computing Applications, vol. 32, no. 2, pp. 220â230, March 2018. DOI: 10.1177/1094342016646844 (2.08 MB)
Hoemmen, M., and I. Yamazaki, Production Implementations of Pipelined & Communication-Avoiding Iterative Linear Solvers , Tokyo, Japan, SIAM Conference on Parallel Processing for Scientific Computing, March 2018. (2.34 MB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, Tensor Contractions using Optimized Batch GEMM Routines , San Jose, CA, GPU Technology Conference (GTC), Poster, March 2018. (1.64 MB)
Marques, O., J. Demmel, and P. B. Vasconcelos, “Bidiagonal SVD Computation via an Associated Tridiagonal Eigenproblem,” LAPACK Working Note, no. LAWN 295, ICL-UT-18-02: University of Tennessee, April 2018. (1.53 MB)
Haidar, A., H. Jagode, P. Vaccaro, A. YarKhan, S. Tomov, and J. Dongarra, “Investigating Power Capping toward Energy-Efficient Scientific Applications,” Concurrency Computation: Practice and Experience, vol. 2018, issue e4485, pp. 1-14, April 2018. DOI: 10.1002/cpe.4485 (1.2 MB)
Haidar, A., S. Tomov, A. Abdelfattah, I. Yamazaki, and J. Dongarra, MAtrix, TEnsor, and Deep-learning Optimized Routines (MATEDOR) , Washington, DC, NSF PI Meeting, Poster, April 2018. DOI: 10.6084/m9.figshare.6174143.v3 (2.4 MB)
Danalis, A., H. Jagode, and J. Dongarra, PAPI: Counting outside the Box , Barcelona, Spain, 8th JLESC Meeting, April 2018.
Kurzak, J., M. Gates, A. YarKhan, I. Yamazaki, P. Wu, P. Luszczek, J. Finney, and J. Dongarra, “Parallel BLAS Performance Report,” SLATE Working Notes, no. 05, ICL-UT-18-01: University of Tennessee, April 2018. (4.39 MB)

Recent Conferences

MAR
5-10

SIAM Conference on Parallel Processing for Scientific Computing (PP18) Tokyo, Japan
George
Ichitaro
Jack
Piotr

George Bosilca, Ichitaro Yamazaki, Jack Dongarra, Piotr Luszczek
MAR
15-16

TESSE Meeting in Roanoke, VA Roanoke, Alaska
Damien
George
Thomas

Damien Genet, George Bosilca, Thomas Herault
MAR
20-23

Open MPI Developer Meeting Dallas, Texas
Dong
George
Arm

Dong Zhong, George Bosilca, Thananon Patinyasakdikul
MAR
25-28

BDEC Workshop Chicago, Illinois
Jack
Terry
Tracy

Jack Dongarra, Terry Moore, Tracy Rafferty
MAR
26-29

GTC 2018 San Jose, California
Ahmad
Azzam
Ichitaro
Stan

Ahmad Abdelfattah, Azzam Haidar, Ichitaro Yamazaki, Stanimire Tomov
APR
11

High-Performance Systems and Analytics for Big Data Bloomington, Indiana
Piotr

Piotr Luszczek
APR
15-18

2018 Spring Simulation Multi-Conference Baltimore, Maryland
Piotr

Piotr Luszczek
APR
16-19

JLESC Workshop Barcelona, Spain
Anthony
George
Jack
Thomas

Anthony Danalis, George Bosilca, Jack Dongarra, Thomas Herault
APR
30-1

2018 NSF SI2 PI Meeting Washington, District of Columbia
Azzam
George
Heike
Piotr

Azzam Haidar, George Bosilca, Heike Jagode, Piotr Luszczek

Upcoming Conferences

MAY
21-25

IPDPS 2018 Vancouver, Canada
Ichitaro
Jakub

Ichitaro Yamazaki, Jakub Kurzak
MAY
24

PETTT Workshop: Performance Portability Libraries for DoD Applications Washington DC, Maryland
Stan

Stanimire Tomov

Recent Lunch Talks

MAR
2
Anthony Danalis
Low-Level Benchmarking: A Dive down the Rabbit Hole
MAR
9
Thananon Patinyasakdikul
Injection Rate in Multithreaded MPI
APR
6
Scott Emrich
EECS
High Throughput Genome Analysis using Makeflow and Friends PDF
APR
13
Jiali Li
PCP Component in PAPI PDF
APR
20
Hartwig Anzt
Karlsruhe Institute of Technology
Variable-Size Batched Condition Number Calculation on GPUs PDF
APR
27
Yves Robert
ENS-Lyon
A Performance Model to Execute Workflows on High-Bandwidth Memory Architectures PDF

Upcoming Lunch Talks

MAY
4
Ahmad Abdelfattah
MAGMA Update: High Performance and Energy Efficient LU Factorization PDF
MAY
11
Ana Gainaru
Vanderbilt University
Scheduling Solutions for Data-Driven Large-Scale Applications PDF
MAY
18
Pratik Nayak
Karlsruhe Institute of Technology
Using Iterative Methods for Local Solves in Asynchronous Schwarz Methods
MAY
25
Yaohung Tsai
Pseudo-Assembly Programming on NVIDIA GPU PDF

Visitors

Pratik Nayak from Karlsruhe Institute of Technology (KIT) will be visiting from April 2 through April 28.

People

Hartwig Anzt is working on site at ICL from April 2nd to April 29th. Welcome back, Hartwig!
Tony Castaldo joined ICL at the beginning of April to work with the Performance ICL (PICL) team. Welcome aboard, Tony!
John Batson joined ICL in March as a technical editing intern. John is working with the Technical Services Group.

Visitors

Pratik Nayak from Karlsruhe Institute of Technology (KIT) will be visiting from April 2 through April 28.

congratulations

Tenured Professor Vince Weaver

ICL alumnus Vince Weaver was granted tenure in the the University of Maine’s Electrical and Computer Engineering department. Congratulations Vince!

Dates to Remember

ICL Retreat

The 2018 ICL retreat has been set for August 20–21. Location to be determined. Mark your calendars!

April 2018

News and Announcements

HPCG Benchmark

David Rogers Receives Award from Tickle College of Engineering

Conference Reports

ECP 2nd Annual Meeting

Distributed Tasking for Exascale (DTE)

Exascale Performance Application Programming Interface (Exa-PAPI)

Production-ready, Exascale-Enabled Krylov Solvers for Exascale Computing (PEEKS)

Software for Linear Algebra Targeting Exascale (SLATE)

SIAM Conference on Parallel Processing for Scientific Computing (PP18)

Interview

Jamie Finney

Recent Papers

Recent Conferences

Upcoming Conferences

Recent Lunch Talks

Upcoming Lunch Talks

Visitors

People

Visitors

congratulations

Tenured Professor Vince Weaver

Dates to Remember

ICL Retreat

Archives

PDF Editions