News and Announcements

HPCG Benchmark

TOP500 (LINPACK) and HPCG rankings of the fastest supercomputers of 2017. Image courtesy of Sandia National Laboratories.

The final 2017 results for the HPC Preconditioned Conjugate Gradient (HPCG) benchmark were released on November 14, 2017 at SC17 in Denver, Colorado. A joint effort between ICL and Sandia National Laboratories, HPCG is designed to measure performance that is representative of modern HPC capability by simulating compute and communication patterns from sparse iterative solvers commonly found in science and engineering applications.

HPCG results are now released alongside the TOP500 rankings to show how real-world applications might fare on a given machine. In the image above (courtesy of Sandia National Laboratories), you can see how the HPCG benchmark would have ranked its top 10 machines and where those machines ranked on the LINPACK-based TOP500 list. The full list of HPCG rankings is available here.

To read Mike Heroux’s thoughts on HPCG and its increased relevance as an HPC benchmark, click on over to this article on HPCWire.

David Rogers Receives Award from Tickle College of Engineering

From left to right: Dean Wayne Davis, Kathy Williams, David Rogers, Ashly Perason, Yvette Gooden, and Associate Dean Masood Parang. Photo by Erik Campos.

On April 5th, ICL’s David Rogers received a Tickle College of Engineering Outstanding Support Staff Award. ICL management nominated David for the award with what was described as an “excellent [submission] packet.”

As our graphic designer and web developer, David has been with ICL since 2000 and has been an undeniable influence on the research output of ICL and on the ICL brand itself, contributing to nearly 20 years of proposals, posters, and papers. Congratulations, David!

Conference Reports

ECP 2nd Annual Meeting

The Exascale Computing Project (ECP) is a collaborative effort of two DOE organizations—the Office of Science and the National Nuclear Security Administration—and is responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, and hardware, to support the nation’s exascale computing initiative. As many of you know, ICL competed for and won seven ECP awards in 2016, is the PI institution on four of these projects (detailed below), and is actively collaborating with the ECP community at large.

The ECP project held its 2nd annual meeting to highlight the technical accomplishments and developments that resulted from the ECP community’s interactions and collaborations over the last two years. Project participants were asked to explore ways to leverage and share requirements with other ECP efforts. Key topics included plans for future systems, the future of the software stack, and interactions with DOE computing facilities.

Distributed Tasking for Exascale (DTE)

The DTE project, as part of the ECP effort, extends the capabilities of ICL’s Parallel Runtime and Execution Controller (PaRSEC) project, which is a generic framework for architecture-aware scheduling and management of microtasks on distributed, many-core, heterogeneous architectures. The PaRSEC environment also provides a runtime component for dynamically executing tasks on heterogeneous distributed systems along with a productivity toolbox and development framework that supports multiple domain-specific languages and extensions and tools for debugging, trace collection, and analysis.

All of the above-mentioned features are essential to the ECP mission, and the DTE principals were on hand for a DTE PaRSEC tutorial and a breakout session focused on mixed-model programming. DTE’s latest innovations were also illustrated and on display in the aptly named “poster session.”

Exascale Performance Application Programming Interface (Exa-PAPI)

Exa-PAPI builds on ICL’s Performance Application Programming Interface (PAPI) project and extends it with performance counter monitoring capabilities for new and advanced ECP hardware and software technologies. PAPI provides a consistent interface and methodology for collecting performance counter information from various hardware and software components, including most major CPUs, GPUs and accelerators, interconnects, I/O systems, and power interfaces, as well as virtual cloud environments.

Exa-PAPI adds performance counter monitoring capabilities for new and advanced ECP hardware and software technologies, fine-grained power management support, and integration capabilities for exascale paradigms like task-based runtime systems.

ICL’s Heike Jagode participated in the breakout session, or “lightning round” as she called it—a missed opportunity to use the word “blitz” in the opinion of this editor—where she explained Exa-PAPI’s new support for software-defined events (SDEs). These SDEs extend PAPI’s role as a standardizing layer for monitoring performance counters.

Heike also said that the Exa-PAPI poster garnered significant attention during the well-attended, two-hour poster session.

Production-ready, Exascale-Enabled Krylov Solvers for Exascale Computing (PEEKS)


The PEEKS project explores the redesign of solvers and extends the DOE’s Extreme-scale Algorithms and Solver Resilience (EASIR) project. Many large-scale scientific applications rely heavily on preconditioned iterative solvers for large linear systems. For these solvers to efficiently exploit extreme-scale hardware, both the solver algorithms and the implementations must be redesigned to address challenges like extreme concurrency, complex memory hierarchies, costly data movement, and heterogeneous node architectures.

Members of the PEEKS team, including Hartwig Anzt who was in Knoxville for the meeting, presented the latest developments in the PEEKS project at the breakout session. As with the other four ICL-led ECP projects, PEEKS had their own poster describing the new features and directions of the project effort.

Software for Linear Algebra Targeting Exascale (SLATE)

For decades, ICL has applied algorithmic and technological innovations to the process of pioneering, implementing, and disseminating dense linear algebra software—including the Linear Algebra PACKage (LAPACK) and Scalable Linear Algebra PACKage (ScaLAPACK) libraries. The Software for Linear Algebra Targeting Exascale (SLATE) project, as part of the ECP effort, is working to converge and consolidate that software into a dense linear algebra library that will integrate seamlessly into the ECP ecosystem.

At the all-hands meeting, the SLATE team provided the ECP community with more details of their progress through the SLATE poster, which was also on display at the poster session.

SIAM Conference on Parallel Processing for Scientific Computing (PP18)

The Society for Industrial and Applied Mathematics (SIAM) recently hosted their 2018 Conference on Parallel Processing for Scientific Computing (PP18), which provides a forum for communication among the applied mathematics, computer science, and computational science and engineering communities, at Waseda University in Tokyo, Japan. From March 7th to March 10th, around 650 attendees gathered at the Nishi Waseda Campus for invited talks, presentations, and a poster session.

This conference series has played a key role in promoting parallel scientific computing, algorithms for parallel systems, and parallel numerical algorithms and is unique in its emphasis on the intersection between high-performance scientific computing and scalable algorithms, architectures, and software. Conference organizers noted that the number of attendees in Tokyo was up significantly from the last conference in the series.

ICL had a significant presence at PP18, with several presentations and a poster. Jack Dongarra presented the latest developments on “Dense Linear Systems for Extreme Scale” and discussed the joint Parallel Numerical Linear Algebra for Extreme Scale Systems (NLAFET) effort and collaboration between ICL and the University of Manchester.

George Bosilca gave a presentation on “The Case for Resilience Support in MPI,” where he proposed a local rollback mechanism—a combination of ULFM, the CPPC application-level checkpointing tool, and OpenMPI Vprotocol pessimist message logging—for generic SPMD programs, in which only the failed processes are recovered from the last checkpoint, while consistency and further progress of the computation are enabled using message logging capabilities.

Ichitaro Yamazaki gave a presentation on “Hierarchical-Matrix BiCGStab on GPU Clusters with MAGMA Variable-Size Batched Kernel,” where he discussed ICL’s recent efforts to port a low-rank compression solver onto GPU clusters, namely Reedbush-H and Tsubame3.

Mike Tsai represented ICL at the poster session with his panel on “Pseudo-Assembly Programming for Batched Matrix Factorization,” where he demonstrated the advantage of low-level programming over CUDA by showing some preliminary results of batched matrix factorizations written in PTX for NVIDIA GPUs.

Hartwig Anzt gave a talk, “ParILUT – A New Parallel Threshold ILU,” where he presented a parallel algorithm for computing a threshold incomplete LU factorization, with the main idea being to interleave a parallel fixed-point iteration that approximates an incomplete factorization for a given sparsity pattern with a procedure that adaptively changes the pattern.

Not to be outdone, and thanks to Ichitaro Yamazaki and Azzam Haidar, Piotr Luszczek gave two presentations at SIAM. First, Piotr presented Ichitaro’s work on “Performance of S-Step and Pipelined Krylov Methods,” where he compared the performance of pipelined and s-step variants of a Krylov solver; these implementations of both s-step and pipelined methods focus on reducing the cost of global all-reduce operations needed for the orthogonalization.

Second, Piotr presented Azzam’s work on “MAGMA Batched Computations: Current Development and Trend,” where he described MAGMA Batched—a library that achieves dramatically better performance by executing small operations in “batches.”

Several ICL alum and collaborators were also in attendance, including Keita Teranishi, Hatem Ltaief, and Sven J. Hammarling. All in all, a busy—but rewarding—endeavor for the ICL crew.

Interview

Jamie Finney Then

Jamie Finney

Where are you from, originally?
I was born and raised in middle Tennessee, near Winchester.

Can you summarize your educational background?
I began at the University of Tennessee right out of high school but decided to pursue a career as an automotive technician. I attended Universal Technical Institute in Mooresville, North Carolina and then attended a BMW technician training program in Orlando, Florida. I returned to UT to complete my Bachelor’s in Computer Science, and I finished my degree in 2016.

Where did you work before joining ICL?
I worked first as an intern, then as an Embedded Software Engineer at Emerson Automation Solutions, which produces vibration detection and analysis software and hardware. Most recently, I worked at a Farragut-based company, Perceptics. Perceptics makes licence plate reader systems used on the Canadian and Mexican borders by the US border crossing agents and by electronic toll collectors on toll roads.

How did you first hear about the lab, and what made you want to work here?
As a UT student, I had heard of the various research groups in the EECS department, but even before that I had seen articles referencing the TOP500 list of the fastest supercomputers and the LINPACK benchmark used for the list. As to what made me want to work here, I can’t imagine why anyone with even a passing interest in computers wouldn’t want to be a part of a group that routinely works with the some of the most powerful computing resources in the world.

What is your focus here at ICL? What are you working on?
Currently, I am part of the SLATE project which focuses on creating a new linear algebra library for the Exascale Computing Project. I have been working on testing routines for LAPACK++ and BLAS++, the C++ versions of the canonical linear algebra libraries, and—most recently—on implementing a CMake build system for SLATE.

What are your interests/hobbies outside of work?
I enjoy spending most of my free time with my wife of fifteen years, Alicia, and with and our two children. I also play trumpet with the Knoxville Community Band, which plays several free concerts throughout the year.

Tell us something about yourself that might surprise people.
After my partner and I won second place in a diagnostic competition at our automotive school, we used rock-paper-scissors to see who would be the first to select a prize. The most valuable remaining prize was a $3,500 scholarship. I lost and chose an air-powered ratchet.

If you weren’t working at ICL, where would you like to be working and why?
I would likely be still working at my previous employer, Perceptics.

Recent Papers

  1. Haidar, A., A. Abdelfattah, S. Tomov, and J. Dongarra, Harnessing GPU's Tensor Cores Fast FP16 Arithmetic to Speedup Mixed-Precision Iterative Refinement Solvers and Achieve 74 Gflops/Watt on Nvidia V100 , San Jose, CA, GPU Technology Conference (GTC), Poster, March 2018.  (2.96 MB)
  2. Anzt, H., M. Kreutzer, E. Ponce, G. D. Peterson, G. Wellein, and J. Dongarra, Optimization and Performance Evaluation of the IDR Iterative Krylov Solver on GPUs,” The International Journal of High Performance Computing Applications, vol. 32, no. 2, pp. 220–230, March 2018. DOI: 10.1177/1094342016646844  (2.08 MB)
  3. Hoemmen, M., and I. Yamazaki, Production Implementations of Pipelined & Communication-Avoiding Iterative Linear Solvers , Tokyo, Japan, SIAM Conference on Parallel Processing for Scientific Computing, March 2018.  (2.34 MB)
  4. Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, Tensor Contractions using Optimized Batch GEMM Routines , San Jose, CA, GPU Technology Conference (GTC), Poster, March 2018.  (1.64 MB)
  5. Marques, O., J. Demmel, and P. B. Vasconcelos, Bidiagonal SVD Computation via an Associated Tridiagonal Eigenproblem,” LAPACK Working Note, no. LAWN 295, ICL-UT-18-02: University of Tennessee, April 2018.  (1.53 MB)
  6. Haidar, A., H. Jagode, P. Vaccaro, A. YarKhan, S. Tomov, and J. Dongarra, Investigating Power Capping toward Energy-Efficient Scientific Applications,” Concurrency Computation: Practice and Experience, vol. 2018, issue e4485, pp. 1-14, April 2018. DOI: 10.1002/cpe.4485  (1.2 MB)
  7. Haidar, A., S. Tomov, A. Abdelfattah, I. Yamazaki, and J. Dongarra, MAtrix, TEnsor, and Deep-learning Optimized Routines (MATEDOR) , Washington, DC, NSF PI Meeting, Poster, April 2018. DOI: 10.6084/m9.figshare.6174143.v3  (2.4 MB)
  8. Danalis, A., H. Jagode, and J. Dongarra, PAPI: Counting outside the Box , Barcelona, Spain, 8th JLESC Meeting, April 2018.
  9. Kurzak, J., M. Gates, A. YarKhan, I. Yamazaki, P. Wu, P. Luszczek, J. Finney, and J. Dongarra, Parallel BLAS Performance Report,” SLATE Working Notes, no. 05, ICL-UT-18-01: University of Tennessee, April 2018.  (4.39 MB)

Recent Conferences

  1. MAR
    -
    George Bosilca
    George
    Ichitaro Yamazaki
    Ichitaro
    Jack Dongarra
    Jack
    Piotr Luszczek
    Piotr
    George Bosilca, Ichitaro Yamazaki, Jack Dongarra, Piotr Luszczek
  2. MAR
    -
    Damien Genet
    Damien
    George Bosilca
    George
    Thomas Herault
    Thomas
    Damien Genet, George Bosilca, Thomas Herault
  3. MAR
    -
    Dong Zhong
    Dong
    George Bosilca
    George
    Thananon Patinyasakdikul
    Arm
    Dong Zhong, George Bosilca, Thananon Patinyasakdikul
  4. MAR
    -
    BDEC Workshop Chicago, Illinois
    Jack Dongarra
    Jack
    Terry Moore
    Terry
    Tracy Rafferty
    Tracy
    Jack Dongarra, Terry Moore, Tracy Rafferty
  5. MAR
    -
    GTC 2018 San Jose, California
    Ahmad Abdelfattah
    Ahmad
    Azzam Haidar
    Azzam
    Ichitaro Yamazaki
    Ichitaro
    Stanimire Tomov
    Stan
    Ahmad Abdelfattah, Azzam Haidar, Ichitaro Yamazaki, Stanimire Tomov
  6. APR
    Piotr Luszczek
    Piotr
    Piotr Luszczek
  7. APR
    -
    Piotr Luszczek
    Piotr
    Piotr Luszczek
  8. APR
    -
    JLESC Workshop Barcelona, Spain
    Anthony Danalis
    Anthony
    George Bosilca
    George
    Jack Dongarra
    Jack
    Thomas Herault
    Thomas
    Anthony Danalis, George Bosilca, Jack Dongarra, Thomas Herault
  9. APR
    -
    2018 NSF SI2 PI Meeting Washington, District of Columbia
    Azzam Haidar
    Azzam
    George Bosilca
    George
    Heike Jagode
    Heike
    Piotr Luszczek
    Piotr
    Azzam Haidar, George Bosilca, Heike Jagode, Piotr Luszczek

Upcoming Conferences

  1. MAY
    -
    IPDPS 2018 Vancouver, Canada
    Ichitaro Yamazaki
    Ichitaro
    Jakub Kurzak
    Jakub
    Ichitaro Yamazaki, Jakub Kurzak
  2. MAY
    Stanimire Tomov
    Stan
    Stanimire Tomov

Recent Lunch Talks

  1. MAR
    2
    Anthony Danalis
    Anthony Danalis
    Low-Level Benchmarking: A Dive down the Rabbit Hole
  2. MAR
    9
    Thananon Patinyasakdikul
    Thananon Patinyasakdikul
    Injection Rate in Multithreaded MPI
  3. APR
    6
    Scott Emrich
    Scott Emrich
    EECS
    High Throughput Genome Analysis using Makeflow and Friends PDF
  4. APR
    13
    Jiali Li
    Jiali Li
    PCP Component in PAPI PDF
  5. APR
    20
    Hartwig Anzt
    Hartwig Anzt
    Karlsruhe Institute of Technology
    Variable-Size Batched Condition Number Calculation on GPUs PDF
  6. APR
    27
    Yves Robert
    Yves Robert
    ENS-Lyon
    A Performance Model to Execute Workflows on High-Bandwidth Memory Architectures PDF

Upcoming Lunch Talks

  1. MAY
    4
    Ahmad Abdelfattah
    Ahmad Abdelfattah
    MAGMA Update: High Performance and Energy Efficient LU Factorization PDF
  2. MAY
    11
    Ana Gainaru
    Ana Gainaru
    Vanderbilt University
    Scheduling Solutions for Data-Driven Large-Scale Applications PDF
  3. MAY
    18
    Pratik Nayak
    Pratik Nayak
    Karlsruhe Institute of Technology
    Using Iterative Methods for Local Solves in Asynchronous Schwarz Methods
  4. MAY
    25
    Yaohung Tsai
    Yaohung Tsai
    Pseudo-Assembly Programming on NVIDIA GPU PDF

Visitors

  1. Pratik Nayak
    Pratik Nayak from Karlsruhe Institute of Technology (KIT) will be visiting from April 2 through April 28.

People

  1. Hartwig Anzt
    Hartwig Anzt is working on site at ICL from April 2nd to April 29th. Welcome back, Hartwig!
  2. Tony Castaldo
    Tony Castaldo joined ICL at the beginning of April to work with the Performance ICL (PICL) team. Welcome aboard, Tony!
  3. John Batson
    John Batson joined ICL in March as a technical editing intern. John is working with the Technical Services Group.

Visitors

  1. Pratik Nayak
    Pratik Nayak from Karlsruhe Institute of Technology (KIT) will be visiting from April 2 through April 28.

congratulations

Tenured Professor Vince Weaver

ICL alumnus Vince Weaver was granted tenure in the the University of Maine’s Electrical and Computer Engineering department. Congratulations Vince!

Dates to Remember

ICL Retreat

The 2018 ICL retreat has been set for August 20–21. Location to be determined. Mark your calendars!