News and Announcements
TOP500 – June 2018
In June 2018, the 51st TOP500 list was announced at the ISC-HPC conference in Frankfurt, Germany. The United States is back on top with Oak Ridge National Laboratory’s Summit machine. Summit achieved 122.3 petaFLOP/s on the HPL benchmark, besting China’s Sunway TaihuLight system.
Summit has 4,356 nodes, each one equipped with two 22-core IBM Power9 CPUs and six NVIDIA Tesla V100 GPUs. The nodes are linked with a Mellanox dual-rail EDR InfiniBand network. For a deeper dive into the Summit machine, please see the June 2018 issue of the ICL newsletter.
Summit wasn’t the only new addition near the top of the list, as the new AI Bridging Cloud Infrastructure (ABCI) machine is now No. 5, with an HPL score of 19.9 petaFLOP/s. The Fujitsu-built supercomputer, powered by 20-core Xeon Gold CPUs along with NVIDIA Tesla V100 GPUs, is up and running at Japan’s National Institute of Advanced Industrial Science and Technology (AIST) and pushed Piz Daint to the No. 6 spot.
| Rank | System | Cores | Rmax (TFLOP/s) | Rpeak (TFLOP/s) | Power (kW) |
|---|---|---|---|---|---|
| 1 | Summit – IBM POWER9, NVIDIA Volta GV100, Dual-rail Mellanox EDR Infiniband, DOE/SC/Oak Ridge National Laboratory, United States |
2,282,544 | 122,300.0 | 187,659.3 | 8,806 |
| 2 | Sunway TaihuLight – Sunway SW26010 260C 1.45GHz, Sunway, NRCPC National Supercomputing Center in Wuxi, China |
10,649,600 | 93,014.6 | 125,435.9 | 15,371 |
| 3 | Sierra – IBM POWER9, NVIDIA Volta GV100, Dual-rail Mellanox EDR Infiniband, DOE/NNSA/LLNL, United States |
1,572,480 | 71,610.0 | 119,193.6 | |
| 4 | Tianhe-2A – TH-IVB-FEP Cluster, Intel Xeon E5-2692v2 12C 2.2GHz, TH Express-2, Matrix-2000, NUDT National Super Computer Center in Guangzhou, China |
4,981,760 | 61,444.5 | 100,678.7 | 18,482 |
| 5 | AI Bridging Cloud Infrastructure (ABCI) – PRIMERGY CX2550 M4, Xeon Gold 6148 20C 2.4GHz, NVIDIA Tesla V100, SXM2, Infiniband EDR, Fujitsu, National Institute of Advanced Industrial Science and Technology (AIST), Japan |
391,680 | 19,880.0 | 32,576.6 | 1,649 |
Happy Birthday, Jack!

Notably, this is the first time Jack has been at ICL on his birthday. “Lesson learned.” Happy birthday, Jack!
HPCG Results – June 2018
The latest results for the HPC Preconditioned Conjugate Gradient (HPCG) benchmark were released on June 25, 2018 at ISC-HPC in Frankfurt, Germany. A joint effort between ICL and Sandia National Laboratories, HPCG is designed to measure performance that is representative of modern HPC capability by simulating compute and communication patterns from sparse iterative solvers commonly found in science and engineering applications.
HPCG results are released twice per year alongside the TOP500 rankings to show how real-world applications might fare on a given machine. Like in the TOP500, Summit has claimed the No. 1 spot. The full list of HPCG rankings is available here.
| Rank | Computer | HPL (PFLOP/s) | TOP500 Rank | HPCG (PFLOP/s) | %Peak |
|---|---|---|---|---|---|
| 1 | Summit – IBM, POWER9, NVIDIA Volta V100
DOE/SC/ORNL, USA |
122.3 | 1 | 2.926 | 1.50% |
| 2 | Sierra – IBM, Power9, NVIDIA Tesla V100
DOE/NNSA/LLNL, USA |
71.61 | 3 | 1.796 | 1.50% |
| 3 | K Computer – Fujitsu, SPARC64
RIKEN/AIST, Japan |
10.51 | 16 | 0.603 | 5.30% |
| 4 | Trinity – Cray XC40, Intel Xeon E5-2698 v3
DOE/NNSA/LANL/SNL, USA |
14.137 | 9 | 0.546 | 1.80% |
| 5 | Piz Daint – Cray XC50, Intel Xeon E5-2690 v3, NVIDIA Tesla P100
CSCS, Switzerland |
19.59 | 6 | 0.486 | 1.90% |
Employment Opportunities at ICL

ICL is seeking full-time scientists (MS or PhD) or postdoctoral researchers to participate in the design, development, and maintenance of numerical software libraries for solving linear algebra problems on large, distributed-memory machines with multi-core processors, hardware accelerators, and performance monitoring capabilities for new and advanced hardware and software technologies. The prospective researcher will coauthor papers to document research findings, present the team’s work at conferences and workshops, and help lead students and other team members in their research endeavors in ongoing and future projects. Given the nature of the work, there will be opportunities for publication, travel, and high-profile professional networking and collaboration across academia, labs, and industry.
An MS or PhD in computer science, computational sciences, or math is preferred. Background in at least one of the following areas is also preferred: numerical linear algebra, HPC, performance monitoring, machine learning, or data analytics. Full-time employment for up to 4 years with the possibility of further extensions based on funding availability and performance.
Joining this team will offer qualified candidates exciting career opportunities as they participate in the US Department of Energy’s Exascale Computing Project (ECP). ICL is involved in several ECP projects, including SLATE (http://icl.utk.edu/slate/), PEEKS (http://icl.utk.edu/peeks/), xSDK (http://www.icl.utk.edu/research/xsdk4ecp), Exa-PAPI (http://icl.utk.edu/exa-papi/), CEED (https://ceed.exascaleproject.org/), Distributed Tasking for Exascale (PaRSEC) (http://icl.utk.edu/dte/), MAGMA (http://icl.cs.utk.edu/magma/), FFT-ECP, and others.
Starting date is July 1, 2018 or later. All qualified candidates, be it fresh (MS or PhD) graduates or seasoned HPC veterans, are encouraged to apply.
For more information, contact Jack Dongarra (dongarra@icl.utk.edu) or check out ICL’s jobs page: http://www.icl.utk.edu/jobs.
Washington Post Op-Ed
In June, the Washington Post’s Mark Lasswell asked Jack Dongarra to write an op-ed on all things HPC, including the new Summit machine, the impact of HPC on our daily lives, and on the future of HPC as a whole.
Click here to read the article on the Washington Post website.
Conference Reports
ISC High Performance
The 2018 ISC High Performance Computing conference (ISC-HPC) kicked off on June 17th in Frankfurt, Germany. ICL’s Jack Dongarra and Piotr Luszczek both attended the conference, accompanied by approximately 3,500 other attendees and contributors, including ICL collaborators Sven Hammarling, Felix Wolf, Pedro Valero Lara, and Zounon Mawussi.
Jack co-presented the TOP500 awards, unveiling Oak Ridge National Laboratory’s Summit as the No. 1 machine on the list. Jack also led a focus session on the NLAFET library, where he described recent developments in task-based algorithms for the solution of dense linear systems and the solution of both symmetric and unsymmetric dense eigenproblems.
Jack, Sven, Piotr, Mawussi, and Pedro also hosted a Birds of a Feather (BoF) session on Batched BLAS standardization. The BoF aimed to continue the efforts to propose and refine an objective batched BLAS standard—that does not incur a severe performance penalty for any given architecture—by analyzing the benefits and drawbacks of existing batched BLAS interfaces.
Piotr also presented a poster for ICL’s Benchtesting OpeN Software Autotuning Infrastructure (BONSAI) project, which aims to develop a software infrastructure for using parallel hybrid systems at any scale to carry out large, concurrent autotuning sweeps to dramatically accelerate the optimization process of computational kernels for GPU accelerators and many-core coprocessors.
Mawussi and Piotr, on behalf of Azzam Haidar and coauthors, presented a poster on low-precision arithmetic on GPUs, “Using GPU’s FP16 Tensor Cores Arithmetic to Accelerate Mixed-Precision Iterative Refinement Solvers and Reduce Energy Consumption,” which went on to win its category’s Best Poster Award. Congratulations to all involved!
Finally, Piotr gave the keynote talk for the Approximate and Transprecision Computing on Emerging Technologies (ATCET) workshop. ATCET focuses on applications ranging from big data analytics to deep learning and classical scientific computing/simulation and aims to investigate the theoretical and practical understanding of the energy efficiency boost obtainable when accuracy requirements on data being processed, stored, and communicated can be lifted for intermediate calculations.
The editor would like to thank Piotr Luszczek for his contribution to this article.
GPU Hackathon
On June 4–8, ICL’s Piotr Luszczek was in Boulder, CO for this year’s aptly named Boulder GPU Hackathon. The Hackathon consisted of around 50 people split into teams of developers who were carefully paired with mentors (i.e., Piotr and ICL alum Kenneth Roche) who specialize high-performance computing, appropriate software languages, and GPU programming APIs.
Over five days, each team worked through a coding sprint with daily stand-ups. Daily stand-ups promote cross-team collaboration, enhance knowledge sharing, and ensure quick roadblock resolution.
More Hackathons for Sante Fe, NM and Brookhaven/Upton, NY are already planned. See the GPU Hackathon website for more details.
The editor would like to thank Piotr Luszczek for his contributions to this article.
Recent Releases
MAGMA 2.4.0
MAGMA 2.4.0 is now available. Matrix Algebra on GPU and Multicore Architectures (MAGMA) is a collection of next-generation linear algebra (LA) libraries for heterogeneous architectures. The MAGMA package supports interfaces for current LA packages and standards (e.g., LAPACK and BLAS) to allow computational scientists to easily port any LA-reliant software components to heterogeneous architectures.
MAGMA 2.4.0 features LAPACK-compliant routines for multi-core CPUs enhanced with NVIDIA GPUs (including the Volta V100). MAGMA now includes more than 400 routines, covering one-sided dense matrix factorizations and solvers, and two-sided factorizations and eigen/singular-value problem solvers, as well as a subset of highly optimized BLAS for GPUs.
Other updates and features in MAGMA 2.4.0:
- Added constrained least squares routines (
magma_[sdcz]gglse) and dependencies formagma_zggrqf– generalized RQ factorization;magma_zunmrq– multiply by orthogonal Q as returned by zgerqf. - Added performance improvements across many batch routines, including batched TRSM, batched LU, batched LU-nopiv, and batched Cholesky.
- Fixed some compilation issues with
inf,nan, andnullptr.
Additional MAGMA-sparse features:
- Changed how data from an external application is handled;
there is now a clear distinction between memory allocated/used/freed from MAGMA and the user application. - Added the functions
magma_zvcopyandmagma_zvpass, which do not allocate memory; instead, they copy values from/to application-allocated memory. - The examples (in example/example_sparse.c) provide a demonstration of how these routines should be used.
Click here to download the tarball.
Interview

Daniel Barry
Where are you from, originally?
I’m from Karns, a town in West Knoxville, Tennessee.
Can you summarize your educational background?
I earned my Bachelors of Science in Computer Engineering and Mathematics at the University of Tennessee, Knoxville. I graduated in the spring of 2018. I started with the Computer Engineering major, but I decided to double major with Mathematics because I feel that math is a useful tool set in engineering.
How did you first hear about ICL, and what made you want to work here?
I first heard about ICL during the 2014 Student Cluster Competition. I was a member of the University of Tennessee, Knoxville team. As part of the competition, teams must compile and run the HPL software on their cluster. Since ICL develops HPL, we naturally heard of ICL.
What is your focus here? What are you working on?
I am helping to develop a toolkit, the central aim of which is to define an accurate mapping between particular high-level concepts of performance metrics, such as L1 data cache misses, and their underlying, corresponding low-level hardware events.
What would you consider the most valuable “lesson” you have learned so far at ICL?
I have learned that consistent communication with the other students and research scientists of ICL is greatly advantageous. My mentors have introduced me to great opportunities in which to get involved, such as serving as a student volunteer at the annual Supercomputing Conference. One of my fellow student research assistants helped me sign up for a machine learning course that I needed for my core curriculum. I also found out about the Interdisciplinary Graduate Minor in Computational Science (IGMCS) from talking to my research student peers. Additionally, a few fellow ICLers regularly invite me to lunch, which was a great introduction for the new guy! Staying involved at ICL is highly beneficial.
What are your interests/hobbies outside of work?
I like to spend time outside by walking, hiking, caving, and swimming. Indoors, I like to program, read about news in technology and computer science, and play video games. I thoroughly enjoy eating; as such, I enjoy cooking, too.
Tell us something about yourself that might surprise people.
I can speak Mandarin Chinese.
If you weren’t working at ICL, where would you like to be working and why?
This is a tough question. Since my freshman year of undergraduate, I have wanted to work at ICL. If I was not working here, then I certainly would like to be working here. I have a huge appreciation for research in computer science and engineering, so if I was to work elsewhere, I might pursue a research assistantship with a some other research group in the Department of Electrical Engineering and Computer Science at UTK.




























