News and Announcements
TOP500: November 2019
The 54th TOP500 list was just unveiled at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19) in Denver, CO. The United States has kept the top-two spots with the Department of Energy’s Summit (at Oak Ridge National Laboratory) and Sierra (at Lawrence Livermore National Laboratory). In fact, the top 10 machines on the list remain unchanged since June 2019.
The most powerful supercomputer that is new to the November 2019 list is Rensselaer Polytechnic’s AiMOS, which acheived 8.045 PFLOP/s on the HPL benchmark and landed in the #24 spot. Installed at the Center for Computational Innovations, AiMOS runs Power9 CPUs with NVIDIA V100 GPUs, which is becoming a popular combination in the TOP500.
| Rank | System | Cores | Rmax (TFLOP/s) | Rpeak (TFLOP/s) | Power (kW) |
|---|---|---|---|---|---|
| 1 | Summit – IBM Power System AC922, IBM POWER9 22C 3.07GHz, NVIDIA Volta GV100, Dual-rail Mellanox EDR Infiniband, IBM DOE/SC/Oak Ridge National Laboratory United States |
2,414,592 | 148,600.0 | 200,794.9 | 10,096 |
| 2 | Sierra – IBM Power System S922LC, IBM POWER9 22C 3.1GHz, NVIDIA Volta GV100, Dual-rail Mellanox EDR Infiniband, IBM / NVIDIA / Mellanox DOE/NNSA/LLNL United States |
1,572,480 | 94,640.0 | 125,712.0 | 7,438 |
| 3 | Sunway TaihuLight – Sunway MPP, Sunway SW26010 260C 1.45GHz, Sunway, NRCPC National Supercomputing Center in Wuxi China |
10,649,600 | 93,014.6 | 125,435.9 | 15,371 |
| 4 | Tianhe-2A – TH-IVB-FEP Cluster, Intel Xeon E5-2692v2 12C 2.2GHz, TH Express-2, Matrix-2000, NUDT National Super Computer Center in Guangzhou China |
4,981,760 | 61,444.5 | 100,678.7 | 18,482 |
| 5 | Frontera – Dell C6420, Xeon Platinum 8280 28C 2.7GHz, Mellanox InfiniBand HDR, Dell EMC Texas Advanced Computing Center United States |
448,448 | 23,516.4 | 38,745.9 |
HPCG: November 2019
The latest results for the HPC Preconditioned Conjugate Gradient (HPCG) benchmark were also released at SC19. A joint effort between ICL and Sandia National Laboratories, HPCG is designed to measure performance that is representative of modern HPC capability by simulating compute and communication patterns from sparse iterative solvers commonly found in science and engineering applications.
HPCG results are released twice per year alongside the TOP500 rankings to show how real-world applications might fare on a given machine. One notable change is that the K Computer, a stalwart entry in HPCG’s top five, was decommissioned, and so the lower-ranking systems moved up a place. The full list of HPCG rankings is available here.
| Rank | Computer | HPL (PFLOP/s) | TOP500 Rank | HPCG (PFLOP/s) | %Peak |
|---|---|---|---|---|---|
| 1 | Summit – IBM, POWER9, NVIDIA Volta V100
DOE/SC/ORNL, USA |
148.6 | 1 | 2.926 | 1.5% |
| 2 | Sierra – IBM, Power9, NVIDIA Tesla V100
DOE/NNSA/LLNL, USA |
94.64 | 2 | 1.796 | 1.4% |
| 3 | Trinity – Cray XC40, Intel Xeon E5-2698 v3, Xeon Phi 7250
DOE/NNSA/LANL/SNL, USA |
20.159 | 7 | 0.546 | 1.3% |
| 4 | AI Bridging Cloud Infrastructure – PRIMERGY CX2570 M4, Xeon Gold 6148 20C 2.4GHz, NVIDIA Tesla V100
AIST, Japan |
19.880 | 8 | 0.509 | 1.6% |
| 5 | Piz Daint – Cray XC50, Xeon E5-2690v3 12C 2.6GHz, NVIDIA Tesla P100
Swiss National Supercomputing Centre, Switzerland |
21.230 | 6 | 0.497 | 1.8% |
HPL-AI
SC19 also saw the official launch of HPL-AI, a benchmark that seeks to highlight the emerging convergence of HPC and artificial intelligence (AI) workloads. While traditional HPC focuses on simulation runs for modeling phenomena in physics, chemistry, biology, and so on, the mathematical models that drive these computations require, for the most part, 64-bit accuracy. On the other hand, the machine-learning methods that fuel advances in AI can achieve the desired results at 32-bit (full precision) or even lower floating-point precision formats.
This lesser demand for accuracy fueled a resurgence of interest in new hardware platforms that deliver a mix of unprecedented performance levels and energy savings to achieve the classification and recognition fidelity afforded by higher-accuracy formats.
HPL-AI strives to unite these two realms by delivering a blend of modern algorithms and contemporary hardware while simultaneously connecting to the solver formulation of the decades-old High-Performance Linpack (HPL) framework of benchmarking the largest supercomputing installations in the world.
So far, Oak Ridge National Laboratory’s Summit is the only machine to be benchmarked with HPL-AI, and it achieved 445 PFLOP/s in mixed precision. This is nearly triple the 148 PFLOP/s that Summit achieved on the standard (double-precision) HPL benchmark used for the TOP500.
Read more about HPL-AI here: https://icl.bitbucket.io/hpl-ai/.
Employment Opportunities at ICL
ICL is seeking full-time Research Scientists (MS or PhD) to participate in the design, development, and maintenance of numerical software libraries for solving linear algebra problems on large, distributed-memory machines with multi-core processors, hardware accelerators, and performance monitoring capabilities for new and advanced hardware and software technologies.
The prospective researcher will coauthor papers to document research findings, present the team’s work at conferences and workshops, and help lead students and other team members in their research endeavors in ongoing and future projects. Given the nature of the work, there will be opportunities for publication, travel, and high-profile professional networking and collaboration across academia, labs, and industry.
An MS or PhD in computer science, computational sciences, or math is preferred. Background in at least one of the following areas is also preferred: numerical linear algebra, HPC, performance monitoring, machine learning, or data analytics.
For more information check out ICL’s jobs page: http://www.icl.utk.edu/jobs.
Conference Reports
SC19
This year’s International Conference for High Performance Computing Networking, Storage, and Analysis (SC19) was held in Denver, CO on November 17–22.
Five computational science research centers from the University of Tennessee—the Bredesen Center, the Global Computing Laboratory, the Innovative Computing Laboratory, the Joint Institute for Computational Sciences, and the SimCenter—represented the university by anchoring the University of Tennessee booth. As usual, ICL had a significant presence at SC, with faculty, research staff, and students giving talks, presenting papers, and leading “Birds of a Feather” sessions.
ICL once again ran a dedicated ICL@SC webpage, where interested parties could keep tabs on ICL-related events—including a list of attendees, detailed schedule of talks, and the latest project handouts. In addition, ICL’s Daniel Barry did a bang-up job running the ICL Twitter account (@ICL_UTK), where he provided up-to-the minute information about what was happening on the ground.
The editor would like to thank Jack Dongarra, Daniel Barry, and Fengguang Song for their contributions to this article.
Recent Releases
MAGMA 2.5.2 Released
MAGMA 2.5.2 is now available. Matrix Algebra on GPU and Multicore Architectures (MAGMA) is a collection of next-generation linear algebra (LA) libraries for heterogeneous architectures. The MAGMA package supports interfaces for current LA packages and standards (e.g., LAPACK and BLAS) to allow computational scientists to easily port any LA-reliant software components to heterogeneous architectures.
Changes for MAGMA 2.5.2 include:
- New routine:
magmablas_hgemm_batchedfor fixed-size, batched-matrix multiplication in FP16 using the Tensor Cores.- The routine does not currently support pre-Volta GPUs.
- The routine outperforms cuBLAS where sizes are less than 100 by 100, and where general sizes are not multiples of 8.
- The kernel is tuned for the notrans-notrans case only. Comprehensive tuning is planned for future releases.
- Fixed
magmablas_?gemm_vbatchedroutines to correctly handle batch sizes over 65,535. The same fix is applied tovbatched syrk,herk,syr2k,her2k,symm,hemm, andtrmm. - Fixed a bug in the FP32 <-> FP16 conversion routines (
magmablas_hlag2sandmagmablas_slag2h). The bug used to cause a launch failure for very large matrices. - Fixed a bug in the batched LU factorization to avoid NaNs when singularity is encountered.
- Fixed a bug in the batched LU factorization to ensure that the first pivot is always returned—even when multiple pivots with the same absolute value are found.
- Added Frobenius norm for general matrices (supported as an option to
magmablas_XlangeforX = 's', 'd', 'c', or 'z').
Click here to download the tarball.
MAGMA DNN 1.1 Released
MAGMA DNN 1.1 is now available. MAGMA DNN is a C++ neural network library that aims to provide a simple, modular framework for deep learning—accelerated by heterogeneous architectures—using MAGMA as its computational backend.
Changes for MAGMA DNN 1.1 include:
- bug fixes and performance improvements;
- distributed training;
- hyperparameter optimization framework improvements;
- benchmarks using MAGMA DNN; and
- performance comparisons, accuracy validations, and more with TensorFlow, Theano, and PyTorch.
More information on MagmaDNN 1.1 is provided in this paper and in this presentation.
Check out the MAGMA DNN repository on Bitbucket: https://bitbucket.org/icl/magmadnn.
LAPACK 3.9.0 Released
LAPACK 3.9.0 is now available. LAPACK (the Linear Algebra PACKage) is a widely used library for efficiently solving dense linear algebra problems, and ICL has been a major contributor to the development and maintenance of LAPACK since its inception. LAPACK is sequential, relies on the BLAS library, and benefits from the multi-core BLAS library.
Released at SC19, LAPACK 3.9.0 adds a QR-preconditioned QR SVD method and an LAPACK Householder Reconstruction routine.
Visit the LAPACK website to download the tarball.
Interview

Dalal Sukkari
Where are you from, originally?
I was born in Kuwait and grew up in Jordan.
Can you summarize your educational background?
I earned a BS in Mathematics from Hashemite University of Jordan, Amman, in 2008, and I earned my MSc and PhD in Applied Mathematics and Computational Science at the King Abdullah University of Science and Technology in Saudi Arabia, where I studied from 2013 to 2019.
My research focuses on presenting a new high-performance implementation of the QR-based Dynamically Weighted Halley Singular Value Decomposition (QDWH-SVD) solver on (1) multi-core architectures enhanced with GPUs and on (2) distributed-memory platforms based on the state-of-the-art, vendor-optimized ScaLAPACK library.
Where did you work before joining ICL?
I was a PhD student in Applied Mathematics and Computational Science at King Abdullah University of Science and Technology, Saudi Arabia.
How did you first hear about the lab, and what made you want to work here?
I first heard about ICL through the use of LAPACK/PLASMA libraries during one of my summer research courses. Moreover, ICL is a shining star in many conferences, and I have attended many talks and tutorials presented by ICL researchers.
Based on my previous work experience, I thought joining such a productive lab would be a great opportunity for me. So, once I heard about potential ICL job openings at the 2019 SIAM Computational Science and Engineering conference, I applied, and here I am.
What is your focus here at ICL? What are you working on?
I am involved in two interesting projects: SLATE and AsyncIS. In SLATE, we are working on adding more functionalities, such as SVD/EVD, to the SLATE library. In AsynchIS, I will be working on a high-performance implementation of the SGD algorithm to enhance the performance of training a Deep Neural Network.
What are your interests/hobbies outside of work?
I love swimming and walking. Indoors, I like to watch movies, play chess, and I enjoy cooking, too.
Tell us something about yourself that might surprise people.
During my Bachelor’s in Mathematics, I used to skip some difficult math classes (e.g., real analysis) to attend fashion classes at a nearby college.
If you weren’t working at ICL, where would you like to be working and why?
I have a huge appreciation for research in applied mathematics and computational science, so if I was to work elsewhere, I might pursue research with some other research group.

















































