News and Announcements

TOP500 – November 2018

Image courtesy of the US Department of Energy’s Oak Ridge National Laboratory.

In November 2018, the 52nd TOP500 list was unveiled at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18) in Dallas, TX. The United States remains on top with Oak Ridge National Laboratory’s Summit machine. Summit submitted new HPL benchmark results for the November list and achieved 143.5 petaFLOP/s (vs. 122.3 petaFLOP/s in June 2018).

Summit wasn’t the only machine to submit new results, with Lawrence Livermore National Laboratory’s Sierra hitting 94.6 petaFLOP/s on the latest list (vs. 71.6 petaFLOP/s in June 2018). This move puts Sierra at No. 2 on the list—now above China’s Sunway TaihuLight system, which sits at No. 3.

Rank System Cores Rmax (TFLOP/s) Rpeak (TFLOP/s) Power (kW)
1 Summit – IBM Power System AC922, IBM POWER9 22C 3.07GHz, NVIDIA Volta GV100, Dual-rail Mellanox EDR Infiniband , IBM
DOE/SC/Oak Ridge National Laboratory
United States
2,397,824 143,500.0 200,794.9 9,783
2 Sierra – IBM Power System S922LC, IBM POWER9 22C 3.1GHz, NVIDIA Volta GV100, Dual-rail Mellanox EDR Infiniband , IBM / NVIDIA / Mellanox
DOE/NNSA/LLNL
United States
1,572,480 94,640.0 125,712.0 7,438
3 Sunway TaihuLight – Sunway MPP, Sunway SW26010 260C 1.45GHz, Sunway , NRCPC
National Supercomputing Center in Wuxi
China
10,649,600 93,014.6 125,435.9 15,371
4 Tianhe-2A – TH-IVB-FEP Cluster, Intel Xeon E5-2692v2 12C 2.2GHz, TH Express-2, Matrix-2000 , NUDT
National Super Computer Center in Guangzhou
China
4,981,760 61,444.5 100,678.7 18,482
5 Piz Daint – Cray XC50, Xeon E5-2690v3 12C 2.6GHz, Aries interconnect , NVIDIA Tesla P100 , Cray Inc.
Swiss National Supercomputing Centre (CSCS)
Switzerland
387,872 21,230.0 27,154.3 2,384

MAGMA Mentioned in NVIDIA Keynote


Jensen Huang, Founder and CEO of NVIDIA, delivers the NVIDIA Special Address at SC18. At around 08:35, Jensen mentions ICL’s MAGMA project and our work in Investigating Half Precision Arithmetic to Accelerate Dense Linear System Solvers.1

[1] Azzam Haidar, Panruo Wu, Stanimire Tomov, and Jack Dongarra. 2017. Investigating Half Precision Arithmetic to Accelerate Dense Linear System Solvers. In Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA ’17). ACM, New York, NY, USA.

The Editor would like to thank Azzam Haidar for sharing this video.

30 Years of SC

ICL’s Jack Dongarra is among a select few (pictured above) to have attended all 30 SC meetings.

HPCG Results – November 2018

The latest results for the HPC Preconditioned Conjugate Gradient (HPCG) benchmark were released on November 13th at SC18 in Dallas, TX. A joint effort between ICL and Sandia National Laboratories, HPCG is designed to measure performance that is representative of modern HPC capability by simulating compute and communication patterns from sparse iterative solvers commonly found in science and engineering applications.

HPCG results are released twice per year alongside the TOP500 rankings to show how real-world applications might fare on a given machine. The full list of HPCG rankings is available here.

Rank Computer HPL (PFLOP/s) TOP500 Rank HPCG (PFLOP/s) %Peak
1 Summit – IBM, POWER9, NVIDIA Volta V100

DOE/SC/ORNL, USA

143.5 1 2.926 1.5%
2 Sierra – IBM, Power9, NVIDIA Tesla V100

DOE/NNSA/LLNL, USA

94.64 2 1.796 1.4%
3 K Computer – Fujitsu, SPARC64

RIKEN/AIST, Japan

10.51 18 0.603 5.3%
4 Trinity – Cray XC40, Intel Xeon E5-2698 v3, Xeon Phi 7250

DOE/NNSA/LANL/SNL, USA

20.159 6 0.546 1.3%
5 AI Bridging Cloud Infrastructure – PRIMERGY CX2570 M4, Xeon Gold 6148 20C 2.4GHz, NVIDIA Tesla V100

AIST, Japan

16.859 10 0.509 1.7%

ISC 2019 Workshops

ISC is now accepting proposals for full-day and half-day workshops. The goal of the workshops is to provide attendees with a focused and in-depth platform for presentations, discussion, and interaction in a particular subject area.

Submitted workshop proposals will be reviewed by the ISC 2019 Workshops Committee, which is headed by Dr. Sadaf Alam, Swiss National Supercomputing Center (CSCS), with Dr. Heike Jagode, University of Tennessee–Knoxville, as Deputy Chair.

The workshops will be held on Thursday, June 20, 2019 and will be either half-day (9:00 a.m. to 1:00 p.m. or 2:00 p.m. to 6:00 p.m.) or full-day (9:00 a.m. to 6:00 p.m.). Attendance will require a Workshop Pass.

Workshop proposals should be submitted via the ISC 2019 submission site by Wednesday, November 28, 2018. Check out the ISC-HPC workshops site for more information.

Conference Reports

SC18

The International Conference for High Performance Computing Networking, Storage, and Analysis (SC18), now celebrating 30 years, is a staple of ICL’s November itinerary. SC18 was held in Dallas, TX on November 11–16.

Four computational science research centers from the University of Tennessee—the Bredesen Center, the Global Computing Laboratory, the Innovative Computing Laboratory, and the SimCenter—represented the university by anchoring a newly minted University of Tennessee booth. As usual, ICL had a significant presence at SC, with faculty, research staff, and students giving talks, presenting papers, and leading “Birds of a Feather” sessions.

In addition to having a new booth on the floor, ICL also leveraged an online “virtual booth” through which interested parties could keep tabs on ICL-related events—including a list of attendees, detailed schedule of talks, and the latest project handouts.

The editor would like to thank Piotr Luszczek, Jack Dongarra, Terry Moore, and Gerald Ragghianti for their contributions to this article.

Recent Releases

MAGMA 2.5.0 RC1

MAGMA 2.5.0 RC1 is now available. Matrix Algebra on GPU and Multicore Architectures (MAGMA) is a collection of next-generation linear algebra (LA) libraries for heterogeneous architectures. The MAGMA package supports interfaces for current LA packages and standards (e.g., LAPACK and BLAS) to allow computational scientists to easily port any LA-reliant software components to heterogeneous architectures.

MAGMA 2.5.0 RC1 features LAPACK-compliant routines for multi-core CPUs enhanced with NVIDIA GPUs (including the Volta V100). MAGMA now includes more than 400 routines, covering one-sided dense matrix factorizations and solvers, and two-sided factorizations and eigen/singular-value problem solvers, as well as a subset of highly optimized BLAS for GPUs.

Updates and features in MAGMA 2.5.0 RC1 include:

  • New routine: magmablas_Xgemm_batched_strided (X = {s, d, c, z}), which is the stride-based variant of magmablas_Xgemm_batched;
  • New routine: magma_Xgetrf_native (X = {s, d, c, z}) performs the LU factorization with partial pivoting using the GPU only. It has the same interface as the hybrid (CPU + GPU) implementation provided by magma_Xgetrf_gpu. Testing the performance of this routine is possible through running testing_Xgetrf_gpu with the option (--version 3);
  • New routine: magma_Xpotrf_native (X = {s, d, c, z}) performs the Cholesky factorization using the GPU only. It has the same interface as the hybrid (CPU + GPU) implementation provided by magma_Xpotrf_gpu. Testing the performance of this routine is possible through running testing_Xpotrf_gpu with the option (--version 2); and
  • Added benchmark for GEMM in FP16 arithmetic (HGEMM) as well as auxiliary functions to cast matrices from FP32 to FP16 storage (magmablas_slag2h) and from FP16 to FP32 (magmablas_hlag2s).

Click here to download the tarball.

SC18 Project Handouts

The new project handouts from SC18 are available for download in PDF format.

Interview

Joseph Schuchart Then

Joseph Schuchart

Where are you from, originally?

I was born in the Eastern German coastal city of Stralsund but spent most of my early years in a small village in Eastern Thuringia. Even though I have no recollection of life on the coast, I have always been attracted to the sea and water in general (of which there is plenty around here).

Can you summarize your educational background?

I received my German diploma in Computer Science (similar to M.Sc.) from the Dresden University of Technology (TUD), where I became interested in HPC. As a student, I worked at ZIH with Prof. Nagel and Andreas Knüpfer on their in-house performance analysis tools—Vampir, VampirTrace, and Score-P. I am currently working towards my PhD at the High-Performance Computing Center (HLRS) at the University of Stuttgart.

Tell us how you first learned about ICL.

After graduating from TUD, I was fortunate enough to become the on-site support for Vampir at Oak Ridge National Laboratory (ORNL), where I worked from 2012–2013. This was probably also the time I first learned about ICL.

What made you want to visit ICL?

In the course of my PhD work, I am cutting across many topics in parallel programming: from task-based parallelization down to the PGAS programming model and, specifically, MPI-3 RMA, which we use as a basis in the DASH project. ICL seemed to be a natural fit with the task-based runtime system PaRSEC and its ties to Open MPI. There is also a certain attraction to Knoxville and the surrounding area that I have felt ever since I first worked at ORNL.

What are your research interests?

My main research interests range from parallel programming models to performance analysis tools, and—even though I am not a computational scientist—I am fascinated by the applications that sit in between the two and drive our research.

What are you working on during your visit with ICL?

So far I have been exploring different techniques for task schedulers to communicate task states across process boundaries, which is relevant for both PaRSEC and DASH. I am also trying to contribute to both Open MPI and the MPI standard to improve certain aspects we noticed during the last two years of working with MPI RMA.

What are your interests/hobbies outside work?

I love being outdoors, and I enjoy rock climbing and hiking. I am also a passionate hobby photographer—one who is happy about every luck shot he gets.

Tell us something about yourself that might surprise people.

I am a really bad chef, but I enjoy baking breads and cakes.

Recent Papers

  1. Tomov, S., A. Haidar, D. Schultz, and J. Dongarra, Evaluation and Design of FFT for Distributed Accelerated Systems,” ECP WBS 2.3.3.09 Milestone Report, no. FFT-ECP ST-MS-10-1216: Innovative Computing Laboratory, University of Tennessee, October 2018.  (7.53 MB)
  2. Dorris, J., A. YarKhan, J. Kurzak, P. Luszczek, and J. Dongarra, Task Based Cholesky Decomposition on Xeon Phi Architectures using OpenMP,” International Journal of Computational Science and Engineering (IJCSE), vol. 17, no. 3, October 2018. DOI: http://dx.doi.org/10.1504/IJCSE.2018.095851
  3. Ahrens, J., C. M. Biwer, A. Costan, G. Antoniu, M. S. Pérez, N. Stojanovic, R. Badia, O. Beckstein, G. Fox, S. Jha, et al., A Collection of White Papers from the BDEC2 Workshop in Bloomington, IN,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-15: University of Tennessee, Knoxville, November 2018.  (9.26 MB)
  4. Cheng, X., A. Soma, E. D'Azevedo, K. Wong, and S. Tomov, Accelerating 2D FFT: Exploit GPU Tensor Cores through Mixed-Precision , Dallas, TX, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), ACM Student Research Poster, November 2018.  (740.37 KB)
  5. Balaprakash, P., J. Dongarra, T. Gamblin, M. Hall, J. Hollingsworth, B. Norris, and R. Vuduc, Autotuning in High-Performance Computing Applications,” Proceedings of the IEEE, vol. 106, issue 11, pp. 2068–2083, November 2018. DOI: 10.1109/JPROC.2018.2841200  (2.5 MB)
  6. Dongarra, J., M. Gates, J. Kurzak, P. Luszczek, and Y. Tsai, Autotuning Numerical Dense Linear Algebra for Batched Computation With GPU Hardware Accelerators,” Proceedings of the IEEE, vol. 106, issue 11, pp. 2040–2055, November 2018. DOI: 10.1109/JPROC.2018.2868961  (2.53 MB)
  7. Haidar, A., S. Tomov, J. Dongarra, and N. J. Higham, Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers,” The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), Dallas, TX, IEEE, November 2018. DOI: 10.1109/SC.2018.00050  (642.51 KB)
  8. Abdelfattah, A., J. Dongarra, A. Haidar, S. Tomov, and I. Yamazaki, MATEDOR: MAtrix, TEnsor, and Deep-learning Optimized Routines , Dallas, TX, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), Research Poster, November 2018.  (2.55 MB)
  9. Dongarra, J., V. Getov, and K. Walsh, The 30th Anniversary of the Supercomputing Conference: Bringing the Future Closer—Supercomputing History and the Immortality of Now,” Computer, vol. 51, issue 10, pp. 74–85, November 2018. DOI: 10.1109/MC.2018.3971352  (1.73 MB)
  10. Dongarra, J., M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, and I. Yamazaki, The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale,” SIAM Review, vol. 60, issue 4, pp. 808–865, November 2018. DOI: 10.1137/17M1117732  (2.5 MB)
  11. Chow, E., H. Anzt, J. Scott, and J. Dongarra, Using Jacobi Iterations and Blocking for Solving Sparse Triangular Systems in Incomplete Factorization Preconditioning,” Journal of Parallel and Distributed Computing, vol. 119, pp. 219–230, November 2018. DOI: 10.1016/j.jpdc.2018.04.017  (273.53 KB)

Recent Conferences

  1. OCT
    -
    Open MPI Developer Meeting San Jose, California
    George Bosilca
    George
    Thananon Patinyasakdikul
    Arm
    George Bosilca, Thananon Patinyasakdikul
  2. OCT
    -
    George Bosilca
    George
    George Bosilca
  3. NOV
    -
    The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18) Dallas, TX
    Ahmad Abdelfattah
    Ahmad
    Anara Kozhokanova
    Anara
    Aurelien Bouteiller
    Aurelien
    Daniel Barry
    Daniel
    George Bosilca
    George
    Hartwig Anzt
    Hartwig
    Jack Dongarra
    Jack
    Piotr Luszczek
    Piotr
    Terry Moore
    Terry
    Thomas Herault
    Thomas
    Tracy Rafferty
    Tracy
    Yaohung Tsai
    Mike
    Ahmad Abdelfattah, Anara Kozhokanova, Aurelien Bouteiller, Daniel Barry, George Bosilca, Hartwig Anzt, Jack Dongarra, Piotr Luszczek, Terry Moore, Thomas Herault, Tracy Rafferty, Yaohung Tsai
  4. NOV
    -
    BDEC2 Bloomington, IN
    Jack Dongarra
    Jack
    Terry Moore
    Terry
    Tracy Rafferty
    Tracy
    Jack Dongarra, Terry Moore, Tracy Rafferty

Upcoming Conferences

  1. DEC
    -
    TESSE Meeting New York, New York
    Damien Genet
    Damien
    George Bosilca
    George
    Thomas Herault
    Thomas
    Damien Genet, George Bosilca, Thomas Herault
  2. DEC
    -
    MPI Forum Milpitas, California
    Aurelien Bouteiller
    Aurelien
    Aurelien Bouteiller

Recent Lunch Talks

  1. OCT
    5
    Yuechao Lu
    Yuechao Lu
    Osaka University
    Randomized SVD and its Application
  2. OCT
    12
    Pierre Blanchard
    Pierre Blanchard
    University of Manchester
    Optimizing the Polar Decomposition for Modern Computer Architectures PDF
  3. OCT
    19
    Travis Johnston
    Travis Johnston
    ORNL
    167-PetaFLOPs Deep Learning for Electron Microscopy: From Learning Physics to Atomic Manipulation
  4. OCT
    26
    Michael Wyatt
    Michael Wyatt
    Global Computing Laboratory
    PRIONN: Predicting Runtime and IO using Neural Networks
  5. NOV
    2
    Hartwig Anzt
    Hartwig Anzt
    Karlsruhe Institute of Technology
    Convolutional Neural Networks for the Efficient Preconditioner Generation PDF
  6. NOV
    9
    Damien Genet
    Damien Genet
    Tensor Contraction on Distributed Hybrid Architectures using a Task-Based Runtime System PDF
  7. NOV
    30
    John Levesque
    John Levesque
    Cray
    Can We get to an EXAFLOP in Sustained Performance and still be Performance Portable?

Upcoming Lunch Talks

  1. DEC
    7
    Jakub Kurzak
    Jakub Kurzak
    2018 SLATE Recap PDF
  2. DEC
    14
    Heike Jagode
    Heike Jagode
    Software-Defined Events through PAPI PDF

Dates to Remember

ICL 30th Anniversary

Save the date for ICL’s 30th anniversary gathering, slated for August 8–9, 2019 in Knoxville, TN!