ICL Newsletter

News and Announcements

TOP500 – June 2019

The 53rd TOP500 list was just unveiled at the ISC High Performance 2019 conference in Frankfurt, Germany. The United States has kept the top-two spots with the Department of Energy’s Summit (at Oak Ridge National Laboratory) and Sierra (at Lawrence Livermore National Laboratory).

For the first time, every entrant on the TOP500 now exceeds 1 petaFLOP/s on the HPL benchmark, with #500 coming in at 1.022 petaFLOP/s. Also of note is that Piz Daint has been pushed out of the top 5 by Frontera—a new Dell machine installed at the Texas Advanced Computing Center.

Rank	System	Cores	Rmax (TFLOP/s)	Rpeak (TFLOP/s)	Power (kW)
1	Summit – IBM Power System AC922, IBM POWER9 22C 3.07GHz, NVIDIA Volta GV100, Dual-rail Mellanox EDR Infiniband, IBM DOE/SC/Oak Ridge National Laboratory United States	2,414,592	148,600.0	200,794.9	10,096
2	Sierra – IBM Power System S922LC, IBM POWER9 22C 3.1GHz, NVIDIA Volta GV100, Dual-rail Mellanox EDR Infiniband, IBM / NVIDIA / Mellanox DOE/NNSA/LLNL United States	1,572,480	94,640.0	125,712.0	7,438
3	Sunway TaihuLight – Sunway MPP, Sunway SW26010 260C 1.45GHz, Sunway, NRCPC National Supercomputing Center in Wuxi China	10,649,600	93,014.6	125,435.9	15,371
4	Tianhe-2A – TH-IVB-FEP Cluster, Intel Xeon E5-2692v2 12C 2.2GHz, TH Express-2, Matrix-2000, NUDT National Super Computer Center in Guangzhou China	4,981,760	61,444.5	100,678.7	18,482
5	Frontera – Dell C6420, Xeon Platinum 8280 28C 2.7GHz, Mellanox InfiniBand HDR, Dell EMC Texas Advanced Computing Center United States	448,448	23,516.4	38,745.9

HPCG – June 2019

The latest results for the HPC Preconditioned Conjugate Gradient (HPCG) benchmark were just released at ISC High Performance in Frankfurt, Germany. A joint effort between ICL and Sandia National Laboratories, HPCG is designed to measure performance that is representative of modern HPC capability by simulating compute and communication patterns from sparse iterative solvers commonly found in science and engineering applications.

HPCG results are released twice per year alongside the TOP500 rankings to show how real-world applications might fare on a given machine. The full list of HPCG rankings is available here.

Rank	Computer	HPL (PFLOP/s)	TOP500 Rank	HPCG (PFLOP/s)	%Peak
1	Summit – IBM, POWER9, NVIDIA Volta V100 DOE/SC/ORNL, USA	148.6	1	2.926	1.5%
2	Sierra – IBM, Power9, NVIDIA Tesla V100 DOE/NNSA/LLNL, USA	94.64	2	1.796	1.4%
3	K Computer – Fujitsu, SPARC64 RIKEN/AIST, Japan	10.51	20	0.603	5.3%
4	Trinity – Cray XC40, Intel Xeon E5-2698 v3, Xeon Phi 7250 DOE/NNSA/LANL/SNL, USA	20.159	7	0.546	1.3%
5	AI Bridging Cloud Infrastructure – PRIMERGY CX2570 M4, Xeon Gold 6148 20C 2.4GHz, NVIDIA Tesla V100 AIST, Japan	19.880	8	0.509	1.6%

The coffee machine is dead. Long live the coffee machine!

Our beloved Saeco has finally ground its last bean and caffeinated its last ICLer. However, with some quick and creative thinking by ICL admin and accounting, we have a new and shiny DeLonghi superautomatic, and the coffee must flow.

Please be kind to the new machine. Our last one was quite the trooper and performed beyond expectations.

—Ed.

Conference Reports

MIT GPU Hackathon

Piotr Luszczek made his way to Cambridge, MA on June 3–7, where he served as a hacking mentor for the MIT GPU Hackathon. These Hackathons divide developers up into teams as participants work to port their code to GPUs or further optimize their applications for the latest and greatest in GPU hardware—all with the help of a team mentor.

One feature of being so close to MIT is that the Julia Lab software team joined in with the team that was porting Julia-based code for ocean flow models—which involve multiple layers of water columns and mixing between them—to GPUs.

Another team was calculating plasma containment problems and leveraging the Software for Linear Algebra Targeting Exascale’s (SLATE’s) compatibility interface, which serves as an easy-to-use API for current ScaLAPACK users.

A team from Bosch was looking at energy potential modeled by neural networks and was mixing GPU code with Python.

Piotr’s team had a large code developed by GE for flow optimization around wind turbines. This code used higher-order, Runge-Kutta, and an explicit solver with some parts from PETSc and Hypre (part of xSDK).

Piotr also continued his push to improve application integration with respect to the Extreme-scale Scientific Software Development Kit (xSDK), and he will continue to do so at several of this year’s 11 GPU Hackathons (a record)—the next of which will be held at Princeton University.

The editor would like to thank Piotr Luszczek for his contributions to this article.

BDEC2: Poznań

This slideshow requires JavaScript.

The Big Data and Extreme-Scale Computing² (BDEC2) workshop series rolls on, and the latest meeting was held on May 14–16 in Poznań, Poland and focused on the need for a new cyberinfrastructure platform to connect the HPC resources required for intense data analysis, deep learning, and neural networks to the vast amount of data generators that are rarely co-located with facilities capable of that scale of computing.

This workshop brings together eminent representatives of the scientific computing community; including members from industry, academia, and government, with expertise in algorithms, computer system architecture, operating systems, workflow middleware, compilers, libraries, languages, and applications; who are endeavoring to map out the ways in which big-data challenges are changing the traditional cyberinfrastructure paradigm.

The ICL team played a prominent role in this meeting, with Terry Moore acting as a key BDEC conductor, David Rogers designing the meeting’s website and logo, and Tracy Rafferty, Joan Snoderly, and Sam Crawford coordinating and providing additional support on site.

For more information on the BDEC effort, please check out the most recent BDEC report, “Pathways to Convergence: Towards a Shaping Strategy for a Future Software and Data Ecosystem for Scientific Inquiry.”

The next BDEC2 meeting is slated for fall 2019 in San Diego, CA.

The editor would like to thank Tracy Rafferty and Terry Moore for their contributions to this article—and for bringing him along.

Recent Releases

MAGMA 2.5.1 Alpha

MAGMA 2.5.1 Alpha is now available. Matrix Algebra on GPU and Multicore Architectures (MAGMA) is a collection of next-generation linear algebra (LA) libraries for heterogeneous architectures. The MAGMA package supports interfaces for current LA packages and standards (e.g., LAPACK and BLAS) to allow computational scientists to easily port any LA-reliant software components to heterogeneous architectures.

Updates and features in MAGMA 2.5.1 Alpha include:

Updates and improvements in CMakeLists.txt for improved/friendlier CMake and Spack installations;
Fixes related to MAGMA installation on GPUs and CUDA versions that do not support FP16 arithmetic;
Added support for Turing GPUs;
Removed some C++ features from MAGMA Sparse for friendlier compilation (using nvcc and various CPU compilers).

Click here to download the tarball.

Interview

Where are you from, originally?
I was born in Tunisia, which is a small country in North Africa. It was originally occupied by Berbers, but it hosted a lot of different civilizations throughout history, like Carthage, the Roman Empire, and the Ottoman Empire, and it was a French colony from 1881 until 1956. It is the country that started the Arab Spring, and it is now considered a democracy, though it still struggles from serious economic difficulties.

But I cannot answer this question without mentioning my French side. I’m not French (yet), but I have been living in France for several years now, and my love for that country and its people is, for me, stronger than any official paper stating my nationality.

Can you summarize your educational background?
So, I started with “Classes Préparatoires aux Grandes Écoles,” which are a special couple of years in the French educational System where you prepare for a contest to be admitted (or not) to engineering schools. During those two years, I learned a lot of math and physics and a bit of computer science, chemistry, and engineering. I then applied to my current school—ENSEEIHT, in Toulouse, France—to major in computer science and applied mathematics, specializing in HPC and Big Data. I am now finishing my third and final year there, and I will earn my Master’s degree in September.

Tell us how you first learned about ICL.
I told one of my favorite professors at ENSEEIHT that I wanted to do a research internship in HPC in the United States, and he suggested ICL.

What made you want to visit ICL?
I would say the subjects of the papers published by ICL. A lot of them, especially those of the Linear Algebra group, are based on things I did see in class. But at the same time, they target new state-of-the-art challenges. So, I thought I had the tools to learn a lot of things, and being surrounded by the people who wrote those papers is my best chance to do so.

What are your research interests?
Honestly, this question is difficult for me because I’m just starting my research journey. I know that my love for linear algebra is what got me here. But now, I would say, as long as I am learning new things, I’m happy.

What are you working on during your visit with ICL?
I am working on the FFT project, the goal of which is having an FFT library that targets current machines.

What are your interests/hobbies outside work?
This answer is always subject to change. Sometimes I like practicing sports or reading, but the most consistent things would be hanging out with friends outside or watching series. Currently, most of my free time is about playing Zelda.

Tell us something about yourself that might surprise people.
I speak 4 languages, and my favorite one is when I can mix them up. I also can be very sociable when I’m comfortable with the language I speak, and that, I believe, may surprise people here.

Recent Papers

Antoniu, G., A. Costan, O. Marcu, M. S. PÃ©rez, N. Stojanovic, R. M. Badia, M. VÃ¡zquez, S. Girona, M. Beck, T. Moore, et al., “A Collection of White Papers from the BDEC2 Workshop in Poznan, Poland,” Innovative Computing Laboratory Technical Report, no. ICL-UT-19-10: University of Tennessee, Knoxville, May 2019. (5.82 MB)
Ribizel, T., and H. Anzt, “Approximate and Exact Selection on GPUs,” 2019 IEEE International Parallel and Distributed Processing Symposium Workshops, Rio de Janeiro, Brazil, IEEE, May 2019. DOI: 10.1109/IPDPSW.2019.00088 (440.71 KB)
Anzt, H., and G. Flegar, “Are we Doing the Right Thing? â A Critical Analysis of the Academic HPC Community,” 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Rio de Janeiro, Brazil, IEEE, May 2019. DOI: 10.1109/IPDPSW.2019.00122 (622.32 KB)
Kaya, O., and Y. Robert, “Computing Dense Tensor Decompositions with Optimal Dimension Trees,” Algorithmica, vol. 81, issue 5, pp. 2092â2121, May 2019. DOI: 10.1007/s00453-018-0525-3 (638.4 KB)
Abdelfattah, A., S. Tomov, and J. Dongarra, “Fast Batched Matrix Multiplication for Small Sizes using Half Precision Arithmetic on GPUs,” 33rd IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019. (675.5 KB)
Bai, Z., J. Dongarra, D. Lu, and I. Yamazaki, “Matrix Powers Kernels for Thick-Restart Lanczos with Explicit External Deflation,” International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019. (480.73 KB)
Anzt, H., T. Ribizel, G. Flegar, E. Chow, and J. Dongarra, “ParILUT â A Parallel Threshold ILU for GPUs,” IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019. DOI: 10.1109/IPDPS.2019.00033 (505.95 KB)
Aupy, G., A. Gainaru, V. HonorÃ©, P. Raghavan, Y. Robert, and H. Sun, “Reservation Strategies for Stochastic Jobs,” 33rd IEEE International Parallel and Distributed Processing Symposium (IPDPS 2019), Rio de Janeiro, Brazil, IEEE Computer Society Press, May 2019. (808.93 KB)
Danalis, A., H. Jagode, T. Herault, P. Luszczek, and J. Dongarra, “Software-Defined Events through PAPI,” 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Rio de Janeiro, Brazil, IEEE, May 2019. DOI: 10.1109/IPDPSW.2019.00069 (446.41 KB)
Zaitsev, D., S. Tomov, and J. Dongarra, “Solving Linear Diophantine Systems on Parallel Architectures,” IEEE Transactions on Parallel and Distributed Systems, vol. 30, issue 5, pp. 1158-1169, May 2019. DOI: http://dx.doi.org/10.1109/TPDS.2018.2873354 (802.97 KB)
Wong, K., S. Tomov, and J. Dongarra, “Hands-on Research and Training in High-Performance Data Sciences, Data Analytics, and Machine Learning for Emerging Environments,” ISC High Performance, Frankfurt, Germany, Springer International Publishing, June 2019. (1016.52 KB)
Danalis, A., H. Jagode, and J. Dongarra, Is your scheduling good? How would you know? , Bordeaux, France, 14th Scheduling for Large Scale Systems Workshop, June 2019. (2.5 MB)
Kurzak, J., M. Gates, A. Charara, A. YarKhan, and J. Dongarra, “Least Squares Solvers for Distributed-Memory Machines with GPU Accelerators,” ACM International Conference on Supercomputing (ICS '19), Phoenix, Arizona, ACM, pp. 117â126, June 2019. DOI: https://dl.acm.org/doi/abs/10.1145/3330345.3330356 (1.63 MB)
Nichols, D., N-S. Tomov, F. Betancourt, S. Tomov, K. Wong, and J. Dongarra, “MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing,” ISC High Performance, Frankfurt, Germany, Springer International Publishing, June 2019. DOI: 10.1007/978-3-030-34356-9_37 (1.37 MB) (8.72 MB)
Dongarra, J., M. Gates, A. Haidar, J. Kurzak, P. Luszczek, P. Wu, I. Yamazaki, A. YarKhan, M. Abalenkovs, N. Bagherpour, et al., “PLASMA: Parallel Linear Algebra Software for Multicore Using OpenMP,” ACM Transactions on Mathematical Software, vol. 45, issue 2, June 2019. DOI: 10.1145/3264491 (7.5 MB)
Canon, L-C., A K W. Chang, Y. Robert, and F. Vivien, “Scheduling Independent Stochastic Tasks under Deadline and Budget Constraints,” International Journal of High Performance Computing Applications, vol. 34, issue 2, pp. 246-264, June 2019. DOI: 10.1177/1094342019852135 (427.92 KB)
Kurzak, J., M. Gates, A. Charara, A. YarKhan, and J. Dongarra, “SLATE Working Note 12: Implementing Matrix Inversions,” SLATE Working Notes, no. 12, ICL-UT-19-04: Innovative Computing Laboratory, University of Tennessee, June 2019. (1.95 MB)
Anzt, H., Y. Chen Chen, T. Cojean, J. Dongarra, G. Flegar, P. Nayak, E. S. Quintana-Orti, Y. M. Tsai, and W. Wang, “Towards Continuous Benchmarking,” Platform for Advanced Scientific Computing Conference (PASC 2019), Zurich, Switzerland, ACM Press, June 2019. DOI: 10.1145/3324989.3325719 (1.51 MB)

Recent Conferences

MAY
6-7

PETTT Annual PPE Review Vicksburg, Mississippi
Stan

Stanimire Tomov
MAY
14-16

BDEC2 Poznan, Poland
Joan
Sam
Terry
Tracy

Joan Snoderly, Sam Crawford, Terry Moore, Tracy Rafferty
MAY
20-24

IPDPS Rio di Janeiro, Brazil
Anthony
Hartwig
Ichitaro
Jack

Anthony Danalis, Hartwig Anzt, Ichitaro Yamazaki, Jack Dongarra
MAY
21-23

2019 OLCF User Meeting Oak Ridge, Tennessee
Stan

Stanimire Tomov
MAY
28-31

MPI Forum Chicago, Illinois
Aurelien

Aurelien Bouteiller
MAY
29-30

DoE/MEXT Chicago, Illinois
Mark

Mark Gates
JUN
3-7

MIT GPU Hackathon Cambridge, Massachusetts
Piotr

Piotr Luszczek
JUN
16-20

ISC High Performance 2019 Frankfurt, Germany
Hartwig
Heike
Jack
Stan

Hartwig Anzt, Heike Jagode, Jack Dongarra, Stanimire Tomov
JUN
26-28

14th Scheduling for Large Scale Systems Workshop Bordeaux, France
Anthony
Thomas

Anthony Danalis, Thomas Herault
JUN
26-28

ICS'19 Phoenix, Arizona
Jakub

Jakub Kurzak

Upcoming Conferences

JUL
7-13

International HPC Summer School Kobe, Japan
Alan

Alan Ayala
JUL
10-11

1st Joint NSF DOE Workshop on High Performance Computing Eugene, Oregon
Gerald

Gerald Ragghianti
JUL
22-25

Collegeville Workshop on Sustainable Scientific Software (CW3S19) Collegeville, Minnesota
Heike
Jakub

Heike Jagode, Jakub Kurzak
JUL
22-26

OLCF/ECP OpenMP Hackathon Knoxville, Tennessee
Ali

Ali Charara
JUL
28-1

2019 Scalable Tools Workshop Tahoe City, California
Anthony

Anthony Danalis
JUL
28-9

Argonne Training Program on Extreme-Scale Computing (ATPESC) 2019 St. Charles, Illinois
Daniel

Daniel Barry

Recent Lunch Talks

MAY
3
Stephen Thomas
Global Computing Laboratory
Analytics4MD: In-Situ Data Analytics for Next Generation Molecular Dynamics (MD) Workflows
MAY
10
Qinglei Cao
Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance
MAY
17
Ali Charara
SLATE Data Coherence: An Adapted Cache Coherence Model
MAY
24
Thomas Herault
Comparing the Performance of Rigid, Moldable and Grid-Shaped Applications on Failure-Prone HPC Platforms PDF
MAY
31
Reazul Hoque
Dynamic Task Discovery in a Data Flow Task-Based Runtime PDF
JUN
6
Vipin Kumar
University of Minnesota
Physics Guided Machine Learning: A New Paradigm for Modeling Dynamical Systems
JUN
7
Yves Robert
ENS-Lyon
Replication is More Efficient Than You Think

Upcoming Lunch Talks

JUL
26
Thananon Patinyasakdikul
Improving Multithreaded MPI for Current Hardware Architecture

June 2019