News and Announcements

TOP500 – June 2018

In June 2018, the 51st TOP500 list was announced at the ISC-HPC conference in Frankfurt, Germany. The United States is back on top with Oak Ridge National Laboratory’s Summit machine. Summit achieved 122.3 petaFLOP/s on the HPL benchmark, besting China’s Sunway TaihuLight system.

Summit has 4,356 nodes, each one equipped with two 22-core IBM Power9 CPUs and six NVIDIA Tesla V100 GPUs. The nodes are linked with a Mellanox dual-rail EDR InfiniBand network. For a deeper dive into the Summit machine, please see the June 2018 issue of the ICL newsletter.

Summit wasn’t the only new addition near the top of the list, as the new AI Bridging Cloud Infrastructure (ABCI) machine is now No. 5, with an HPL score of 19.9 petaFLOP/s. The Fujitsu-built supercomputer, powered by 20-core Xeon Gold CPUs along with NVIDIA Tesla V100 GPUs, is up and running at Japan’s National Institute of Advanced Industrial Science and Technology (AIST) and pushed Piz Daint to the No. 6 spot.

Rank System Cores Rmax (TFLOP/s) Rpeak (TFLOP/s) Power (kW)
1 Summit – IBM  POWER9, NVIDIA Volta GV100, Dual-rail Mellanox EDR Infiniband,
DOE/SC/Oak Ridge National Laboratory,
United States
2,282,544 122,300.0 187,659.3 8,806
2 Sunway TaihuLight – Sunway SW26010 260C 1.45GHz, Sunway, NRCPC
National Supercomputing Center in Wuxi,
China
10,649,600 93,014.6 125,435.9 15,371
3 Sierra – IBM POWER9, NVIDIA Volta GV100, Dual-rail Mellanox EDR Infiniband,
DOE/NNSA/LLNL,
United States
1,572,480 71,610.0 119,193.6
4 Tianhe-2A – TH-IVB-FEP Cluster, Intel Xeon E5-2692v2 12C 2.2GHz, TH Express-2, Matrix-2000, NUDT
National Super Computer Center in Guangzhou,
China
4,981,760 61,444.5 100,678.7 18,482
5 AI Bridging Cloud Infrastructure (ABCI) – PRIMERGY CX2550 M4, Xeon Gold 6148 20C 2.4GHz, NVIDIA Tesla V100, SXM2, Infiniband EDR, Fujitsu,
National Institute of Advanced Industrial Science and Technology (AIST),
Japan
391,680 19,880.0 32,576.6 1,649

Happy Birthday, Jack!

On July 18, Jack was treated to a surprise party and cake in the coffee room. ICLers huddled around to marvel at Jack and his confectionery likeness.

Notably, this is the first time Jack has been at ICL on his birthday. “Lesson learned.” Happy birthday, Jack!

HPCG Results – June 2018

The latest results for the HPC Preconditioned Conjugate Gradient (HPCG) benchmark were released on June 25, 2018 at ISC-HPC in Frankfurt, Germany. A joint effort between ICL and Sandia National Laboratories, HPCG is designed to measure performance that is representative of modern HPC capability by simulating compute and communication patterns from sparse iterative solvers commonly found in science and engineering applications.

HPCG results are released twice per year alongside the TOP500 rankings to show how real-world applications might fare on a given machine. Like in the TOP500, Summit has claimed the No. 1 spot. The full list of HPCG rankings is available here.

Rank Computer HPL (PFLOP/s) TOP500 Rank HPCG (PFLOP/s) %Peak
1 Summit – IBM, POWER9, NVIDIA Volta V100

DOE/SC/ORNL, USA

122.3 1 2.926 1.50%
2 Sierra – IBM, Power9, NVIDIA Tesla V100

DOE/NNSA/LLNL, USA

71.61 3 1.796 1.50%
3 K Computer – Fujitsu, SPARC64

RIKEN/AIST, Japan

10.51 16 0.603 5.30%
4 Trinity – Cray XC40, Intel Xeon E5-2698 v3

DOE/NNSA/LANL/SNL, USA

14.137 9 0.546 1.80%
5 Piz Daint – Cray XC50, Intel Xeon E5-2690 v3, NVIDIA Tesla P100

CSCS, Switzerland

19.59 6 0.486 1.90%

Employment Opportunities at ICL

Research Scientist (with MS or PhD) or Postdoctoral Researcher

ICL is seeking full-time scientists (MS or PhD) or postdoctoral researchers to participate in the design, development, and maintenance of numerical software libraries for solving linear algebra problems on large, distributed-memory machines with multi-core processors, hardware accelerators, and performance monitoring capabilities for new and advanced hardware and software technologies. The prospective researcher will coauthor papers to document research findings, present the team’s work at conferences and workshops, and help lead students and other team members in their research endeavors in ongoing and future projects. Given the nature of the work, there will be opportunities for publication, travel, and high-profile professional networking and collaboration across academia, labs, and industry.

An MS or PhD in computer science, computational sciences, or math is preferred. Background in at least one of the following areas is also preferred: numerical linear algebra, HPC, performance monitoring, machine learning, or data analytics. Full-time employment for up to 4 years with the possibility of further extensions based on funding availability and performance.

Joining this team will offer qualified candidates exciting career opportunities as they participate in the US Department of Energy’s Exascale Computing Project (ECP). ICL is involved in several ECP projects, including SLATE (http://icl.utk.edu/slate/), PEEKS (http://icl.utk.edu/peeks/), xSDK (http://www.icl.utk.edu/research/xsdk4ecp), Exa-PAPI (http://icl.utk.edu/exa-papi/), CEED (https://ceed.exascaleproject.org/), Distributed Tasking for Exascale (PaRSEC) (http://icl.utk.edu/dte/), MAGMA (http://icl.cs.utk.edu/magma/), FFT-ECP, and others.

Starting date is July 1, 2018 or later. All qualified candidates, be it fresh (MS or PhD) graduates or seasoned HPC veterans, are encouraged to apply.

For more information, contact Jack Dongarra (dongarra@icl.utk.edu) or check out ICL’s jobs page: http://www.icl.utk.edu/jobs.

Washington Post Op-Ed

In June, the Washington Post’s Mark Lasswell asked Jack Dongarra to write an op-ed on all things HPC, including the new Summit machine, the impact of HPC on our daily lives, and on the future of HPC as a whole.

Click here to read the article on the Washington Post website.

Conference Reports

ISC High Performance

The 2018 ISC High Performance Computing conference (ISC-HPC) kicked off on June 17th in Frankfurt, Germany. ICL’s Jack Dongarra and Piotr Luszczek both attended the conference, accompanied by approximately 3,500 other attendees and contributors, including ICL collaborators Sven Hammarling, Felix Wolf, Pedro Valero Lara, and Zounon Mawussi.

Jack co-presented the TOP500 awards, unveiling Oak Ridge National Laboratory’s Summit as the No. 1 machine on the list. Jack also led a focus session on the NLAFET library, where he described recent developments in task-based algorithms for the solution of dense linear systems and the solution of both symmetric and unsymmetric dense eigenproblems.

Jack, Sven, Piotr, Mawussi, and Pedro also hosted a Birds of a Feather (BoF) session on Batched BLAS standardization. The BoF aimed to continue the efforts to propose and refine an objective batched BLAS standard—that does not incur a severe performance penalty for any given architecture—by analyzing the benefits and drawbacks of existing batched BLAS interfaces.

Piotr also presented a poster for ICL’s Benchtesting OpeN Software Autotuning Infrastructure (BONSAI) project, which aims to develop a software infrastructure for using parallel hybrid systems at any scale to carry out large, concurrent autotuning sweeps to dramatically accelerate the optimization process of computational kernels for GPU accelerators and many-core coprocessors.

Mawussi and Piotr, on behalf of Azzam Haidar and coauthors, presented a poster on low-precision arithmetic on GPUs, “Using GPU’s FP16 Tensor Cores Arithmetic to Accelerate Mixed-Precision Iterative Refinement Solvers and Reduce Energy Consumption,” which went on to win its category’s Best Poster Award. Congratulations to all involved!

Finally, Piotr gave the keynote talk for the Approximate and Transprecision Computing on Emerging Technologies (ATCET) workshop. ATCET focuses on applications ranging from big data analytics to deep learning and classical scientific computing/simulation and aims to investigate the theoretical and practical understanding of the energy efficiency boost obtainable when accuracy requirements on data being processed, stored, and communicated can be lifted for intermediate calculations.

The editor would like to thank Piotr Luszczek for his contribution to this article.

GPU Hackathon

On June 4–8, ICL’s Piotr Luszczek was in Boulder, CO for this year’s aptly named Boulder GPU Hackathon. The Hackathon consisted of around 50 people split into teams of developers who were carefully paired with mentors (i.e., Piotr and ICL alum Kenneth Roche) who specialize high-performance computing, appropriate software languages, and GPU programming APIs.

Over five days, each team worked through a coding sprint with daily stand-ups. Daily stand-ups promote cross-team collaboration, enhance knowledge sharing, and ensure quick roadblock resolution.

More Hackathons for Sante Fe, NM and Brookhaven/Upton, NY are already planned. See the GPU Hackathon website for more details.

The editor would like to thank Piotr Luszczek for his contributions to this article.

Recent Releases

MAGMA 2.4.0

MAGMA 2.4.0 is now available. Matrix Algebra on GPU and Multicore Architectures (MAGMA) is a collection of next-generation linear algebra (LA) libraries for heterogeneous architectures. The MAGMA package supports interfaces for current LA packages and standards (e.g., LAPACK and BLAS) to allow computational scientists to easily port any LA-reliant software components to heterogeneous architectures.

MAGMA 2.4.0 features LAPACK-compliant routines for multi-core CPUs enhanced with NVIDIA GPUs (including the Volta V100). MAGMA now includes more than 400 routines, covering one-sided dense matrix factorizations and solvers, and two-sided factorizations and eigen/singular-value problem solvers, as well as a subset of highly optimized BLAS for GPUs.

Other updates and features in MAGMA 2.4.0:

  • Added constrained least squares routines (magma_[sdcz]gglse) and dependencies for magma_zggrqf – generalized RQ factorization; magma_zunmrq – multiply by orthogonal Q as returned by zgerqf.
  • Added performance improvements across many batch routines, including batched TRSM, batched LU, batched LU-nopiv, and batched Cholesky.
  • Fixed some compilation issues with inf, nan, and nullptr.

Additional MAGMA-sparse features:

  • Changed how data from an external application is handled;
    there is now a clear distinction between memory allocated/used/freed from MAGMA and the user application.
  • Added the functions magma_zvcopy and magma_zvpass, which do not allocate memory; instead, they copy values from/to application-allocated memory.
  • The examples (in example/example_sparse.c) provide a demonstration of how these routines should be used.

Click here to download the tarball.

Interview

Daniel Barry Then

Daniel Barry

Where are you from, originally?

I’m from Karns, a town in West Knoxville, Tennessee.

Can you summarize your educational background?

I earned my Bachelors of Science in Computer Engineering and Mathematics at the University of Tennessee, Knoxville. I graduated in the spring of 2018. I started with the Computer Engineering major, but I decided to double major with Mathematics because I feel that math is a useful tool set in engineering.

How did you first hear about ICL, and what made you want to work here?

I first heard about ICL during the 2014 Student Cluster Competition. I was a member of the University of Tennessee, Knoxville team. As part of the competition, teams must compile and run the HPL software on their cluster. Since ICL develops HPL, we naturally heard of ICL.

What is your focus here? What are you working on?

I am helping to develop a toolkit, the central aim of which is to define an accurate mapping between particular high-level concepts of performance metrics, such as L1 data cache misses, and their underlying, corresponding low-level hardware events.

What would you consider the most valuable “lesson” you have learned so far at ICL?

I have learned that consistent communication with the other students and research scientists of ICL is greatly advantageous. My mentors have introduced me to great opportunities in which to get involved, such as serving as a student volunteer at the annual Supercomputing Conference. One of my fellow student research assistants helped me sign up for a machine learning course that I needed for my core curriculum. I also found out about the Interdisciplinary Graduate Minor in Computational Science (IGMCS) from talking to my research student peers. Additionally, a few fellow ICLers regularly invite me to lunch, which was a great introduction for the new guy! Staying involved at ICL is highly beneficial.

What are your interests/hobbies outside of work?

I like to spend time outside by walking, hiking, caving, and swimming. Indoors, I like to program, read about news in technology and computer science, and play video games. I thoroughly enjoy eating; as such, I enjoy cooking, too.

Tell us something about yourself that might surprise people.

I can speak Mandarin Chinese.

If you weren’t working at ICL, where would you like to be working and why?

This is a tough question. Since my freshman year of undergraduate, I have wanted to work at ICL. If I was not working here, then I certainly would like to be working here. I have a huge appreciation for research in computer science and engineering, so if I was to work elsewhere, I might pursue a research assistantship with a some other research group in the Department of Electrical Engineering and Computer Science at UTK.

Recent Papers

  1. Luo, X., W. Wu, G. Bosilca, T. Patinyasakdikul, L. Wang, and J. Dongarra, ADAPT: An Event-Based Adaptive Collective Communication Framework,” The 27th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '18), Tempe, Arizona, ACM Press, June 2018. DOI: 10.1145/3208040.3208054  (493.65 KB)
  2. Bosilca, G., A. Bouteiller, T. Herault, V. Le Fèvre, Y. Robert, and J. Dongarra, Distributed Termination Detection for HPC Task-Based Environments,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-14: University of Tennessee, June 2018.
  3. Abdelfattah, A., M. Gates, J. Kurzak, P. Luszczek, and J. Dongarra, Implementation of the C++ API for Batch BLAS,” SLATE Working Notes, no. 07, ICL-UT-18-04: Innovative Computing Laboratory, University of Tennessee, June 2018.  (1.07 MB)
  4. YarKhan, A., G. Ragghianti, J. Dongarra, M. Cawkwell, D. Perez, and A. Voter, Initial Integration and Evaluation of SLATE Parallel BLAS in LATTE,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-07: Innovative Computing Laboratory, University of Tennessee, June 2018.  (366.6 KB)
  5. Kurzak, J., M. Gates, A. YarKhan, I. Yamazaki, P. Luszczek, J. Finney, and J. Dongarra, Parallel Norms Performance Report,” SLATE Working Notes, no. 06, ICL-UT-18-06: Innovative Computing Laboratory, University of Tennessee, June 2018.  (1.13 MB)
  6. Anzt, H., I. Yamazaki, M. Hoemmen, E. Boman, and J. Dongarra, Solver Interface & Performance on Cori,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-05: University of Tennessee, June 2018.  (188.05 KB)
  7. Haidar, A., A. Abdelfattah, M. Zounon, P. Wu, S. Pranesh, S. Tomov, and J. Dongarra, The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques,” International Conference on Computational Science (ICCS 2018), vol. 10860, Wuxi, China, Springer, pp. 586–600, June 2018. DOI: 10.1007/978-3-319-93698-7_45  (487.88 KB)
  8. Haidar, A., S. Tomov, A. Abdelfattah, M. Zounon, and J. Dongarra, Using GPU FP16 Tensor Cores Arithmetic to Accelerate Mixed-Precision Iterative Refinement Solvers and Reduce Energy Consumption,” ISC High Performance (ISC'18), Best Poster, Frankfurt, Germany, June 2018.  (3.01 MB)
  9. Haidar, A., S. Tomov, A. Abdelfattah, M. Zounon, and J. Dongarra, Using GPU FP16 Tensor Cores Arithmetic to Accelerate Mixed-Precision Iterative Refinement Solvers and Reduce Energy Consumption , Frankfurt, Germany, ISC High Performance (ISC18), Best Poster Award, June 2018.  (3.01 MB)
  10. Jagode, H., A. Danalis, and J. Dongarra, Accelerating NWChem Coupled Cluster through dataflow-based Execution,” The International Journal of High Performance Computing Applications, vol. 32, issue 4, pp. 540--551, July 2018. DOI: 10.1177/1094342016672543  (1.68 MB)
  11. Dongarra, J., I. Duff, M. Gates, A. Haidar, S. Hammarling, N. J. Higham, J. Hogg, P. Valero Lara, P. Luszczek, M. Zounon, et al., Batched BLAS (Basic Linear Algebra Subprograms) 2018 Specification , July 2018.  (483.05 KB)
  12. Asch, M., T. Moore, R. M. Badia, M. Beck, P. Beckman, T. Bidot, F. Bodin, F. Cappello, A. Choudhary, B. R. de Supinski, et al., Big Data and Extreme-Scale Computing: Pathways to Convergence - Toward a Shaping Strategy for a Future Software and Data Ecosystem for Scientific Inquiry,” The International Journal of High Performance Computing Applications, vol. 32, issue 4, pp. 435–479, July 2018. DOI: 10.1177/1094342018778123  (1.29 MB)
  13. Casanova, H., J. Herrmann, and Y. Robert, Computing the Expected Makespan of Task Graphs in the Presence of Silent Errors,” Parallel Computing, vol. 75, pp. 41–60, July 2018. DOI: 10.1016/j.parco.2018.03.004  (2.56 MB)
  14. Anzt, H., E. Chow, and J. Dongarra, ParILUT - A New Parallel Threshold ILU,” SIAM Journal on Scientific Computing, vol. 40, issue 4: SIAM, pp. C503–C519, July 2018. DOI: 10.1137/16M1079506  (19.26 MB)
  15. Danalis, A., H. Jagode, and J. Dongarra, Software-Defined Events through PAPI for In-Depth Analysis of Application Performance , Basel, Switzerland, 5th Platform for Advanced Scientific Computing Conference (PASC18), July 2018.

Recent Conferences

  1. JUN
    -
    GPU Hackathon Boulder, Colorado
    Piotr Luszczek
    Piotr
    Piotr Luszczek
  2. JUN
    -
    Heike Jagode
    Heike
    Heike Jagode
  3. JUN
    -
    SC Conference Meeting Dallas, Texas
    Gerald Ragghianti
    Gerald
    Jack Dongarra
    Jack
    Gerald Ragghianti, Jack Dongarra
  4. JUN
    -
    HPDC Tempe, Arizona
    Xi Luo
    Xi
    Xi Luo
  5. JUN
    -
    Thomas Herault
    Thomas
    Thomas Herault
  6. JUN
    -
    ISC High Performance 2018 Frankfurt, Germany
    Piotr Luszczek
    Piotr
    Piotr Luszczek
  7. JUN
    -
    PMAA18 and CLay-GPU Workshop Zurich, Switzerland
    George Bosilca
    George
    George Bosilca
  8. JUL
    -
    PASC 2018 Basel, Switzerland
    Anthony Danalis
    Anthony
    Heike Jagode
    Heike
    Anthony Danalis, Heike Jagode
  9. JUL
    -
    2018 SIAM Annual Meeting Portland, Oregon
    Azzam Haidar
    Azzam
    Jakub Kurzak
    Jakub
    Azzam Haidar, Jakub Kurzak
  10. JUL
    -
    Ichitaro Yamazaki
    Ichitaro
    Ichitaro Yamazaki

Upcoming Conferences

  1. AUG
    -
    CEED 2nd Annual Meeting Boulder, Colorado
    Stanimire Tomov
    Stan
    Stanimire Tomov
  2. AUG
    -
    Piotr Luszczek
    Piotr
    Piotr Luszczek
  3. AUG
    -
    EuroPar 2018 Turin, Italy
    George Bosilca
    George
    George Bosilca
  4. AUG
    -
    Ichitaro Yamazaki
    Ichitaro
    Stanimire Tomov
    Stan
    Ichitaro Yamazaki, Stanimire Tomov

Recent Lunch Talks

  1. JUN
    1
    Anne Benoit
    Anne Benoit
    Georgia Tech
    Combining Checkpointing and Replication for Reliable Execution of Linear Workflows
  2. JUN
    8
    Lou Gross
    Lou Gross
    NIMBioS
    A Rational Basis for Hope: Human Behavior Modeling and Climate Change

Upcoming Lunch Talks

  1. AUG
    24
    Tracy Rafferty
    Tracy Rafferty
    ICL Meeting Space PDF
  2. AUG
    31
    Arm Patinyasakdikul
    Arm Patinyasakdikul
    One-Sided MPI Implementation PDF

People

  1. Hejer Shaiek
    Hejer Shaiek, a visiting scholar from ENSEEIHT, Toulouse, is working with Stan on the MAGMA project.
  2. Yuechao Lu
    Yuechao Lu, a PhD student from Osaka University by way of Shanghai, is visiting ICL through the fall and will be working with Ichi and Stan.
  3. Azzam Haidar
    Azzam Haidar left ICL in July to join the math libraries team at NVIDIA. Congratulations and good luck, Azzam!

Dates to Remember

ICL Retreat

The 2018 ICL retreat has been set for August 20–21 at the RT Lodge. Mark your calendars!