News and Announcements

TOP500 – June 2015

The 45th TOP500 rankings were presented at this year’s ISC High Performance conference in Frankfurt, Germany. For the 5th consecutive time, China’s Tianhe-2 has remained at the top of the ranking with 33.863 petaflop/s on the High Performance LINPACK benchmark. Tianhe-2 has 16,000 nodes, each with two Intel Xeon Ivy Bridge processors and three Xeon Phi coprocessors for a combined total of 3,120,000 computing cores.

As for the rest of the list, the top 5 machines remain unchanged, but there is a new system at No. 7. Shaheen II—a Cray XC40 system installed at King Abdullah University of Science and Technology (KAUST)achieved 5.536 petaflop/s on the LINPACK benchmark, making it the highest-ranked Middle East system in the 22-year history of the list and the first to reach the Top 10.

More details on the the 45th edition of the TOP500 are available in the official press release.

Rank Site System Rmax (TFlop/s)

1

National Super Computer Center in Guangzhou
China

Tianhe-2 (MilkyWay-2) – TH-IVB-FEP Cluster
NUDT

33,862.7

2

DOE/SC/Oak Ridge National Laboratory
United States

Titan – Cray XK7
Cray Inc.

17,590.0

3

DOE/NNSA/LLNL
United States

Sequoia – BlueGene/Q
IBM

17,173.2

4

RIKEN Advanced Institute for Computational Science (AICS)
Japan

K computer, SPARC64 VIIIfx
Fujitsu

10,510.0

5

DOE/SC/Argonne National Laboratory
United States

Mira – BlueGene/Q
IBM

8,586.6

See the full list at TOP500.org.


In the interview above, Jack Dongarra and Erich Strohmaier discuss the possible reasons for the performance plateau evident in recent TOP500 lists. They also share their expectations, including a leap in performance in the near future, and the arrival of Exascale in 2022-2023.

DARE Funded

ICL’s Data-driven Autotuning for Runtime Execution (DARE) project was recently funded by the NSF. DARE will investigate techniques for empirical autotuning of execution schedules for parallel applications on scalable multicore-plus-accelerator hybrid systems using a software workbench—the DARE framework. The DARE framework will optimize data and work placement, granularity, and scheduling decisions for maximum performance of a given application on a given hardware system. Congratulations to the DARE team!

HPCG June 2015 Results

The June 2015 results for the HPC Preconditioned Conjugate Gradient (HPCG) benchmark were released on July 15th at the ISC-HPC conference. Intended to be a new HPC metric, HPCG is designed to measure performance that is representative of modern HPC capability by simulating patterns commonly found in real science and engineering applications.

To keep pace with the changing hardware and software infrastructures, HPCG results will be used to augment the TOP500 rankings to show how real world applications might fare on a given machine. In the table below, you can see how the HPCG benchmark would have ranked its top 10 machines, and where those machines ranked on the LINPACK-based TOP500 list. The full list of rankings is available here.

Site Computer HPL (Pflop/s) TOP500 Rank HPCG (Pflop/s) HPCG Rank %Peak
NSCC / Guangzhou Tianhe-2 NUDT, Xeon 12C 2.2GHz + Intel Xeon Phi 57C + Custom 33.863 1 0.5800 1 1.1%
RIKEN Advanced Institute for Computational Science K computer, SPARC64 VIIIfx 2.0GHz, Tofu interconnect 10.510 4 0.4608 2 4.1%
DOE/SC/Oak Ridge Nat Lab Titan – Cray XK7 , Opteron 6274 16C 2.200GHz, Cray Gemini interconnect, NVIDIA K20x 17.590 2 0.3223 3 1.2%
DOE/SC/Argonne National Laboratory Mira – BlueGene/Q, Power BQC 16C 1.60GHz, Custom 8.587 5 0.1670 4 1.7%
NASA / Mountain View Pleiades – SGI ICE X, Intel E5-2680, E5-2680V2, E5-2680V3, Infiniband FDR 4.089 11 0.1319 5 2.7%
Swiss National Supercomputing Centre (CSCS) Piz Daint – Cray XC30, Xeon E5-2670 8C 2.600GHz, Aries interconnect , NVIDIA K20x 6.271 6 0.1246 6 1.6%
KAUST / Jeda Shaheen II – Cray XC40, Intel Haswell 2.3 GHz 16C, Cray Aries 5.537 7 0.1139 7 1.6%
Texas Advanced Computing Center/Univ. of Texas Stampede – PowerEdge C8220, Xeon E5-2680 8C 2.700GHz, Infiniband FDR, Intel Xeon Phi SE10P 5.168 8 0.0968 8 1.0%
Leibniz Rechenzentrum SuperMUC – iDataPlex DX360M4, Xeon E5-2680 8C 2.70GHz, Infiniband FDR 2.897 20 0.0833 9 2.6%
EPSRC/University of Edinburgh ARCHER – Cray XC30, Intel Xeon E5 v2 12C 2.700GHz, Aries interconnect 1.643 34 0.0808 10 3.2%

Conference Reports

ISC High Performance 2015

This year’s ISC meeting, now known as ISC High Performance (ISC-HPC), was held on July 12-16 in Frankfurt, Germany. ICL was well represented at ISC-HPC with Jack Dongarra, Stan Tomov, Jakub Kurzak, Terry Moore, and Tracy Rafferty all making their way to Frankfurt for the conference.

Jack was in high demand as usual. He started off the week presenting the June 2015 TOP500 awards on Monday, and then gave a talk, “Anatomy of Optimizing an Algorithm for Exascale,” as a distinguished speaker on Wednesday. Jack also presented the HPCG results for June 2015 and served as Chair for the workshop on Big Data and Exascale Computing (BDEC).

Stan and Jakub, along with ICL collaborator Mike Heroux, gave a linear algebra tutorial during Sunday’s tutorial session. Stan stayed busy as well and presented two papers, “Framework for Batched & GPU-resident Factorization Algorithms Applied to Block Householder Transformations” and “On the Design, Development & Analysis of Optimized Matrix-Vector Multiplication Routines for Coprocessors,” and gave a talk on MAGMA MIC at the Intel booth.

As mentioned above, the 1 day workshop on Big Data and Extreme Scale Computing (BDEC)—premised on the need to systematically map out the ways in which the major issues associated with Big Data intersect and interact with plans for achieving Exascale computing—was co-located with ISC-HPC and featured 15 individual talks and 25 participants.

ISC-HPC wasn’t all paper presentations and talks, however, and the ICL team met with representatives from Cavium, who will be giving ICL access to a dual-socket, 96 (48+48) core ARM system. All in all, ISC-HPC—even under a different name—was as productive as ever for ICL.

Recent Releases

MAGMA MIC 1.4.0 Released

MAGMA MIC 1.4.0 is now available. MAGMA (Matrix Algebra on GPU and Multicore Architectures) is a collection of next generation linear algebra (LA) libraries for heterogeneous architectures. The MAGMA package supports interfaces for current LA packages and standards, e.g., LAPACK and BLAS, to allow computational scientists to easily port any LA-reliant software components to heterogeneous architectures. MAGMA allows applications to fully exploit the power of current heterogeneous systems of multi/many-core CPUs and multi-GPUs/co-processors to deliver the fastest possible time to accurate solution within given energy constraints.

MAGMA MIC provides implementations for MAGMA’s one-sided (LU, QR, and Cholesky) and two-sided (Hessenberg, bi- and tridiagonal reductions) dense matrix factorizations, as well as linear and eigenproblem solvers for Intel Xeon Phi Coprocessors. More information on the approach is given in this presentation.

The MAGMA MIC 1.4.0 release adds:

  • Added port of MAGMA Sparse, including: CG, GMRES, BiCGSTAB (support for both bybrid and native versions); Auxiliary routines; and Preconditioned versions.
  • Added mixed-precision iterative refinement auxiliary routines and a solver for symmetric and positive definite matrices; {zc|ds}posv_mic.
  • Improved dsymv and dgemv in expert interface.
  • Added auxiliary bulge chasing routines used in two-stage eigensolvers.
  • Accelerated reductions to tridiagonal (dsytrd) and upper Hessenberg form (dgehrd) using the expert dsymv and dgemv, respectively.
  • Added test drivers and benchmarking routines.

Visit the MAGMA software page to download the tarball.

HPCC 1.5.0b Released

HPCC 1.5.0 beta is now available. The HPCC (HPC Challenge) benchmark suite is designed to establish, through rigorous testing and measurement, the bounds of performance on many real-world applications for computational science at extreme scale. To this end, the benchmark includes a suite of tests for sustained floating point operations, memory bandwidth, rate of random memory updates, interconnect latency, and interconnect bandwidth.

This latest release consists of minor additions and bug fixes:

  • Added new targets to the main make(1) file.
  • Fixed bug introduced while updating to MPI STREAM 1.7 with spurious global communicator (reported by NEC).
  • Added make(1) file for OpenMPI from MacPorts.
  • Fixed bug introduced while updating to MPI STREAM 1.7 that caused some ranks to use NULL communicator.
  • Fixed bug introduced while updating to MPI STREAM 1.7 that caused syntax errors.

Visit the HPCC software page to download the tarball.

Interview

Chongxiao Cao Then

Chongxiao Cao

Where are you from, originally?

I’m from Hangzhou, a city located in eastern China.

Can you summarize your educational background?

I spent 7 years in Xi’an, an ancient city in China, for undergraduate and graduate studies at Xi’an Jiaotong University. I earned my Bachelor’s degree in Automation Engineering in 2008 and my Master’s degree in System Engineering in 2011. In August 2011, I came to the other side of Earth to pursue my PhD in Computer Science here at UTK.

Tell us how you first learned about ICL.

In the summer of 2010, when I was doing an internship as a software engineer for Alibaba.com, I was interested and impressed by the distributed system design and development work of the largest B2B and C2C website in China. I then made my decision to study abroad in this area of expertise. I talked with professors at my university, and they recommended a book for me to read. That book is “Sourcebook of Parallel Computing” (it’s a Chinese transaction version and it is still on the shelf of my office). From this classic book, I first learned about Jack and his lab – ICL.

What made you want to work for ICL?

Jack is a famous professor, and ICL is a leading lab in parallel and distributed computing. The projects here impressed me and persuaded me to be here.

What are you working on while at ICL?

I’m working with the DisCo group, focusing on developing fault-tolerant features in PaRSEC.

If you weren’t working at ICL, where would you like to be working and why?

I would probably work as a software engineer for Alibaba.com in China, because the headquarters of this company is located in Hangzhou, China—my hometown.

What are your interests/hobbies outside work?

I like swimming, hiking, and playing tennis. I also play computer games; my favorite one is “Hearthstone.”

Tell us something about yourself that might surprise people.

I have a twin brother, but his career path is very different than mine. I chose engineering as my major and he majored in business. He is now an accountant at an IT company in China.

Recent Papers

  1. Reed, D., and J. Dongarra, Exascale Computing and Big Data,” Communications of the ACM, vol. 58, no. 7: ACM, pp. 56-68, July 2015. DOI: 10.1145/2699414  (7.3 MB)
  2. Chow, E., H. Anzt, and J. Dongarra, Asynchronous Iterative Algorithm for Computing Incomplete Factorizations on GPUs,” International Supercomputing Conference (ISC 2015), Frankfurt, Germany, July 2015.
  3. Benoit, A., S. K. Raina, and Y. Robert, Efficient Checkpoint/Verification Patterns,” International Journal on High Performance Computing Applications, July 2015. DOI: 10.1177/1094342015594531  (392.76 KB)
  4. Haidar, A., T. Dong, S. Tomov, P. Luszczek, and J. Dongarra, Framework for Batched and GPU-resident Factorization Algorithms to Block Householder Transformations,” ISC High Performance, Frankfurt, Germany, Springer, July 2015.  (778.26 KB)
  5. Kabir, K., A. Haidar, S. Tomov, and J. Dongarra, On the Design, Development, and Analysis of Optimized Matrix-Vector Multiplication Routines for Coprocessors,” ISC High Performance 2015, Frankfurt, Germany, July 2015.  (1.49 MB)
  6. YarKhan, A., A. Haidar, C. Cao, P. Luszczek, S. Tomov, and J. Dongarra, Cholesky Across Accelerators,” 17th IEEE International Conference on High Performance Computing and Communications (HPCC 2015), Elizabeth, NJ, IEEE, August 2015.
  7. Haidar, A., A. YarKhan, C. Cao, P. Luszczek, S. Tomov, and J. Dongarra, Flexible Linear Algebra Development and Scheduling with Cholesky Factorization,” 17th IEEE International Conference on High Performance Computing and Communications, Newark, NJ, August 2015.  (494.31 KB)
  8. Anzt, H., E. Chow, and J. Dongarra, Iterative Sparse Triangular Solves for Preconditioning,” EuroPar 2015, Vienna, Austria, Springer Berlin, August 2015. DOI: 10.1007/978-3-662-48096-0_50  (322.36 KB)

Recent Conferences

  1. JUL
    ISC High Performance 2015 Frankfurt, Germany
    Jack Dongarra
    Jack
    Jakub Kurzak
    Jakub
    Stanimire Tomov
    Stan
    Terry Moore
    Terry
    Tracy Rafferty
    Tracy
    Jack Dongarra, Jakub Kurzak, Stanimire Tomov, Terry Moore, Tracy Rafferty
  2. AUG
    OpenShmem workshop Annapolis, Delaware
    Aurelien Bouteiller
    Aurelien
    Aurelien Bouteiller
  3. AUG
    EuroPar 2015 Vienna, Austria
    Hartwig Anzt
    Hartwig
    Hartwig Anzt
  4. AUG
    HPCC 2015 Newark, New Jersey
    Jack Dongarra
    Jack
    Piotr Luszczek
    Piotr
    Jack Dongarra, Piotr Luszczek
  5. AUG
    Stanimire Tomov
    Stan
    Stanimire Tomov

Upcoming Conferences

  1. SEP
    9th Parallel Tools Workshop Dresden, Germany
    Heike McCraw
    Heike
    Heike McCraw
  2. SEP
    PPAM15 Poland, Krakow
    Heike McCraw
    Heike
    Heike McCraw
  3. SEP
    IEEE Cluster 2015 Chicago, Illinois
    Anthony Danalis
    Anthony
    Anthony Danalis
  4. SEP
    ORNL scientific seminar series Oak Ridge, Tennessee
    Azzam Haidar
    Azzam
    Azzam Haidar
  5. SEP
    Piotr Luszczek
    Piotr
    Piotr Luszczek
  6. SEP
    Euro MPI 2015 Bordeaux, France
    George Bosilca
    George
    George Bosilca
  7. SEP
    Intel Big Data Retreat Hillsboro, Oregon
    Piotr Luszczek
    Piotr
    Thomas Herault
    Thomas
    Piotr Luszczek, Thomas Herault
  8. SEP
    Jack Dongarra
    Jack
    Jack Dongarra

Recent Lunch Talks

  1. JUL
    1
    Ed Valeev
    Ed Valeev
    Virginia Tech
    Tensor Computation for Chemistry Sparsity and More PDF
  2. JUL
    1
    Torsten Hoefler
    Torsten Hoefler
    ETH Zürich
    Towards Fully Automated Interpretable Performance Models PDF
  3. JUL
    17
    Sangamesh Ragate
    Sangamesh Ragate
    PC Sampling in GPU PDF
  4. JUL
    31
    Joseph Schuchart
    Joseph Schuchart
    TU Dresden
    HPC energy-efficiency research at ZIH, Or: What the HAEC is HDEEM? PDF
  5. AUG
    7
    Ian Masliah
    Ian Masliah
    University of Paris-Sud
    Towards C++ and Beyond PDF
  6. AUG
    21
    Yaohung Tsai
    Yaohung Tsai
    Convolutional Layers in RaPyDLI PDF
  7. AUG
    28
    Tingxing Dong
    Tingxing Dong
    Batched Linear Algebra Problems on Hardware Accelerators Based on GPUs PDF

Upcoming Lunch Talks

  1. SEP
    4
    Mathieu Faverge
    Mathieu Faverge
    Inria
    Blocking Strategy Optimizations for Sparse Direct Linear Solver on Heterogeneous Architectures PDF
  2. SEP
    11
    Asim YarKhan
    Asim YarKhan
    OpenMP Tasks and PLASMA PDF
  3. SEP
    18
    Mark Gates
    Mark Gates
    Accelerating Collaborative Filtering Using Concepts from High Performance Computing PDF
  4. SEP
    25
    Ichitaro Yamazaki
    Ichitaro Yamazaki
    Random Sampling to Update Truncated SVD PDF

Visitors

  1. Valentin Le Fevre
    Valentin Le Fevre from ENS de Lyon will be visiting from May 17 through August 8. Valentin is visiting ICL for several months and will be working with the Distributed Computing group.
  2. Ian Masliah
    Ian Masliah from University of Paris-Sud will be visiting from July 27 through August 17. Ian will be working with the Linear Algebra Group.

People

  1. Mathieu Faverge
    ICL alumni Mathieu Faverge will be visiting ICL and working with the Linear Algebra Group for a month beginning in August. Welcome back, Mathieu!
  2. Ian Masliah
    Ian Masliah from the University of Paris-Sud will be visiting from July 27 through August 17. Ian will be working with the Linear Algebra Group.
  3. Joe Dorris
    Joe Dorris will be joining ICL this fall semester as an MS student working with the Linear Algebra Group. Welcome aboard, Joe!
  4. Phil Vaccaro
    Phil Vaccaro will be joining ICL this fall semester as an MS student working with the Performance Analysis Group. Welcome to ICL, Phil!

Visitors

  1. Valentin Le Fevre
    Valentin Le Fevre from ENS de Lyon will be visiting from May 17 through August 8. Valentin is visiting ICL for several months and will be working with the Distributed Computing group.
  2. Ian Masliah
    Ian Masliah from University of Paris-Sud will be visiting from July 27 through August 17. Ian will be working with the Linear Algebra Group.

congratulations

Mr. and Mrs. Gates

On July 25th, ICL’s Mark Gates married Elaine Gates née Davis. Congratulations to the bride and groom!

Dates to Remember

ICL Annual Retreat

The 2015 ICL Annual Retreat, the 16th such retreat in the lab’s history, has been set for August 13th-14th at the Tremont Lodge in Townsend, TN. Dinner on Thursday will be held at Miss Lily’s Cafe. Mark your calendars.