News and Announcements
TOP500 – June 2015
The 45th TOP500 rankings were presented at this year’s ISC High Performance conference in Frankfurt, Germany. For the 5th consecutive time, China’s Tianhe-2 has remained at the top of the ranking with 33.863 petaflop/s on the High Performance LINPACK benchmark. Tianhe-2 has 16,000 nodes, each with two Intel Xeon Ivy Bridge processors and three Xeon Phi coprocessors for a combined total of 3,120,000 computing cores.
As for the rest of the list, the top 5 machines remain unchanged, but there is a new system at No. 7. Shaheen II—a Cray XC40 system installed at King Abdullah University of Science and Technology (KAUST)—achieved 5.536 petaflop/s on the LINPACK benchmark, making it the highest-ranked Middle East system in the 22-year history of the list and the first to reach the Top 10.
More details on the the 45th edition of the TOP500 are available in the official press release.
| Rank | Site | System | Rmax (TFlop/s) |
|---|---|---|---|
|
1 |
33,862.7 |
||
|
2 |
DOE/SC/Oak Ridge National Laboratory |
Titan – Cray XK7 |
17,590.0 |
|
3 |
DOE/NNSA/LLNL |
17,173.2 |
|
|
4 |
RIKEN Advanced Institute for Computational Science (AICS) |
K computer, SPARC64 VIIIfx |
10,510.0 |
|
5 |
DOE/SC/Argonne National Laboratory |
8,586.6 |
|
| See the full list at TOP500.org. | |||
In the interview above, Jack Dongarra and Erich Strohmaier discuss the possible reasons for the performance plateau evident in recent TOP500 lists. They also share their expectations, including a leap in performance in the near future, and the arrival of Exascale in 2022-2023.
DARE Funded
ICL’s Data-driven Autotuning for Runtime Execution (DARE) project was recently funded by the NSF. DARE will investigate techniques for empirical autotuning of execution schedules for parallel applications on scalable multicore-plus-accelerator hybrid systems using a software workbench—the DARE framework. The DARE framework will optimize data and work placement, granularity, and scheduling decisions for maximum performance of a given application on a given hardware system. Congratulations to the DARE team!
HPCG June 2015 Results
The June 2015 results for the HPC Preconditioned Conjugate Gradient (HPCG) benchmark were released on July 15th at the ISC-HPC conference. Intended to be a new HPC metric, HPCG is designed to measure performance that is representative of modern HPC capability by simulating patterns commonly found in real science and engineering applications.
To keep pace with the changing hardware and software infrastructures, HPCG results will be used to augment the TOP500 rankings to show how real world applications might fare on a given machine. In the table below, you can see how the HPCG benchmark would have ranked its top 10 machines, and where those machines ranked on the LINPACK-based TOP500 list. The full list of rankings is available here.
| Site | Computer | HPL (Pflop/s) | TOP500 Rank | HPCG (Pflop/s) | HPCG Rank | %Peak |
| NSCC / Guangzhou | Tianhe-2 NUDT, Xeon 12C 2.2GHz + Intel Xeon Phi 57C + Custom | 33.863 | 1 | 0.5800 | 1 | 1.1% |
| RIKEN Advanced Institute for Computational Science | K computer, SPARC64 VIIIfx 2.0GHz, Tofu interconnect | 10.510 | 4 | 0.4608 | 2 | 4.1% |
| DOE/SC/Oak Ridge Nat Lab | Titan – Cray XK7 , Opteron 6274 16C 2.200GHz, Cray Gemini interconnect, NVIDIA K20x | 17.590 | 2 | 0.3223 | 3 | 1.2% |
| DOE/SC/Argonne National Laboratory | Mira – BlueGene/Q, Power BQC 16C 1.60GHz, Custom | 8.587 | 5 | 0.1670 | 4 | 1.7% |
| NASA / Mountain View | Pleiades – SGI ICE X, Intel E5-2680, E5-2680V2, E5-2680V3, Infiniband FDR | 4.089 | 11 | 0.1319 | 5 | 2.7% |
| Swiss National Supercomputing Centre (CSCS) | Piz Daint – Cray XC30, Xeon E5-2670 8C 2.600GHz, Aries interconnect , NVIDIA K20x | 6.271 | 6 | 0.1246 | 6 | 1.6% |
| KAUST / Jeda | Shaheen II – Cray XC40, Intel Haswell 2.3 GHz 16C, Cray Aries | 5.537 | 7 | 0.1139 | 7 | 1.6% |
| Texas Advanced Computing Center/Univ. of Texas | Stampede – PowerEdge C8220, Xeon E5-2680 8C 2.700GHz, Infiniband FDR, Intel Xeon Phi SE10P | 5.168 | 8 | 0.0968 | 8 | 1.0% |
| Leibniz Rechenzentrum | SuperMUC – iDataPlex DX360M4, Xeon E5-2680 8C 2.70GHz, Infiniband FDR | 2.897 | 20 | 0.0833 | 9 | 2.6% |
| EPSRC/University of Edinburgh | ARCHER – Cray XC30, Intel Xeon E5 v2 12C 2.700GHz, Aries interconnect | 1.643 | 34 | 0.0808 | 10 | 3.2% |
Conference Reports
ISC High Performance 2015
This year’s ISC meeting, now known as ISC High Performance (ISC-HPC), was held on July 12-16 in Frankfurt, Germany. ICL was well represented at ISC-HPC with Jack Dongarra, Stan Tomov, Jakub Kurzak, Terry Moore, and Tracy Rafferty all making their way to Frankfurt for the conference.
Jack was in high demand as usual. He started off the week presenting the June 2015 TOP500 awards on Monday, and then gave a talk, “Anatomy of Optimizing an Algorithm for Exascale,” as a distinguished speaker on Wednesday. Jack also presented the HPCG results for June 2015 and served as Chair for the workshop on Big Data and Exascale Computing (BDEC).
Stan and Jakub, along with ICL collaborator Mike Heroux, gave a linear algebra tutorial during Sunday’s tutorial session. Stan stayed busy as well and presented two papers, “Framework for Batched & GPU-resident Factorization Algorithms Applied to Block Householder Transformations” and “On the Design, Development & Analysis of Optimized Matrix-Vector Multiplication Routines for Coprocessors,” and gave a talk on MAGMA MIC at the Intel booth.
As mentioned above, the 1 day workshop on Big Data and Extreme Scale Computing (BDEC)—premised on the need to systematically map out the ways in which the major issues associated with Big Data intersect and interact with plans for achieving Exascale computing—was co-located with ISC-HPC and featured 15 individual talks and 25 participants.
ISC-HPC wasn’t all paper presentations and talks, however, and the ICL team met with representatives from Cavium, who will be giving ICL access to a dual-socket, 96 (48+48) core ARM system. All in all, ISC-HPC—even under a different name—was as productive as ever for ICL.
Recent Releases
MAGMA MIC 1.4.0 Released
MAGMA MIC 1.4.0 is now available. MAGMA (Matrix Algebra on GPU and Multicore Architectures) is a collection of next generation linear algebra (LA) libraries for heterogeneous architectures. The MAGMA package supports interfaces for current LA packages and standards, e.g., LAPACK and BLAS, to allow computational scientists to easily port any LA-reliant software components to heterogeneous architectures. MAGMA allows applications to fully exploit the power of current heterogeneous systems of multi/many-core CPUs and multi-GPUs/co-processors to deliver the fastest possible time to accurate solution within given energy constraints.
MAGMA MIC provides implementations for MAGMA’s one-sided (LU, QR, and Cholesky) and two-sided (Hessenberg, bi- and tridiagonal reductions) dense matrix factorizations, as well as linear and eigenproblem solvers for Intel Xeon Phi Coprocessors. More information on the approach is given in this presentation.
The MAGMA MIC 1.4.0 release adds:
- Added port of MAGMA Sparse, including: CG, GMRES, BiCGSTAB (support for both bybrid and native versions); Auxiliary routines; and Preconditioned versions.
- Added mixed-precision iterative refinement auxiliary routines and a solver for symmetric and positive definite matrices; {zc|ds}posv_mic.
- Improved dsymv and dgemv in expert interface.
- Added auxiliary bulge chasing routines used in two-stage eigensolvers.
- Accelerated reductions to tridiagonal (dsytrd) and upper Hessenberg form (dgehrd) using the expert dsymv and dgemv, respectively.
- Added test drivers and benchmarking routines.
Visit the MAGMA software page to download the tarball.
HPCC 1.5.0b Released
HPCC 1.5.0 beta is now available. The HPCC (HPC Challenge) benchmark suite is designed to establish, through rigorous testing and measurement, the bounds of performance on many real-world applications for computational science at extreme scale. To this end, the benchmark includes a suite of tests for sustained floating point operations, memory bandwidth, rate of random memory updates, interconnect latency, and interconnect bandwidth.
This latest release consists of minor additions and bug fixes:
- Added new targets to the main make(1) file.
- Fixed bug introduced while updating to MPI STREAM 1.7 with spurious global communicator (reported by NEC).
- Added make(1) file for OpenMPI from MacPorts.
- Fixed bug introduced while updating to MPI STREAM 1.7 that caused some ranks to use NULL communicator.
- Fixed bug introduced while updating to MPI STREAM 1.7 that caused syntax errors.
Visit the HPCC software page to download the tarball.
Interview

Chongxiao Cao
Where are you from, originally?
I’m from Hangzhou, a city located in eastern China.
Can you summarize your educational background?
I spent 7 years in Xi’an, an ancient city in China, for undergraduate and graduate studies at Xi’an Jiaotong University. I earned my Bachelor’s degree in Automation Engineering in 2008 and my Master’s degree in System Engineering in 2011. In August 2011, I came to the other side of Earth to pursue my PhD in Computer Science here at UTK.
Tell us how you first learned about ICL.
In the summer of 2010, when I was doing an internship as a software engineer for Alibaba.com, I was interested and impressed by the distributed system design and development work of the largest B2B and C2C website in China. I then made my decision to study abroad in this area of expertise. I talked with professors at my university, and they recommended a book for me to read. That book is “Sourcebook of Parallel Computing” (it’s a Chinese transaction version and it is still on the shelf of my office). From this classic book, I first learned about Jack and his lab – ICL.
What made you want to work for ICL?
Jack is a famous professor, and ICL is a leading lab in parallel and distributed computing. The projects here impressed me and persuaded me to be here.
What are you working on while at ICL?
I’m working with the DisCo group, focusing on developing fault-tolerant features in PaRSEC.
If you weren’t working at ICL, where would you like to be working and why?
I would probably work as a software engineer for Alibaba.com in China, because the headquarters of this company is located in Hangzhou, China—my hometown.
What are your interests/hobbies outside work?
I like swimming, hiking, and playing tennis. I also play computer games; my favorite one is “Hearthstone.”
Tell us something about yourself that might surprise people.
I have a twin brother, but his career path is very different than mine. I chose engineering as my major and he majored in business. He is now an accountant at an IT company in China.





























