News and Announcements
TOP500 – November 2015
The 46th TOP500 rankings were presented at SC15. For the 6th consecutive time, China’s Tianhe-2 has remained at the top of the ranking with 33.863 petaflop/s on the High Performance LINPACK benchmark. Tianhe-2 has 16,000 nodes, each with two Intel Xeon Ivy Bridge processors and three Xeon Phi coprocessors for a combined total of 3,120,000 computing cores.
As for the rest of the list, the top 5 machines remain unchanged, but two systems were added to the TOP10, including DOE’s new Cray machine Trinity (#6), and HLRS’s Cray machine Hazel-Hen (#8). The November list also sees China nearly triple the number of machines in the TOP500, while the US has dropped to the lowest number of machines since the inception of the list in 1993.
Nages Sieslack of ISC Group caught up with Jack Dongarra to get his thoughts on the latest TOP500 list and more. You can read the entire interview here.
More details on the the 46th edition of the TOP500 are available in the official press release.
| Rank | Site | System | Rmax (TFlop/s) |
|---|---|---|---|
|
1 |
33,862.7 |
||
|
2 |
DOE/SC/Oak Ridge National Laboratory |
Titan – Cray XK7 |
17,590.0 |
|
3 |
DOE/NNSA/LLNL |
17,173.2 |
|
|
4 |
RIKEN Advanced Institute for Computational Science (AICS) |
K computer, SPARC64 VIIIfx |
10,510.0 |
|
5 |
DOE/SC/Argonne National Laboratory |
8,586.6 |
|
| See the full list at TOP500.org. | |||
HPCG Results – November 2015
The November 2015 results for the HPC Preconditioned Conjugate Gradient (HPCG) benchmark were released on November 18th at the SC15 HPCG BoF in Austin, TX. Intended to be a new HPC metric, HPCG is designed to measure performance that is representative of modern HPC capability by simulating patterns commonly found in real science and engineering applications.
To keep pace with the changing hardware and software infrastructures, HPCG results will be used to augment the TOP500 rankings to show how real world applications might fare on a given machine. In the table below, you can see how the HPCG benchmark would have ranked its top 5 machines, and where those machines ranked on the LINPACK-based TOP500 list. The full list of rankings is available here.
| Site | Computer | HPL (Pflop/s) | TOP500 Rank | HPCG (Pflop/s) | HPCG Rank | %Peak |
| NSCC / Guangzhou | Tianhe-2 NUDT, Xeon 12C 2.2GHz + Intel Xeon Phi 57C + Custom | 33.863 | 1 | 0.5800 | 1 | 1.1% |
| RIKEN Advanced Institute for Computational Science | K computer, SPARC64 VIIIfx 2.0GHz, Tofu interconnect | 10.510 | 4 | 0.4608 | 2 | 4.1% |
| DOE/SC/Oak Ridge Nat Lab | Titan – Cray XK7 , Opteron 6274 16C 2.200GHz, Cray Gemini interconnect, NVIDIA K20x | 17.590 | 2 | 0.3223 | 3 | 1.2% |
| DOE/NNSA/LANL/SNL | Trinity – Cray XC40, Intel E5-2698v3, Aries custom | 8.101 | 6 | 0.1826 | 4 | 1.6% |
| DOE/SC/Argonne National Laboratory | Mira – BlueGene/Q, Power BQC 16C 1.60GHz, Custom | 8.587 | 5 | 0.1670 | 5 | 1.7% |
Jack Dongarra Receives HPCwire Readers’ Choice Award

Jack Dongarra received HPCwire Readers’ Choice Award for Outstanding Leadership in HPC at SC15 in Austin, TX. These awards, which are nominated and voted on by the HPC community at large, recognize the best and brightest developments in HPC in the last 12 months. Congratulations Jack!
Conference Reports
SC15
This year’s International Conference for High Performance Computing, Networking, Storage and Analysis (SC15) was held in Austin, TX on November 15 – 20. As usual, ICL had a significant presence at SC15, with faculty, research staff, and students giving talks, presenting papers, and leading BoF sessions.
For the fourth consecutive year, ICL was active in the University of Tennessee’s SC booth. The booth, which was organized and led by the National Institute for Computational Sciences (NICS), was visually designed with the help of ICL/CITR staff, manned with support from ICL researchers attending SC, and featured some of the lab’s research projects.
ICLers both past and present who attended SC15 were invited to the traditional ICL Alumni Dinner. Austin’s Z’Tejas hosted this year’s dinner, and there were plenty of conversations shared between old friends and colleagues, as the ideas and drinks flowed freely. In the end, everyone enjoyed the mini-reunion as they capped off the last major conference of the year.
Recent Releases
HPCG 3.0 Released
HPCG 3.0 is now available! The HPCG (High Performance Conjugate Gradients) benchmark is designed to measure performance that is representative of modern scientific applications. It does so by exercising the computational and communication patterns that are commonly found in real science and engineering codes, which are often based on sparse iterative solvers. HPCG exhibits the same irregular accesses to memory and fine-grain recursive computations that dominate large-scale scientific workloads used to simulate complex physical phenomena. Intended as a candidate for a new HPC metric, HPCG implements the preconditioned conjugate gradient algorithm with a local symmetric Gauss-Seidel as the preconditioner. Additionally, the essential components of the geometric multigrid method are present in the code as a way to represent execution patterns of modern multigrid solvers.
The HPCG 3.0 Reference Code includes updates for the following:
- Problem generation is a timed portion of the benchmark. This time is now added to any time spent optimizing data structures and counted as overhead when computing the official GFLOP/s rating. The total overhead time is divided by 500 to amortize its cost over 500 iterations.
- Added memory usage counting and reporting.
- Added memory bandwidth measurement and reporting
- Added a “Quick Path” option to make obtaining results on production systems easier. With this option, obtaining a rating will take only a few minutes. This option also makes profiling and debugging easier. The Quick Path option is invoked by setting the run time to zero, either in hpcg.dat or by using the –rt=0 option.
- Added a command line option (–rt=) to specify the run time.
- Made a few small changes to support easier builds on MS Windows.
- Changed the way the residual variance is computed to make sure it is zero if all residual values are identical.
- Changed the order of array allocation in the reference code in order to improve performance.
- Set the minimum iteration count for the optimized run to be the same as what is used in the reference run.
HPCG 3.0 releases optimized for Intel Xeon CPUs and Xeon Phi coprocessors and NVIDIA GPUs are also available.
Visit the HPCG software page to download the tarballs.
LAPACK 3.6.0 Released
LAPACK 3.6.0 has been released! LAPACK (the Linear Algebra PACKage) is a widely used library for efficiently solving dense linear algebra problems, and ICL has been a major contributor to the development and maintenance of LAPACK since its inception. LAPACK is sequential, relies on the BLAS library, and benefits from the multicore BLAS library.
LAPACK 3.6.0 includes BLAS3 routines for generalized SVD; new routines to compute a Subset of the Singular Value Decomposition, full or partial (subset) SVD of a bidiagonal matrix through an associated eigenvalue problem, full or partial (subset) SVD of a general matrix; new Complex Jacobi SVD routines; Recursive Cholesky routines; and some improvements to QR (NEP).
Visit the LAPACK website to download the tarball.
Open MPI 1.10.1 Released
Open MPI 1.10.1 is now available! Open MPI is an open source MPI implementation that is developed and maintained by a consortium of academic, research, and industry partners. MPI primarily addresses the message-passing parallel programming model, in which data is moved from the address space of one process to that of another process through cooperative operations on each process. Open MPI integrates technologies and resources from several other projects (HARNESS/FT-MPI, LA-MPI, LAM/MPI, and PACX-MPI) in order to build the best MPI library available. A completely new MPI-3.1 compliant implementation, Open MPI offers advantages for system and software vendors, application developers, and computer science researchers.
Visit the Open MPI project page to download the tarball.
ULFM 1.1 Released
ULFM 1.1 is now available! This release includes bug fixes identified by the user/developer community. ULFM (User Level Failure Mitigation) is a set of new interfaces for MPI that enables message passing applications to restore MPI functionality affected by process failures. The MPI implementation is spared the expense of internally taking protective and corrective automatic actions against failures. Instead, it can prevent any fault-related deadlock situation by reporting operations whose completions were rendered impossible by failures.
This release focused on improving stability, feature coverage for intercommunicators, following the updated specification for MPI_ERR_PROC_FAILED_PENDING, and includes the following enhancements and fixes:
- Addition of the MPI_ERR_PROC_FAILED_PENDING error code, as per newer specification revision. Properly returned from point-to-point, non-blocking ANY_SOURCE operations.
- Alias MPI_ERR_PROC_FAILED, MPI_ERR_PROC_FAILED_PENDING and MPI_ERR_REVOKED to the corresponding standard blessed – extension- names MPIX_ERR_xxx.
- Support for Intercommunicators:
- Support for the blocking version of the agreement, MPI_COMM_AGREE on Intercommunicators.
- MPI_COMM_REVOKE tested on intercommunicators.
- Completely disabled (.ompi_ignore) many untested components.
- Changed the default ORTE failure notification propagation aggregation delay from 1s to 25ms.
- Added an OMPI internal failure propagator; failure propagation between SM domains is now immediate.
- Bugfixes:
- SendRecv would not always report MPI_ERR_PROC_FAILED correctly.
- SendRecv could incorrectly update the status with errors pertaining to the Send portion of the Sendrecv.
- Revoked send operations are now always completed or remote cancelled and may not deadlock anymore.
- Cancelled send operations to a dead peer will not trigger an assert when the BTL reports that same failure.
- Repeat calls to operations returning MPI_ERR_PROC_FAILED will eventually return MPI_ERR_REVOKED when another process revokes the communicator.
Visit the ULFM website to download the tarball.
Interview

Erika Fuentes
Where are you from, originally?
I was born in Mexico City.
Can you summarize your educational background?
I earned a college degree in Computer Engineering from the Institute of Technology and Superior Studies of Monterrey. I then earned my MS and a PhD in Computer Science from the University of Tennessee, Knoxville.
How did you get introduced to ICL?
I first learned about MPI during undergrad, and after finishing my bachelor’s degree I moved to the US to pursue graduate education. I then came to learn that a research group at UT worked on MPI—and many other cool projects. I visited UT after being accepted into the CS graduate program; of all days it was Good Friday and I wasn’t aware that the school was closed! However, I met Terry Moore, who introduced me to ICL (in fact, most of the group was there, including Jack of course).
What did you work on during your time at ICL?
I first started as a member of ICL help, but when I started my PhD studies, I worked on the Self-Adaptive Large Scale Solver Architecture (SALSA).
What are some of your favorite memories from your time at ICL?
Friday lunches! Great talks, conversation, and of course free food for poor students. But generally speaking, memories of learning from and working with knowledgeable and talented colleagues who were also supportive and friendly, making the research group feel like a family. It doesn’t matter how long you’ve been away, you always feel welcomed and comfortable going back. For instance, the gatherings at SC remind me of such times.
Tell us where you are and what you’re doing now.
I am currently working as full time faculty at the University of Washington, Bothell campus, where I started just this fall. As part of the School of STEM, one of my current priorities is to grow and develop computer science graduate curriculum that includes HPC courses. Our graduate program is growing tremendously as our campus supports working students from local high-tech companies, including Microsoft, Google, Amazon, Intel, and Boeing. Previously, I worked for Microsoft for 5 years, but I was drawn back to academia and I was faculty at Washington State University (Mechanical and Materials Engineering) and at Everett Community College before joining the University of Washington.
In what ways did working at ICL prepare you for what you do now, if at all?
Countless ways… not just with the technical knowledge and experience, but also, I had great advising from colleagues at different levels, I learned how to push myself into being confident about working independently on research and being innovative. I also learned how to creatively solve challenging problems. Another aspect, which I use every day, is how to communicate effectively to collaborate with other faculty, present ideas, and of course, advise my current students.
Tell us something about yourself that might surprise some people.
Although I was born in Mexico, I do not consider myself Mexican. I grew up in a family and community of mostly European immigrants, I attended an American/British grade school. I grew up eating British pasties and French pastries (odd combo), and part of my heritage comes from Cornish mining engineers. I always wanted to come to the US where the culture matches my diverse background.



































