ICL Newsletter

News and Announcements

TOP500 – November 2015

The 46th TOP500 rankings were presented at SC15. For the 6th consecutive time, China’s Tianhe-2 has remained at the top of the ranking with 33.863 petaflop/s on the High Performance LINPACK benchmark. Tianhe-2 has 16,000 nodes, each with two Intel Xeon Ivy Bridge processors and three Xeon Phi coprocessors for a combined total of 3,120,000 computing cores.

As for the rest of the list, the top 5 machines remain unchanged, but two systems were added to the TOP10, including DOE’s new Cray machine Trinity (#6), and HLRS’s Cray machine Hazel-Hen (#8). The November list also sees China nearly triple the number of machines in the TOP500, while the US has dropped to the lowest number of machines since the inception of the list in 1993.

Nages Sieslack of ISC Group caught up with Jack Dongarra to get his thoughts on the latest TOP500 list and more. You can read the entire interview here.

More details on the the 46th edition of the TOP500 are available in the official press release.

Rank	Site	System	Rmax (TFlop/s)
1	National Super Computer Center in Guangzhou China	Tianhe-2 (MilkyWay-2) – TH-IVB-FEP Cluster NUDT	33,862.7
2	DOE/SC/Oak Ridge National Laboratory United States	Titan – Cray XK7 Cray Inc.	17,590.0
3	DOE/NNSA/LLNL United States	Sequoia – BlueGene/Q IBM	17,173.2
4	RIKEN Advanced Institute for Computational Science (AICS) Japan	K computer, SPARC64 VIIIfx Fujitsu	10,510.0
5	DOE/SC/Argonne National Laboratory United States	Mira – BlueGene/Q IBM	8,586.6
See the full list at TOP500.org.

HPCG Results – November 2015

The November 2015 results for the HPC Preconditioned Conjugate Gradient (HPCG) benchmark were released on November 18th at the SC15 HPCG BoF in Austin, TX. Intended to be a new HPC metric, HPCG is designed to measure performance that is representative of modern HPC capability by simulating patterns commonly found in real science and engineering applications.

To keep pace with the changing hardware and software infrastructures, HPCG results will be used to augment the TOP500 rankings to show how real world applications might fare on a given machine. In the table below, you can see how the HPCG benchmark would have ranked its top 5 machines, and where those machines ranked on the LINPACK-based TOP500 list. The full list of rankings is available here.

Site	Computer	HPL (Pflop/s)	TOP500 Rank	HPCG (Pflop/s)	HPCG Rank	%Peak
NSCC / Guangzhou	Tianhe-2 NUDT, Xeon 12C 2.2GHz + Intel Xeon Phi 57C + Custom	33.863	1	0.5800	1	1.1%
RIKEN Advanced Institute for Computational Science	K computer, SPARC64 VIIIfx 2.0GHz, Tofu interconnect	10.510	4	0.4608	2	4.1%
DOE/SC/Oak Ridge Nat Lab	Titan – Cray XK7 , Opteron 6274 16C 2.200GHz, Cray Gemini interconnect, NVIDIA K20x	17.590	2	0.3223	3	1.2%
DOE/NNSA/LANL/SNL	Trinity – Cray XC40, Intel E5-2698v3, Aries custom	8.101	6	0.1826	4	1.6%
DOE/SC/Argonne National Laboratory	Mira – BlueGene/Q, Power BQC 16C 1.60GHz, Custom	8.587	5	0.1670	5	1.7%

Jack Dongarra Receives HPCwire Readers’ Choice Award

Jack Dongarra received HPCwire Readers’ Choice Award for Outstanding Leadership in HPC at SC15 in Austin, TX. These awards, which are nominated and voted on by the HPC community at large, recognize the best and brightest developments in HPC in the last 12 months. Congratulations Jack!

Conference Reports

SC15

This year’s International Conference for High Performance Computing, Networking, Storage and Analysis (SC15) was held in Austin, TX on November 15 – 20. As usual, ICL had a significant presence at SC15, with faculty, research staff, and students giving talks, presenting papers, and leading BoF sessions.

For the fourth consecutive year, ICL was active in the University of Tennessee’s SC booth. The booth, which was organized and led by the National Institute for Computational Sciences (NICS), was visually designed with the help of ICL/CITR staff, manned with support from ICL researchers attending SC, and featured some of the lab’s research projects.

ICLers both past and present who attended SC15 were invited to the traditional ICL Alumni Dinner. Austin’s Z’Tejas hosted this year’s dinner, and there were plenty of conversations shared between old friends and colleagues, as the ideas and drinks flowed freely. In the end, everyone enjoyed the mini-reunion as they capped off the last major conference of the year.

Recent Releases

HPCG 3.0 Released

HPCG 3.0 is now available! The HPCG (High Performance Conjugate Gradients) benchmark is designed to measure performance that is representative of modern scientific applications. It does so by exercising the computational and communication patterns that are commonly found in real science and engineering codes, which are often based on sparse iterative solvers. HPCG exhibits the same irregular accesses to memory and fine-grain recursive computations that dominate large-scale scientific workloads used to simulate complex physical phenomena. Intended as a candidate for a new HPC metric, HPCG implements the preconditioned conjugate gradient algorithm with a local symmetric Gauss-Seidel as the preconditioner. Additionally, the essential components of the geometric multigrid method are present in the code as a way to represent execution patterns of modern multigrid solvers.

The HPCG 3.0 Reference Code includes updates for the following:

Problem generation is a timed portion of the benchmark. This time is now added to any time spent optimizing data structures and counted as overhead when computing the official GFLOP/s rating. The total overhead time is divided by 500 to amortize its cost over 500 iterations.
Added memory usage counting and reporting.
Added memory bandwidth measurement and reporting
Added a “Quick Path” option to make obtaining results on production systems easier. With this option, obtaining a rating will take only a few minutes. This option also makes profiling and debugging easier. The Quick Path option is invoked by setting the run time to zero, either in hpcg.dat or by using the –rt=0 option.
Added a command line option (–rt=) to specify the run time.
Made a few small changes to support easier builds on MS Windows.
Changed the way the residual variance is computed to make sure it is zero if all residual values are identical.
Changed the order of array allocation in the reference code in order to improve performance.
Set the minimum iteration count for the optimized run to be the same as what is used in the reference run.

HPCG 3.0 releases optimized for Intel Xeon CPUs and Xeon Phi coprocessors and NVIDIA GPUs are also available.

Visit the HPCG software page to download the tarballs.

LAPACK 3.6.0 Released

LAPACK 3.6.0 has been released! LAPACK (the Linear Algebra PACKage) is a widely used library for efficiently solving dense linear algebra problems, and ICL has been a major contributor to the development and maintenance of LAPACK since its inception. LAPACK is sequential, relies on the BLAS library, and benefits from the multicore BLAS library.

LAPACK 3.6.0 includes BLAS3 routines for generalized SVD; new routines to compute a Subset of the Singular Value Decomposition, full or partial (subset) SVD of a bidiagonal matrix through an associated eigenvalue problem, full or partial (subset) SVD of a general matrix; new Complex Jacobi SVD routines; Recursive Cholesky routines; and some improvements to QR (NEP).

Visit the LAPACK website to download the tarball.

Open MPI 1.10.1 Released

Open MPI 1.10.1 is now available! Open MPI is an open source MPI implementation that is developed and maintained by a consortium of academic, research, and industry partners. MPI primarily addresses the message-passing parallel programming model, in which data is moved from the address space of one process to that of another process through cooperative operations on each process. Open MPI integrates technologies and resources from several other projects (HARNESS/FT-MPI, LA-MPI, LAM/MPI, and PACX-MPI) in order to build the best MPI library available. A completely new MPI-3.1 compliant implementation, Open MPI offers advantages for system and software vendors, application developers, and computer science researchers.

Visit the Open MPI project page to download the tarball.

ULFM 1.1 Released

ULFM 1.1 is now available! This release includes bug fixes identified by the user/developer community. ULFM (User Level Failure Mitigation) is a set of new interfaces for MPI that enables message passing applications to restore MPI functionality affected by process failures. The MPI implementation is spared the expense of internally taking protective and corrective automatic actions against failures. Instead, it can prevent any fault-related deadlock situation by reporting operations whose completions were rendered impossible by failures.

This release focused on improving stability, feature coverage for intercommunicators, following the updated specification for MPI_ERR_PROC_FAILED_PENDING, and includes the following enhancements and fixes:

Addition of the MPI_ERR_PROC_FAILED_PENDING error code, as per newer specification revision. Properly returned from point-to-point, non-blocking ANY_SOURCE operations.
Alias MPI_ERR_PROC_FAILED, MPI_ERR_PROC_FAILED_PENDING and MPI_ERR_REVOKED to the corresponding standard blessed – extension- names MPIX_ERR_xxx.
Support for Intercommunicators:
- Support for the blocking version of the agreement, MPI_COMM_AGREE on Intercommunicators.
- MPI_COMM_REVOKE tested on intercommunicators.
Completely disabled (.ompi_ignore) many untested components.
Changed the default ORTE failure notification propagation aggregation delay from 1s to 25ms.
Added an OMPI internal failure propagator; failure propagation between SM domains is now immediate.
Bugfixes:
- SendRecv would not always report MPI_ERR_PROC_FAILED correctly.
- SendRecv could incorrectly update the status with errors pertaining to the Send portion of the Sendrecv.
Revoked send operations are now always completed or remote cancelled and may not deadlock anymore.
Cancelled send operations to a dead peer will not trigger an assert when the BTL reports that same failure.
Repeat calls to operations returning MPI_ERR_PROC_FAILED will eventually return MPI_ERR_REVOKED when another process revokes the communicator.

Visit the ULFM website to download the tarball.

SC15 Handouts

The new project handouts from SC15 are available for download in PDF format.

Interview

Where are you from, originally?

I was born in Mexico City.

Can you summarize your educational background?

I earned a college degree in Computer Engineering from the Institute of Technology and Superior Studies of Monterrey. I then earned my MS and a PhD in Computer Science from the University of Tennessee, Knoxville.

How did you get introduced to ICL?

I first learned about MPI during undergrad, and after finishing my bachelor’s degree I moved to the US to pursue graduate education. I then came to learn that a research group at UT worked on MPI—and many other cool projects. I visited UT after being accepted into the CS graduate program; of all days it was Good Friday and I wasn’t aware that the school was closed! However, I met Terry Moore, who introduced me to ICL (in fact, most of the group was there, including Jack of course).

What did you work on during your time at ICL?

I first started as a member of ICL help, but when I started my PhD studies, I worked on the Self-Adaptive Large Scale Solver Architecture (SALSA).

What are some of your favorite memories from your time at ICL?

Friday lunches! Great talks, conversation, and of course free food for poor students. But generally speaking, memories of learning from and working with knowledgeable and talented colleagues who were also supportive and friendly, making the research group feel like a family. It doesn’t matter how long you’ve been away, you always feel welcomed and comfortable going back. For instance, the gatherings at SC remind me of such times.

Tell us where you are and what you’re doing now.

I am currently working as full time faculty at the University of Washington, Bothell campus, where I started just this fall. As part of the School of STEM, one of my current priorities is to grow and develop computer science graduate curriculum that includes HPC courses. Our graduate program is growing tremendously as our campus supports working students from local high-tech companies, including Microsoft, Google, Amazon, Intel, and Boeing. Previously, I worked for Microsoft for 5 years, but I was drawn back to academia and I was faculty at Washington State University (Mechanical and Materials Engineering) and at Everett Community College before joining the University of Washington.

In what ways did working at ICL prepare you for what you do now, if at all?

Countless ways… not just with the technical knowledge and experience, but also, I had great advising from colleagues at different levels, I learned how to push myself into being confident about working independently on research and being innovative. I also learned how to creatively solve challenging problems. Another aspect, which I use every day, is how to communicate effectively to collaborate with other faculty, present ideas, and of course, advise my current students.

Tell us something about yourself that might surprise some people.

Although I was born in Mexico, I do not consider myself Mexican. I grew up in a family and community of mostly European immigrants, I attended an American/British grade school. I grew up eating British pasties and French pastries (odd combo), and part of my heritage comes from Cornish mining engineers. I always wanted to come to the US where the culture matches my diverse background.

Recent Papers

Gates, M., H. Anzt, J. Kurzak, and J. Dongarra, “Accelerating Collaborative Filtering for Implicit Feedback Datasets using GPUs,” 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, IEEE, November 2015. (1.02 MB)
Anzt, H., J. Dongarra, and E. S. Quintana-Orti, “Adaptive Precision Solvers for Sparse Linear Systems,” 3rd International Workshop on Energy Efficient Supercomputing (E2SC '15), Austin, TX, ACM, November 2015.
SolcÃ , R., A. Kozhevnikov, A. Haidar, S. Tomov, T. C. Schulthess, and J. Dongarra, “Efficient Implementation Of Quantum Materials Simulations On Distributed CPU-GPU Systems,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015. (1.09 MB)
Anzt, H., E. Ponce, G. D. Peterson, and J. Dongarra, “GPU-accelerated Co-design of Induced Dimension Reduction: Algorithmic Fusion and Kernel Overlap,” 2nd International Workshop on Hardware-Software Co-Design for High Performance Computing, Austin, TX, ACM, November 2015. (1.46 MB)
Kurzak, J., H. Anzt, M. Gates, and J. Dongarra, “Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs,” IEEE Transactions on Parallel and Distributed Systems, no. 1045-9219, November 2015.
Yamazaki, I., S. Tomov, J. Kurzak, J. Dongarra, and J. Barlow, “Mixed-precision Block Gram Schmidt Orthogonalization,” 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Austin, TX, ACM, November 2015. (235.69 KB)
Faverge, M., J. Herrmann, J. Langou, B. Lowery, Y. Robert, and J. Dongarra, “Mixing LU-QR Factorization Algorithms to Design High-Performance Dense Linear Algebra Solvers,” Journal of Parallel and Distributed Computing, vol. 85, pp. 32-46, November 2015. DOI: doi:10.1016/j.jpdc.2015.06.007 (5.06 MB)
Mary, T., I. Yamazaki, J. Kurzak, P. Luszczek, S. Tomov, and J. Dongarra, “Performance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.
Herault, T., A. Bouteiller, G. Bosilca, M. Gamell, K. Teranishi, M. Parashar, and J. Dongarra, “Practical Scalable Consensus for Pseudo-Synchronous Distributed Systems,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015. (550.96 KB)
Yamazaki, I., J. Kurzak, P. Luszczek, and J. Dongarra, “Randomized Algorithms to Update Partial Singular Value Decomposition on a Hybrid CPU/GPU Cluster,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.
Strohmaier, E., H. Meuer, J. Dongarra, and H. D. Simon, “The TOP500 List and Progress in High-Performance Computing,” IEEE Computer, vol. 48, issue 11, pp. 42-49, November 2015. DOI: doi:10.1109/MC.2015.338
Anzt, H., J. Dongarra, and E. S. Quintana-Orti, “Tuning Stationary Iterative Solvers for Fault Resilience,” 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA15), Austin, TX, ACM, November 2015. (1.28 MB)
Haugen, B., S. Richmond, J. Kurzak, C. A. Steed, and J. Dongarra, “Visualizing Execution Traces with Task Dependencies,” 2nd Workshop on Visual Performance Analysis (VPA '15), Austin, TX, ACM, November 2015. (927.5 KB)
Haidar, A., Y. Jia, P. Luszczek, S. Tomov, A. YarKhan, and J. Dongarra, “Weighted Dynamic Scheduling with Many Parallelism Grains for Offloading of Numerical Workloads to Multiple Varied Accelerators,” Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA'15), vol. No. 5, Austin, TX, ACM, November 2015. (347.6 KB)
Anzt, H., B. Haugen, J. Kurzak, P. Luszczek, and J. Dongarra, “Experiences in autotuning matrix multiplication for energy minimization on GPUs,” Concurrency in Computation: Practice and Experience, vol. 27, issue 17, pp. 5096-5113, December 2015. DOI: 10.1002/cpe.3516 (1.98 MB)

Recent Conferences

NOV
15-20

SC15 Austin, Texas
Aurelien
George
Hartwig
Ichitaro
Jack
Jakub
Phil
Piotr
Terry
Thomas
Tracy

Aurelien Bouteiller, George Bosilca, Hartwig Anzt, Ichitaro Yamazaki, Jack Dongarra, Jakub Kurzak, Phil Mucci, Piotr Luszczek, Terry Moore, Thomas Herault, Tracy Rafferty
NOV
15

2nd Workshop on Visual Performance Analysis (VPA) Austin, Texas
Blake

Blake Haugen
DEC
7

MPI Forum San Jose, California
Aurelien

Aurelien Bouteiller
DEC
7

TESSE Meeting Blacksburg, Virginia
Amina
George
Thomas

Amina Guermouche, George Bosilca, Thomas Herault

Recent Lunch Talks

NOV
3
Takeshi Fukaya
Hokkaido University
CholeskyQR2: Cholesky QR factorization with reorthogonalization PDF
NOV
3
Toshiyuki Imamura
RIKEN AICS
ASPEN.K2+MUBLAS:level2 CUDA BLAS kernels PDF
NOV
6
Moritz Kreutzer
Friedrich-Alexander University Erlangen-Nürnberg
Performance Engineering of the Kernel Polynomal Method on Large-Scale CPU-GPU Systems PDF
NOV
13
Sticks Mabakane
University of Cape Town
Novel Visualizations for Optimization of Parallel Programs PDF
DEC
4
Azzam Haidar
Batched Matrix Computations on Hardware Accelerators PDF
DEC
11
Kalyan Perumalla
ORNL

Upcoming Lunch Talks

JAN
8
Yves Robert
INRIA
Which Verification for Silent Error Detection? PDF
JAN
14
David Keffer
UTK Department of Materials Science and Engineering
Algorithms for 3D-3D Registration with Known and Unknown References: Applications to Materials Science PDF
JAN
22
Aurelien Bouteiller
Plan B: Interruption of Ongoing MPI Operations to Support Failure Recovery PDF
JAN
29
Joe Dorris
PLASMA OpenMP on Xeon Phi and A Case Study with Cholesky Decomposition PDF

Visitors

Moritz Kreutzer from Regionales RechenZentrum Erlangen (RRZE) will be visiting from October 17 through December 18. Moritz, a PhD student, will be working with Hartwig during his visit to ICL.

People

ICL Alumni Mathieu Faverge is working up to ICL Frequent Visitor status as he once again joins us at the lab beginning in January 2016 and stays with us through June. Welcome back, Mathieu!

Visitors

Moritz Kreutzer from Regionales RechenZentrum Erlangen (RRZE) will be visiting from October 17 through December 18. Moritz, a PhD student, will be working with Hartwig during his visit to ICL.

Dates to Remember

Holiday Schedule

Just a reminder that the University is closed on December 21st through December 25th and again on January 1st, 2016.

December 2015

News and Announcements

TOP500 – November 2015

HPCG Results – November 2015

Jack Dongarra Receives HPCwire Readers’ Choice Award

Conference Reports

SC15

Recent Releases

HPCG 3.0 Released

LAPACK 3.6.0 Released

Open MPI 1.10.1 Released

ULFM 1.1 Released

SC15 Handouts

Interview

Erika Fuentes

Recent Papers

Recent Conferences

Recent Lunch Talks

Upcoming Lunch Talks

Visitors

People

Visitors

Dates to Remember

Holiday Schedule

Archives

PDF Editions