ICL Newsletter

News and Announcements

ICL to Participate in Seven DOE Exascale Computing Project Awards

ICL + ECP

ICL will have a hand in six of the 35 software development awards and one of four co-design center awards for the US Department of Energy’s Exascale Computing Project (ECP).

ECP is focused on developing systems at least 50 times faster than the nation’s most powerful current supercomputers.

For its role, ICL will receive about $3.3 million in funding the first year and more than $3.4 million in each of the next two years for an overall total of approximately $10.2 million.

First-year funding for all the ECP software development awards totals $34 million.

TOP500 — November 2016

The 48th edition of the TOP500 list, presented November 14 at SC16 in Salt Lake City, Utah, reflected the degree to which China and the United States continue to jockey for supercomputing preeminence: each nation now has 171 systems apiece in the November 2016 rankings, with China holding onto dominance by keeping the first two slots.

China’s Sunway TaihuLight (93 petaflops) and Tianhe-2 (34 petaflops) were number one and two, respectively. Together, those two machines provide the TOP500 with nearly 19 percent of its total FLOPS.

After US and China, Germany claims the most systems with 31, followed by Japan with 27, France with 20, and the UK with 13. A year ago the US was the clear leader with 200 systems, while China had 108, Japan had 37, Germany had 33, and both France and the UK had 18.

In addition to matching each other in system count in the latest rankings, China and the US are running neck and neck in aggregate Linpack performance. The US holds the narrowest of leads, with 33.9 percent of the total; China is second with 33.3 percent. The total performance of all 500 computers on the list is now 672 petaflops, a 60 percent increase from a year ago.

Concerning other highlights from the list, two new additions appeared in the top 10. The National Energy Research Scientific Computing Center (NERSC) came in at number five with its Cori supercomputer (14 petaflops), while Japan’s new Oakforest-PACS machine (13.6 petaflops) captured the sixth slot. Both of those machines employ the Intel “Knights Landing” Xeon Phi 7250, a 68-core processor that delivers just under 3 peak teraflops of performance.

More details and perspective on the 48th edition of the TOP500 are available in the official press release.

HPCG Results — November 2016

November 2016 HPCG Top 5

The results of the High Performance Conjugate Gradients (HPCG) benchmark, released at SC16, showed considerable differences as compared with the traditional High Performance LINPACK (HPL) metric rankings.

HPCG varied with HPL especially in terms of top systems, as Japan won the first and third slots with RIKEN’s K computer and the Joint Center for Advanced High Performance Computing’s Oakforest-PACS machine, respectively. Meanwhile, TaihuLight from China and Titan from the US held those positions on the TOP500.

HPCG is designed to measure performance that is representative of modern HPC capability by simulating patterns commonly found in real science and engineering applications. While the latest HPCG list contains entries for more than half of the top 50 systems from the TOP500, its shuffling of the HPL rankings indicates that HPCG features are exposing different and complementary system characteristics.

This is the sixth list produced for HPCG. The first was announced at ISC14 two and a half years ago, containing 15 entries; the SC15 list had 60; and the ISC16 had 80. The current HPCG list features 101 entries, most from the upper portion of the TOP500 list.

The full list of the latest HPCG rankings is available here.

Conference Reports

SC16

Play Prev|Next

The annual International Conference for High Performance Computing, Networking, Storage and Analysis (SC16) took place in Salt Lake City, Utah, on November 13–18. And in typical fashion, ICL faculty, research staff, and students were well engaged in the event by giving talks, presenting papers and a poster, and leading Birds of a Feather (BoF) sessions. Plus, ICL garnered an award.

Clint Whaley and Jack Dongarra were honored with the SC16 Test of Time Award for their paper about Automatically Tuned Linear Algebra Software, or ATLAS, an autotuning, optimized implementation of the Basic Linear Algebra Subprograms (BLAS). The paper has received hundreds of citations, and new citations still appear. In addition to the portable performance that ATLAS provides, its autotuning strategies have been an inspiration to other research teams doing similar work.

Another highlight from SC16 for ICL was the Big Data and Extreme-scale Computing (BDEC) Community Report BoF, led by Jack Dongarra. Participants in the international workshop series on BDEC are systematically mapping out the ways in which the major issues associated with data-intensive science interact with plans for achieving exascale computing. The BoF presented an overview of this road-mapping effort and elicited community input on the development of plans for the convergence of currently bifurcated software ecosystems on a common software infrastructure.

Since the University of Tennessee did not have a booth at the conference this time around, ICL provided its schedule of SC16 activities with links for more information, a list of its attendees, and project information via a virtual booth online.

ICLers, both past and present, who attended SC16 were invited to the traditional ICL Alumni Dinner, this year hosted by Caffé Molise in downtown Salt Lake City. As is always the case, long-time friends and colleagues shared plenty of conversations during this enjoyable conclusion to the last major conference of the year.

Linux Plumbers Conference

On November 1–4, high-altitude Santa Fe, capital of the state of New Mexico, was the scene for a gathering of plumbers—not the kind concerned with pipes and water, but with the “plumbing” of Linux, which involves its kernel subsystems, core libraries, windowing systems, and such. The event was called the Linux Plumbers Conference.

George Bosilca presented on the work that ICL has been doing in Open MPI to provide efficient support for the different checkpoint/restart techniques, including optimizations for uncoordinated checkpoint/restart with message logging.

Recent Releases

MAGMA 2.2 Released

MAGMA 2.2 is now available! MAGMA (Matrix Algebra on GPU and Multicore Architectures) is a collection of next-generation linear algebra (LA) libraries for heterogeneous architectures. The MAGMA package supports interfaces for current LA packages and standards (e.g., LAPACK and BLAS) to allow computational scientists to easily port any LA-reliant software components to heterogeneous architectures. MAGMA allows applications to fully exploit the power of current heterogeneous systems of multi/many-core CPUs and multi-GPUs/coprocessors to deliver the fastest possible time to accurate solution within given energy constraints.

The 2.2.0 release has the following updates:

Added variable size batched Cholesky factorization magma_[sdcz]potrf_vbatched
Added new fixed size batched BLAS routines {hemm, symm, hemv, symv, trmm}_batched
Added new variable size batched BLAS routines {hemm, symm, hemv, symv, trmm, trsm}_vbatched
Fixed memory leaks in {sy,he}evdx_2stage and getri_outofplace_batched.
Fixed bug for small matrices in {symm, hemm}_mgpu and updated tester.
Fixed libraries in make.inc examples for MKL with gcc.
Added more robust error checking for Batched BLAS routines.

MAGMA-sparse

Added Incomplete Sparse Approximate Inverse (ISAI) Preconditioner for sparse triangular solves, including batched generation.
Added Block-Jacobi triangular solves, including variable blocksize (based on supervariable amalgamation).
Added ParILUT, a parallel threshold ILU based on OpenMP.
Added CSR5 format and CSR5 SpMV kernel, a sparse matrix vector product often outperforming the cuSPARSE SpMV CSR and HYB.

Visit the MAGMA software page to download the tarball.

PAPI 5.5.1

PAPI 5.5.1 has been released! PAPI (the Performance API) provides the tool designer and application engineer with a consistent interface and methodology for use of the performance counter hardware found in most major microprocessors. PAPI enables software engineers to see, in near real time, the relation between software performance and processor events.

In addition, PAPI provides access to a collection of components that expose performance measurement opportunites across the hardware and software stack.

PAPI 5.5.1 is a point release intended primarily to add support for encore performance-monitoring events on Intel Xeon Phi Knights Landing (KNL). Other major bugfixes have also been made.

Users can obtain specific and detailed information on changes made in this release by referring to the ChangeLogP551.txt for keywords of interest or by going directly to the PAPI git repository.

Visit the PAPI software page to download the tarball.

Interview

Where are you from originally?

I grew up in the northern suburbs of Paris. Although my family is originally from many places in the country, in France I always lived in Paris or in its suburbs.

Can you summarize your background?

After high school, I first completed a two-year pre-engineering program that prepares students for national competitive exams for what we call here Grandes Écoles. I went to a telecommunication engineering school, where I got my master’s degree, majoring in distributed computing and minoring in high-frequency transmissions. While completing my master’s of science, I had to do an internship abroad, at the department of mathematics of King’s College, London. I really enjoyed research, so I decided to earn a PhD in computer science at Université Paris Sud-XI.

Tell us how you first learned about ICL.

While pursuing my PhD, I worked with Thomas and Franck Cappello who, of course, told me about ICL.

What made you want to work for ICL?

It is a great lab that has originated so many advances in the state of the art in distributed computing, and the outcome of the research done at ICL has a huge impact on the scientific community. Also, more specifically, it is one of the leading actors in the world of MPI implementations, and those were a core point of my PhD project. Working at ICL made me see “how Open MPI was made,” directly in a group that was involved in implementing it.

What did you do at ICL?

I was part of the MPI group: I worked on the scalability and the resilience of Open MPI’s runtime environment.

If you could have a different job role from the one you have now, where would that be and why?

I had a couple of experiences in private companies, large and small. Based on those experiences, I know that if I weren’t working where I am now, I would be part of another research group.

What are your interests/hobbies outside work?

I love many types of sports. I play tennis and I used to run a little bit—15 km every evening—but my knees did not really enjoy it as much as I did, so I switched to cycling and I really love it. I ride my bike to work, as a physical exercise. On vacations I visit the area on my bike.

Tell us something about yourself that might surprise people.

I am afraid of rabbits?

Recent Papers

Anzt, H., E. Chow, T. Huckle, and J. Dongarra, “Batched Generation of Incomplete Sparse Approximate Inverses on GPUs,” Proceedings of the 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, pp. 49â56, November 2016. DOI: 10.1109/ScalA.2016.11
Bosilca, G., A. Bouteiller, A. Guermouche, T. Herault, Y. Robert, P. Sens, and J. Dongarra, “Failure Detection and Propagation in HPC Systems,” Proceedings of the The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Salt Lake City, Utah, IEEE Press, pp. 27:1-27:11, November 2016.
Anzt, H., J. Dongarra, and E. S. Quintana-Orti, “Fine-grained Bit-Flip Protection for Relaxation Methods,” Journal of Computational Science, November 2016. DOI: 10.1016/j.jocs.2016.11.013 (1.47 MB)
Yamazaki, I., S. Tomov, and J. Dongarra, “Non-GPU-resident Dense Symmetric Indefinite Factorization,” Concurrency and Computation: Practice and Experience, November 2016. DOI: 10.1002/cpe.4012
Anzt, H., E. Chow, and J. Dongarra, “On block-asynchronous execution on GPUs,” LAPACK Working Note, no. 291, November 2016. (1.05 MB)
Lopez, M. G., V. Larrea, W. Joubert, O. Hernandez, A. Haidar, S. Tomov, and J. Dongarra, “Towards Achieving Performance Portability Using Directives for Accelerators,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Third Workshop on Accelerator Programming Using Directives (WACCPD), Salt Lake City, Utah, Innovative Computing Laboratory, University of Tennessee, November 2016. (567.02 KB)

Recent Conferences

NOV
1

Linux Plumbers and PPoPP PC Santa Fe, New Mexico
George

George Bosilca
NOV
12-13

SC16 salt lake city, Utah
Anthony
Aurelien
Chongxiao
George
Hartwig
Jack
Jakub
Phil
Piotr
Reazul
Terry
Arm
Arm
Thomas
Tracy
Mike
Mike
Yves

Anthony Danalis, Aurelien Bouteiller, Chongxiao, George Bosilca, Hartwig Anzt, Jack Dongarra, Jakub Kurzak, Phil Mucci, Piotr Luszczek, Reazul Hoque, Terry Moore, Thananon, Thananon Patinyasakdikul, Thomas Herault, Tracy Rafferty, Yaohung, Yaohung Tsai, Yves Robert
NOV
29

ECP PI Meeting Lemont, IL
George
Heike
Jack

George Bosilca, Heike Jagode, Jack Dongarra
DEC
5

PEEKS Kick-off Meeting Albuquerque, New Mexico
Ichitaro

Ichitaro Yamazaki
DEC
5

MPI Forum Dallas, Texas
Aurelien

Aurelien Bouteiller
DEC
12

TESSE Workgroup Meeting New York, New York
Damien
George
Thomas

Damien Genet, George Bosilca, Thomas Herault

Upcoming Conferences

JAN
9

Master in HPC Advanced Programming and Software Optimization Trieste, Italy
Piotr

Piotr Luszczek
JAN
10

ECP CEED Meeting Argonne, Illinois
Azzam
Stan

Azzam Haidar, Stanimire Tomov
JAN
24-26

Open MPI Developers Meeting San Jose, California
George

George Bosilca
JAN
30-2

ECP All-Hands Meeting Knoxville, Tennessee
Anthony
Asim
Aurelien
Azzam
Damien
George
Hartwig
Heike
Ichitaro
Jack
Jakub
Mark
Piotr
Stan
Terry
Thomas

Anthony Danalis, Asim YarKhan, Aurelien Bouteiller, Azzam Haidar, Damien Genet, George Bosilca, Hartwig Anzt, Heike Jagode, Ichitaro Yamazaki, Jack Dongarra, Jakub Kurzak, Mark Gates, Piotr Luszczek, Stanimire Tomov, Terry Moore, Thomas Herault

Recent Lunch Talks

NOV
4
Reazul Hoque
Dynamic Task Discovery in PaRSEC PDF
NOV
11
Thananon Patinyasakdikul
Multithreaded MPI PDF
DEC
2
Stephen Richmond
UCX as Communication Backend for PaRSEC
DEC
9
Wei Wu
Topology-aware Collective of CUDA-aware Open MPI
DEC
16
Chongxiao Cao

Upcoming Lunch Talks

JAN
6
Kwai Wong
JICS
An Interoperable Workflow Platform for Multidisciplinary Simulations—openDIEL PDF
JAN
13
David Eberius
Profiling PaRSEC and Using Software Events in PAPI PDF
JAN
20
Yves Robert
Bidiagonalization with Parallel Tiled Algorithms PDF
JAN
27
Ryan Glasby
JICS
Results from HPCMP CREATE™-AV Kestrel Component COFFE for 3-D Aircraft Configurations with Higher-Order Meshes Generated by Pointwise, Inc. PDF

Dates to Remember

Holiday Schedule

Just a reminder that the university is closed on December 26th through December 30th and again on January 2nd, 2017.

December 2016