News and Announcements

ICL to Participate in Seven DOE Exascale Computing Project Awards

ICL + ECP

ICL will have a hand in six of the 35 software development awards and one of four co-design center awards for the US Department of Energy’s Exascale Computing Project (ECP).

ECP is focused on developing systems at least 50 times faster than the nation’s most powerful current supercomputers.

For its role, ICL will receive about $3.3 million in funding the first year and more than $3.4 million in each of the next two years for an overall total of approximately $10.2 million.

First-year funding for all the ECP software development awards totals $34 million.

Read more in Tennessee Today.

TOP500 — November 2016

2016 Top5

The 48th edition of the TOP500 list, presented November 14 at SC16 in Salt Lake City, Utah, reflected the degree to which China and the United States continue to jockey for supercomputing preeminence: each nation now has 171 systems apiece in the November 2016 rankings, with China holding onto dominance by keeping the first two slots.

China’s Sunway TaihuLight (93 petaflops) and Tianhe-2 (34 petaflops) were number one and two, respectively. Together, those two machines provide the TOP500 with nearly 19 percent of its total FLOPS.

After US and China, Germany claims the most systems with 31, followed by Japan with 27, France with 20, and the UK with 13. A year ago the US was the clear leader with 200 systems, while China had 108, Japan had 37, Germany had 33, and both France and the UK had 18.

In addition to matching each other in system count in the latest rankings, China and the US are running neck and neck in aggregate Linpack performance. The US holds the narrowest of leads, with 33.9 percent of the total; China is second with 33.3 percent. The total performance of all 500 computers on the list is now 672 petaflops, a 60 percent increase from a year ago.

Concerning other highlights from the list, two new additions appeared in the top 10. The National Energy Research Scientific Computing Center (NERSC) came in at number five with its Cori supercomputer (14 petaflops), while Japan’s new Oakforest-PACS machine (13.6 petaflops) captured the sixth slot. Both of those machines employ the Intel “Knights Landing” Xeon Phi 7250, a 68-core processor that delivers just under 3 peak teraflops of performance.

More details and perspective on the 48th edition of the TOP500 are available in the official press release.

HPCG Results — November 2016

November 2016 HPCG Top 5

The results of the High Performance Conjugate Gradients (HPCG) benchmark, released at SC16, showed considerable differences as compared with the traditional High Performance LINPACK (HPL) metric rankings.

HPCG varied with HPL especially in terms of top systems, as Japan won the first and third slots with RIKEN’s K computer and the Joint Center for Advanced High Performance Computing’s Oakforest-PACS machine, respectively. Meanwhile, TaihuLight from China and Titan from the US held those positions on the TOP500.

HPCG is designed to measure performance that is representative of modern HPC capability by simulating patterns commonly found in real science and engineering applications. While the latest HPCG list contains entries for more than half of the top 50 systems from the TOP500, its shuffling of the HPL rankings indicates that HPCG features are exposing different and complementary system characteristics.

This is the sixth list produced for HPCG. The first was announced at ISC14 two and a half years ago, containing 15 entries; the SC15 list had 60; and the ISC16 had 80. The current HPCG list features 101 entries, most from the upper portion of the TOP500 list.

The full list of the latest HPCG rankings is available here.

Conference Reports

SC16

The annual International Conference for High Performance Computing, Networking, Storage and Analysis (SC16) took place in Salt Lake City, Utah, on November 13–18. And in typical fashion, ICL faculty, research staff, and students were well engaged in the event by giving talks, presenting papers and a poster, and leading Birds of a Feather (BoF) sessions. Plus, ICL garnered an award.

Clint Whaley and Jack Dongarra were honored with the SC16 Test of Time Award for their paper about Automatically Tuned Linear Algebra Software, or ATLAS, an autotuning, optimized implementation of the Basic Linear Algebra Subprograms (BLAS). The paper has received hundreds of citations, and new citations still appear. In addition to the portable performance that ATLAS provides, its autotuning strategies have been an inspiration to other research teams doing similar work.

Another highlight from SC16 for ICL was the Big Data and Extreme-scale Computing (BDEC) Community Report BoF, led by Jack Dongarra. Participants in the international workshop series on BDEC are systematically mapping out the ways in which the major issues associated with data-intensive science interact with plans for achieving exascale computing. The BoF presented an overview of this road-mapping effort and elicited community input on the development of plans for the convergence of currently bifurcated software ecosystems on a common software infrastructure.

Since the University of Tennessee did not have a booth at the conference this time around, ICL provided its schedule of SC16 activities with links for more information, a list of its attendees, and project information via a virtual booth online.

ICLers, both past and present, who attended SC16 were invited to the traditional ICL Alumni Dinner, this year hosted by Caffé Molise in downtown Salt Lake City. As is always the case, long-time friends and colleagues shared plenty of conversations during this enjoyable conclusion to the last major conference of the year.

Linux Plumbers Conference

On November 1–4, high-altitude Santa Fe, capital of the state of New Mexico, was the scene for a gathering of plumbers—not the kind concerned with pipes and water, but with the “plumbing” of Linux, which involves its kernel subsystems, core libraries, windowing systems, and such. The event was called the Linux Plumbers Conference.

George Bosilca presented on the work that ICL has been doing in Open MPI to provide efficient support for the different checkpoint/restart techniques, including optimizations for uncoordinated checkpoint/restart with message logging.

Recent Releases

MAGMA 2.2 Released

MAGMA 2.2 is now available! MAGMA (Matrix Algebra on GPU and Multicore Architectures) is a collection of next-generation linear algebra (LA) libraries for heterogeneous architectures. The MAGMA package supports interfaces for current LA packages and standards (e.g., LAPACK and BLAS) to allow computational scientists to easily port any LA-reliant software components to heterogeneous architectures. MAGMA allows applications to fully exploit the power of current heterogeneous systems of multi/many-core CPUs and multi-GPUs/coprocessors to deliver the fastest possible time to accurate solution within given energy constraints.

The 2.2.0 release has the following updates:

  • Added variable size batched Cholesky factorization magma_[sdcz]potrf_vbatched
  • Added new fixed size batched BLAS routines {hemm, symm, hemv, symv, trmm}_batched
  • Added new variable size batched BLAS routines {hemm, symm, hemv, symv, trmm, trsm}_vbatched
  • Fixed memory leaks in {sy,he}evdx_2stage and getri_outofplace_batched.
  • Fixed bug for small matrices in {symm, hemm}_mgpu and updated tester.
  • Fixed libraries in make.inc examples for MKL with gcc.
  • Added more robust error checking for Batched BLAS routines.

MAGMA-sparse

  • Added Incomplete Sparse Approximate Inverse (ISAI) Preconditioner for sparse triangular solves, including batched generation.
  • Added Block-Jacobi triangular solves, including variable blocksize (based on supervariable amalgamation).
  • Added ParILUT, a parallel threshold ILU based on OpenMP.
  • Added CSR5 format and CSR5 SpMV kernel, a sparse matrix vector product often outperforming the cuSPARSE SpMV CSR and HYB.

Visit the MAGMA software page to download the tarball.

PAPI 5.5.1

PAPI 5.5.1 has been released! PAPI (the Performance API) provides the tool designer and application engineer with a consistent interface and methodology for use of the performance counter hardware found in most major microprocessors. PAPI enables software engineers to see, in near real time, the relation between software performance and processor events.

In addition, PAPI provides access to a collection of components that expose performance measurement opportunites across the hardware and software stack.

PAPI 5.5.1 is a point release intended primarily to add support for encore performance-monitoring events on Intel Xeon Phi Knights Landing (KNL). Other major bugfixes have also been made.

Users can obtain specific and detailed information on changes made in this release by referring to the ChangeLogP551.txt for keywords of interest or by going directly to the PAPI git repository.

Visit the PAPI software page to download the tarball.

Interview

Camille Coti Then

Camille Coti

Where are you from originally?

I grew up in the northern suburbs of Paris. Although my family is originally from many places in the country, in France I always lived in Paris or in its suburbs.

Can you summarize your background?

After high school, I first completed a two-year pre-engineering program that prepares students for national competitive exams for what we call here Grandes Écoles. I went to a telecommunication engineering school, where I got my master’s degree, majoring in distributed computing and minoring in high-frequency transmissions. While completing my master’s of science, I had to do an internship abroad, at the department of mathematics of King’s College, London. I really enjoyed research, so I decided to earn a PhD in computer science at Université Paris Sud-XI.

Tell us how you first learned about ICL.

While pursuing my PhD, I worked with Thomas and Franck Cappello who, of course, told me about ICL.

What made you want to work for ICL?

It is a great lab that has originated so many advances in the state of the art in distributed computing, and the outcome of the research done at ICL has a huge impact on the scientific community. Also, more specifically, it is one of the leading actors in the world of MPI implementations, and those were a core point of my PhD project. Working at ICL made me see “how Open MPI was made,” directly in a group that was involved in implementing it.

What did you do at ICL?

I was part of the MPI group: I worked on the scalability and the resilience of Open MPI’s runtime environment.

If you could have a different job role from the one you have now, where would that be and why?

I had a couple of experiences in private companies, large and small. Based on those experiences, I know that if I weren’t working where I am now, I would be part of another research group.

What are your interests/hobbies outside work?

I love many types of sports. I play tennis and I used to run a little bit—15 km every evening—but my knees did not really enjoy it as much as I did, so I switched to cycling and I really love it. I ride my bike to work, as a physical exercise. On vacations I visit the area on my bike.

Tell us something about yourself that might surprise people.

I am afraid of rabbits?

Recent Papers

  1. Anzt, H., E. Chow, T. Huckle, and J. Dongarra, Batched Generation of Incomplete Sparse Approximate Inverses on GPUs,” Proceedings of the 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, pp. 49–56, November 2016. DOI: 10.1109/ScalA.2016.11
  2. Bosilca, G., A. Bouteiller, A. Guermouche, T. Herault, Y. Robert, P. Sens, and J. Dongarra, Failure Detection and Propagation in HPC Systems,” Proceedings of the The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Salt Lake City, Utah, IEEE Press, pp. 27:1-27:11, November 2016.
  3. Anzt, H., J. Dongarra, and E. S. Quintana-Orti, Fine-grained Bit-Flip Protection for Relaxation Methods,” Journal of Computational Science, November 2016. DOI: 10.1016/j.jocs.2016.11.013  (1.47 MB)
  4. Yamazaki, I., S. Tomov, and J. Dongarra, Non-GPU-resident Dense Symmetric Indefinite Factorization,” Concurrency and Computation: Practice and Experience, November 2016. DOI: 10.1002/cpe.4012
  5. Anzt, H., E. Chow, and J. Dongarra, On block-asynchronous execution on GPUs,” LAPACK Working Note, no. 291, November 2016.  (1.05 MB)
  6. Lopez, M. G., V. Larrea, W. Joubert, O. Hernandez, A. Haidar, S. Tomov, and J. Dongarra, Towards Achieving Performance Portability Using Directives for Accelerators,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Third Workshop on Accelerator Programming Using Directives (WACCPD), Salt Lake City, Utah, Innovative Computing Laboratory, University of Tennessee, November 2016.  (567.02 KB)

Recent Conferences

  1. NOV
    Linux Plumbers and PPoPP PC Santa Fe, New Mexico
    George Bosilca
    George
    George Bosilca
  2. NOV
    -
    SC16 salt lake city, Utah
    Anthony Danalis
    Anthony
    Aurelien Bouteiller
    Aurelien
    Chongxiao
    Chongxiao
    George Bosilca
    George
    Hartwig Anzt
    Hartwig
    Jack Dongarra
    Jack
    Jakub Kurzak
    Jakub
    Phil Mucci
    Phil
    Piotr Luszczek
    Piotr
    Reazul Hoque
    Reazul
    Terry Moore
    Terry
    Thananon
    Arm
    Thananon  Patinyasakdikul
    Arm
    Thomas Herault
    Thomas
    Tracy Rafferty
    Tracy
    Yaohung
    Mike
    Yaohung  Tsai
    Mike
    Yves Robert
    Yves
    Anthony Danalis, Aurelien Bouteiller, Chongxiao, George Bosilca, Hartwig Anzt, Jack Dongarra, Jakub Kurzak, Phil Mucci, Piotr Luszczek, Reazul Hoque, Terry Moore, Thananon, Thananon Patinyasakdikul, Thomas Herault, Tracy Rafferty, Yaohung, Yaohung Tsai, Yves Robert
  3. NOV
    ECP PI Meeting Lemont, IL
    George Bosilca
    George
    Heike Jagode
    Heike
    Jack Dongarra
    Jack
    George Bosilca, Heike Jagode, Jack Dongarra
  4. DEC
    PEEKS Kick-off Meeting Albuquerque, New Mexico
    Ichitaro Yamazaki
    Ichitaro
    Ichitaro Yamazaki
  5. DEC
    MPI Forum Dallas, Texas
    Aurelien Bouteiller
    Aurelien
    Aurelien Bouteiller
  6. DEC
    TESSE Workgroup Meeting New York, New York
    Damien Genet
    Damien
    George Bosilca
    George
    Thomas Herault
    Thomas
    Damien Genet, George Bosilca, Thomas Herault

Upcoming Conferences

  1. JAN
    Piotr Luszczek
    Piotr
    Piotr Luszczek
  2. JAN
    ECP CEED Meeting Argonne, Illinois
    Azzam Haidar
    Azzam
    Stanimire Tomov
    Stan
    Azzam Haidar, Stanimire Tomov
  3. JAN
    -
    Open MPI Developers Meeting San Jose, California
    George Bosilca
    George
    George Bosilca
  4. JAN
    -
    ECP All-Hands Meeting Knoxville, Tennessee
    Anthony Danalis
    Anthony
    Asim YarKhan
    Asim
    Aurelien Bouteiller
    Aurelien
    Azzam Haidar
    Azzam
    Damien Genet
    Damien
    George Bosilca
    George
    Hartwig Anzt
    Hartwig
    Heike Jagode
    Heike
    Ichitaro Yamazaki
    Ichitaro
    Jack Dongarra
    Jack
    Jakub Kurzak
    Jakub
    Mark Gates
    Mark
    Piotr Luszczek
    Piotr
    Stanimire Tomov
    Stan
    Terry Moore
    Terry
    Thomas Herault
    Thomas
    Anthony Danalis, Asim YarKhan, Aurelien Bouteiller, Azzam Haidar, Damien Genet, George Bosilca, Hartwig Anzt, Heike Jagode, Ichitaro Yamazaki, Jack Dongarra, Jakub Kurzak, Mark Gates, Piotr Luszczek, Stanimire Tomov, Terry Moore, Thomas Herault

Recent Lunch Talks

  1. NOV
    4
    Reazul Hoque
    Reazul Hoque
    Dynamic Task Discovery in PaRSEC PDF
  2. NOV
    11
    Thananon Patinyasakdikul
    Thananon Patinyasakdikul
    Multithreaded MPI PDF
  3. DEC
    2
    Stephen Richmond
    Stephen Richmond
    UCX as Communication Backend for PaRSEC
  4. DEC
    9
    Wei Wu
    Wei Wu
    Topology-aware Collective of CUDA-aware Open MPI
  5. DEC
    16
    Chongxiao Cao
    Chongxiao Cao

Upcoming Lunch Talks

  1. JAN
    6
    Kwai Wong
    Kwai Wong
    JICS
    An Interoperable Workflow Platform for Multidisciplinary Simulations—openDIEL PDF
  2. JAN
    13
    David Eberius
    David Eberius
    Profiling PaRSEC and Using Software Events in PAPI PDF
  3. JAN
    20
    Yves Robert
    Yves Robert
    Bidiagonalization with Parallel Tiled Algorithms PDF
  4. JAN
    27
    Ryan Glasby
    Ryan Glasby
    JICS
    Results from HPCMP CREATE™-AV Kestrel Component COFFE for 3-D Aircraft Configurations with Higher-Order Meshes Generated by Pointwise, Inc. PDF

Dates to Remember

Holiday Schedule

Just a reminder that the university is closed on December 26th through December 30th and again on January 2nd, 2017.