ICL Newsletter

News and Announcements

SC16

iclatsc16 When this year’s International Conference for High Performance Computing, Networking, Storage, and Analysis, SC16, happens November 13–18 in Salt Lake City, Utah, ICL will again be a substantial contributor to the activities, presenting BoFs, papers, tutorials, and a poster.

Although the University of Tennessee will not have a booth this year, ICL has created its own virtual booth where you can keep up with what ICLers will be doing throughout the conference.

Dongarra Becomes a Foreign Member of the Russian Academy of Sciences

Vladimir Fortov, president of the Russian Academy of Sciences (RAS), informed Jack on October 31 that the organization has elected him as a foreign member.

In making this esteemed list of well-known scientists from various countries and disciplines, Dongarra joins a number of other notable prior-elected computer scientists — such as Americans Don Knuth, professor emeritus at Stanford, and Mike Stonebraker, adjunct professor at the Massachusetts Institute of Technology. The RAS roster of foreign members also includes seven Nobel laureates; among those: famous American statesman Henry Kissinger.

“The academy’s prestige stems from its long-standing role as a global network of scientists and scholars from an array of institutes and laboratories dedicated to advancing science for the betterment of humanity,” Dongarra said. “So, being elected to the RAS community is not only an honor but also another effective avenue for sharing what we learn from our experimental computer science work.”

Established in 1724 in Petersburg, Russia, the RAS is that country’s highest scientific society and principal coordinating body for research in natural and social sciences, technology, and production. Membership is by election and is divided into three ranks: academician, corresponding member, or foreign member. The organization has more than 1,500 members, with some 800 corresponding members, 500 academicians, and 200 foreign members.

The RAS trains students and publicizes scientific achievements and knowledge, as well as maintains ties with many international scientific institutions and collaborates with foreign academies. It also directs the research of other scientific institutions and institutions of higher learning.

More on the RAS elections is available via a translated webpage. The complete list of foreign members of the organization is available on Wikipedia.

PULSE and BONSAI Grants Awarded

During the summer, ICL won two grants from the National Science Foundation, under the Software Infrastructure for Sustained Innovation (SI²) program. The awarded projects are called PULSE and BONSAI, and each is funded for approximately $500,000 over three years.

PULSE — the PAPI Unifying Layer for Software-defined Events project — aims to enable cross-layer and integrated modeling and analysis of the entire hardware system by extending PAPI (Performance API) with the capability to expose performance metrics for key software components found in the HPC software stack.

PULSE will enhance the impact of the abstraction and unification layer that PAPI provides to hardware events to also encompass MPI, OpenMP, LAPACK, MAGMA, and task-based runtimes software events.

As for BONSAI — the BEAST OpeN Software Attuning Infrastructure project — it uses the knowledge and software that resulted from ICL’s research to create a modular, easy-to-use software toolkit that will enable domain scientists and library developers to efficiently explore and optimize the execution of a wide variety of computational kernels on current and future hybrid platforms.

Dongarra Shares Insights on the TOP500 and More

Comments from Jack Dongarra are featured in an article published in October 19 issue of Computerworld delving into the history and characteristics of the TOP500 list and related topics.

The article conveys how the speed of supercomputers has progressed over time, industry’s prominent use of simulations and big data, China’s rise as formidable competitor of the US in high-performance computing, and the status of progress in the quest for exascale computing.

Also, back in June when the TOP500 list was published at the International Supercomputer Conference, Jack was interviewed by the Wall Street Journal and the New York Times concerning the latest rankings and China’s prominence in them. See related stories in Tennessee Today and the SC16 website.

Conference Reports

CCDSC 2016

The Workshop on Clusters, Clouds, and Data for Scientific Computing (CCDSC) continued its tradition of addressing numerous themes for developing and using both cluster and computational clouds when it convened on October 3–6, 2016, at La Maison des Contes, 427 Chemin de Chanzé, France.

Held every two years and alternating between the US and France, these by-invitation-only workshops evaluate the state of the art and future trends in cluster computing and the use of computational clouds for scientific computing. The meeting is a continuation of a series of workshops titled “Workshop on Environments and Tools for Parallel Scientific Computing” that began in 1992.

ICL’s Jack Dongarra and former ICLer Bernard Tourancheau (now a professor at the Université Grenoble Alpes) co-chaired, while Yves Robert and Anthony Danalis gave invited talks. Yves’ talk was on “Failure Detection and Propagation in HPC Systems.” And Anthony’s was on “Dataflow programming: Do we need it for exascale?”

The meeting hosted more than 50 attendees and featured 50 individual talks.

ASCR

About 150 representatives from the U.S. Department of Energy (DOE) national laboratories and a number of academic institutions met in Rockville, MD, September 27–29 for the Exascale Requirements Review for Advanced Scientific Computing Research (ASCR), one of a series of workshops DOE’s Office of Science conducts to determine the exascale requirements of application scientists and computer scientists.

The workshops address the computation, data analysis, software, workflows, HPC services, and the complete range of computer requirements necessary to support advanced computing research through 2025.

Attending the meeting from ICL were Asim YarKhan and George Bosilca, both of whom participated in breakout sessions focused on system software requirements for production systems and emerging systems. In addition, George took part in the session on system software for early-access machines.

HPEC ’16

ICL’s Stanimire Tomov attended the 20th annual IEE High Performance Extreme Computing (HPEC) Conference, September 13–15 at the Westin Hotel in Waltham, MA.

Two ICL papers were presented at HPEC ’16.

One of the ICL papers, “LU, QR, and Cholesky Factorizations: Programming Model, Performance Analysis and Optimization Techniques for the Intel Knights Landing Xeon Phi,” resulted from work on numerical libraries with Intel® collaborators on Intel’s newest Xeon Phi™ processor, Knights Landing.

The paper was on the “Performance Analysis and Acceleration of Explicit Integration for Large Kinetic Networks using Batched GPU Computations” and featured physics collaborators from Oak Ridge National Laboratory and the University of Tennessee. This publication incorporates some of ICL’s newest batched processing techniques to accelerate applications from core-collapse supernovae to realistic atmospheric simulations.

HPEC is the largest computing conference in New England and addresses the convergence of high-performance embedded computing. Held each year in the Boston area, HPEC brings together academic, industry, and federal DOE/DOD researchers in high-performance computing, computing hardware, software, systems, and applications in which performance matters.

EuroMPI

The 23rd-annual EuroMPI conference took place on September 25–28 in Edinburgh, Scotland, providing participants the opportunity for networking, discussion, and skills building amidst the theme “Modern Challenges to MPI’s Dominance in HPC.”

ICL’s George Basilca was there. “This looked like a rebirth of the conference, with the number of participants going up for the first time in the last few years,” he says. “The paper acceptance rate also improved; it is now below 45 percent. So EuroMPI is getting back its leadership role in MPI-related developments. And, as a side note, Edinburgh in September was an excellent choice.”

George presented a tutorial titled “Survival in an MPI World,” in which he explained a holistic approach to fault tolerance. His tutorial introduced multiple fault-management techniques while maintaining the focus on what is called User Level Failure Mitigation (ULFM) — a minimal extension of the MPI specification that aims to provide users with the basic building blocks and tools to construct higher-level abstractions and introduce resilience in their applications.

Recent Releases

PAPI 5.5.0

PAPI 5.5.0 has been released!

The Performance API (PAPI) provides simultaneous access to performance counters on CPUs, GPUs, and other components of interest (e.g., network and I/O systems). Provided as a linkable library of shared objects, PAPI can be called directly in a user program, or used transparently through a variety of third-party tools, making it a de facto standard for hardware counter analysis.

The PAPI 5.5.0 release includes a new component that provides read and write access to the information and controls exposed via the Linux powercap interface. The PAPI powercap component supports measuring and capping power usage on recent Intel®architectures.

ICL has added core support for Knights Landing as well as power monitoring via the RAPL and powercap components. Uncore support will be provided later.

Visit the PAPI website for more information and to download the software.

MAGMA 2.1

MAGMA 2.1 has been released!

Matrix Algebra on GPU and Multicore Architectures (MAGMA) is a collection of next-generation linear algebra (LA) libraries for heterogeneous architectures. The MAGMA package supports interfaces for current LA packages and standards (e.g., LAPACK and BLAS) to allow computational scientists to easily port any LA-reliant software components to heterogeneous architectures.

MAGMA enables applications to fully exploit the power of current heterogeneous systems of multi/many-core CPUs and multi-GPUs/coprocessors to deliver the fastest possible time to accurate solution within given energy constraints.

Following are the new features and updates included in MAGMA 2.1:

Variable size batched routines (gemm, gemv, syrup, syrk2k)
Improved SVD performance for tall (m >> n) or wide (m << n) matrices
Preconditioned QMR
Expanded Doxygen documentation
For MAGMA v1 compatibility, initializes default queue for each GPU on first use, instead of magma_init.

Visit the MAGMA website to download the software.

Open MPI 2.0.1

Open MPI 2.0.1 has been released!

The Open MPI project is an open source message passing interface (MPI) implementation that is developed and maintained by a consortium of academic, research, and industry partners.

MPI primarily addresses the message-passing parallel programming model, in which data is moved from the address space of one process to that of another process through cooperative operations on each process.

Open MPI integrates technologies and resources from several other projects (HARNESS/FT-MPI, LA-MPI, LAM/MPI, and PACX-MPI) to build the best MPI library available.

The list of changes to Open MPI implemented in version 2.0.1 is provided on GitHub.

Visit the Open MPI website to download the software.

Interview

Where are you from originally?

I am from Spain. I was born in Vigo, and I moved to A Coruña to study; both are cities in Galicia. It is a green land in the northwest of Spain, with a beautiful coastline full of beaches, islands, and cliffs.

Can you summarize your background?

I earned a college degree in computer engineering in 2013, then a master’s degree in HPC in 2014, both from the University of A Coruña, where I am currently doing my PhD and working as a researcher.

Tell us how you first learned about ICL.

ICL is a world reference in research in high-performance computing and has developed many software packages that are well-known in the scientific community. My research in fault tolerance has led me to use ULFM; however, the first projects I learned about ICL were OpenMPI, PAPI, and the TOP500 list.

What made you want to work for ICL?

I want to improve my knowledge of HPC and fault tolerance, and this research visit brings me the perfect opportunity.

What are you working on at ICL?

I’m working on resilience. We intend to integrate ULFM and the checkpointing tool CPPC (developed by the Computer Architecture Group of A Coruña) to obtain a local recovery fault tolerance solution by using message logging.

What are your interests/hobbies outside work?

I like spending time with my friends, watching movies and TV series, music, traveling, and painting.

Tell us something about yourself that might surprise people.

I grew up fishing with my dad and I ended up being a pretty good fisher.

Recent Papers

Yamazaki, I., S. Nooshabadi, S. Tomov, and J. Dongarra, “High Performance Realtime Convex Solver for Embedded Systems,” University of Tennessee Computer Science Technical Report, no. UT-EECS-16-745, October 2016. (225.43 KB)
Anzt, H., S. Tomov, and J. Dongarra, “On the performance and energy efficiency of sparse linear algebra on GPUs,” International Journal of High Performance Computing Applications, October 2016. DOI: 10.1177/1094342016672081 (1.19 MB)
Yamazaki, I., S. Tomov, and J. Dongarra, “Stability and Performance of Various Singular Value QR Implementations on Multicore CPU with a GPU,” ACM Transactions on Mathematical Software (TOMS), vol. 43, issue 2, October 2016.
Anzt, H., E. Chow, T. Huckle, and J. Dongarra, “Batched Generation of Incomplete Sparse Approximate Inverses on GPUs,” Proceedings of the 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, pp. 49â56, November 2016. DOI: 10.1109/ScalA.2016.11
Bosilca, G., A. Bouteiller, A. Guermouche, T. Herault, Y. Robert, P. Sens, and J. Dongarra, “Failure Detection and Propagation in HPC Systems,” Proceedings of the The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Salt Lake City, Utah, IEEE Press, pp. 27:1-27:11, November 2016.
Anzt, H., J. Dongarra, and E. S. Quintana-Orti, “Fine-grained Bit-Flip Protection for Relaxation Methods,” Journal of Computational Science, November 2016. DOI: 10.1016/j.jocs.2016.11.013 (1.47 MB)
Yamazaki, I., S. Tomov, and J. Dongarra, “Non-GPU-resident Dense Symmetric Indefinite Factorization,” Concurrency and Computation: Practice and Experience, November 2016. DOI: 10.1002/cpe.4012
Anzt, H., E. Chow, and J. Dongarra, “On block-asynchronous execution on GPUs,” LAPACK Working Note, no. 291, November 2016. (1.05 MB)
Lopez, M. G., V. Larrea, W. Joubert, O. Hernandez, A. Haidar, S. Tomov, and J. Dongarra, “Towards Achieving Performance Portability Using Directives for Accelerators,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Third Workshop on Accelerator Programming Using Directives (WACCPD), Salt Lake City, Utah, Innovative Computing Laboratory, University of Tennessee, November 2016. (567.02 KB)

Recent Conferences

OCT
3

CCDSC Lyon, France
Anthony
Jack

Anthony Danalis, Jack Dongarra
NOV
1

Linux Plumbers and PPoPP PC Santa Fe, New Mexico
George

George Bosilca
NOV
12-13

SC16 salt lake city, Utah
Anthony
Aurelien
George
Hartwig
Jack
Jakub
Piotr
Reazul
Terry
Arm
Thomas
Tracy
Mike

Anthony Danalis, Aurelien Bouteiller, George Bosilca, Hartwig Anzt, Jack Dongarra, Jakub Kurzak, Piotr Luszczek, Reazul Hoque, Terry Moore, Thananon Patinyasakdikul, Thomas Herault, Tracy Rafferty, Yaohung Tsai
NOV
29

ECP PI Meeting Lemont, IL
George
Heike
Jack

George Bosilca, Heike Jagode, Jack Dongarra

Upcoming Conferences

DEC
5

PEEKS Kick-off Meeting Albuquerque, New Mexico
Ichitaro

Ichitaro Yamazaki
DEC
5

MPI Forum Dallas, Texas
Aurelien

Aurelien Bouteiller
DEC
12

TESSE Workgroup Meeting New York, New York
Damien
George
Thomas

Damien Genet, George Bosilca, Thomas Herault

Recent Lunch Talks

OCT
7
Piotr Luszczek
What Deep Learning?!? PDF
OCT
14
Yves Robert
INRIA
Failure Detection and Propagation in HPC systems PDF
OCT
21
Harry Hughes
A Simulation-based System to Optimize Tile Size Parameters in PLASMA PDF
OCT
28
Frank Winkler
ORNL
Performance Analysis at Scale: The Score-P Tools Infrastructure PDF
NOV
4
Reazul Hoque
Dynamic Task Discovery in PaRSEC PDF
NOV
11
Thananon Patinyasakdikul
Multithreaded MPI PDF

Upcoming Lunch Talks

DEC
2
Stephen Richmond
UCX as Communication Backend for PaRSEC
DEC
9
Wei Wu
Topology-aware Collective of CUDA-aware Open MPI
DEC
16
Chongxiao Cao

Visitors

Khairul Kabir from NVIDIA will be visiting from October 24 through November 5. Working on research with the Linear Algebra Group
Camille Coti from Universite Paris 13 will be visiting from November 7 through November 8. Camille with be working with the Distributed Computing group.

People

Khairul Kabir of NVIDIA will be visiting through Nov. 5th working with the Linear Algebra group.

Visitors

Khairul Kabir from NVIDIA will be visiting from October 24 through November 5. Working on research with the Linear Algebra Group
Camille Coti from Universite Paris 13 will be visiting from November 7 through November 8. Camille with be working with the Distributed Computing group.

congratulations

Amina Guermouche

As a member of distributed computing group led by George Bosilca, Amina worked mainly on the Task-based Environment for Scientific Simulation of Extreme Scale (TESSE). She now has a new role as associate professor at Telecom SudParis. Best wishes, Amina!

Sam Crawford

All the very best to Sam, formerly an information specialist at ICL, in his new role as a technical editor and writer at ORNL.

Sofia Tomov

At the age of 12, Sofia Tomov, daughter of ICL’s Stanimire Tomov, is already taking on one of the great challenges of modern medicine: the prevention of adverse or life-threatening reactions to medication. To address the problem, Sofia has devised a computer algorithm that would allow doctors to determine if their patients have genetic mutations indicative of high risk for dangerous reaction. Her outstanding work has garnered her a spot as a finalist in the 2016 Discovery Education 3M Young Scientist Challenge, a prestigious science competition for middle school students. Sofia says she hopes that one day her algorithm will help save lives around the world and that she believes its use will become extremely widespread. For more about this amazing, mighty girl, see her Facebook page.

Dates to Remember

ICL Dinner at SC16

Caffé Molise in downtown Salt Lake City will be the locale for the ICL SC16 dinner, which takes place on November 16 at 7:00pm. The restaurant is located at 55 West 100 South.

To confirm your attendance, contact Tracy Rafferty (rafferty@icl.utk.edu).

November 2016

News and Announcements

SC16

Dongarra Becomes a Foreign Member of the Russian Academy of Sciences

PULSE and BONSAI Grants Awarded

Dongarra Shares Insights on the TOP500 and More

Conference Reports

CCDSC 2016

ASCR

HPEC ’16

EuroMPI

Recent Releases

PAPI 5.5.0

MAGMA 2.1

Open MPI 2.0.1

Interview

Nuria Losada

Recent Papers

Recent Conferences

Upcoming Conferences

Recent Lunch Talks

Upcoming Lunch Talks

Visitors

People

Visitors

congratulations

Amina Guermouche

Sam Crawford

Sofia Tomov

Dates to Remember

ICL Dinner at SC16

Archives

PDF Editions