ICL Newsletter

News and Announcements

A Note from Jack

I assume you are feeling as I am, we are ready for this to be over and get back to some kind of normal. I suspect the new “normal” will be different from what we had before, but I’m sure we can work with it and figure it out.

Looking over the past 6 weeks, we are still here, our research is still progressing, we all have jobs, and we are working for the future with new proposals being funded and new ones in the planning stages. Remember: it is normal to be anxious and stressed in these times, and we will make it out the other end.

Stay safe, be smart, and wash your hands. We will be back in the office soon.

–Jack

Employment Opportunities at ICL

ICL is seeking full-time Research Scientists (MS or PhD) to participate in the design, development, and maintenance of numerical software libraries for solving linear algebra problems on large, distributed-memory machines with multi-core processors, hardware accelerators, and performance monitoring capabilities for new and advanced hardware and software technologies.

The prospective researcher will coauthor papers to document research findings, present the team’s work at conferences and workshops, and help lead students and other team members in their research endeavors in ongoing and future projects. Given the nature of the work, there will be opportunities for publication, travel, and high-profile professional networking and collaboration across academia, labs, and industry.

An MS or PhD in computer science, computational sciences, or math is preferred. Background in at least one of the following areas is also preferred: numerical linear algebra, HPC, performance monitoring, machine learning, or data analytics.

For more information check out ICL’s jobs page: http://www.icl.utk.edu/jobs.

Gonzalez Family Award for Excellence in Research

The Min H. Kao Department of Electrical Engineering & Computer Science just announced Jack Dongarra as the winner of the 2020 Gonzalez Family Award for Excellence in Research! Congratulations, Jack!

Staying “in touch” during a Pandemic: a Primer

Well, we are in it now. I don’t know about you guys, but this is my first global pandemic. Despite the many challenges this situation has presented before us, ICL is very fortunate to have the infrastructure and support of the University of Tennessee, which is encouraging students, faculty, and staff to learn, teach, and work remotely.

Working from home, or “telecommuting” as the ~~old fogies~~ managers call it, certainly isn’t a new thing, but this kind of en masse offsite productivity is unprecedented. To help with what is hopefully a temporary situation, we have a few pointers listed below.

Stay Connected

Meetings that used to be 5-minute “door knocks” now require an app, emails/scheduling, and your calendar. Rather than be discouraged, take another tack. Many groups within ICL are now holding more regular—but less formal—staff meetings in addition to the regular weekly or monthly discussions. These regularly scheduled “chats” can help bolster some of the communication channels that are otherwise suffering or altogether absent. Alternatively, you can schedule regular one-on-one chats with your teammates if your staff meetings are somewhat infrequent.

Be Available

Scheduled meetings can’t cover all the bases, so it is also important to be available and engaged. ICL doesn’t have an official chat or collaboration app, but many of us are using Zoom for video conferencing and Slack for chatting. It’s a good idea to keep these apps open and running during your designated business hours. Keep the app open, be signed in, and work as you usually would. Click here to sign into Zoom using your NetID. Send a request to icl-help@icl.utk.edu to be added to the ICL Slack channel.

Make a Space

It is easier said than done in many households, but if you can carve out a corner of your place for a workstation, you will be much happier. Many of you are sharing workspaces with spouses and kids—all of whom have their own work to do—so this is a difficult task. To make it easier, you can request equipment (e.g., peripherals, monitor stands, standing desks) through your Director. Some of it can be acquired from ICL’s offices, and some of it can be ordered and shipped to your home. In any case, having a space to work and avoid distractions is vital.

Designate Time

Transitioning from home to work and back again is made more difficult when it all happens in the same place. Having a dedicated workstation that you can walk away from certainly helps, but it is also critical that we have and adhere to some kind of regular schedule. Normal business hours can remain normal business hours. Though easier said than done, try to maintain a normal morning schedule, take a break for lunch and coffee, and then finish up for the day before dinner. It seems so simple, but the delineation between work time and home time is essential if the space has to remain the same.

Go, get out!

Go outside and get some Vitamin D. Do the social distancing, wear the mask—whatever—but go outside. A walk will do. Take the dog if you have one. Ride your bike. Go for a drive. There are still a lot of things to see out there, and you can avoid people if you think about it creatively (e.g., business parks, subdivisions, and wide-open parks instead of crowded greenways).

Just Ask

If you need help—whether it’s with your equipment, your work arrangement, or you simply want to talk to another human being—just ask. Ask your Director, ask your colleague, or ask The Editor. We’re all here for the duration.

Recent Releases

New ICL PowerPoint Template

A new ICL PowerPoint template is now available for download. Feel free to use it on an upcoming presentation.

A Keynote version of this template will be available soon.

MAGMA 2.5.3 Released

MAGMA 2.5.3 is now available. Matrix Algebra on GPU and Multicore Architectures (MAGMA) is a collection of next-generation linear algebra (LA) libraries for heterogeneous architectures. The MAGMA package supports interfaces for current LA packages and standards (e.g., LAPACK and BLAS) to allow computational scientists to easily port any LA-reliant software components to heterogeneous architectures.

Changes for MAGMA 2.5.3 include:

Enhancements to enable hipMAGMA generation from MAGMA to support AMD GPUs through single (MAGMA) sources;
New routine: added syrk in all precisions (needed for hipMAGMA);
New routine: added hemm/symm in all precisions (needed for hipMAGMA);
New routine: added GEMM-based herk and her2k in all precision (needed for hipMAGMA);
Bug fixed in cmake when USE_FORTRAN is OFF;
Bug fixed in example_sparse.c; and
Fixed support for half computation in magmablas_hgemm_batched tester for CUDA < 9.2.

Click here to download the tarball.

hipMAGMA 1.0.0 Released

The new hipMAGMA 1.0.0 library is now available. This initial release, based on MAGMA 2.5.2, provides MAGMA functionalities for AMD GPUs. All CUDA-based sources in MAGMA 2.5.2 have been converted to HIP and are provided in hipMAGMA 1.0.0. Note that this initial release does not include all BLAS functionalities.

Features include:

Specific tuning for AMD GPUs added for some BLAS routines; and
SYMM, SYRK, and SYR2K for all precisions (previously not available in MAGMA and not yet in rocBLAS) to enable solvers for generalized symmetric/Hermitian-definite eigenproblems.

Click here to download the tarball or check out the hipMAGMA branch on BitBucket.

PAPI 6.0 Released

PAPI 6.0 is now available. This release includes a new API for software-defined events (SDEs), a major revision of the “high-level API,” and several new components, including ROCM and ROCM_SMI (for AMD GPUs), powercap_ppc and sensors_ppc (for IBM Power9 and later), SDE, and the IO component (exposes I/O statistics exported by the Linux kernel). Furthermore, PAPI 6.0 ships with a new Counter Analysis Toolkit (CAT) that assists with native performance counter disambiguation through micro benchmarks.

Major changes for PAPI 6.0 include:

Added the rocm component to support performance counters on AMD GPUs.
Added the rocm_smi component; the System Management Interface (SMI) monitors power usage on AMD GPUs, which is also writeable by the user (e.g., to reduce power consumption on non-critical operations).
Added io component to expose I/O statistics exported by the Linux kernel (/proc/self/io).
Added SDE component, which enables HPC software layers to expose internal performance-critical behavior as SDEs in the PAPI interface.
Added “SDE API” to register performance-critical events that originate from HPC software layers, which are recognized as “PAPI counters” and can therefore be monitored with the standard PAPI interface.
Added powercap_ppc component to support monitoring and capping power usage on IBM PowerPC architectures (Power9 and later) using the powercap interface exposed through the Linux kernel.
Added sensors_ppc component to support monitoring of system metrics on IBM PowerPC architectures (Power9 and later) using the opal/exports sysfs interface.
Retired infiniband_umad component, it is superseded by infiniband.
Revised PAPI’s “high-level API” to make it more intuitive and effective for novice users and quick event reporting.
Added “counter_analysis_toolkit” (CAT) sub-directory to assist with native performance counter disambiguation through micro benchmarks.

Other changes include:

Standardized environment variables and implemented a simplified, unified approach for specifying libraries necessary for components, with overrides possible for special circumstances.
Eliminated component-level “configure” requirements.
Corrected Thread Local Storage (TLS) issues and race conditions.
Incorporated several bug fixes, documentation fixes and enhancements, and improvements to README files for user instruction and code comments.

For more detailed information on changes made for this release, see ChangeLogP600.txt for filenames or keywords of interest and change summaries, or go directly to the PAPI git repository.

Acknowledgments

This release is the result of efforts from many people. The PAPI team would like to express special thanks to Vince Weaver, Stephane Eranian (for libpfm4), William Cohen, Steve Kaufmann, Phil Mucci, Kevin Huck, Yunqiang Su, Carl Love, Andreas Beckmann, Al Grant, and Evgeny Shcherbakov.

Click here to download the tarball.

Interview

Where are you from, originally?
I was born in Nanjing, China.

Can you summarize your educational background?
I earned my Bachelor’s degree in Computer Science from Sichuan University in 2011. In 2012, I entered Stevens Institute of Technology and received a Master’s degree in Computer Science in 2014. After that, I joined ICL.

Where did you work before joining ICL?
Before joining ICL, I was a graduate student at Stevens Institute of Technology in Hoboken, New Jersey. I worked on a project that tries to map unexpected errors to existing error handlers in the code—at runtime—to make a software “heal” itself.

How did you first hear about the lab, and what made you want to work here?
I first heard about ICL when I was applying for PhD programs. I was interested in distributed computing, specifically, and when I searched with the keyword “distributed computing,” ICL was at the top.

Looking more at ICL’s work, the projects that George’s (DisCo) group is working on are very interesting to me. Since joining ICL, I have really enjoyed the friendly and supporting environment.

What is your focus here at ICL? What are you working on?
As above, I worked in the DisCo group at ICL, and I focused on improving the performance of MPI collective operations. I just defended my PhD in March!

What are your interests/hobbies outside of work?
I enjoy playing video games, watching movies, and watching basketball games. I am a Kings fan.

Tell us something about yourself that might surprise people.
I just became a dad! I am excited to share that my baby boy arrived on Feb 23rd. Mom and baby are healthy.

If you weren’t working at ICL, where would you like to be working and why?
I honestly do not know.

Recent Papers

Lu, Y., I. Yamazaki, F. Ino, Y. Matsushita, S. Tomov, and J. Dongarra, “Reducing the Amount of out-of-core Data Access for GPU-Accelerated Randomized SVD,” Concurrency and Computation: Practice and Experience, April 2020. DOI: 10.1002/cpe.5754 (1.43 MB)
Lopez, F., E. Chow, S. Tomov, and J. Dongarra, “Asynchronous SGD for DNN Training on Shared-Memory Parallel Architectures,” Workshop on Scalable Deep Learning over Parallel And Distributed Infrastructures (ScaDL 2020), May 2020. (188.51 KB)
Kolev, T., P. Fischer, A. Abdelfattah, S. Ananthan, V. Barra, N. Beams, R. Bleile, J. Brown, R. Carson, J-S. Camier, et al., “CEED ECP Milestone Report: Improve Performance and Capabilities of CEED-Enabled ECP Applications on Summit/Sierra,” ECP Milestone Reports: Zenodo, May 2020. DOI: 10.5281/zenodo.3860804 (28.12 MB)
Pei, Y., Q. Cao, G. Bosilca, P. Luszczek, V. Eijkhout, and J. Dongarra, “Communication Avoiding 2D Stencil Implementations over PaRSEC Task-Based Runtime,” 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), New Orleans, LA, IEEE, May 2020. DOI: 10.1109/IPDPSW50202.2020.00127 (1.33 MB)
Nicolae, B., J. Li, J. M. Wozniak, G. Bosilca, M. Dorier, and F. Cappello, “DeepFreeze: Towards Scalable Asynchronous Checkpointing of Deep Learning Models,” 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), Melbourne, VIC, Australia, IEEE, May 2020. DOI: 10.1109/CCGrid49817.2020.00-76 (424.19 KB)
Benoit, A., V. Le FÃ¨vre, P. Raghavan, Y. Robert, and H. Sun, “Design and Comparison of Resilient Scheduling Heuristics for Parallel Jobs,” 22nd Workshop on Advances in Parallel and Distributed Computational Models (APDCM 2020), New Orleans, LA, IEEE Computer Society Press, May 2020. (696.21 KB)
Losada, N., P. GonzÃ¡lez, M. J. MartÃn, G. Bosilca, A. Bouteiller, and K. Teranishi, “Fault Tolerance of MPI Applications in Exascale Systems: The ULFM Solution,” Future Generation Computer Systems, vol. 106, pp. 467-481, May 2020. DOI: 10.1016/j.future.2020.01.026 (2.06 MB)
Haidar, A., H. Bayraktar, S. Tomov, J. Dongarra, and N. J. Higham, “Mixed-Precision Solution of Linear Systems Using Accelerator-Based Computing,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-05: University of Tennessee, May 2020. (1.03 MB)
Gainaru, A., B. Goglin, V. HonorÃ©, P. Raghavan, G. Pallez, P. Raghavan, Y. Robert, and H. Sun, “Reservation and Checkpointing Strategies for Stochastic Jobs,” 34th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2020), New Orleans, LA, IEEE Computer Society Press, May 2020. (692.4 KB)
Bathie, G., L. Marchal, Y. Robert, and S. Thibault, “Revisiting Dynamic DAG Scheduling under Memory Constraints for Shared-Memory Platforms,” 22nd Workshop on Advances in Parallel and Distributed Computational Models (APDCM 2020), New Orleans, LA, IEEE Computer Society Press, May 2020. (317.93 KB)
Zhong, D., P. Shamis, Q. Cao, G. Bosilca, and J. Dongarra, “Using Arm Scalable Vector Extension to Optimize Open MPI,” 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID 2020), Melbourne, Australia, IEEE/ACM, May 2020. DOI: 10.1109/CCGrid49817.2020.00-71 (359.95 KB)

Recent Conferences

MAY
7-8

Variable Precision in Mathematical and Scientific Computing Workshop Providence, Rhode Island
Hartwig
Jack
Neil
Piotr
Mike

Hartwig Anzt, Jack Dongarra, Neil Lindquist, Piotr Luszczek, Yaohung Tsai
MAY
18-22

21st IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2020) New Orleans, Louisiana
Yu

Yu Pei

Upcoming Conferences

JUN
22-25

ISC High Performance 2020 Zoom
Hartwig
Heike
Jack

Hartwig Anzt, Heike Jagode, Jack Dongarra

Recent Lunch Talks

APR
3
Yu Pei
Communication Avoiding 2D Stencil Implementations over PaRSEC Task-Based Runtime PDF
APR
17
Mark Gates
Test Matrix Generation PDF
APR
24
Dong Zhong
Using Arm Scalable Vector Extension to Optimize Open MPI PDF
MAY
1
Stanimire Tomov
HeFFTe: FFT-ECP API and High-Performance Library Prototype for Multidimensional FFTs on Large-Scale Heterogeneous Systems PDF
MAY
8
Paula Olaya
Global Computing Laboratory
Building Containerized Environments for Reproducibility and Traceability of Scientific Workflows PDF
MAY
15
Jiali Li
Optimizing all-to-all Operation with Awareness of Network Topology PDF
MAY
22
Aurelien Bouteiller
Here's What's Fresh with MPI-4 PDF
MAY
29
Azzam Haidar
NVIDIA
How CUDA Math Libraries Can Help You Unleash the Power of the New NVIDIA A100 GPU

Upcoming Lunch Talks

JUN
5
Frank Winkler
PIKA: Center-Wide and Job-Aware Cluster Monitoring PDF
JUN
12
Hartwig Anzt
Porting Linear Algebra Libraries to the AMD Ecosystem PDF
JUN
19
Michael Wyatt
Global Computing Laboratory
AI4IO: A Suite of AI-Based Tools for IO-Aware HPC Resource Management
JUN
26
Tony Castaldo
Controlling Power on NVIDIA and AMD GPUs

People

Joseph Schuchart joined ICL's DisCo group at the beginning of May. Welcome back, Joseph!
Jamie Finney is leaving ICL to the join the Oak Ridge Leadership Computing Facility as an HPC Software Support Engineer. Congratulations and good luck, Jamie!
Xi Luo has had an exciting couple of months: first he welcomed his new baby boy on February 23rd and then he defended his PhD on March 4th. Congratulations, Xi!

May 2020