ICL Newsletter

News and Announcements

Employment Opportunities at ICL

ICL is seeking full-time Research Scientists (MS or PhD) to participate in the design, development, and maintenance of numerical software libraries for solving linear algebra problems on large, distributed-memory machines with multi-core processors, hardware accelerators, and performance monitoring capabilities for new and advanced hardware and software technologies.

The prospective researcher will coauthor papers to document research findings, present the team’s work at conferences and workshops, and help lead students and other team members in their research endeavors in ongoing and future projects. Given the nature of the work, there will be opportunities for publication, travel, and high-profile professional networking and collaboration across academia, labs, and industry.

An MS or PhD in computer science, computational sciences, or math is preferred. Background in at least one of the following areas is also preferred: numerical linear algebra, HPC, performance monitoring, machine learning, or data analytics.

For more information check out ICL’s jobs page: http://www.icl.utk.edu/jobs.

Happy Birthday, Terry!

Terry Moore, ICL’s long-time Associate Director, celebrated his 70th birthday at the NSF Workshop on Smart Cyberinfrastructure in Crystal City, VA. Fortunately for Terry, Michela Taufer was also in attendance and not only remembered his birthday but also arranged a cake for the occasion.

Following his return, ICL admin arranged for Terry to celebrate his birthday again during Friday Lunch. Happy Birthday, Terry!

The Editor would like to thank Hidehiko Hasegawa and Jack Dongarra for their photo contributions.

Conference Reports

2020 ECP Annual Meeting

The 2020 Exascale Computing Project (ECP) Annual Meeting, held on February 3–7, brought us a second time to the hot and humid city of Houston, Texas. Like last year, the Royal Sonesta, located in Uptown Houston just a few feet away from Texas’s largest shopping mall, became home for more than a dozen ICLers.

After a day of industry partners presenting their plans for Exascale, the Director of DoE, Dr. Chris Fall, gave a keynote in which he expressed his excitement about the project but also indicated that people are thinking about what follows after the completion of ECP.

Other highlights included a session on “Understanding Performance with Exa-PAPI” organized by Heike Jagode and her team, a session on “Integrating PaRSEC-Enabled Libraries in Scientific Applications,” and a “Math Library Speed Dating Session.”

The sessions were complemented with poster presentations that were well attended and offered a nice environment for further discussion. The meeting was also a great place to catch up with ICL alum who still contribute to the success of ECP from other institutions. Familiar faces included Ichitaro Yamazaki (now working at Sandia National Labs) and Jakub Kurzak (now working for AMD).

With several years still to go in this ambitious endeavor, we anticipate seeing everyone again next year!

The Editor would like to thank Hartwig Anzt for generously providing the content above. The Editor also thanks Heike Jagode, George Bosilca, Earl Carr, and Piotr Luszczek for their photo contributions.

SIAM PP20

The 2020 SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP20) was held in Seattle, Washington on February 12–15, 2020. SIAM is typically one of ICL’s most heavily attended conferences—second only to SC—and SIAM PP20 was in keeping with this tradition. There is a lot of ground to cover here, so let’s get rolling.

New workloads driven by machine learning and artificial intelligence applications—and their accelerated performance on new and emerging hardware—is pumping a lot of excitement into mixed and low-precision methods in linear algebra. As you might expect then, the “Advances in Algorithms Exploiting Low Precision Floating-Point Arithmetic” mini symposium was a fan favorite, where Ahmad Abdelfattah presented “Recent Half Precision Developments in the MAGMA Library,” to discuss new mixed-precision solvers in the MAGMA library, which are aimed at GPUs with native low-precision capabilities (e.g., NVIDIA’s Tensor Cores).

ICL alum Azzam Haidar was also in this session to discuss NVIDIA’s latest work in mixed precision. According to Azzam’s presentation, “Mixed Precision Numerical Techniques Accelerated with Tensor Cores and its Impact on Today’s Scientific Computing and Implications for Tomorrow’s Hardware Design,” NVIDIA is actively working on this problem by using their Tensor cores to accelerate common linear algebra routines at lower precisions while maintaining the accuracy of the solution.

Taking another angle on mixed precision, ICL Graduate Research Assistant Neil Lindquist looks at “Improving the Performance of GMRES with Mixed Precision.” Given the memory-bound nature of GMRES, Neil looks at storing vectors in single precision to lighten the burden on the memory and help alleviate some of this bottleneck for improved performance overall—all while maintaining the accuracy of the solution.

Moving on from the mixed-precision action, let’s talk about SLATE. Mark Gates was on hand to discuss “The Sustainability Lessons of the SLATE Project,” where he described the software engineering practices leveraged by the SLATE team to minimize the impact of rapidly changing technology and hardware and maximize SLATE’s sustainability in the long term.

Hartwig presented his work on “Sparse Matrix Vector Product on High-End GPU Clusters,” where he compares his hybrid algorithm against vendor kernels for both AMD and NVIDIA (i.e., Ginkgo vs. HIP vs. CUDA). Hartwig also presented other developments in the Ginkgo project during the poster session.

Stan “the Man” Tomov brought out a new heFFTe poster to show off the latest developments in ICL’s ECP FFT effort. The heFFTe library (heFFTe 0.1 was released in October 2019) is the first software release from the ECP FFT project.

Mike Tsai also brought a poster to illustrate his work in “Using Quantized Integer in LU Factorization with Partial Pivoting,” where he showed ICL’s preliminary results for using quantization integers for LU factorization with partial pivoting.

Piotr Luszczek presented “Evaluation of Large Scale Systems with Focus on Application Performance: the Benchmarking Perspective,” where he described using results from four widely known benchmarks to inform both hardware and software development—earlier in the engineering phases—to ensure better performance and compatibility.

Moving on from all the mixed-precision and linear algebra hullabaloo, it’s time to DisCo. Yes, the Distributed Computing group was also on hand at SIAM PP20.

George Bosilca presented “New Non-Blocking Extensions to the ULFM Proposal,” with the key principle being that no MPI call (e.g., point-to-point, collective, RMA, I/O) can block indefinitely after a failure but must either succeed or raise an MPI error. ULFM and its extensions are currently under review by the MPI Forum’s Fault Tolerance Working Group.

Yu Pei presented his work on the “Evaluation of Programming Models to Address Load Imbalance on Distributed Multi-Core CPUs: A Case Study with Block Low-Rank Factorization.” Yu and his co authors used Block Low-Rank LU factorization as a test case to study the programmability and performance of five different programming approaches: (1) flat MPI, (2) Adaptive MPI, (3) MPI + OpenMP, (4) parameterized task graph, and (5) dynamic task discovery (DTD). These approaches were then analyzed on an Intel Haswell-based system to see the effectiveness of each implementation in addressing load imbalance.

Finally, Thomas Herault talked about “Novel Approaches to Optimize and Execute Task-Based, Irregular Applications on Extreme-Scale, Heterogeneous Systems using PaRSEC” as part of his group’s effort in the National Science Foundation–funded Epexa project.

Next year’s meeting will move out of the Pacific Northwest to Munich, Germany. Take pictures.

The Editor would like to thank Hidehiko Hasegawa, Mike Tsai, and Piotr Luszczek for their contributions to this article.

Interview

Where are you from, originally?
I am originally from Guangzhou, a city in southern China.

Can you summarize your educational background?
I went to Sun-Yat Sen University in our province for biotechnology, at first, but then I switched and earned my BS in statistics instead. This was before the craze of machine learning and deep learning. Then I came to the United States to earn my MS in statistics at UC Davis. Through a sequence of events, I am now doing my PhD here at ICL!

Where did you work before joining ICL?
After my MS and before my PhD, I stayed at the UC Davis energy center as a junior researcher briefly, then I worked at ORNL for six months.

How did you first hear about the lab, and what made you want to work here?
I heard about the lab via my supervisor’s friend at ORNL. I have used LAPACK before, so when I saw ICL is one of the main developers of it I knew it would be a great place to pursue my PhD.

What is your focus here at ICL? What are you working on?
I am working in the DisCo group and specifically working in the PaRSEC project. I am exploring the optimization of several numerical algorithms on top of the runtime system. George also mentioned that I will have a focus on the programming language aspect, and I am excited for that!

What are your interests/hobbies outside of work?
I play some pick-up basketball at the gym, and I read all kinds of novels. I have a PS4, but it has been a while since I have had time to play.

Tell us something about yourself that might surprise people.
I was on my high school’s volleyball team, and when it was time for college, I had thought about going into athletics. Life is like a box of chocolates.

If you weren’t working at ICL, where would you like to be working and why?
Doing research is fun, so ideally some place research focused. Working in distributed computing and machine learning would be cool.

Recent Papers

Gates, M., S. Tomov, H. Anzt, P. Luszczek, and J. Dongarra, Clover: Computational Libraries Optimized via Exascale Research , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020. (872 KB)
Bosilca, G., T. Herault, and J. Dongarra, DTE: PaRSEC Enabled Libraries and Applications (Poster) , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020. (979.27 KB)
Bosilca, G., T. Herault, and J. Dongarra, DTE: PaRSEC Systems and Interfaces (Poster) , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020. (840.54 KB)
Jagode, H., A. Danalis, and J. Dongarra, Exa-PAPI: The Exascale Performance API with Modern C++ , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020. (556.78 KB)
Anzt, H., T. Cojean, Y-C. Chen, F. Goebel, T. Gruetzmacher, P. Nayak, T. Ribizel, Y-H. Tsai, and J. Dongarra, Ginkgo: A Node-Level Sparse Linear Algebra Library for HPC (Poster) , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020. (699 KB)
Ayala, A., S. Tomov, A. Haidar, and J. Dongarra, heFFTe: Highly Efficient FFT for Exascale (Poster) , Seattle, WA, SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP20), February 2020. (1.54 MB)
Ayala, A., S. Tomov, J. Dongarra, and A. Haidar, heFFTe: Highly Efficient FFT for Exascale (Poster) , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020. (6.2 MB)
Han, L., L-C. Canon, J. Liu, Y. Robert, and F. Vivien, “Improved Energy-Aware Strategies for Periodic Real-Time Tasks under Reliability Constraints,” 40th IEEE Real-Time Systems Symposium (RTSS 2019), York, UK, IEEE Press, February 2020.
Tomov, S., MATEDOR: MAtrix, TEnsor, and Deep-learning Optimized Routines , Seattle, WA, 2020 NSF Cyberinfrastructure for Sustained Scientific Innovation (CSSI) Principal Investigator Meeting, February 2020. (2.28 MB)
Hori, A., K. Yoshinaga, T. Herault, A. Bouteiller, G. Bosilca, and Y. Ishikawa, “Overhead of Using Spare Nodes,” The International Journal of High Performance Computing Applications, February 2020. DOI: 10.1177%2F1094342020901885 (2.15 MB)
Jagode, H., and A. Danalis, PULSE: PAPI Unifying Layer for Software-Defined Events (Poster) , Seattle, WA, 2020 NSF Cyberinfrastructure for Sustained Scientific Innovation (CSSI) Principal Investigator Meeting, February 2020. (1.86 MB)
Winkler, F., “Redesigning PAPI's High-Level API,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-03: University of Tennessee, February 2020. (356.41 KB)
Gates, M., J. Kurzak, A. YarKhan, A. Charara, J. Finney, D. Sukkari, M. Al Farhan, I. Yamazaki, P. Wu, and J. Dongarra, SLATE Tutorial , Houston, TX, 2020 ECP Annual Meeting, February 2020. (12.14 MB)
Gates, M., A. Charara, J. Kurzak, A. YarKhan, M. Al Farhan, D. Sukkari, and J. Dongarra, SLATE: Software for Linear Algebra Targeting Exascale (POSTER) , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020. (546.56 KB)
Luszczek, P., and J. Dongarra, The PLASMA Library on CORAL Systems and Beyond (Poster) , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020. (550.86 KB)
Tsai, Y., P. Luszczek, and J. Dongarra, Using Quantized Integer in LU Factorization with Partial Pivoting (Poster) , Seattle, WA, SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP20), February 2020. (6.65 MB)
Bartlett, R., xSDK4ECP: Extreme-scale Scientific Software Development Kit for ECP (Poster) , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020. (1.54 MB)
Lopez, F., E. Chow, S. Tomov, and J. Dongarra, “Asynchronous SGD for DNN Training on Shared-Memory Parallel Architectures,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-04: University of Tennessee, Knoxville, March 2020. (188.51 KB)
Zaitsev, D., and P. Luszczek, “Docker Container based PaaS Cloud Computing Comprehensive Benchmarks using LAPACK,” Computer Modeling and Intelligent Systems CMIS-2020, Zaporizhzhoa, March 2020. (451.33 KB)
Brown, C., A. Abdelfattah, S. Tomov, and J. Dongarra, hipMAGMA v1.0 : Zenodo, March 2020. DOI: 10.5281/zenodo.3908549
Anzt, H., T. Cojean, C. Yen-Chen, J. Dongarra, G. Flegar, P. Nayak, S. Tomov, Y. M. Tsai, and W. Wang, “Load-Balancing Sparse Matrix Vector Product Kernels on GPUs,” ACM Transactions on Parallel Computing, vol. 7, issue 1, March 2020. DOI: 10.1145/3380930 (5.67 MB)
Wyrzykowski, R., E. Deelman, J. Dongarra, and K. Karczewski, “Parallel Processing and Applied Mathematics: 13th International Conference, PPAM 2019, Bialystok, Poland, September 8â11, 2019, Revised Selected Papers, Part I,” Lecture Notes in Computer Science, 1, no. 12043: Springer International Publishing, pp. 581, March 2020. DOI: 10.1007/978-3-030-43229-4
Wyrzykowski, R., E. Deelman, J. Dongarra, and K. Karczewski, “Parallel Processing and Applied Mathematics: 13th International Conference, PPAM 2019, Bialystok, Poland, September 8â11, 2019, Revised Selected Papers, Part II,” Lecture Notes in Computer Science, no. 12044: Springer International Publishing, pp. 503, March 2020. DOI: 10.1007/978-3-030-43222-5
Ribizel, T., and H. Anzt, “Parallel Selection on GPUs,” Parallel Computing, vol. 91, March 2020, 2019. DOI: 10.1016/j.parco.2019.102588 (1.43 MB)

Recent Conferences

FEB
2-7

2020 ECP Annual Meeting Houston, Tennessee
Alan
Anthony
Asim
Aurelien
Damien
Earl
George
Hartwig
Heike
Jack
Jamie
Mark
Piotr
Stan
Thomas
Tony

Alan Ayala, Anthony Danalis, Asim YarKhan, Aurelien Bouteiller, Damien Genet, Earl Carr, George Bosilca, Hartwig Anzt, Heike Jagode, Jack Dongarra, Jamie Finney, Mark Gates, Piotr Luszczek, Stanimire Tomov, Thomas Herault, Tony Castaldo
FEB
12-15

SIAM Conference on Parallel Processing for Scientific Computing (PP20) Seattle, Washington
Ahmad
Florent
George
Jack
Mark
Neil
Piotr
Qinglei
Sebastien
Stan
Thomas
Mike
Yu

Ahmad Abdelfattah, Florent Lopez, George Bosilca, Jack Dongarra, Mark Gates, Neil Lindquist, Piotr Luszczek, Qinglei Cao, Sebastien Cayrols, Stanimire Tomov, Thomas Herault, Yaohung Tsai, Yu Pei
FEB
17-21

MPI Forum Portland, Oregon
George

George Bosilca
FEB
25-27

NSF Workshop on Smart Cyberinfrastructure Crystal City, Virginia
Terry

Terry Moore
MAR
1-6

Dagstuhl Seminar Dagstuhl, Germany
George

George Bosilca
MAR
10-11

ECP Industry Council Argonne, Illinois
Mark

Mark Gates
MAR
21-27

16th Copper Mountain Conference On Iterative Methods Copper Mountain, Colorado
Florent

Florent Lopez
MAR
23-26

GPU Technology Conference (GTC 2020) San Jose, California
Ahmad
Alan

Ahmad Abdelfattah, Alan Ayala
MAR
23-26

GTC 2020 San Jose, California
Stan

Stanimire Tomov
MAR
24-26

BDEC Porto Porto, Portugal
Jack
Joan
Terry

Jack Dongarra, Joan Snoderly, Terry Moore

Recent Lunch Talks

FEB
21
Daniel Schultz
The Effects of MPI Oversubscription & Multi-Process Service on Distributed-memory FFT Libraries PDF
FEB
28
Mohammed Al Farhan
Optimizing the Cholesky factorization in SLATE PDF
MAR
6
Chris Gropp
Global Computing Laboratory
Understanding Machine Learning Systems using Adversarial Approaches
MAR
13
Ahmad Abdelfattah
Recent Developments for Mixed Precision Solvers in MAGMA

Upcoming Lunch Talks

APR
3
Yu Pei
Communication Avoiding 2D Stencil Implementations over PaRSEC Task-Based Runtime PDF
APR
17
Mark Gates
Test Matrix Generation PDF
APR
24
Dong Zhong
Using Arm Scalable Vector Extension to Optimize Open MPI PDF

People

Ali Charara recently took a position with NVIDIA in Santa Clara, CA. Congratulations and good luck, Ali!

March 2020