ICL Newsletter

News and Announcements

Winter Reception 2017

Play Prev|Next

In celebration of another outstanding year, the ICL Winter Reception took place at Calhoun’s on the River, once again bringing together a sizable number of ICLers and their guests for an enjoyable evening. Here’s to further success in 2017!

ECP First Annual Meeting — Photo Gallery

The US Department of Energy’s Exascale Computing Project (ECP) has published a brief article and image gallery to convey the essence of its first annual meeting January 29–February 3.

ICL representatives from the seven ECP projects in which we are involved were among the 450 attendees, and are featured in some of the pictures.

See the content here.

Conference Reports

Open MPI Developers Meeting

On January 24–26, ICL’s George Bosilca attended the Open MPI Developers Meeting in San Jose, CA, in which Open MPI’s core developers addressed various subjects ranging from technical topics to software release.

“One of the most interesting outcomes was that we decided to switch the software releases to a time-based approach—similar to the Linux kernel—in which we will issue a stable release every four to six months, with a two-month stabilization period,” George said.

In addition to discussing the current status of different software release versions, the attendees selected the release managers for version 3.0, pending confirmations by their respective institutions; and they conducted a number of technical discussions concerning the future of the code base, vendor participation, user feedback, and other major roadmaps for the next six months, the period preceding the next meeting.

PPoPP 2017

ICL’s Hartwig Anzt and Ahmad Ahmad were speakers at workshops held in conjunction with the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP) 2017, which took place February 4–8 in Austin, TX.

Hartwig gave a talk on batched Gauss-Jordan elimination for block-Jackobi preconditioning during the 8th International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM). Also during the symposium, Hartwig attended a tutorial on heterogeneous computing.

Ahmad was a speaker at the 10th workshop about general purpose processing using GPUs (GPGPU-10), where he presented a technical paper on performing the one-sided factorization using only GPUs. In contrast with the MAGMA hybrid CPU+GPU routines, this is a new category of algorithms in which the CPU is not involved in performing any computation.

Ahmad demonstrated the ability of the new algorithms to be within 90 percent of the performance of the hybrid techniques while using fewer resources and consuming less power.

SRA International Winter Intensive Training Programs

ICL’s Tracy Rafferty and Cindy Knisley were in Atlanta February 16–17 for the Society of Research Administrators International 2017 Winter Intensive Training Programs, where they were able to sharpen their knowledge and skills in the development and business sides of proposal work, respectively.

The conference, which drew approximately fifty attendees and ran concurrently over two days, allowed participants to focus on a specific area of interest that would promote their professional growth.

Tracy attended the Grant Writing and Proposal Development session. Her goal was to learn more about how to cultivate successful proposals, including positioning projects for funding, planning submissions, and satisfying reviewers.

The workshop leaders were Tolise Dailey, training development specialist in the Office of Contracts and Grants at the University of Colorado, and Johnny Toma, deputy director of finance and administration at Georgetown University Medical Center.

“The two presenters were very good and accomplished in their work,” Tracy said. “The session included actual examples of what works and what doesn’t, and plenty of Q&A. As it turns out, there were three attendees from the UT Knoxville Faculty Development team, as well as two from UT Chattanooga.”

Cindy went to the Departmental Administration workshop, which was aimed at departmental administrators of research-sponsored awards. Both pre- and post-award components were discussed, beginning with the basic elements of proposal development and continuing through managing the day-to-day activities of sponsored awards.

Emphasis during this workshop was on minimizing audit risks, budgets, budget justifications relative to allowable direct costs, direct costs vs indirect costs, travel on sponsored awards, sponsored projects vs gifts, grants vs cooperative agreement vs contract, and methods of determining if a cost is appropriate to charge to a sponsored award.

The workshop also incorporated how to communicate with principal investigators concerning proposal development and daily management needs.

NSF 2017 SI2 PI Meeting

About 150 principal investigators who are currently funded by the National Science Foundation’s Software Infrastructure for Sustained Innovation (SI2) program met in Arlington, VA, February 21–22, for a mix of presentations and breakout sessions focused on fostering collaborations and discussion about topics of interest to the attendees. Jakub Kurzak and Anthony Danalis represented ICL.

Jakub presented on the BONSAI project during the poster session. BONSAI, which stands for BEAST on OpeN Software Attuning Infrastructure, will develop a software infrastructure for using parallel hybrid systems at any scale to execute large, concurrent autotuning sweeps that will make the optimization of computational kernels for GPU accelerators and manycore coprocessors faster.

Workshop on Batched, Reproducible, and Reduced-precision BLAS

ICL was integrally involved in the second Workshop on Batched, Reproducible, and Reduced-precision Basic Linear Algebra Software Library (BLAS) in Atlanta, February 24–25. Participants sought to investigate the possibility of extending the currently accepted standards to provide greater parallelism for small-size operations, reproducibility, and reduced-precision support.

Presentations and discussions centered around the separate issues related to defining a standard interface for the batched BLAS, reproducible BLAS, and reduced-precision BLAS.

Hardware and software vendors and developers reported on their needs relative to numerical linear algebra software for current and future systems. The authors of the batched, reproducible, and reduced-precision BLAS presented the current proposal and various implements (reference and more-specific ones). The participants discussed various aspects of the plans.

Jack Dongarra delivered the welcome and introduced the participants, whose affiliations were ICL/UT Knoxville; the University of Manchester; the University of California, Berkeley; Sandia National Laboratories; Umea University; Aachen University; King Abdullah University of Science & Technology; Texas A&M; ARM; Numerical Algorithms Group (NAG); MathWorks; Cray; and Adobe.

ICLers presented on the following topics:

Autotuning—Jakub Kurzak and Piotr Luszczek
A Proposed Modification to the Batch BLAS Interface—Ahmad Ahmad
MAGMA Batched Computations: Approaches and Applications—Stan Tomov
High Performance Design of Batched Tensor Computations: Performance Analysis, Modeling, Tuning, and Optimization—Azzam Haidar
Half-precision Benchmarks—Piotr

The workshop was sponsored in part by Intel, ICL, and Georgia Tech.

SIAM Conference

BLAS-workshop-group

ICL gave numerous talks at the SIAM Conference on Computational Science and Engineering, February 27–March 3, in Atlanta. SIAM is the Society for Industrial and Applied Mathematics.

ICL Talks at SIAM 2017

Accelerating the Singular Value Decomposition with a Hybrid Two-Stage Algorithm—Mark Gates
Batch Linear Algebra for GPU-Accelerated High Performance Computing Environments—Ahmad Ahmad
Dense LA at Exascale—Jakub Kurzak
Implementation Techniques for Point Set Registration on Multicore and Hetergoneneous Hardware—Piotr Luszczek
Multilevel Parallelism for SU/PG Discretization Within HPCMP Create™-AV COFFE for Compressible Rans Equations on Fully Tetrahedral Meshes—Stephen Wood
Preconditioning on Parallel and Hybrid Architectures—Hartwig Anzt
Programming Model, Performance Analysis and Optimization Techniques for the Intel Knights Landing Xeon Phi—Azzam Haidar
Randomized Algorithm for Computing or Updating SVD—Ichitaro Yamazaki
Tensor Contractions with MAGMA Batched—Stan Tomov
The Design and Implementation of a Dense Linear Algebra Library for Extreme Parallel Computers—Jack Dongarra
Toward Distributed Eigenvalue and Singular Value Solver Using Data Flow System—Azzam Haidar

SIAM exists to ensure the strongest interactions between mathematics and other scientific and technological communities through membership activities, publication of journals and books, and conferences.

The SIAM Conference was sponsored by the SIAM Activity Group on Computational Science and Engineering.

ICLer Gives Dissertation Talk at Sandia

Stephen Wood, an ICL postdoctoral research associate, gave a talk based on his dissertation titled “Lattice Boltzmann Methods for Wind Energy Analysis” to seventeen members of the Sandia National Laboratories Wind Energy Technologies (WET) department, led by Dave Minster.

The talk covered steps Stephen followed to build, verify, and validate a lattice Boltzmann implementation for wind turbine analysis. Emphasis was placed on the assessment of turbulence modeling options and the analysis of simulation results.

Stephen also met with Srini Arunajatesan, leader of the High Fidelity Modeling group, and Paul Crosier, leader of the software development efforts for the Multiscale Computational Materials Methods group.

David Maniacci, the rotor blade and wind plant aerodynamics lead within WET, invited Stephen to give his talk at Sandia following discussions regarding the details of wind turbine simulation at the American Institute of Aeronautics and Astronautics (AIAA) SciTech Forum in January.

“The opportunities for collaboration through, and surrounding, the [US Department of Energy’s] Exascale Computing Project were recognized by all,” Stephen said. “We will continue our discussions and begin planning collaborations.”

Interview

Matthew Bachstein Then — Matthew Bachstein

Where are you from originally?

I mostly grew up in Lebanon, TN, though I moved around a lot when I was younger.

Can you summarize your educational background?

I have my BS in mathematical sciences from Clemson, and my MS in mathematics from UT Knoxville, both focused in applied mathematics.

What are your research interests?

Currently, optimization of GPU kernels.

What made you want to work for ICL?

I realized that I really liked the implementation and optimization aspects of numerical algorithms, which led to the switch from mathematics to computer science.

What are you working on while at ICL?

I am currently working on the BONSAI project with Piotr and Jakub.

If you weren’t working at ICL, where would you like to be working and why?

Not sure.

What are your interests/hobbies outside work?

Video games, of course. But I also enjoy fly-fishing and small-scale woodworking.

Tell us something unique about yourself that might surprise some people.

I very much enjoy cooking.

Recent Papers

Anzt, H., J. Dongarra, G. Flegar, and E. S. Quintana-Orti, “Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioner Generation on GPUs,” Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, New York, NY, USA, ACM, pp. 1â10, February 2017. DOI: 10.1145/3026937.3026940 (552.62 KB)
Haidar, A., A. Abdelfattah, S. Tomov, and J. Dongarra, “High-performance Cholesky Factorization for GPU-only Execution,” Proceedings of the General Purpose GPUs (GPGPU-10), Austin, TX, ACM, February 2017. DOI: 10.1145/3038228.3038237 (872.18 KB)
Abdelfattah, A., M. Baboulin, V. Dobrev, J. Dongarra, C. Earl, J. Falcou, A. Haidar, I. Karlin, T. Kolev, I. Masliah, et al., Accelerating Tensor Contractions in High-Order FEM with MAGMA Batched , Atlanta, GA, SIAM Conference on Computer Science and Engineering (SIAM CSE17), Presentation, March 2017. (9.29 MB)
Baboulin, M., J. Dongarra, A. Remy, S. Tomov, and I. Yamazaki, “Solving Dense Symmetric Indefinite Systems using GPUs,” Concurrency and Computation: Practice and Experience, vol. 29, issue 9, March 2017. DOI: 10.1002/cpe.4055 (1.94 MB)

Recent Conferences

FEB
4

Principles and Practice of Parallel Programming (PPoPP) Austin, Texas
Hartwig

Hartwig Anzt
FEB
4-5

GPGPU-10 Workshop Austin, Texas
Ahmad

Ahmad Abdelfattah Ahmad
FEB
23-3

SIAM CSE + Batched BLAS Atlanta, Georgia
Ahmad
Anthony
Aurelien
Azzam
Dongarra
George
Hartwig
Ichitaro
Jack
Jakub
Mark
Piotr
Stan
Stephen
Thomas
Mike
Yves

Ahmad Abdelfattah Ahmad, Anthony Danalis, Aurelien Bouteiller, Azzam Haidar, Dongarra, George Bosilca, Hartwig Anzt, Ichitaro Yamazaki, Jack Dongarra, Jakub Kurzak, Mark Gates, Piotr Luszczek, Stanimire Tomov, Stephen Wood, Thomas Herault, Yaohung Tsai, Yves Robert
FEB
27-3

SIAM Conference on Computational Science and Engineering Atlanta, Georgia
Damien
George

Damien Genet, George Bosilca
FEB
27-3

SIAM Conference on Computational Science and Engineering Atlanta, Georgia
Damien
George

Damien Genet, George Bosilca
FEB
27-2

MPI Forum Portlan, Oregon
Aurelien

Aurelien Bouteiller
MAR
8

BDEC Wuxi, China
Jack
Piotr
Terry
Tracy

Jack Dongarra, Piotr Luszczek, Terry Moore, Tracy Rafferty
MAR
13-15

TESSE Workgroup Meeting Roanoke, Virginia
Damien
George
Jeffrey
Thomas

Damien Genet, George Bosilca, Jeffrey Steill, Thomas Herault

Upcoming Conferences

APR
4-5

Future Online Analysis Platform Arlington, Virginia
Piotr
Terry

Piotr Luszczek, Terry Moore

Recent Lunch Talks

FEB
3
George Ostrouchov
ORNL
Bringing Modern Statistical Science to HPC PDF
FEB
10
Scott Klasky
ORNL
Creating ADIOS-2 for scientific exascale data PDF
FEB
17
Arvind Ramanathan
ORNL
Computer to Cure Cancer: Building Exascale Deep Learning Tools for Effective Cancer Surveillance PDF
MAR
10
Nuria Losada
Stop&Restart vs Resilient MPI applications PDF
MAR
17
Stephen Wood
Multilevel Parallelism for CFD: Preconditioning via Parallel ILU Factorization PDF
MAR
31
Piotr Luszczek
Implementation Techniques for Point Set Registration on Multicore and Heterogeneous Hardware PDF

Upcoming Lunch Talks

APR
7
Benjamin Hernandez
ORNL
GPU Refinement Learning for Crowd Simulations PDF
APR
21
Sangamesh Ragate
Sequence Labeling Using RNN PDF
APR
28
Azzam Haidar
Toward Distributed Eigenvalue and Singular Value Solver Using Data Flow System PDF

Visitors

Sticks Mabakane from Council for Scientific and Industrial Research (South Africa) will be visiting from February 25 through March 12. Sticks is working with the PAPI group on performance visualizations.

Visitors

Sticks Mabakane from Council for Scientific and Industrial Research (South Africa) will be visiting from February 25 through March 12. Sticks is working with the PAPI group on performance visualizations.

congratulations

New Arrival

Rowan-McMillan-Aultman

ICL’s Terry Moore is now a granddad! Rowan McMillan Aultman was 21.5 inches long and weighed 8.9 pounds when he was born on February 20 in Tallahassee, FL. Mother and baby are doing fine. Congratulations, Terry!

13-year-old Daughter of ICLer to Attend UT

sofia You may recall having read previously in this newsletter about Sophia Tomov, the daughter of ICL’s Stanimire Tomov, and her winning a spot as a finalist in the 2016 Discovery Education 3M Young Scientist Challenge, a prestigious science competition for middle school students.

Now a new development to report: Sophia, who was accepted into UT as a visiting high school student during the fall of 2016 at the age of 12, will take a computer science course in coding this fall at UT.

Sophia, who hopes to pursue a career in sustainable energy, is the first Tennessee recipient of the Caroline D. Bradley Scholarship, a scholarship for gifted high school students. She is currently a home-schooled student in the eighth grade.

Read more about Sophia’s story in UT’s Daily Beacon publication.

ICL Student Worked on Innovative TV Switcher

Before coming to UT and ICL to pursue higher studies, graduate research assistant Hanumantharayappa had the opportunity to contribute his talents to the team developing a technology product that is receiving high-profile attention.

The product, called Caavo, is a TV switcher, or hub, with eight HDMI ports that unifies streaming television boxes and services such as cable, satellite, apps, and subscriptions.

You can read more about Caavo in the magazine Fast Company and on the website TechCrunch.

March 2017

News and Announcements

Winter Reception 2017

ECP First Annual Meeting — Photo Gallery

Conference Reports

Open MPI Developers Meeting

PPoPP 2017

SRA International Winter Intensive Training Programs

NSF 2017 SI2 PI Meeting

Workshop on Batched, Reproducible, and Reduced-precision BLAS

SIAM Conference

ICLer Gives Dissertation Talk at Sandia

Interview

Matthew Bachstein

Recent Papers

Recent Conferences

Upcoming Conferences

Recent Lunch Talks

Upcoming Lunch Talks

Visitors

Visitors

congratulations

New Arrival

13-year-old Daughter of ICLer to Attend UT

ICL Student Worked on Innovative TV Switcher

Archives

PDF Editions