ICL Newsletter

News and Announcements

2013 ICL Retreat

This year’s ICL retreat moved to the RT Lodge in Maryville, Tennessee. A little closer to home, but secluded in its own way, the new venue provided an excellent platform for 2 days of talks, which covered student projects and summer internships, the lab’s progress in the areas of linear algebra, distributed computing, benchmarking, and performance analysis, along with recaps of administrative procedures. There was also some fun to be had at the RT Lodge, and the friendly staff provided excellent food and service for the duration of our visit. Here’s to another great year at ICL!

#1 Downloaded Paper in Parallel Computing

“From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming,” authored by Peng Du, Rick Weber, Piotr Luszczek, Stan Tomov, Greg Peterson, and Jack Dongarra, was published in Volume 38 of Parallel Computing in August 2012. In the first half of 2013, it was the #1 most frequently downloaded article from Parallel Computing. Click here to read the full paper. Abstract below.

In this work, we evaluate OpenCL as a programming tool for developing performance-portable applications for GPGPU. While the Khronos group developed OpenCL with programming portability in mind, performance is not necessarily portable. OpenCL has required performance-impacting initializations that do not exist in other languages such as CUDA. Understanding these implications allows us to provide a single library with decent performance on a variety of platforms. We choose triangular solver (TRSM) and matrix multiplication (GEMM) as representative level 3 BLAS routines to implement in OpenCL. We profile TRSM to get the time distribution of the OpenCL runtime system. We then provide tuned GEMM kernels for both the NVIDIA Tesla C2050 and ATI Radeon 5870, the latest GPUs offered by both companies. We explore the benefits of using the texture cache, the performance ramifications of copying data into images, discrepancies in the OpenCL and CUDA compilers’ optimizations, and other issues that affect the performance. Experimental results show that nearly 50% of peak performance can be obtained in GEMM on both GPUs in OpenCL. We also show that the performance of these kernels is not highly portable. Finally, we propose the use of auto-tuning to better explore these kernels’ parameter space using search harness.

Recent Releases

HPCC 1.4.3 Released

HPCC 1.4.3 has been released! The HPC Challenge (HPCC) benchmark suite is designed to assess the bounds of performance on many real-world applications for computational science at extreme scale. Included in the benchmark suite are tests for sustained floating point operations, memory bandwidth, rate of random memory updates, interconnect latency, and interconnect bandwidth. The main factors that differentiate the various components of the suite are the memory access patterns that, in a meaningful way, span the memory utilization space of temporal and spatial locality.

HPCC 1.4.3 includes the following updates:

Increased the size of scratch vector for local FFT tests that were missed in the previous version (reported by SGI).
Added Makefile for Blue Gene/P contributed by Vasil Tsanov.

Visit the HPCC software page to download the tarball.

MAGMA 1.4 Released

MAGMA 1.4 is now available. This release provides performance improvements and support for the new NVIDIA Kepler GPUs. More information is given in the MAGMA: a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures presentation. The MAGMA 1.4 release adds the following new functionalities:

Merge libmagmablas into libmagma to eliminate circular dependencies.
Link with just -lmagma now;
Add multi-GPU Hessenberg and non-symmetric eigenvalue routines:
geev_m, gehrd_m, unghr_m, ungqr_m;
Fix required workspace size in gels_gpu, gels3_gpu, geqrs_gpu, geqrs3_gpu;
Fix required workspace size in [zcsd]geqrf;
Add macro USE_INT64 to compile with int being 64-bit. See make.inc.int64;
Add panel factorizations for LU, QR, and Cholesky entirely on the GPU, correspondingly in [zcsd]getf2_gpu, [zcsd]geqr2_gpu, and [zcsd]potf2_gpu;
Add QR with pivoting in GPU interface (functions [zcsd]geqp3_gpu), and improve the performance for both CPU and GPU interface QRs with pivoting;
Add multi-GPU symmetric eigenvalue routines (one-stage):
[zhe|che|ssy|dsy]trd_mgpu,
[zhe|che|ssy|dsy]evd_m, [zhe|che|ssy|dsy]evdx_m,
[zhe|che|ssy|dsy]gvd_m, [zhe|che|ssy|dsy]gvdx_m;
Add single and multi-GPU symmetric eigenvalue routines (two-stage):
[zhe|che|ssy|dsy]evdx_2stage, [zhe|che|ssy|dsy]gvdx_2stage,
[zhe|che|ssy|dsy]evdx_2stage_m, [zhe|che|ssy|dsy]gvdx_2stage_m.

Visit the MAGMA software page to download the tarball.

PULSAR 1.0.0 Released

PULSAR 1.0.0 has been released! The first release of PULSAR provides a complete implementation of the PULSAR Runtime (PRT) for building and executing a Virtual Systolic Array (VSA). The implementation is based on Pthreads and MPI. This release includes two examples: tile QR (“domino” reduction) and tile LU (no pivoting).

Visit the PULSAR software page to download the tarball.

Interview

Where are you from, originally?

I am originally from Hoopeston, Illinois, a small rural town in east central Illinois.

Can you summarize your educational background?

I started college as a music performance major at Indiana University Music School, with my instrument being the oboe. After a couple of years of that, and seeing people graduate with music performance degrees and not get jobs, I switched to a math and chemistry double major. I earned a Master’s degree in math, and after teaching math and computer science as an instructor at Wichita State University for a few years, I decided to go for a PhD in Computer Science. I earned my PhD in Computer Sciences from Purdue University in 1990 in the area of distributed databases.

How did you first meet Jack?

I applied for a postdoc position that Jack had advertized in Communications of the ACM. The position was to work on the XNetlib project, which was an X-windows interface to the Netlib software repository. I first met Jack when I interviewed for the position. Although I had two other offers for postdoc positions, after meeting Jack and the XNetlib crew, I knew this was where I wanted to be.

Your time at ICL goes way back. How did your responsibilities at ICL evolve over the years?

I started out working on just the XNetlib project. When the National HPCC Software Exchange (NHSE) project started, I also got involved with that. As part of that project, I started working with some of the software, in particular with performance analysis tools. I became involved with proposal writing and became Associate Director of Research of ICL for a few years. So basically my responsibilities evolved from working on a single project under someone else’s direction to managing and working on a number of different projects.

What are some of your favorite memories from your time at ICL?

I have so many good memories that it’s hard to pick favorites. I guess my best memories are working with other ICLers on projects and being able to knock on anyone’s door at anytime and have a stimulating technical discussion. I really miss that now since I am pretty much on my own.

Tell us where you are and what you’re doing now.

I am starting my second year at the University of Texas at El Paso (UTEP) as an Associate Professor of Computer Science. My position is split between the Computer Science Department and the graduate Computational Science Program. Both programs offer both Master’s and PhD degrees. I have seven graduate students and one undergraduate working in my research lab.

In what ways did working at ICL prepare you for what you do now, if at all?

Working at ICL was great preparation for what I do now. While at ICL, I did research, published papers, wrote grant proposals, and worked with graduate students, and these are my main duties now with the addition of teaching courses.

Tell us something about yourself that might surprise some people.

I took a leave of absence from ICL from 2007-2010 to teach high school computer science on the south side of Chicago. Although a lot of people at ICL know that, they may not know that my main motivation in going to Chicago was to live in and volunteer with a community called the Good News Partners that helps homeless people get back on their feet. I have continued a relationship with that organization since I left and I hope that some of the kids I worked with on a one-to-one basis will achieve their dream of going to college and escaping the cycle of poverty their families are trapped in.

Recent Papers

Nelson, J., “Analyzing PAPI Performance on Virtual Machines,” ICL Technical Report, no. ICL-UT-13-02, August 2013. (437.37 KB)
Kurzak, J., P. Luszczek, and J. Dongarra, “LU Factorization with Partial Pivoting for a Multicore System with Accelerators,” IEEE Transactions on Parallel and Distributed Computing, vol. 24, issue 8, pp. 1613-1621, August 2013. DOI: http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.242 (1.08 MB)
Bouteiller, A., F. Cappello, J. Dongarra, A. Guermouche, T. Herault, and Y. Robert, “Multi-criteria Checkpointing Strategies: Response-Time versus Resource Utilization,” Euro-Par 2013, Aachen, Germany, Springer, August 2013. (431.84 KB)
Turchenko, V., G. Bosilca, A. Bouteiller, and J. Dongarra, “Efficient Parallelization of Batch Pattern Training Algorithm on Many-core and Cluster Architectures,” 7th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems, Berlin, Germany, September 2013. (102.51 KB)
Dongarra, J., M. Gates, A. Haidar, Y. Jia, K. Kabir, P. Luszczek, and S. Tomov, “Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi,” PPAM 2013, Warsaw, Poland, September 2013. (284.97 KB)
Mattson, T., D. Bader, J. Berry, A. Buluc, J. Dongarra, C. Faloutsos, J. Feo, J. Gilbert, J. Gonzalez, B. Hendrickson, et al., “Standards for Graph Algorithm Primitives,” 17th IEEE High Performance Extreme Computing Conference (HPEC '13), Waltham, MA, IEEE, September 2013. DOI: 10.1109/HPEC.2013.6670338 (108.86 KB)

Recent Lunch Talks

AUG
23
Michela Taufer
University of Delaware
On the effectiveness of application-aware self-management for scientific discovery in volunteer computing systems PDF
AUG
30
Jeff Larkin
NVIDIA
OpenACC 2.0 Highlights PDF
SEP
6
Yves Robert
On the Combination of Silent Error Detection and Checkpointing PDF
SEP
13
Jakub Kurzak
Parallel Ultra Light Systolic Array Runtime PDF
SEP
20
Piotr Luszczek
Energy and Power Consumption Trends PDF
SEP
27
Dan Terpstra
Small Scale Water Treatment in Developing Countries PDF

Upcoming Lunch Talks

OCT
4
Aurelien Bouteiller
Multi-criteria Checkpointing Strategies: Response-time vs Resource Utilization PDF
OCT
11
Alisa Meador
UTK Center for International Education
International SOS
OCT
18
Rainer Keller
University of Stuttgart
Tools for High-Performance Computing PDF
OCT
25
Mark Dean
EECS
Technology Trends and Innovation Opportunities PDF

congratulations

Fengguang Song

fengguang_song ICL alum Fengguang Song recently took a position as an Assistant Professor at Indiana University-Purdue University Indianapolis’s Department of Computer and Information Science.

Fengguang will be working with IUPUI’s scientific computing group at the university’s recently established School of Science Institute for Mathematical Modeling and Computational Science (IMMCS). Congratulations Fengguang!

September 2013