ICL Newsletter

News and Announcements

TOP500 – November 2014

The 44th TOP500 list was released at this year’s Supercomputing Conference in New Orleans, LA. For the 4th consecutive time, China’s Tianhe-2 has remained at the top of the ranking with 33,862.7 petaflop/s on the High Performance LINPACK benchmark. Tianhe-2 has 16,000 nodes, each with two Intel Xeon Ivy Bridge processors and three Xeon Phi coprocessors for a combined total of 3,120,000 computing cores.

As for the rest of the list, the top 5 machines remain unchanged, but there is a new entry at #10, with a 3.57 petaflop/s Cray CS-Storm system installed at an undisclosed U.S. government site. More details on the the 44th entry of the TOP500 are available in the official press release.

Rank	Site	System	Rmax (TFlop/s)
1	National Super Computer Center in Guangzhou China	Tianhe-2 (MilkyWay-2) – TH-IVB-FEP Cluster NUDT	33,862.7
2	DOE/SC/Oak Ridge National Laboratory United States	Titan – Cray XK7 Cray Inc.	17,590.0
3	DOE/NNSA/LLNL United States	Sequoia – BlueGene/Q IBM	17,173.2
4	RIKEN Advanced Institute for Computational Science (AICS) Japan	K computer, SPARC64 VIIIfx Fujitsu	10,510.0
5	DOE/SC/Argonne National Laboratory United States	Mira – BlueGene/Q IBM	8,586.6
See the full list at TOP500.org.

2014 HPCC Awards

Each year, the HPCC Awards competition features contestants who submit performance numbers from the world’s largest supercomputer installations, as well as alternative implementations that use a vast array of parallel programming environments.

This year’s HPCC winners were unveiled by Piotr Luszczek and Jeremy Kepner during a BoF session at SC14. For the Class 1 awards, the Japanese K Computer, currently #4 on the TOP500, took 1st place in the Global HPL and EP-STREAM-Triad (system) benchmarks, while IBM’s Power 775 took 1st place in the Global RandomAccess benchmark. Argonne’s Mira, an IBM BlueGene/Q system, took first place in FFT. Visit the HPCC website to see the full list of the winners.

Conference Reports

SC14

This slideshow requires JavaScript.

This year’s International Conference for High Performance Computing, Networking, Storage and Analysis (SC14) returned to New Orleans, LA on November 16 – 21. ICL had a significant presence at SC14, with faculty, research staff, and students giving talks, presenting papers, and leading BoF sessions.

For the third consecutive year, ICL was active in the University of Tennessee’s SC booth. The booth, which was organized and led by the National Institute for Computational Sciences (NICS), was visually designed with the help of ICL/CITR staff, manned with support from ICL researchers attending SC, and featured the lab’s research projects in the booth’s kiosks.

Several ICL research personnel gave “booth talks” at the UT/NICS booth in addition to their usual conference activities: Jack Dongarra gave a talk on the TOP500, Asim YarKhan gave a talk on recent PAPI developments, George Bosilca presented the latest on distributed computing at extreme scale, and Piotr Luszczek discussed the modern software stack for numerical linear algebra.

As is tradition, the ICLers both past and present who attended SC14 were invited to the Alumni Dinner. This year, the dinner was held at Calcasieu, and there were plenty of conversations shared between old friends and colleagues, as the ideas and drinks flowed freely. In the end, everyone had a good time as they capped off the last major conference of the year.

Recent Releases

SC14 Handouts

The new project handouts from SC14 are available for download in PDF format.

clMAGMA 1.3 Released

clMAGMA 1.3 is now available. clMAGMA is an OpenCL port of the MAGMA library. This release adds the following new functionalities:

clMAGMA is now on Bitbucket;
Performance improvements;
Add mixed-precision iterative refinement solver for SPD matrices. This includes the {zc|ds}posv_gpu.cpp routines and their dependencies;
Add clmagmablas routines using CUDA-to-OpenCL auto-converter
{z|c|d|s}lan{he|sy}, {zc|ds}axpycp, {z|d}lat2{c|s}, {z|d}lag2{c|s}, {c|s}lag2{z|d}, {z|c|d|s}laswp, {z|c|d|s}swap, {z|c|d|s}lacpy, and {z|c|d|s}transpose;
Add Bunch-Kaufman factorization for symmetric indefinite matrices
{z|c|d|s}{he|sy}trf;
Remodel the clMAGMA runtime system;
Support added for Windows and Mac OS.

Visit the MAGMA software page to download the tarball.

MAGMA 1.6 Released

MAGMA 1.6 is now available. This release provides performance improvements and increased functionality. More information is given in the MAGMA 1.6 Quick Reference handout.

Visit the MAGMA software page to download the tarball.

MAGMA MIC 1.3 Released

MAGMA MIC 1.3 is now available. This release provides implementations for MAGMA’s one-sided (LU, QR, and Cholesky) and two-sided (Hessenberg, bi- and tridiagonal reductions) dense matrix factorizations, as well as a linear and eigenproblem solver for Intel Xeon Phi Coprocessors. More information on the approach is given in this presentation.

Visit the MAGMA software page to download the tarball.

PAPI 5.4 Released

PAPI 5.4 is now available. This release provides a new component for the high speed power measurement API for IBM BlueGene/Q (BG/Q), called EMON, to provide access to power and energy data on BG/Q in a transparent fashion. This additional support complements the earlier BGPM components for BG/Q, and enables PAPI users and tool developers to use their PAPI instrumented code, as is, without having to learn a new set of library and instrumentation primitives.

PAPI 5.4 also includes initial support for Applied Micro X-Gene architecture, RAPL (energy measurement) support for Intel Haswell, and support for the IBM POWER8 system when run as a non-virtualized platform ‘PowerNV’. Furthermore, we have extended the RAPL energy measurements via msr-safe, which is a Linux kernel module that allows user access to a whitelisted set of MSRs.

This release also includes several enhancements for the perf_event (core/uncore) components, including support for extended event masks, which adds a number of new masks that enable counting in the user domain, kernel domain, or on a specific CPU.

Additionally, there are also changes to the papi_component_avail utility which now provides a list of PMU names supported by active components. The papi_native_avail utility now supports a more robust “–validate” check on systems with events that require multiple masks to be provided in order to be a valid event (e.g., on Intel SandyBridge EP).

There have been several other bug fixes and enhancements, including:

Updated IBM POWER7, POWER8 presets;
Hardware counter and event count added/fixed for BGPM components;
Reduced overhead of API call PAPI_name_to_code();
Growing list of native events in core/uncore components fixed; and
Cleaned up Intel IvyBridge presets

Visit the PAPI software page to download the tarball.

PULSAR 2.0 Released

PULSAR 2.0 is now available. PULSAR is a complete programming platform for large-scale distributed memory systems with multicore processors and hardware accelerators. PULSAR provides a simple abstraction layer over multithreading, message-passing, and multi-GPU, multi-stream programming. PULSAR offers a general-purpose programming model, suitable for a wide range of scientific and engineering applications.

PULSAR version 2.0 introduces GPU support for NVIDIA GPUs using the CUDA programming system. This 2.0 release also adds multi-GPU, multi-stream execution to PULSAR’s multithreading and message-passing capabilities.

Visit the PULSAR software page to download the tarball.

Recent Papers

Yamazaki, I., S. Tomov, and J. Dongarra, “Deflation Strategies to Improve the Convergence of Communication-Avoiding GMRES,” 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, New Orleans, LA, November 2014. (465.52 KB)
Cao, C., T. Herault, G. Bosilca, and J. Dongarra, “Design for a Soft Error Resilient Dynamic Task-based Runtime,” ICL Technical Report, no. ICL-UT-14-04: University of Tennessee, November 2014. (2.61 MB)
Yamazaki, I., S. Rajamanickam, E. G. Boman, M. Hoemmen, M. A. Heroux, and S. Tomov, “Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC 14), New Orleans, LA, IEEE, November 2014.
Haidar, A., C. Cao, I. Yamazaki, J. Dongarra, M. Gates, P. Luszczek, and S. Tomov, “Performance and Portability with OpenCL for Throughput-Oriented HPC Workloads Across Accelerators, Coprocessors, and Multicore Processors,” 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA '14), New Orleans, LA, IEEE, November 2014. DOI: 10.1109/ScalA.2014.8 (407.5 KB)
Danalis, A., G. Bosilca, A. Bouteiller, T. Herault, and J. Dongarra, “PTG: An Abstraction for Unhindered Parallelism,” International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), New Orleans, LA, IEEE Press, November 2014. (480.05 KB)
Dongarra, J., J. Kurzak, P. Luszczek, and I. Yamazaki, “PULSAR Usersâ Guide, Parallel Ultra-Light Systolic Array Runtime,” University of Tennessee EECS Technical Report, no. UT-EECS-14-733: University of Tennessee, November 2014. (561.56 KB)

Recent Conferences

NOV
6

HPC China Guangzhou, China
Jack

Jack Dongarra
NOV
16-26

SC14 New Orleans, Louisiana
Anthony
Asim
Aurelien
George
Ichitaro
Jack
Jakub
Piotr
Terry
Thomas
Tracy
Wei
Yulu

Anthony Danalis, Asim YarKhan, Aurelien Bouteiller, George Bosilca, Ichitaro Yamazaki, Jack Dongarra, Jakub Kurzak, Piotr Luszczek, Terry Moore, Thomas Herault, Tracy Rafferty, Wei Wu, Yulu Jia
DEC
1

CHPC National Meeting 2014 Kruger National Park, South Africa
Jack

Jack Dongarra
DEC
2

ISP2S2 Kobe, Japan
George

George Bosilca
DEC
8

Neural Information Processing Systems Foundations 2014 Montreal, Quebec
Piotr

Piotr Luszczek
DEC
8

MPI Forum San Jose, California
Aurelien

Aurelien Bouteiller

Upcoming Conferences

JAN
27

Open MPI Developers Meeting Dallas, Texas
George

George Bosilca
JAN
28

BDEC Barcelona, Spain
Jack
Sam
Terry
Tracy

Jack Dongarra, Sam Crawford, Terry Moore, Tracy Rafferty

Recent Lunch Talks

NOV
7
Adrien Remy
LRI
Using Random Butterfly Transformation to Solve Dense Linear Systems Using Accelerators PDF
NOV
14
Chongxiao Cao
Design for a Soft Error Resilient Dynamic Task-based Runtime PDF
DEC
5
Asim YarKhan
Latest Developments in the PAPI Performance Monitoring Library PDF
DEC
12
Ichitaro Yamazaki
Mixed-precision orthogonalization scheme and its case-studies with GPUs

Upcoming Lunch Talks

JAN
8
Tony Hey
The Fourth Paradigm: Data-Intensive Scientific Discovery, Open Science and the Cloud PDF
JAN
16
Emmanuel Jeannot
INRIA
Topology Aware Data Management PDF
JAN
23
George Bosilca
Building Blocks for Resilient Applications PDF

congratulations

Mia-Lynne Haidar

Mia-Lynne Haidar was born to Azzam and Dana Haidar on Novermber 14, 2014 at 8:38pm. Mia-Lynne is 20 inches in length and weighs 7 pounds, 7 ounces. Congratulations to the Haidar family!

Dates to Remember

ICL’s 25th Anniversary Gathering

We are pleased to announce that ICL will be hosting the “25 Years of Innovative Computing Conference” on March 31 – April 2, 2015 in honor of the lab’s 25th year. Mark your calendars!

December 2014

News and Announcements

TOP500 – November 2014

2014 HPCC Awards

Conference Reports

SC14

Recent Releases

SC14 Handouts

clMAGMA 1.3 Released

MAGMA 1.6 Released

MAGMA MIC 1.3 Released

PAPI 5.4 Released

PULSAR 2.0 Released

Recent Papers

Recent Conferences

Upcoming Conferences

Recent Lunch Talks

Upcoming Lunch Talks

congratulations

Mia-Lynne Haidar

Dates to Remember

ICL’s 25th Anniversary Gathering

Archives

PDF Editions