News and Announcements

TOP500 – November 2014

The 44th TOP500 list was released at this year’s Supercomputing Conference in New Orleans, LA. For the 4th consecutive time, China’s Tianhe-2 has remained at the top of the ranking with 33,862.7 petaflop/s on the High Performance LINPACK benchmark. Tianhe-2 has 16,000 nodes, each with two Intel Xeon Ivy Bridge processors and three Xeon Phi coprocessors for a combined total of 3,120,000 computing cores.

As for the rest of the list, the top 5 machines remain unchanged, but there is a new entry at #10, with a 3.57 petaflop/s Cray CS-Storm system installed at an undisclosed U.S. government site. More details on the the 44th entry of the TOP500 are available in the official press release.

Rank Site System Rmax (TFlop/s)

1

National Super Computer Center in Guangzhou
China

Tianhe-2 (MilkyWay-2) – TH-IVB-FEP Cluster
NUDT

33,862.7

2

DOE/SC/Oak Ridge National Laboratory
United States

Titan – Cray XK7
Cray Inc.

17,590.0

3

DOE/NNSA/LLNL
United States

Sequoia – BlueGene/Q
IBM

17,173.2

4

RIKEN Advanced Institute for Computational Science (AICS)
Japan

K computer, SPARC64 VIIIfx
Fujitsu

10,510.0

5

DOE/SC/Argonne National Laboratory
United States

Mira – BlueGene/Q
IBM

8,586.6

See the full list at TOP500.org.

2014 HPCC Awards


Each year, the HPCC Awards competition features contestants who submit performance numbers from the world’s largest supercomputer installations, as well as alternative implementations that use a vast array of parallel programming environments.

This year’s HPCC winners were unveiled by Piotr Luszczek and Jeremy Kepner during a BoF session at SC14. For the Class 1 awards, the Japanese K Computer, currently #4 on the TOP500, took 1st place in the Global HPL and EP-STREAM-Triad (system) benchmarks, while IBM’s Power 775 took 1st place in the Global RandomAccess benchmark. Argonne’s Mira, an IBM BlueGene/Q system, took first place in FFT. Visit the HPCC website to see the full list of the winners.

Conference Reports

SC14

This slideshow requires JavaScript.

This year’s International Conference for High Performance Computing, Networking, Storage and Analysis (SC14) returned to New Orleans, LA on November 16 – 21. ICL had a significant presence at SC14, with faculty, research staff, and students giving talks, presenting papers, and leading BoF sessions.

For the third consecutive year, ICL was active in the University of Tennessee’s SC booth. The booth, which was organized and led by the National Institute for Computational Sciences (NICS), was visually designed with the help of ICL/CITR staff, manned with support from ICL researchers attending SC, and featured the lab’s research projects in the booth’s kiosks.

Several ICL research personnel gave “booth talks” at the UT/NICS booth in addition to their usual conference activities: Jack Dongarra gave a talk on the TOP500, Asim YarKhan gave a talk on recent PAPI developments, George Bosilca presented the latest on distributed computing at extreme scale, and Piotr Luszczek discussed the modern software stack for numerical linear algebra.

As is tradition, the ICLers both past and present who attended SC14 were invited to the Alumni Dinner. This year, the dinner was held at Calcasieu, and there were plenty of conversations shared between old friends and colleagues, as the ideas and drinks flowed freely. In the end, everyone had a good time as they capped off the last major conference of the year.

Recent Releases

SC14 Handouts

The new project handouts from SC14 are available for download in PDF format.

SC14 PULSAR BEAST SC14 DPLASMA draft SC14 FT-LA SC14 HPCC SC14 HPCG SC14 HPL draft SC14 ICL draft SC14 MAGMA SC14 PAPI SC14 PaRSEC SC14 PLASMA SC14 PULSAR SC14 QUARK SC14 ScaLAPACK TOP500-November-2014 SC14 ULFM

clMAGMA 1.3 Released

clMAGMA 1.3 is now available. clMAGMA is an OpenCL port of the MAGMA library. This release adds the following new functionalities:

  • clMAGMA is now on Bitbucket;
  • Performance improvements;
  • Add mixed-precision iterative refinement solver for SPD matrices. This includes the {zc|ds}posv_gpu.cpp routines and their dependencies;
  • Add clmagmablas routines using CUDA-to-OpenCL auto-converter
    {z|c|d|s}lan{he|sy}, {zc|ds}axpycp, {z|d}lat2{c|s}, {z|d}lag2{c|s}, {c|s}lag2{z|d}, {z|c|d|s}laswp, {z|c|d|s}swap, {z|c|d|s}lacpy, and {z|c|d|s}transpose;
  • Add Bunch-Kaufman factorization for symmetric indefinite matrices
    {z|c|d|s}{he|sy}trf;
  • Remodel the clMAGMA runtime system;
  • Support added for Windows and Mac OS.

Visit the MAGMA software page to download the tarball.

MAGMA 1.6 Released

MAGMA 1.6 is now available. This release provides performance improvements and increased functionality. More information is given in the MAGMA 1.6 Quick Reference handout.

MAGMA-sheetVisit the MAGMA software page to download the tarball.

MAGMA MIC 1.3 Released

MAGMA MIC 1.3 is now available. This release provides implementations for MAGMA’s one-sided (LU, QR, and Cholesky) and two-sided (Hessenberg, bi- and tridiagonal reductions) dense matrix factorizations, as well as a linear and eigenproblem solver for Intel Xeon Phi Coprocessors. More information on the approach is given in this presentation.

Visit the MAGMA software page to download the tarball.

PAPI 5.4 Released

PAPI 5.4 is now available. This release provides a new component for the high speed power measurement API for IBM BlueGene/Q (BG/Q), called EMON, to provide access to power and energy data on BG/Q in a transparent fashion. This additional support complements the earlier BGPM components for BG/Q, and enables PAPI users and tool developers to use their PAPI instrumented code, as is, without having to learn a new set of library and instrumentation primitives.

PAPI 5.4 also includes initial support for Applied Micro X-Gene architecture, RAPL (energy measurement) support for Intel Haswell, and support for the IBM POWER8 system when run as a non-virtualized platform ‘PowerNV’.  Furthermore, we have extended the RAPL energy measurements via msr-safe, which is a Linux kernel module that allows user access to a whitelisted set of MSRs.

This release also includes several enhancements for the perf_event (core/uncore) components, including support for extended event masks, which adds a number of new masks that enable counting in the user domain, kernel domain, or on a specific CPU.

Additionally, there are also changes to the papi_component_avail utility which now provides a list of PMU names supported by active components. The papi_native_avail utility now supports a more robust “–validate” check on systems with events that require multiple masks to be provided in order to be a valid event (e.g., on Intel SandyBridge EP).

There have been several other bug fixes and enhancements, including:

  • Updated IBM POWER7, POWER8 presets;
  • Hardware counter and event count added/fixed for BGPM components;
  • Reduced overhead of API call PAPI_name_to_code();
  • Growing list of native events in core/uncore components fixed; and
  • Cleaned up Intel IvyBridge presets

Visit the PAPI software page to download the tarball.

PULSAR 2.0 Released

PULSAR 2.0 is now available. PULSAR is a complete programming platform for large-scale distributed memory systems with multicore processors and hardware accelerators. PULSAR provides a simple abstraction layer over multithreading, message-passing, and multi-GPU, multi-stream programming. PULSAR offers a general-purpose programming model, suitable for a wide range of scientific and engineering applications.

PULSAR version 2.0 introduces GPU support for NVIDIA GPUs using the CUDA programming system. This 2.0 release also adds multi-GPU, multi-stream execution to PULSAR’s multithreading and message-passing capabilities.

Visit the PULSAR software page to download the tarball.

Recent Papers

  1. Yamazaki, I., S. Tomov, and J. Dongarra, Deflation Strategies to Improve the Convergence of Communication-Avoiding GMRES,” 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, New Orleans, LA, November 2014.  (465.52 KB)
  2. Cao, C., T. Herault, G. Bosilca, and J. Dongarra, Design for a Soft Error Resilient Dynamic Task-based Runtime,” ICL Technical Report, no. ICL-UT-14-04: University of Tennessee, November 2014.  (2.61 MB)
  3. Yamazaki, I., S. Rajamanickam, E. G. Boman, M. Hoemmen, M. A. Heroux, and S. Tomov, Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC 14), New Orleans, LA, IEEE, November 2014.
  4. Haidar, A., C. Cao, I. Yamazaki, J. Dongarra, M. Gates, P. Luszczek, and S. Tomov, Performance and Portability with OpenCL for Throughput-Oriented HPC Workloads Across Accelerators, Coprocessors, and Multicore Processors,” 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA '14), New Orleans, LA, IEEE, November 2014.  (407.5 KB)
  5. Danalis, A., G. Bosilca, A. Bouteiller, T. Herault, and J. Dongarra, PTG: An Abstraction for Unhindered Parallelism,” International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), New Orleans, LA, IEEE Press, November 2014.  (480.05 KB)
  6. Dongarra, J., J. Kurzak, P. Luszczek, and I. Yamazaki, PULSAR Users’ Guide, Parallel Ultra-Light Systolic Array Runtime,” University of Tennessee EECS Technical Report, no. UT-EECS-14-733: University of Tennessee, November 2014.  (561.56 KB)

Recent Conferences

  1. NOV
    HPC China Guangzhou, China
    Jack Dongarra
    Jack
    Jack Dongarra
  2. NOV
    -
    SC14 New Orleans, LA
    Anthony Danalis
    Anthony
    Asim YarKhan
    Asim
    Aurelien Bouteiller
    Aurelien
    George Bosilca
    George
    Ichitaro Yamazaki
    Ichitaro
    Jack Dongarra
    Jack
    Jakub Kurzak
    Jakub
    Piotr Luszczek
    Piotr
    Terry Moore
    Terry
    Thomas Herault
    Thomas
    Tracy Rafferty
    Tracy
    Wei Wu
    Wei
    Yulu Jia
    Yulu
    Anthony Danalis, Asim YarKhan, Aurelien Bouteiller, George Bosilca, Ichitaro Yamazaki, Jack Dongarra, Jakub Kurzak, Piotr Luszczek, Terry Moore, Thomas Herault, Tracy Rafferty, Wei Wu, Yulu Jia

Upcoming Conferences

  1. DEC
    CHPC National Meeting 2014 Kruger National Park, South Africa
    Jack Dongarra
    Jack
    Jack Dongarra
  2. DEC
    ISP2S2 Kobe, Japan
    George Bosilca
    George
    George Bosilca
  3. DEC
    Piotr Luszczek
    Piotr
    Piotr Luszczek
  4. DEC
    MPI Forum San Jose, CA
    Aurelien Bouteiller
    Aurelien
    Aurelien Bouteiller

Recent Lunch Talks

  1. NOV
    7
    Adrien Remy
    Adrien Remy from LRI
    Using Random Butterfly Transformation to Solve Dense Linear Systems Using Accelerators PDF
  2. NOV
    14
    Chongxiao Cao
    Chongxiao Cao
    Design for a Soft Error Resilient Dynamic Task-based Runtime PDF

Upcoming Lunch Talks

  1. DEC
    5
    Asim YarKhan
    Asim YarKhan
    Latest Developments in the PAPI Performance Monitoring Library PDF
  2. DEC
    12
    Ichitaro Yamazaki
    Ichitaro Yamazaki
    Mixed-precision orthogonalization scheme and its case-studies with GPUs

congratulations

Mia-Lynne Haidar

Mia-Lynne Haidar was born to Azzam and Dana Haidar on Novermber 14, 2014 at 8:38pm. Mia-Lynne is 20 inches in length and weighs 7 pounds, 7 ounces. Congratulations to the Haidar family!

mia_haidar

Dates to Remember

ICL’s 25th Anniversary Gathering

We are pleased to announce that ICL will be hosting the “25 Years of Innovative Computing Conference” on March 31 – April 2, 2015 in honor of the lab’s 25th year. Mark your calendars!