News and Announcements
TOP500 – November 2014
The 44th TOP500 list was released at this year’s Supercomputing Conference in New Orleans, LA. For the 4th consecutive time, China’s Tianhe-2 has remained at the top of the ranking with 33,862.7 petaflop/s on the High Performance LINPACK benchmark. Tianhe-2 has 16,000 nodes, each with two Intel Xeon Ivy Bridge processors and three Xeon Phi coprocessors for a combined total of 3,120,000 computing cores.
As for the rest of the list, the top 5 machines remain unchanged, but there is a new entry at #10, with a 3.57 petaflop/s Cray CS-Storm system installed at an undisclosed U.S. government site. More details on the the 44th entry of the TOP500 are available in the official press release.
Rank | Site | System | Rmax (TFlop/s) |
---|---|---|---|
1 |
33,862.7 |
||
2 |
DOE/SC/Oak Ridge National Laboratory |
Titan – Cray XK7 |
17,590.0 |
3 |
DOE/NNSA/LLNL |
17,173.2 |
|
4 |
RIKEN Advanced Institute for Computational Science (AICS) |
K computer, SPARC64 VIIIfx |
10,510.0 |
5 |
DOE/SC/Argonne National Laboratory |
8,586.6 |
|
See the full list at TOP500.org. |
2014 HPCC Awards
Each year, the HPCC Awards competition features contestants who submit performance numbers from the world’s largest supercomputer installations, as well as alternative implementations that use a vast array of parallel programming environments.
This year’s HPCC winners were unveiled by Piotr Luszczek and Jeremy Kepner during a BoF session at SC14. For the Class 1 awards, the Japanese K Computer, currently #4 on the TOP500, took 1st place in the Global HPL and EP-STREAM-Triad (system) benchmarks, while IBM’s Power 775 took 1st place in the Global RandomAccess benchmark. Argonne’s Mira, an IBM BlueGene/Q system, took first place in FFT. Visit the HPCC website to see the full list of the winners.
Conference Reports
SC14
This year’s International Conference for High Performance Computing, Networking, Storage and Analysis (SC14) returned to New Orleans, LA on November 16 – 21. ICL had a significant presence at SC14, with faculty, research staff, and students giving talks, presenting papers, and leading BoF sessions.
For the third consecutive year, ICL was active in the University of Tennessee’s SC booth. The booth, which was organized and led by the National Institute for Computational Sciences (NICS), was visually designed with the help of ICL/CITR staff, manned with support from ICL researchers attending SC, and featured the lab’s research projects in the booth’s kiosks.
Several ICL research personnel gave “booth talks” at the UT/NICS booth in addition to their usual conference activities: Jack Dongarra gave a talk on the TOP500, Asim YarKhan gave a talk on recent PAPI developments, George Bosilca presented the latest on distributed computing at extreme scale, and Piotr Luszczek discussed the modern software stack for numerical linear algebra.
As is tradition, the ICLers both past and present who attended SC14 were invited to the Alumni Dinner. This year, the dinner was held at Calcasieu, and there were plenty of conversations shared between old friends and colleagues, as the ideas and drinks flowed freely. In the end, everyone had a good time as they capped off the last major conference of the year.
Recent Releases
clMAGMA 1.3 Released
clMAGMA 1.3 is now available. clMAGMA is an OpenCL port of the MAGMA library. This release adds the following new functionalities:
- clMAGMA is now on Bitbucket;
- Performance improvements;
- Add mixed-precision iterative refinement solver for SPD matrices. This includes the {zc|ds}posv_gpu.cpp routines and their dependencies;
- Add clmagmablas routines using CUDA-to-OpenCL auto-converter
{z|c|d|s}lan{he|sy}, {zc|ds}axpycp, {z|d}lat2{c|s}, {z|d}lag2{c|s}, {c|s}lag2{z|d}, {z|c|d|s}laswp, {z|c|d|s}swap, {z|c|d|s}lacpy, and {z|c|d|s}transpose; - Add Bunch-Kaufman factorization for symmetric indefinite matrices
{z|c|d|s}{he|sy}trf; - Remodel the clMAGMA runtime system;
- Support added for Windows and Mac OS.
Visit the MAGMA software page to download the tarball.
MAGMA 1.6 Released
MAGMA 1.6 is now available. This release provides performance improvements and increased functionality. More information is given in the MAGMA 1.6 Quick Reference handout.
Visit the MAGMA software page to download the tarball.
MAGMA MIC 1.3 Released
MAGMA MIC 1.3 is now available. This release provides implementations for MAGMA’s one-sided (LU, QR, and Cholesky) and two-sided (Hessenberg, bi- and tridiagonal reductions) dense matrix factorizations, as well as a linear and eigenproblem solver for Intel Xeon Phi Coprocessors. More information on the approach is given in this presentation.
Visit the MAGMA software page to download the tarball.
PAPI 5.4 Released
PAPI 5.4 is now available. This release provides a new component for the high speed power measurement API for IBM BlueGene/Q (BG/Q), called EMON, to provide access to power and energy data on BG/Q in a transparent fashion. This additional support complements the earlier BGPM components for BG/Q, and enables PAPI users and tool developers to use their PAPI instrumented code, as is, without having to learn a new set of library and instrumentation primitives.
PAPI 5.4 also includes initial support for Applied Micro X-Gene architecture, RAPL (energy measurement) support for Intel Haswell, and support for the IBM POWER8 system when run as a non-virtualized platform ‘PowerNV’. Furthermore, we have extended the RAPL energy measurements via msr-safe, which is a Linux kernel module that allows user access to a whitelisted set of MSRs.
This release also includes several enhancements for the perf_event (core/uncore) components, including support for extended event masks, which adds a number of new masks that enable counting in the user domain, kernel domain, or on a specific CPU.
Additionally, there are also changes to the papi_component_avail utility which now provides a list of PMU names supported by active components. The papi_native_avail utility now supports a more robust “–validate” check on systems with events that require multiple masks to be provided in order to be a valid event (e.g., on Intel SandyBridge EP).
There have been several other bug fixes and enhancements, including:
- Updated IBM POWER7, POWER8 presets;
- Hardware counter and event count added/fixed for BGPM components;
- Reduced overhead of API call PAPI_name_to_code();
- Growing list of native events in core/uncore components fixed; and
- Cleaned up Intel IvyBridge presets
Visit the PAPI software page to download the tarball.
PULSAR 2.0 Released
PULSAR 2.0 is now available. PULSAR is a complete programming platform for large-scale distributed memory systems with multicore processors and hardware accelerators. PULSAR provides a simple abstraction layer over multithreading, message-passing, and multi-GPU, multi-stream programming. PULSAR offers a general-purpose programming model, suitable for a wide range of scientific and engineering applications.
PULSAR version 2.0 introduces GPU support for NVIDIA GPUs using the CUDA programming system. This 2.0 release also adds multi-GPU, multi-stream execution to PULSAR’s multithreading and message-passing capabilities.
Visit the PULSAR software page to download the tarball.