News and Announcements
TOP500 – Tianhe-2 is Number 1

Presented at the 2013 International Supercomputing Conference (ISC13) on June 17 in Leipzig, Germany, the 41st edition of the TOP500 list confirmed that China’s Tianhe-2 had muscled its way to the number 1 spot on the TOP500 with an astounding 33.86 petaflop/s on the LINPACK benchmark.
Tianhe-2, or Milky Way-2, will be deployed at the National Supercomputer Center in Guangzho, China, by the end of the year. The surprise appearance of Tianhe-2, two years ahead of the expected deployment, marks China’s first return to the number 1 position since November 2010, when Tianhe-1A was the top system. Tianhe-2 has 16,000 nodes, each with two Intel Xeon IvyBridge processors and three Xeon Phi coprocessors for a combined total of 3,120,000 computing cores.
Check out the podcast below from InsideHPC to hear Jack Dongarra discuss the new TOP500 list and give his thoughts on China’s latest efforts in HPC: [audio:http://icl.utk.edu/newsletter/files/2013-07/audio/top500_podcast.mp3]
| Rank | Site | System | Rmax (TFlop/s) |
|---|---|---|---|
|
1 |
33,862.7 |
||
|
2 |
DOE/SC/Oak Ridge National Laboratory |
Titan – Cray XK7 |
17,590.0 |
|
3 |
DOE/NNSA/LLNL |
17,173.2 |
|
|
4 |
RIKEN Advanced Institute for Computational Science (AICS) |
K computer, SPARC64 VIIIfx |
10,510.0 |
|
5 |
DOE/SC/Argonne National Laboratory |
8,586.6 |
|
| See the full list at TOP500.org. | |||
PaRSEC and BEAST Funded
ICL recently received major funding for two projects: DOE awarded funding for PaRSEC and the NSF funded BEAST. Congratulations to all those involved!
PaRSEC
The Parallel Runtime Scheduling and Execution Controller (PaRSEC) is a generic framework for architecture aware scheduling and management of micro-tasks on distributed many-core heterogeneous architectures. Applications we consider can be expressed as a Direct Acyclic Graph of tasks with labeled edges designating data dependencies. DAGs are represented in a compact problem-size independent format that can be queried on-demand to discover data dependencies in a totally distributed fashion.
PaRSEC assigns computation threads to the cores, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on architectural features such as NUMA nodes and algorithmic features such as data reuse. The framework includes libraries, a runtime system, and development tools to help application developers tackle the difficult task of porting their applications to highly heterogeneous and diverse environments.
BEAST
The Bench-testing Environment for Automated Software Tuning (BEAST) creates a framework for exploring and optimizing the performance of computational kernels on hybrid processors that 1) applies to a diverse range of computational kernels, 2) (semi)automatically generates better performing implementations on various hybrid processor architectures, and 3) increases developer insight into why given kernel/processor combinations have the performance profiles they do.
To achieve this three-fold goal, it applies the model used for traditional application benchmarking in a completely novel way: it combines an abstract kernel specification and corresponding verification test, similar to standard benchmarking, with an automated testing engine and data analysis and machine learning tools, called the BEAST workbench.
Conference Reports
International Supercomputing Conference
The 2013 International Supercomputing Conference (ISC13) held its 28th meeting on June 22 – 26 in Leipzig, Germany. This year’s conference drew 2,423 attendees from 47 nations as well as 153 leading high performance computing vendors and research organizations from around the world.
ICL was well represented with Jack Dongarra, Jakub Kurzak, and Heike McCraw, along with ICL alumus Hatem Ltaief. Jack was chair of the Application of Supercomputing session, gave a talk about the new rankings at the TOP500 session, and gave two presentations: Toward a New (Another) Metric for Ranking High Performance Computing Systems and Critical Issues at Exascale for Algorithm & Software Design.
Jakub, Hatem, and Jack presented a tutorial on dense linear algebra software, including PLASMA/DPLASMA, QUARK, PaRSEC, and MAGMA, and Heike presented her paper, Beyond the CPU: Hardware Performance Counter Monitoring on Blue Gene/Q.
International Conference on Supercomputing
This year’s International Conference on Supercomputing was held on June 10 – 14 in Eugene, Oregon, and marks the 27th meeting in the series. Quite a few ICLers attended the conference, including Piotr Luszczek, Aurelien Bouteiller, Thomas Herault, Mark Gates, and Gabriel Marin, as well as ICL alumnus and frequent collaborator Yves Robert.
Piotr and Aurelien hosted a tutorial called DLA on Multicore with Accelerators where the main objective was to show specific methods and implementations that deal with portability and scalability of high performance codes. Thomas and Yves gave a tutorial called An overview of fault-tolerant techniques for HPC where, as the name suggests, the objective was to present a comprehensive survey of techniques to deal with failures in high performance systems.
Mark Gates presented a paper, Toward a Scalable Multi-GPU Eigensolver via Compute-intensive Kernels and Efficient Communication, and Gabriel Marin presented his paper, Diagnosis and Optimization of Application Prefetching Performance.
Recent Releases
MAGMA 1.4 Beta 2 Released
MAGMA 1.4 Beta is now available. This release provides performance improvements and support for the new NVIDIA Kepler GPUs. More information is given in the MAGMA: a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures SC12 presentation. The MAGMA 1.4 Beta release adds the following new functionalities:
- Merge libmagmablas into libmagma to eliminate circular dependencies. Link with just -lmagma now;
- Add multi-GPU Hessenberg and non-symmetric eigenvalue routines: geev_m, gehrd_m, unghr_m, ungqr_m;
- Fix required workspace size in gels_gpu, gels3_gpu, geqrs_gpu, geqrs3_gpu;
- Fix required workspace size in [zcsd]geqrf;
- Add macro USE_INT64 to compile with int being 64-bit. See make.inc.int64;
- Add panel factorizations for LU, QR, and Cholesky entirely on the GPU, correspondingly in [zcsd]getf2_gpu, [zcsd]geqr2_gpu, and [zcsd]potf2_gpu;
- Add QR with pivoting in GPU interface (functions [zcsd]geqp3_gpu), and improve the performance for both CPU and GPU interface QRs with pivoting;
- Add multi-GPU symmetric eigenvalue routines (one-stage):
- [zhe|che|ssy|dsy]trd_mgpu, [zhe|che|ssy|dsy]evd_m, [zhe|che|ssy|dsy]evdx_m, [zhe|che|ssy|dsy]gvd_m, [zhe|che|ssy|dsy]gvdx_m;
- Add single and multi-GPU symmetric eigenvalue routines (two-stage):
- [zhe|che|ssy|dsy]evdx_2stage, [zhe|che|ssy|dsy]gvdx_2stage, [zhe|che|ssy|dsy]evdx_2stage_m, [zhe|che|ssy|dsy]gvdx_2stage_m.
Visit the MAGMA software page to download the tarball.
Interview

Khairul Kabir
Where are you from, originally?
I am from Bangladesh, a beautiful small country in South Asia.
Can you summarize your educational background?
I earned my bachelor’s degree in Computer Science and Engineering (CSE) from Bangladesh University of Engineering and Technology (BUET), and earned my master’s in Computer Science from the University of Tennessee, Knoxville (UTK). I am currently pursuing a PhD at UTK with Dr. Dongarra as my advisor.
Tell us how you first learned about ICL.
When I applied for admission at UTK, I learned about ICL from the university website. Later, I heard about ICL from one of my friends from Bangladesh who worked for ICL for couple of years. I got all the details about ICL from him, i.e., the research projects at ICL and its working environment, help from fellow researchers and students available in the lab, his and other ICL people’s work, the publications, ICL staff, etc.
What made you want to work for ICL?
Dr. Jack Dongarra and ICL are two famous names in the HPC arena. ICL not only provides leading edge tools for high performance computing problems, but also plays a major role in the development of standards for scientific computing. As I am interested in HPC, working with Jack Dongarra and ICL seemed like the obvious choice for someone given the opportunity.
What are you working on while at ICL?
Currently I am working on the MAGMA project which aims to develop a dense linear algebra library for heterogeneous/hybrid architecture. We focus on developing a dense linear algebra library for “Multicore + Xeon Phi Coprocessor”. We put our efforts in designing a task based algorithm for this heterogeneous system that can extract the best performance out of it.
We have already released a linear algebra library for one sided factorization for both single and multiple Xeon Phi coprocessors. Now we are moving towards two sided factorization. Since MAGMA already has a multicore + GPU implementation, it helped us to develop the library for multicore + Xeon Phi.
If you weren’t working at ICL, where would you like to be working and why?
If I wasn’t working at ICL, I would like to be working at Intel with the MKL team as they are building a world class library for linear algebra that is used by many people in the HPC community as a point of reference. Recently, they have inculed a heterogeneous (multicore + Xeon Phi) implementation of dense linear algebra named, “ao – automatic offload” in their package, which is similar to our work.
What are your interests/hobbies outside of work?
Outside of my work I like to watch movies, hang out with my friends & family, and travel to beautiful places.
Tell us something about yourself that might surprise people.
I am an ordinary guy, maybe that will surprise some people. Haha!








