News and Announcements

China Strikes Back: Tianhe-2

During a recent visit to Changsha, China on May 28-29, ICL’s Jack Dongarra was granted access to China’s latest multi-petaflop supercomputer, the Tianhe-2. The machine, which has a theoretical peak of 54.9 petaflop/s, returned a preliminary LINPACK score of 30.6 petaflop/s using only 90% of the machine’s total resources. For comparison, the current TOP500 #1 is DOE’s Titan, which achieved only 17.6 petaflop/s for the November 2012 TOP500 submission.

Developed by China’s National University of Defense Technology (NUDT) and Chinese IT vendor Inspur, the 16,000-node Tianhe-2 features a lot of Intel hardware, including 2 Ivy Bridge sockets and and 3 Xeon Phi boards per node, resulting in a machine with a total of 32,000 CPU sockets, 48,000 Xeon Phi co-processors, and 3,120,000 total cores. As tested, peak power consumption comes in at 17.6 MW, but when configured with the planned liquid cooling system, the machine is estimated to consume 24 MW, peak.

For an extensive summary of this machine, you can download Jack’s entire report here.

Conference Reports

30 Years of Parallel Computing at Argonne

Thirty years ago Argonne National Laboratory (ANL) established the Advanced Computing Research Facility to experiment with parallel computers of various architectures, and launched research projects on parallel algorithms and programming models. On My 14-15, Argonne hosted a symposium to assess progress in scientific computing during the last thirty years, discuss lessons learned, and speculate about future challenges and solutions.

Jack gave the opening talk of the symposium, along with Paul Messina (ANL ALCF) and Rusty Lusk (ANL MCS), called The History of the Argonne Advanced Computing Research Facility. Jack was also a panelist for “The Impact of Parallel Computing on the World.”

Some of the original Advanced Computing Research Facility members attended the symposium and are pictured above, thirty years later.

IPDPS 2013

On May 20-24, ICL’s Jack Dongarra, Jakub Kurzak, and Ichi Yamazaki made their way to the 27th IEEE International Parallel & Distributed Processing Symposium (IPDPS) in Boston, MA. IPDPS is an international forum for engineers and scientists from around the world to present their latest research findings in all aspects of parallel computation. In addition to technical sessions of submitted paper presentations, the meeting offers workshops, tutorials, and commercial presentations and exhibits.

Jack gave the keynote for the Heterogeneity in Computing workshop, called Emerging Heterogeneous Technologies for High Performance Computing. Jakub gave a talk on a Virtual Systolic Array for QR Decomposition during the Numerical Analysis session. Ichi gave several talks and also took home a best paper award for Implementing a Blocked Aasen’s Algorithm with a Dynamic Scheduler on Multicore Architectures. It was a productive meeting overall with around 200 attendees.

Recent Releases

MAGMA MIC 1.0 Released

MAGMA MIC 1.0 is now available. This release provides implementations for MAGMA’s one-sided (LU, QR, and Cholesky) and two-sided (Hessenberg, bi- and tridiagonal reductions) dense matrix factorizations for Intel Xeon Phi Co-processors. More information on the approach is given in this presentation.

The MAGMA MIC 1.0 release adds the following new functionalities:

  • Added multiple MIC LU factorization (routines {z|c|d|s}getrf_mmic)
  • Added multiple MIC QR factorization (routines {z|c|d|s}geqrf_mmic)
  • Added multiple MIC Cholesky factorization (routines {z|c|d|s}potrf_mmic)
  • Performance improvements for the single MIC LU, QR, and Cholesky factorizations
  • Added LU factorization in CPU interface
  • Added mixed-precision iterative refinement LU solver (with CPU and MIC interfaces)
  • Added reduction to band diagonal for Hermitian/symmetric matrices (routines {z|c|d|s}hetrd_he2hb)
  • Added Hessenberg reduction algorithm ({z|c|d|s}gehrd)
  • Added reduction to tridiagonal for Hermitian/symmetric matrices (routines {zhe|che|dsy|ssy}trd)
  • Added reduction to bidiagonal (routines {z|c|d|s}gebrd)
  • Added {zun|cun|dor|sor}gqr
  • Added {zun|cun|dor|sor}ghr
  • Added {zun|cun|dor|sor}mqr_mic
  • Added GEMV benchmark to test MIC’s bandwidth.

Visit the MAGMA software page to download the tarball.

PAPI 5.1.1 Released

PAPI 5.1.1 is now available. This incremental release adds support for Intel IvyBridge EP along with several other fixes listed below. For a full list of detailed updates, consult the change log (included in the tarball).

  • Build fixes for SPARC and IA64
  • Assorted perf_event fixes, including support for perf_event_paranoid = 2
  • CUDA component fixes to eventset ordering
  • ARM support for pthread_mutexes
  • Better overflow support for BG/Q components
  • Tighter logic in the execution of the run_tests.sh script

Visit the PAPI software page to download the PAPI 5.1.1 tarball, and as always, feel free to contact the PAPI team through the mailing list or User Forum if you have any questions or comments about this release.

Interview

Paul Peltz Then

Paul Peltz

You just recently left ICL. Tell us where you are and what you’re doing now.

I’m now working in the Systems and Operations group at the National Institute for Computational Science. There are about 12 of us total who are all working and specializing in different areas. There are people dedicated to infrastructure (puppet, ldap, etc.), large parallel file systems, security, networking, etc. I’ve been assigned to work on Beacon which is the #1 machine on the Green 500. My first assignment was to get the latest Intel Xeon Phi drivers deployed in a stateless NFSRoot environment. This makes it easier to deploy nodes quickly in a cluster. They have the original Beacon cluster, which is only 16 nodes, that I’ve been testing on until it is solid.  At that point we will deploy it to the production Beacon cluster.

In what ways did working at ICL prepare you for what you do now, if at all?

One of the reasons they put me on Beacon is because of my experience working on the Intel Xeon Phis at ICL. More than that though, ICL has taught me the fundamentals of HPC. When I first started with ICL I did mostly general IT work. I would not have been nearly as prepared to do HPC work if it wasn’t for the research that ICL does. Most organizations would not have resources such as NVIDIA GPUs, AMD GPUs, and Intel Xeon Phis to work with and learn with. It was great to work in such a diverse and interesting research group.

You’re one of the earlier ICLers. Tell us about your background and how you got started with the group.

Before working at ICL I was an intern at Philips/Magnavox working on WebTV. They closed their office in West Knoxville, so I was looking for another job. My then fiancée went to high school with Brett Ellis who was the junior admin at the time. The senior admin Judi Talley was about to retire and Brett was taking over the senior position so they were looking for a replacement. I stopped by one day to meet Brett and it just happened to be the day that everyone was giving their summer talks in Ayres Hall. I came by later to meet with Brett and was hired on as a student through the summer and then took over in the junior admin position when Brett took over the senior position in the fall of 1999. In 2005 Brett left to take a position at Myricom and I took over for him as the senior admin.

Enjoying such a long tenure with the group, any interesting stories you would like to share about your time here?

One of my favorite stories from ICL is a prank that was pulled. It doesn’t involve me, but it is still a story I tell every once in a while because it is a classic. Back in Ayres Hall everyone shared an office as there weren’t any single suites there. In one of them was Kevin London, Scott Wells, Nathan Garner, and Paul McMahan. Just about every time Scott Wells would leave the office on a vacation they would play a prank on him. Usually it was pretty simple stuff like seal up his cubicle walls so he couldn’t get in to his cubicle or hook his monitor up to someone else’s computer. One time though Paul McMahan decided to do something a little more subtle. He routed all of Scott’s web traffic through his machine and then replaced certain phrases that would appear on web pages. Scott would always read up on the UT Volunteer’s athletics programs so to get Scott mad he would replace phrases such as “Philip Fulmer” with “Philip Fatty Fulmer” or “Tennessee Vols” with “Overrated Tennessee Vols”.  It took a while for Scott to catch on and he was even mad enough that he was either writing emails to the editors and writers of these articles or at least threatened to do so. He eventually figured it out because he spotted some things like “Lady Overrated Vols”.

You haven’t been gone from ICL long enough to really miss it much, so what do you think you’re going to miss the most?

I have definitely missed the people so far. There are people I’ve been working with for years now that I miss seeing and talking to.

What do you see yourself doing in 10 years?

Hopefully still working in HPC Systems Administration somewhere if not at NICS or ORNL. My wife and I dream to live near London, England at some point so maybe after 10 years we’d look to try and move there. I don’t believe there is much in the way of HPC in London now, but maybe that will change in 10 years.

Tell us something about yourself that might surprise some people.

I have no idea what you people are talking about in your Friday lunch talks.  Well, that may not be a surprise, but it sure is true! Oh, and people that I recruit to go play soccer with me inevitably get hurt…bad.

Recent Papers

  1. Dongarra, J., M. Faverge, T. Herault, M. Jacquelin, J. Langou, and Y. Robert, Hierarchical QR Factorization Algorithms for Multi-core Cluster Systems,” Parallel Computing, vol. 39, issue 4-5, pp. 212-232, May 2013.  (1.43 MB)
  2. Aupy, G., M. Faverge, Y. Robert, J. Kurzak, P. Luszczek, and J. Dongarra, Implementing a systolic algorithm for QR factorization on multicore clusters with PaRSEC,” Lawn 277, no. UT-CS-13-709, May 2013.  (298.63 KB)
  3. Dongarra, J., T. Herault, and Y. Robert, Revisiting the Double Checkpointing Algorithm,” 15th Workshop on Advances in Parallel and Distributed Computational Models, at the IEEE International Parallel & Distributed Processing Symposium, Boston, MA, May 2013.  (591.1 KB)
  4. Yamazaki, I., T. Dong, S. Tomov, and J. Dongarra, Tridiagonalization of a Symmetric Dense Matrix on a GPU Cluster,” The Third International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), May 2013.
  5. Kurzak, J., P. Luszczek, M. Gates, I. Yamazaki, and J. Dongarra, Virtual Systolic Array for QR Decomposition,” 15th Workshop on Advances in Parallel and Distributed Computational Models, IEEE International Parallel & Distributed Processing Symposium (IPDPS 2013), Boston, MA, IEEE, May 2013. DOI: 10.1109/IPDPS.2013.119  (749.84 KB)
  6. Wang, Y., M. Baboulin, J. Falcou, Y. Fraigneau, and O. Le Maître, A Parallel Solver for Incompressible Fluid Flows,” International Conference on Computational Science (ICCS 2013), Barcelona, Spain, Elsevier B.V., June 2013. DOI: DOI: 10.1016/j.procs.2013.05.207  (588.79 KB)
  7. McCraw, H., D. Terpstra, J. Dongarra, K. Davis, and R. Musselman, Beyond the CPU: Hardware Performance Counter Monitoring on Blue Gene/Q,” International Supercomputing Conference 2013 (ISC'13), Leipzig, Germany, Springer, June 2013.  (624.58 KB)
  8. Marin, G., C. McCurdy, and J. Vetter, Diagnosis and Optimization of Application Prefetching Performance,” Proceedings of the 27th ACM International Conference on Supercomputing (ICS '13), Eugene, Oregon, USA, ACM Press, June 2013. DOI: 10.1145/2464996.2465014  (827.31 KB)
  9. Li, Y., A. YarKhan, J. Dongarra, K. Seymour, and A. Hurault, Enabling Workflows in GridSolve: Request Sequencing and Service Trading,” Journal of Supercomputing, vol. 64, issue 3, pp. 1133-1152, June 2013. DOI: 10.1007/s11227-010-0549-1  (821.29 KB)
  10. Haidar, A., S. Tomov, J. Dongarra, R. Solcà, and T. C. Schulthess, Leading Edge Hybrid Multi-GPU Algorithms for Generalized Eigenproblems in Electronic Structure Calculations,” International Supercomputing Conference (ISC), Lecture Notes in Computer Science, vol. 7905, Leipzig, Germany, Springer Berlin Heidelberg, pp. 67-80, June 2013. DOI: 10.1007/978-3-642-38750-0_6  (2.14 MB)
  11. Aupy, G., A. Benoit, T. Herault, Y. Robert, F. Vivien, and D. Zaidouni, On the Combination of Silent Error Detection and Checkpointing,” UT-CS-13-710: University of Tennessee Computer Science Technical Report, June 2013.  (1.29 MB)
  12. Heroux, M. A., and J. Dongarra, Toward a New Metric for Ranking High Performance Computing Systems,” SAND2013 - 4744, June 2013.  (225.32 KB)
  13. Haidar, A., M. Gates, S. Tomov, and J. Dongarra, Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication,” Proceedings of the 27th ACM International Conference on Supercomputing (ICS '13), Eugene, Oregon, USA, ACM Press, June 2013. DOI: 10.1145/2464996.2465438  (1.27 MB)
  14. Jia, Y., P. Luszczek, and J. Dongarra, Transient Error Resilient Hessenberg Reduction on GPU-based Hybrid Architectures,” UT-CS-13-712: University of Tennessee Computer Science Technical Report, June 2013.  (206.42 KB)

Recent Lunch Talks

  1. MAY
    3
    Aurelien Bouteiller
    Aurelien Bouteiller
    Making DPLASMA with PaRSEC, the Cookbook PDF
  2. MAY
    10
    Yves Robert
    Yves Robert
    Energy-efficient scheduling PDF
  3. MAY
    17
    Julien Herrmann
    Julien Herrmann
    ENS
    Tree traversals with task-memory affinities on hybrid platforms PDF
  4. MAY
    24
    Piotr Luszczek
    Piotr Luszczek
    Competitive Proposal Writing PDF
  5. MAY
    31
    Volodymyr Turchenko
    Volodymyr Turchenko
    Batch Pattern Parallelization Scheme of NNs on Many-core Architectures PDF
  6. JUN
    7
    Vincent C. Betro
    Vincent C. Betro
    NICS
    Performance of the fusion code GYRO on four generations of Cray Computers PDF
  7. JUN
    14
    Dong Li
    Dong Li
    ORNL
    Toward Reliable and Power Efficient Exascale Systemsf PDF
  8. JUN
    21
    Hartwig Anzt
    Hartwig Anzt
    Energy Efficiency on Emerging Hardware PDF
  9. JUN
    28
    Haihang You
    Haihang You
    NICS
    Optimizing utilization across XSEDE resources

Upcoming Lunch Talks

  1. JUL
    19
    Anthony Danalis
    Anthony Danalis
    Creating a new operation with DPLASMA: a step by step guide PDF

People

  1. Hartwig Anzt
    Hartwig Anzt will be joining ICL on June 3rd to work with the Linear Algebra group. Welcome aboard, Hartwig!
  2. Volodymyr Turchenko
    Volodymyr Turchenko, a visiting Fulbright Scholar from Ternopil, Ukraine, has been in Knoxville collaborating with ICL researchers for nine months, but will return to his home institution at the beginning of June. We enjoyed having Vlad at the lab and look forward to future collaborations!

congratulations

Congratulations ICL Grads!

This spring, three ICLers received advanced degrees from the University of Tennessee, Knoxville. Vijay Joshi earned his MS, and Asim YarKhan and Wesley Bland (not pictured) earned their PhDs. Congratulations!