News and Announcements

LANL’s Roadrunner Decommissioned

Roadrunner, the first supercomputer to achieve a sustained performance of over 1 Petaflop/s using the LINPACK benchmark, in 2008, was decommissioned on March 31, 2013. Built by IBM and housed at the Los Alamos National Laboratory in Los Alamos, New Mexico, Roadrunner was configured using commercially available parts and maintained the number one spot on the TOP500 in 2008 and into 2009 before being bested by ORNL’s Jaguar.

Breaking the Petaflop/s barrier was a significant achievement for DOE and IBM, but aside from the raw speed, Roadrunner was also one of the most energy efficient machines of the era, achieving 437.43 Megaflop/s per Watt and attaining the number 3 position on the Green500 in June 2008.

Roadrunner is also known for being the first major supercomputer to use a hybrid hardware configuration. The machine had 6,563 dual core AMD Opteron CPUs, with each core linked to a special graphics processor (PowerXCell 8i) called a “Cell.” The Cell was an enhanced version of a specialized processor originally designed for the Sony Playstation 3, adapted specifically to support scientific computing.

Over the next couple of months, scientists at LANL will run experiments on operating system memory compression techniques and optimized data routing before dismantling the machine completely and disposing of the parts (which may still contain sensitive data) in industrial shredders.

Clint Whaley at UTK

ICL alum and frequent collaborator Clint Whaley recently interviewed for a position with UTK’s Electrical Engineering and Computer Science Department. As part of the interview process, Clint gave a talk called Automated Empirical Optimization of Software (abstract below). We wish him good luck in obtaining the position, and look forward to the possibility of more close collaborations in the near future!

In AEOS (Automated Empirical Optimization of Software), an automated suite of searches are combined with context-sensitive timers and various methods of performing code transformations to auto-adapt high performance kernels to hardware evolving at the frantic pace dictated by Moore’s Law. The author’s widely used ATLAS (Automatically Tuned Linear Algebra Software) was one of the pioneering packages that made AEOS the state-of-the-art way to produce and maintain HPC kernels. This talk outlines our approach to this critical area of investigation, the types of research that are required to advance the field, and future plans.

Big Data and Extreme-scale Computing

ICL’s Jack Dongarra is teaming up with Pete Beckman, Jean-Yves Berthou, Yutaka Ishikawa, Satoshi Matsuoka, and Philippe Ricoux to host a series of 2-day workshops designed to garner help from the international community for planning and building a partnership that can provide the next generation of HPC software necessary to support big data and extreme computing, which are essential for aiding future scientific discovery.

The NSF workshop on Big Data and Extreme-scale Computing (BDEC) is premised on the idea that we must begin to systematically map out and account for the ways in which the major issues associated with Big Data intersect with, impinge upon, and potentially change the national (and international) plans that are now being laid for achieving exascale computing.

The first workshop in the series–which is invitation only and sponsored by the NSF–will be held in Charleston, South Carolina on April 30th – May 1st, with a reception on April 29th, at the Renaissance Charleston Historic District Hotel.

SC13 Due Dates

It’s that time of year again! The due dates for abstracts and full submissions for this year’s Supercomputing Conference are right around the corner, so plan your work accordingly.

  • April 1, 2013 – Full Submissions due for:
    • Tutorial Proposals
  • April 12, 2013 – Full Submissions due for:
    • Student Cluster Competition
  • April 19, 2013 – Abstracts due for:
    • Technical Papers
  • April 26, 2013 – Full submissions due for:
    • Technical Papers
    • ACM Gordon Bell Prize
    • Panel Proposals

A full list of due dates can be found here.

Conference Reports

GPU Technology Conference

On March 18th – 21st, ICL’s Stan Tomov traveled to San Diego, California for NVIDIA’s GPU Technology Conference (GTC). GTC is comprised of an international portfolio of workshops and events that support the educational and networking needs of the worldwide parallel computing community, including scientists, graphic artists, designers, researchers, engineers, and IT managers, who rely on GPUs to tackle enormous computational challenges.

NVIDIA’s Jen-Hsun Huang delivered the keynote address for GTC 2013, where he discussed, among many other things, NVIDIA’s GPU roadmap and gave some insight into the upcoming GPU architectures Maxwell (2014) and Volta. The Maxwell architecture will feature unified virtual memory between the CPU and GPU, thus making programming simpler. Following Maxwell, the Volta architecture will feature a new technology called stacked DRAM, which it will use to achieve a proposed one terabyte per second of bandwidth in an effort to overcome GPU memory bandwidth bottlenecks.

For ICL’s contribution, Stan Tomov and ICL alum Hatem Ltaief gave a talk about the latest developments in the MAGMA library on Tuesday, March 19th. Stan gave another talk on Thursday, March 21st, on behalf of the PAPI team called “Using the CUDA Profiling API and Related Third Party Tools.” In all, there were 350 sessions at the conference, which drew ~3,000 attendees from 50 countries. For a complete rundown of the keynote, visit the NVIDIA GTC 2013 blog.

Recent Releases

MAGMA MIC 1.0 Beta for Intel Xeon Phi

MAGMA MIC 1.0 Beta is now available. This release provides implementations for MAGMA‘s one-sided (LU, QR, and Cholesky) and two-sided (Hessenberg, bi- and tridiagonal reductions) dense matrix factorizations for Intel Xeon Phi Coprocessors. More information on the approach is given in this presentation.

The MAGMA MIC 1.0 Beta release adds the following new functionalities:

  • Added multiple MIC LU factorization (routines {z|c|d|s}getrf_mmic)
  • Added multiple MIC QR factorization (routines {z|c|d|s}geqrf_mmic)
  • Added multiple MIC Cholesky factorization (routines {z|c|d|s}potrf_mmic)
  • Performance improvements for the single MIC LU, QR, and Cholesky factorizations
  • Added LU factorization in CPU interface
  • Added mixed-precision iterative refinement LU solver (with CPU and MIC interfaces)
  • Added reduction to band diagonal for Hermitian/symmetric matrices (routines {z|c|d|s}hetrd_he2hb)
  • Added Hessenberg reduction algorithm ({z|c|d|s}gehrd)
  • Added reduction to tridiagonal for Hermitian/symmetric matrices (routines {zhe|che|dsy|ssy}trd)
  • Added reduction to bidiagonal (routines {z|c|d|s}gebrd)
  • Added {zun|cun|dor|sor}gqr
  • Added {zun|cun|dor|sor}ghr
  • Added {zun|cun|dor|sor}mqr_mic
  • Added GEMV benchmark to test MIC’s bandwidth.

Visit the MAGMA software page to download the tarball.

Interview

Gabriel Marin Then

Gabriel Marin

Where are you from, originally?

I was born in and grew up in Romania. I moved to the US after finishing my undergraduate studies in Bucharest.

Can you summarize your educational background?

I received a Bachelor of Science degree in Computer Science from the “Politehnica” Univeristy of Bucharest in 1998. During the following year I took my GRE and TOEFL exams and I applied for several graduate programs in the US. In the fall of 1999 I moved to Houston to start my graduate studies at Rice University. At Rice, I initially joined the Systems group in the Computer Science Department, where I worked briefly on scheduling policies for web servers while also taking classes. In 2001, I joined the Parallel Compilers group, where I worked with Prof. John Mellor-Crummey on performance analysis tools, and on cross-architecture performance modeling techniques. I received an MSc degree in May 2003, and a PhD degree in Janury 2008, both from Rice.

Where did you work before joining ICL?

In the year after completing my undergraduate degree and before starting my graduate studies in the US, I worked as a software enginner at the Mediafax news agency in Bucharest, and as a Laboratory instructor at my alma mater university. After finishing my PhD, I continued my work on performance modeling as a post-doctoral research associate at Rice University until April 2009. At that time I moved to Knoxville and I joined the Oak Ridge National Laboratory as a Research Staff member. In February 2013, I accepted a positon at ICL.

Tell us how you first learned about ICL.

I met Jack for the first time in 2004, during an all-hands meeting for the GrADS project, a meeting that took place in Knoxville. At that time I did not really know anything about ICL, or that I would eventually accept a position here. Later, in 2007 and 2008, I interacted with Shirley and Heike as part of application engagement activities for the PERI project. I became more familiar with ICL in 2009, after I started working at ORNL, when we collaborated on the Blackjack project.

What made you want to work for ICL?

My work has revolved around high performance computing since I was a graduate student. After working for three and a half years at ORNL, I felt that a research position in an academic environment would be better suited for me. ICL’s work on linear algebra and software packages, such as PAPI, is well known in the HPC community. Therefore, when Dan informed me of an open position with the performance group at ICL, I was happy to take advantage of this opportunity.

What are your interests/hobbies outside work?

I like East Tennessee particularly for its beautiful landscapes. I like to go hiking in the Great Smoky Mountains National Park when I have free time. I would like to eventually hike all 800 or so miles of maintained trails in the park. In addition to enjoying the nature, one can still find traces of old settler homesites, old cemeteries, and old firetowers, in many places in the park, sometimes off the beaten path.

Tell us something about yourself that might surprise people.

I used to do competitive figure skating in my youth, and I won several national-level competitions at my age group. Hopefully, it surprises at least some people.

What will you be working on while at ICL?

While at ICL, I will be working with the performance group. My work is focused primarily on developing techniques and tools for performance and power modeling, and on performance diagnosis.

If you weren’t working at ICL, where would you like to be working and why?

I do not have a favorite place where I would like to work. To answer this question from a different perspective, I think that any future major technical breakthroughs will come from new advances in physics, chemistry, or material sciences. So, I would like for my work, be it at ICL or somewhere else, to contribute even a little to future advances in physical sciences.

Recent Papers

  1. Danalis, A., P. Luszczek, G. Marin, J. Vetter, and J. Dongarra, BlackjackBench: Portable Hardware Characterization with Automated Results Analysis,” The Computer Journal, March 2013. DOI: 10.1093/comjnl/bxt057  (408.45 KB)
  2. Cao, C., J. Dongarra, P. Du, M. Gates, P. Luszczek, and S. Tomov, clMAGMA: High Performance Dense Linear Algebra with OpenCL,” University of Tennessee Technical Report (Lawn 275), no. UT-CS-13-706: University of Tennessee, March 2013.  (526.6 KB)
  3. Bouteiller, A., T. Herault, G. Bosilca, and J. Dongarra, Correlated Set Coordination in Fault Tolerant Message Logging Protocols,” Concurrency and Computation: Practice and Experience, vol. 25, issue 4, pp. 572-585, March 2013. DOI: 10.1002/cpe.2859  (636.68 KB)
  4. Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, P. Luszczek, and J. Dongarra, Dense Linear Algebra on Distributed Heterogeneous Hardware with a Symbolic DAG Approach,” Scalable Computing and Communications: Theory and Practice: John Wiley & Sons, pp. 699-735, March 2013.  (1.01 MB)
  5. Weaver, V., D. Terpstra, and S. Moore, Non-Determinism and Overcount on Modern Hardware Performance Counter Implementations,” 2013 IEEE International Symposium on Performance Analysis of Systems and Software, Austin, TX, IEEE, April 2013.  (307.24 KB)
  6. Weaver, V., D. Terpstra, H. McCraw, M. Johnson, K. Kasichayanula, J. Ralph, J. Nelson, P. Mucci, T. Mohan, and S. Moore, PAPI 5: Measuring Power, Energy, and the Cloud , Austin, TX, 2013 IEEE International Symposium on Performance Analysis of Systems and Software, April 2013.  (78.39 KB)

Recent Lunch Talks

  1. MAR
    1
    Tracy Rafferty
    Tracy Rafferty
    Effort Certifications PDF
  2. MAR
    8
    Ichitaro Yamazaki
    Ichitaro Yamazaki
    Low-rank approximation on a GPU and its integration into an HSS solver PDF
  3. MAR
    15
    Simplice Donfack
    Simplice Donfack
    Dynamically balanced synchronization-avoiding LU factorization with multicore and GPUs PDF
  4. MAR
    22
    Patrick Worley
    Patrick Worley
    ORNL
    Capturing Computer Performance Variability in Production Jobs PDF
  5. MAR
    28
    Julien Langou
    Julien Langou
    University of Colorado at Denver
    Greedy Trees for MPI Reductions PDF
  6. APR
    5
    Yves Robert
    Yves Robert
    Revisiting the double checkpointing algorithm (a.k.a. the buddy algorithm) PDF
  7. APR
    12
    Erlin Yao
    Erlin Yao
    Algorithm-Based Fault Tolerance (ABFT) for Dense Linear Algebra
  8. APR
    19
    Blake Haugen
    Blake Haugen
    Simulation Using Dynamic Schedulers PDF
  9. APR
    26
    Bhanu Rekepalli
    Bhanu Rekepalli
    NICS
    Solving Life Sciences Data Deluge Problems using JICS Resources

Upcoming Lunch Talks

  1. MAY
    3
    Aurelien Bouteiller
    Aurelien Bouteiller
    Making DPLASMA with PaRSEC, the Cookbook PDF
  2. MAY
    10
    Yves Robert
    Yves Robert
    Energy-efficient scheduling PDF
  3. MAY
    17
    Julien Herrmann
    Julien Herrmann
    ENS
    Tree traversals with task-memory affinities on hybrid platforms PDF
  4. MAY
    24
    Piotr Luszczek
    Piotr Luszczek
    Competitive Proposal Writing PDF
  5. MAY
    31
    Volodymyr Turchenko
    Volodymyr Turchenko
    Batch Pattern Parallelization Scheme of NNs on Many-core Architectures PDF

Visitors

  1. Erlin Yao
    Erlin Yao from Institute of Computing Technology, Chinese Academy of Sciences will be visiting from April 1 through April 1.
  2. Joe Thomas
    Joe Thomas from NSHE will be visiting on Friday, April 12. Joe is ICL alum and stopped by for a brief visit.

People

  1. Wesley Bland
    After successfully delivering his PhD dissertation, Wesley Bland will be taking a position at Argonne National Lab where he will be working with the MPICH group. Congratulations and good luck, Wesley!  

Visitors

  1. Erlin Yao
    Erlin Yao from Institute of Computing Technology, Chinese Academy of Sciences will be visiting from April 1 through April 1.
  2. Joe Thomas
    Joe Thomas from NSHE will be visiting on Friday, April 12. Joe is ICL alum and stopped by for a brief visit.