Challenges of Energy-aware Scientific Computing

Abstract Power provisioning and energy consumption become major challenges in the field of high performance computing. Energy costs over the lifetime of an HPC installation are in the range of the acquisition costs. The quest for Exascale computing has made it clear that addressing the power challenge will require the synergy of several major advances. These will range widely starting from algorithmic design and performance modeling all the way to HPC hardware and data center design. We assembled a speaker list of experts and pioneers in energy aware HPC in an attempt to cover the wide range of needed solutions.

Organizers

Piotr Luszczek, University of Tennessee, Knoxville, USA
Costas Bekas, IBM Research-Zurich, Switzerland

Part I

Friday, March 1
MS75
9:30 AM - 11:30 AM Room: Hancock - Lobby Level

9:30-9:55 Energy Aware Performance Metrics, Costas Bekas and Alessandro Curioni, IBM Research-Zurich, Switzerland
Abstract. Energy aware performance metrics are absolutely necessary in order to better appreciate the performance of algorithms on modern architectures. Although recent advances, such as the FLOPS/WATT metric, gave an important push to the right direction, we will show that deeper investigations are needed, if we are to overcome the power barrier for reaching Exaflop. We will showcase tools that allow accurate, on chip power measurements that shed new light on the energy requirements of important kernels.
10:00-10:25 Application-aware Energy Efficient High Performance Computing Laura Carrington, San Diego Supercomputer Center, USA

Abstract. The energy cost of running an HPC system can exceed the cost of the original hardware purchase. This has driven the community to attempt to understand and minimize energy costs wherever possible. We present an automated framework, Green Queue, for customized application-aware Dynamic Voltage-Frequency Scaling (DVFS) settings to reduce the energy consumption of large scale scientific applications. Green Queue supports making CPU clock frequency changes in response to intra-node and internode observations about application behavior. Our intra-node approach reduces CPU clock frequencies and therefore power consumption while CPUs lacks computational work due to inefficient data movement. Our inter-node approach reduces clock frequencies for MPI ranks that lack computational work. We investigated these techniques on a set of large scientific applications on 1024 cores of Gordon, an Intel Sandybridge based supercomputer at the San Diego Supercomputer Center. Our optimal intra-node technique showed an average measured energy savings of 10.6% and a maximum of 21.0% over regular application runs. Our optimal inter-node technique showed an average 17.4% and a maximum of 31.7% energy savings.
10:30-10:55 A 'Roofline' Model of Energy and What it Implies for Algorithm Design Jee Whan Choi and Richard Vuduc, Georgia Institute of Technology, USA
Abstract. We describe an energy-based analogue of the time-based roofline model of Williams, Waterman, and Patterson (Comm. ACM, 2009). Our goal is to explain---in simple, analytic terms accessible to algorithm designers and performance tuners---how the time, energy, and power to execute an algorithm relate. We confirm the basic form of the model experimentally, and explain what it may imply for algorithm design with respect to power and energy compared to time.
11:00-11:25 Power Bounds and Large Scale Computing, Bronis R. de Supinski, Lawrence Livermore National Laboratory, USA
Abstract. Energy and power are widely recognized as significant challenges for future large scale systems. Current processors, in particular the Intel Sandy Bridge family, already include mechanisms to limit power levels dynamically. However, these mechanisms apply without consideration of the power levels of other nodes, which may be lower and allow a "hot" node to consume more power. This talk will discuss options and techniques for limiting power and energy consumption that better suit large-scale systems. It will also detail initial experiences with Intel's Running Average Power Limit (RAPL) on a large Linux system and discuss possible extensions.

Part II

Friday, March 1
MS253
1:00 PM - 3:00 PM Room: Hancock - Lobby Level

1:00-1:25 Cancelled 1:00-1:25 Energy-Aware Dense and Sparse Linear Algebra Enrique S. Quintana-Ortí, Universidad Jaume I, Spain
1:30-1:55 Locality Aware Scheduling of Sparse Computations for Energy and Performance Efficiencies Padma Raghavan, Pennsylvania State University, USA; Michael Frasca, Microsoft Research, USA
Abstract. We consider the problem of increasing the performance and energy efficiencies of sparse matrix and graph computations on multicore processors. Such systems have complex non-uniform memory access (NUMA) cache and memory that exhibit significant variations in data access latencies. We describe a scheme for fine-grain task scheduling to cores that takes into account the probabilities of hits in cache. We present results indicating that our scheme leads to near ideal speed-ups and large improvements in performance and energy efficiencies compared to traditional methods.
2:00-2:25 Lower Bounds on Algorithm Energy Consumption: Current Work and Future Directions Andrew Gearhart and James W. Demmel, University of California, Berkeley, USA
Abstract. By extending communication lower bounds on algorithms via linear models of machine energy consumption, we have derived theoretical bounds on the minimal amount of energy required to run an algorithm on a given machine. To use these bounds for HW/SW cotuning, efficient parameters for energy models must be calculated. We discuss initial approaches to parameter calculation and further development of energy bounds into broader classes of codes.
2:30-2:55 The Powers that be in HPC Kirk Cameron, Virginia Tech, USA
Abstract. The power consumption of supercomputers ultimately limits their performance. The current challenge is not whether we will can build an exaflop system by 2018, but whether we can do it in less than 20 megawatts. The SCAPE Laboratory at Virginia Tech has been studying the tradeoffs between performance and power for over a decade. We've developed an extensive tool chain for monitoring and managing power and performance in supercomputers. We will discuss the implications of our findings for exascale systems and some research directions ripe for innovation.