Challenges of Energy-aware Scientific Computing
Abstract
Power provisioning and energy consumption become major challenges in the field
of high performance computing. Energy costs over the lifetime of an HPC
installation are in the range of the acquisition costs. The quest for Exascale
computing has made it clear that addressing the power challenge will require
the synergy of several major advances. These will range widely starting from
algorithmic design and performance modeling all the way to HPC hardware and
data center design. We assembled a speaker list of experts and pioneers in
energy aware HPC in an attempt to cover the wide range of needed solutions.
Organizers
- Piotr Luszczek, University of Tennessee, Knoxville, USA
- Costas Bekas, IBM Research-Zurich, Switzerland
Part I
Friday, March 1
MS75
9:30 AM - 11:30 AM Room: Hancock - Lobby Level
- 9:30-9:55 Energy Aware Performance Metrics,
Costas Bekas and Alessandro Curioni, IBM Research-Zurich, Switzerland
Abstract. Energy aware performance metrics are absolutely necessary in order
to better appreciate the performance of algorithms on modern architectures.
Although recent advances, such as the FLOPS/WATT metric, gave an important
push to the right direction, we will show that deeper investigations are
needed, if we are to overcome the power barrier for reaching Exaflop. We will
showcase tools that allow accurate, on chip power measurements that shed new
light on the energy requirements of important kernels.
- 10:00-10:25 Application-aware Energy Efficient High Performance Computing
Laura Carrington, San Diego Supercomputer Center, USA
Abstract. The energy cost of running an HPC system can exceed the cost of
the original hardware purchase. This has driven the community to attempt to
understand and minimize energy costs wherever possible. We present an
automated framework, Green Queue, for customized application-aware Dynamic
Voltage-Frequency Scaling (DVFS) settings to reduce the energy consumption
of large scale scientific applications. Green Queue supports making CPU
clock frequency changes in response to intra-node and internode
observations about application behavior. Our intra-node approach reduces
CPU clock frequencies and therefore power consumption while CPUs lacks
computational work due to inefficient data movement. Our inter-node
approach reduces clock frequencies for MPI ranks that lack computational
work. We investigated these techniques on a set of large scientific
applications on 1024 cores of Gordon, an Intel Sandybridge based
supercomputer at the San Diego Supercomputer Center. Our optimal intra-node
technique showed an average measured energy savings of 10.6% and a maximum
of 21.0% over regular application runs. Our optimal inter-node technique
showed an average 17.4% and a maximum of 31.7% energy savings.
- 10:30-10:55 A 'Roofline' Model of Energy and What it Implies for Algorithm Design
Jee Whan Choi and Richard Vuduc, Georgia Institute of Technology, USA
Abstract. We describe an energy-based analogue of the time-based roofline model
of Williams, Waterman, and Patterson (Comm. ACM, 2009). Our goal is to
explain---in simple, analytic terms accessible to algorithm designers and
performance tuners---how the time, energy, and power to execute an algorithm
relate. We confirm the basic form of the model experimentally, and explain what
it may imply for algorithm design with respect to power and energy compared to
time.
- 11:00-11:25 Power Bounds and Large Scale Computing,
Bronis R. de Supinski, Lawrence Livermore National Laboratory, USA
Abstract. Energy and power are widely recognized as significant challenges
for future large scale systems. Current processors, in particular the Intel
Sandy Bridge family, already include mechanisms to limit power levels
dynamically. However, these mechanisms apply without consideration of the
power levels of other nodes, which may be lower and allow a "hot" node to
consume more power. This talk will discuss options and techniques for
limiting power and energy consumption that better suit large-scale systems.
It will also detail initial experiences with Intel's Running Average Power
Limit (RAPL) on a large Linux system and discuss possible extensions.
Part II
Friday, March 1
MS253
1:00 PM - 3:00 PM
Room: Hancock - Lobby Level
- 1:00-1:25
Cancelled 1:00-1:25 Energy-Aware Dense and Sparse Linear Algebra
Enrique S. Quintana-Ortí, Universidad Jaume I, Spain
- 1:30-1:55 Locality Aware Scheduling of Sparse Computations for Energy and Performance Efficiencies
Padma Raghavan, Pennsylvania State University, USA; Michael Frasca, Microsoft Research, USA
Abstract. We consider the problem of increasing the performance and energy
efficiencies of sparse matrix and graph computations on multicore processors.
Such systems have complex non-uniform memory access (NUMA) cache and memory
that exhibit significant variations in data access latencies. We describe a
scheme for fine-grain task scheduling to cores that takes into account the
probabilities of hits in cache. We present results indicating that our scheme
leads to near ideal speed-ups and large improvements in performance and energy
efficiencies compared to traditional methods.
- 2:00-2:25 Lower Bounds on Algorithm Energy Consumption: Current Work and Future Directions
Andrew Gearhart and James W. Demmel, University of California, Berkeley, USA
Abstract. By extending communication lower bounds on algorithms via linear
models of machine energy consumption, we have derived theoretical bounds on the
minimal amount of energy required to run an algorithm on a given machine. To
use these bounds for HW/SW cotuning, efficient parameters for energy models
must be calculated. We discuss initial approaches to parameter calculation and
further development of energy bounds into broader classes of codes.
- 2:30-2:55 The Powers that be in HPC
Kirk Cameron, Virginia Tech, USA
Abstract. The power consumption of supercomputers ultimately limits their
performance. The current challenge is not whether we will can build an exaflop
system by 2018, but whether we can do it in less than 20 megawatts. The SCAPE
Laboratory at Virginia Tech has been studying the tradeoffs between performance
and power for over a decade. We've developed an extensive tool chain for
monitoring and managing power and performance in supercomputers. We will
discuss the implications of our findings for exascale systems and some research
directions ripe for innovation.