Performance Optimization
Outline
Performance
Performance Examples
Issues in Performance
Issues in PerformanceProblem Size and Precision
Performance Metrics
Asymptotic Analysis
Amdahl’s Law
Efficiency
What is Optimization?
Types of Optimization
Steps of Optimization
Performance Strategies
Considerations when Optimizing
Locality
Memory Hierarchy
SP2 Access Times
Cache Performance
Cache Architecture
Cache Mapping
2 way set associative cache
Memory Access
Memory Access Example
Serial Optimizations
Array Optimization
Array Allocation
Array Initialization
Array Padding
Array Paddinga = a + b * c
Stride Minimization
Loop Fusion
Loop Interchange
Floating IF’s
Loop Defactorization
Loop Peeling
Loop Collapse
Loop Unrolling
Loop Unrolling and Sum Reductions
Outer Loop Unrolling
Loop structure
Strength Reduction
Strength ReductionHorner’s Rule
Strength ReductionInteger Division by a Power of 2
Strength ReductionInteger division by a Power of 2
Strength Reduction Integer division by a Power of 2
Strength ReductionFactorization
Subexpression EliminationParenthesis
Subexpression EliminationType Considerations
Subroutine Inlining
Optimized Arithmetic Libraries
SGI Origin 2000
SP2
T3E
T3E Streams
T3E E-registers
T3E Cache Bypass
O2K Flags and Libraries
SP2 Flags and Libraries
T3E Flags and Libraries
Timers
prof
SpeedShop on the SGI
SpeedShop Usage
Ideal Experiment
Pcsamp Experiment
Example of UserTime
tprof for the SP2
PAT for the T3E
Apprentice for the T3E
PPT Slide
Additional Material
References
Email: mucci@cs.utk.edu london@cs.utk.edu
Home Page: http://www.cs.utk.edu/~london http://www.cs.utk.edu/~london
Download presentation source