Table of Contents
Performance Analysis and Compiler Optimizations
Credits
Overview
Not Getting the Performance You Want
Using the Compiler to Optimize Code
Compiler Specific Flags
SP2 Flags and Libraries
Recommended flags for IBM SP
Accuracy Considerations
Numerical Libraries
O2K Flags and Libraries
Recommended Flags for Origin 2000
Accuracy Considerations
Exception profiling
Interprocedural Analysis
Inlining
Manual Inlining
Loop Nest Optimizer
Optimized Arithmetic Libraries
Numerical Libraries
CHALLENGEcomplib and SCSL
T3E Flags and Libraries
Sun Enterprise Flags and Libraries
Recommended flags for the Sun Enterprise
Accuracy Considerations
Performance Tools
O2K Performance Tools
Some Hardware Counter Events
Hardware Performance Counter Access
Speedshop
Speedshop Components
Speedshop Usage
SpeedShop Sampling
Speedshop Counting
Ideal Experiment
ideal Experiment Example
pcsamp Experiment Example
usertime Experiment Example
Gprof information
Exception Profiling
Address Space Profiling
Parallel Profiling
Parallel Profiling
CASEVision Debugger
Performance Tools for the IBM SP2
tprof for the SP2
xprofile for the SP2
Performance Tools for Cray T3E
PAT for the T3E
Apprentice for the T3E
Performance Tools for the Sun Enterprise
looptool for SUN
looptool output
tcov for Sun
Sample tcov report
Fortran 90 Issues
Fortran 90 and OO programming
Operator Overloading
Dynamic Memory Allocation
Array Syntax
Fortran 90 WHERE
CSHIFT and F90 intrinsics
F90 Derived Types
MPI Optimizations
The MPI Protocol Short messages
The MPI Protocol Long messages
What does all this mean?
Portable MPI tips
Vendor MPI tricks
MPI Tools
MPE Logging/nupshot
MPE Logging/nupshot
MPE Logging Library
MPE Logging Library (cont.)
nupshot
Timelines Display
Other Displays
PPT Slide
Vampir and Vampirtrace
Vampir Features
Vampir GUI Features
Vampir GUI Features (cont.)
Vampir GUI Features (cont.)
Global Timeline Display
Global Timeline Display (cont.)
PPT Slide
Global Timeline Context Menu
Identify Message
Identify State
Process Timeline Display
PPT Slide
Global Activity Chart Display
Global Activity Chart Display (cont.)
PPT Slide
PPT Slide
Process Activity Chart Display
PPT Slide
PPT Slide
PPT Slide
PPT Slide
Global Communication Statistics Display
PPT Slide
PPT Slide
Global Parallelism Display
PPT Slide
OpenMP Optimization
OpenMP Optimization cont.
Loop Level Approach
SPMD via OpenMP
OpenMP Synchronization
OpenMP Barriers
Barrier Optimization
OpenMP NOWAIT clause
OpenMP Scheduling
Dynamic Threads
Reducing Overhead
OpenMP Reduction
OpenMP and PRIVATE's
Parallel I/O and OpenMP
OpenMP Memory Consistency
OpenMP and Global Variables
OpenMP Performance Tuning
Additional Material
|