Performance Analysis and Compiler Optimizations

4/22/99


Click here to start


Table of Contents

Performance Analysis and Compiler Optimizations

Credits

Overview

Not Getting the Performance You Want

Using the Compiler to Optimize Code

Compiler Specific Flags

SP2 Flags and Libraries

Recommended flags for IBM SP

Accuracy Considerations

Numerical Libraries

O2K Flags and Libraries

Recommended Flags for Origin 2000

Accuracy Considerations

Exception profiling

Interprocedural Analysis

Inlining

Manual Inlining

Loop Nest Optimizer

Optimized Arithmetic Libraries

Numerical Libraries

CHALLENGEcomplib and SCSL

T3E Flags and Libraries

Sun Enterprise Flags and Libraries

Recommended flags for the Sun Enterprise

Accuracy Considerations

Performance Tools

O2K Performance Tools

Some Hardware Counter Events

Hardware Performance Counter Access

Speedshop

Speedshop Components

Speedshop Usage

SpeedShop Sampling

Speedshop Counting

Ideal Experiment

ideal Experiment Example

pcsamp Experiment Example

usertime Experiment Example

Gprof information

Exception Profiling

Address Space Profiling

Parallel Profiling

Parallel Profiling

CASEVision Debugger

Performance Tools for the IBM SP2

tprof for the SP2

xprofile for the SP2

Performance Tools for Cray T3E

PAT for the T3E

Apprentice for the T3E

Performance Tools for the Sun Enterprise

looptool for SUN

looptool output

tcov for Sun

Sample tcov report

Fortran 90 Issues

Fortran 90 and OO programming

Operator Overloading

Dynamic Memory Allocation

Array Syntax

Fortran 90 WHERE

CSHIFT and F90 intrinsics

F90 Derived Types

MPI Optimizations

The MPI Protocol
Short messages

The MPI Protocol
Long messages

What does all this mean?

Portable MPI tips

Vendor MPI tricks

MPI Tools

MPE Logging/nupshot

MPE Logging/nupshot

MPE Logging Library

MPE Logging Library (cont.)

nupshot

Timelines Display

Other Displays

PPT Slide

Vampir and Vampirtrace

Vampir Features

Vampir GUI Features

Vampir GUI Features (cont.)

Vampir GUI Features (cont.)

Global Timeline Display

Global Timeline Display (cont.)

PPT Slide

Global Timeline Context Menu

Identify Message

Identify State

Process Timeline Display

PPT Slide

Global Activity Chart Display

Global Activity Chart Display (cont.)

PPT Slide

PPT Slide

Process Activity Chart Display

PPT Slide

PPT Slide

PPT Slide

PPT Slide

Global Communication Statistics Display

PPT Slide

PPT Slide

Global Parallelism Display

PPT Slide

OpenMP Optimization

OpenMP Optimization cont.

Loop Level Approach

SPMD via OpenMP

OpenMP Synchronization

OpenMP Barriers

Barrier Optimization

OpenMP NOWAIT clause

OpenMP Scheduling

Dynamic Threads

Reducing Overhead

OpenMP Reduction

OpenMP and PRIVATE's

Parallel I/O and OpenMP

OpenMP Memory Consistency

OpenMP and Global Variables

OpenMP Performance Tuning

Additional Material

Author: Kevin S. London

Email: london@cs.utk.edu

Home Page: http://www.cs.utk.edu/~london/

Author: Philip J. Mucci

Email: mucci@cs.utk.edu

Home Page: http://www.cs.utk.edu/~mucci/

Download presentation source