Parallel Profiling
perfex outputs all tasks followed by all tasks summed
In shared memory executables, watch
load imbalance (cntr 21, flinstr)
excessive synchronization (4, store cond)
false sharing (31, shared cache block)
Previous slide
Next slide
Back to first slide
View graphic version