T3E Flags and Libraries
-O3 - Maximum optimization, may alter semantics.
-apad - Pad arrays to avoid cache line conflicts
-unroll2 - Apply aggressive unrolling
-pipeline2 - Software pipelining
-split2 - Apply loop splitting.
-Wl”-Dallocate(alignsz)=64b” Align common blocks on cache line boundary
-lmfastv - Fastest vectorized intrinsics library
-lsci - Include library with BLAS, LAPACK and ESSL routines
-inlinefrom=<> - Specifies source file or directory of functions to inline
-inline2 - Aggressively inline function calls.