I have recently been trying to run LAPACK dsyev (matrix diagonalization to find eigenvalues/vectors) with 16 processors on an SGI Origin m/c using the SGI SCSL library compiled with multithreaded routines.
The performance is dismal and I cannot find any information as to what efficiencies I could expect using parallelized LAPACK routines.
Any experience/feedback anyone has in this regard would be much appreciated.

