by Julien Langou » Wed Apr 30, 2008 5:46 pm
The problem is the shape of your matrix. You have a small and fat matrix.
I am even impressed that ScaLAPACK managed to get better performance
than LAPACK with any processor count.
We have recently developed some algorithms for tall and skinny matrix,
those algorithms are radically different from ScaLAPACK. (They use
the same data structure.)
Also our algorithm have been designed for tall and skinny matrices, they
work as well on small and fat, that's zero problem.
What you should do is a QR factorization of your initial matrix, preferably
with our new algorithm, then perform an SVD factorization of the
800-by-800 R factor. The 800-by-800 SVD factorization can be done
sequentially. Maybe in parallel on a few processes if that becomes a bottleneck.
I can provide with a tall-and-skinny code that works for me if you give me
you sent me an email and if you have say a few hours to interface with it.
Julien.