The LAPACK forum has moved to https://github.com/Reference-LAPACK/lapack/discussions.

Scalapack SVD negative singular values for large matrices

Open discussion regarding features, bugs, issues, vendors, etc.

Scalapack SVD negative singular values for large matrices

Postby lelouarn » Fri Dec 15, 2006 9:05 am

Hello,
We have been using Scalapack for a while on our Linux cluster of Pentium 4 PCs. We use pdgesvd in order to compute the generalized inverse of large matrices. The function is called from a C routine.
Up to matrix sizes of about 15 000 x 15 000, everything seems to work fine (i.e. the inverted matrix makes sense). For matrices 20 000x20 000 and up, we get the wrong answer. To diagnose the problem, we tried:
- Compute U^T # U -> Gives the identity matrix. Ok.
- Compute V^T # V -> Gives the identity matrix. Ok.
- Some singular values are negative (!). Not OK.
- U # sigma #V^T doesn't give (obviously) the original matrix, since some singular values are negative.
The content of the matrix doesn't seem to matter, as meaningful (for us) and random matrices seem to give the same result.

I would like to emphasize that "small" matrices are fine. So it excludes the obvious kind of bugs (but doesn't exclude with 100% certainty the possibility of memory corruption or something).

I have tested the following:
- Scalapack 1.7.4 and 1.7.0
- gcc and intel compilers
- I have also tested the intel scalapack library and this seems also to be problematic.
- I have tried to take the absolute value of the eigenvalues (the negative values are in the same range in absolute value as the positive ones) but that doesn't help.
- Different block sizes and numbers of CPUs (64 is the standard value I used)
- Different matrix shapes: both 20 000 square and 20 000 x 40 000 fail.

So my question is: is this behaviour "normal" ? Is it a bug in pdgesvd ? The propagation of numerical errors for large matrices in the particular algorithm used in pdgesvd ? Some other weird "feature" ?
Is there a way out of this ?

Any suggestions will be appreciated !

Thanks

Miska

PS: In case you are interested, I have a little test code which demonstrates the effect. But in principle, just do the SVD of a 20kx20k random matrix and explore the singular values.
lelouarn
 
Posts: 4
Joined: Tue Dec 12, 2006 10:44 am

Postby Julie » Fri Dec 15, 2006 11:24 am

From Jim Demmel

Could you please say what the largest positive and negative singular
values were, i.e. largest in absolute value? I suspect that your
problem is very ill conditioned, and that to get a pseudo inverse
you will need to zero out singular values below a threshold, and
so get a low rank pseudo inverse. That is separate from why
you might be getting negative singular values, which you shouldn't.

Could you also say exactly which ScaLAPACK SVD routine you called,
and the calling sequence?

Jim Demmel
Julie
 
Posts: 299
Joined: Wed Feb 23, 2005 12:32 am
Location: ICL, Denver. Colorado

Postby Julie » Fri Dec 15, 2006 11:36 am

There was a bug fix back in January 2006 for the svd, this was in Scalapack-1.7.1. It seems that you are asking for U, S and VT. Well the bug was precisely in this case.

Do you use ScaLAPACK-1.7.1 or higher?As Jim said, can you give use the grid shape (P and Q) the block size for the matrix you are using, the size of the matrix. Then I understand that even a random matrix produces a wrong result. So if you give us P, Q, NB and N we can try to reproduce what you are saying.

Julien
Julie
 
Posts: 299
Joined: Wed Feb 23, 2005 12:32 am
Location: ICL, Denver. Colorado

Postby lelouarn » Fri Dec 15, 2006 12:22 pm

Hi,

> Do you use ScaLAPACK-1.7.1 or higher?

My default version was 1.7.0. I Think I tried with 1.7.4 also and got the same result, but will this confirm again.

The grid shape was 8 x 8, totaling 64 CPUs.
Matrix size: 20k x 20k
Singular values range from -81.834700 to 5003.0157
Block size I used was 400

My problem is ill conditionned and I do some filtering, but getting negative values tells me something is fishy...
Maybe I am doing something fundamentally wrong though.

I appreciate your help !

Miska
lelouarn
 
Posts: 4
Joined: Tue Dec 12, 2006 10:44 am

Postby lelouarn » Thu Dec 21, 2006 12:19 pm

Hi,
Just to keep you posted on my latest findings...
I have now tried several scalapack versions (the 1.7.4 and the intel provided scalapack). Both give the negative singular values for my matrix.
I have also tried different block sizes (64 and 640). Same thing.
Different number of processors (49 and 64). Same effect. I have not tried non square processor geometries, but I'll have to try that.

This thing is still a mystery and very very annoying, because I desperately need to run my code now on large SVDs.

Any progress on trying to reproduce these results ?

And a small questions: I suppose scalapack has been tested some day, on such large matrices, or ??? I cannot imagine to be the first one to try this ?

Thanks again for your help !

Merry winter solstice and a happy (+ bug free :-) ) new year !

Miska
lelouarn
 
Posts: 4
Joined: Tue Dec 12, 2006 10:44 am

Postby lelouarn » Mon Jan 15, 2007 4:14 am

Hi,
Ok, I also submitted this issue to the intel support forum, since their pre-compiled / optimized scalapack routines (in the cluster tools package) also exhibit the same bug. Their MKL team was able to point me in the right direction.

It turns out the problem can be solved by increasing the value of MAXITR from 6 to 12 in the DBDSQR routine (which part of Lapack), and recompiling lapack...

Thanks anyway for looking !

Miska
lelouarn
 
Posts: 4
Joined: Tue Dec 12, 2006 10:44 am


Return to User Discussion

Who is online

Users browsing this forum: No registered users and 3 guests