Page 1 of 1

Code hangs in PDSYGST

PostPosted: Thu Jun 21, 2012 9:39 am
by ldm001
I have a persistent, but intermittent problem where code hangs forever in pdsygst for some matrix sizes for 64 cores (8x8) and perhaps other larger sizes. It does not occur every time, but can happen. (I know this sounds strange, but this is a truly reproducibly irreproducible bug.) It went away for a while, but following some recent updates to my cluster OFED/mvapich it has reappeared. I first reported it back in 2008 (viewtopic.php?f=2&t=795) without any useful responses.

When using Totalview I can see a bit more. Some of the cores are in PDTRSM waiting for information, while others are in PDSYR2K. I wonder if this could cause a problem, when some cores are waiting for one set of information while others are waiting for something else.

N.B., I note that there is a unverified 1.7 bug by Joan:IBM but I cannot find more information to see if it is similar.

Re: Code hangs in PDSYGST

PostPosted: Thu Jun 21, 2012 7:46 pm
by rodney
Which version of scalapack are you using? Are you still using the Intel scalapack you mention in your previous post, or one of the newer 2.0.x versions?

--Rodney

Re: Code hangs in PDSYGST

PostPosted: Thu Jun 21, 2012 8:26 pm
by ldm001
I am using the same version. Is this a known bug in scalapack that is fixed in 2.0.x? I am not certain if Intel has implimented 2.0.x yet.

Re: Code hangs in PDSYGST

PostPosted: Thu Jun 21, 2012 8:32 pm
by rodney
You might want to try the 2.0.2 version.

--Rodney

Re: Code hangs in PDSYGST

PostPosted: Thu Jun 21, 2012 8:59 pm
by ldm001
Has there been a change in the relevant routines?

Re: Code hangs in PDSYGST

PostPosted: Fri Jun 22, 2012 8:19 am
by ldm001
I will add that using 6x8 cores makes the problem go away, as does changing thw exact matrix size. To me this certainly walks like a scalapack bug.

Re: Code hangs in PDSYGST

PostPosted: Sun Jul 08, 2012 2:46 pm
by ldm001
I guess this is an open bug.