(Julie, Justin Si [forswg] has left Stanford so I'm following up on this -- Sherm)
I see that this is on the Lapack 3.5.0 Errata as
bug 121. It is marked "CORRECTED?" in rev 1505. It was assumed to be the same as bug 115. I looked at the code changes in the rev 1505 checkin, and I don't think it fixes bug 121. The checkin had this comment (abbr)
Commit fix from Osni Marquez for Bug 115 reported from Duncan Po (Mathworks) ... I suspect that the problem reported in an e-mail sent by Justin Weiguang Si to
lapack@cs.utk.edu on Sep 22 is also related to those, although Justin has not yet replied to my request for the offending matrix...
The bug has been traced to [d/s]laed6, which computes the root closest to the origin of a secular equation and is used in the D&C tridiagonal eigensolver (DSTEDC) and D&C least squares solver (DGELSD). We have interacted with Ren-Cang Li (the original developer of [d/s]laed6) about possible fixes, and I am attaching a new version of [d/s]laed6.
The bug was related to a too stringent tolerance convergence criterion, line 390 in laed6. ... corrected bugs: 115 and 121
The problem we reported is not a complaint about the failure to converge, but rather the error reporting. Our users encounter the problem sporadically. It is due to an intermediate matrix generated internal to a 3rd-party code we depend on (IpOpt), a nonlinear interior point optimizer that uses Lapack internally. IpOpt generates matrix approximations that may indeed become unsolvable, but IpOpt is equipped to deal with a Lapack failure by restarting and generating cleaner matrices. The problem is that it doesn't get a chance to do so because the INFO=1 error first caught in DLASD8 is not getting returned properly; due to what appears to be a fairly recent change in that code it causes a premature XERBLA call that aborts our executing application (not to mention that the error message itself is incorrect). We found that the code still in DLASDA
- Code: Select all
505 IF( info.NE.0 ) THEN
506 RETURN
507 END IF
appears to be the correct way to handle a lower-level failure, but the code in DLASD8 has been changed to:
- Code: Select all
281 IF( info.NE.0 ) THEN
282 CALL xerbla( 'DLASD4', -info )
283 RETURN
284 END IF
which does not let the error bubble up to where it can be dealt with (the same problem exists in DLASD6).
We would propose that DLASD6 and DLASD8 be changed to report the info=1 failures up, like DLASDA, rather than aborting execution with XERBLA. We tried making these changes ourselves and that allowed IpOpt to recover gracefully.
Please feel free to email me at
msherman@stanford.edu if you have questions.
Thanks and regards,
Sherm (Michael Sherman)