Awesome. So I guess that's it.
(I am using gfortran on MAC OS and I can not use frecursive with fmax-stack-var-size so I only used fmax-stack-var-size.)
I confirm what you wrote.
main.cpp is a correct multithreaded code that spawns two simultaneous execution of zgeev (on two distinct threads)
1) Compiling main.cpp as is =>
NO2) Compiling main.cpp with -DGOOD option (=mutex protection of each zgeev call) =>
OK3) Compiling zgehrd with -fmax-stack-var-size=66560 then linking with main.cpp =>
OK4) Compiling zgehrd with -fmax-stack-var-size=66559 then linking with main.cpp =>
NO5) Using your pacth on zgehrd then linking with main.cpp =>
OK(NO: means that the code runs in two threads returns an unpredectiable uncorrect answer. Very fun to observe.)
Guilty lines of code in dgehrd.f:
- Code: Select all
INTEGER NBMAX, LDT
PARAMETER ( NBMAX = 64, LDT = NBMAX+1 )
[...]
* ..
* .. Local Arrays ..
DOUBLE PRECISION T( LDT, NBMAX )
which is an allocation of an array on the stack of size 64*65*16 = 66560 bytes which is greater than 32768. (The current (4.5.0) default value for gfortran max-stack-var-size.) The LAPACK code is overflowing the maximum
size in bytes of the largest array that can be put on the stack with gfortran. Great.
So I will contact other lapackers on what best to do with the issue. We have several option to fix this: dynamic allocation, going with the workspace as you did, requesting a stack bigger than 66560 bytes ... (other options?). My preference would be to go with the workspace as you did.
I have just added the bug as bug#0061 on the LAPACK Errata webpage.
The problem is potentially not only zgehrd.f. I have checked the LAPACK codes (only ./SRC/), the potential problematic subroutines (allocation on the stack) can be identified by the string of characters: "Local Arrays".
- Code: Select all
grep -H -A 4 -B 0 "Local Arrays" ./SRC/*f
This gives 311 routines. A brief look at them reveal that a lot of them allocate very small arrays on the stack (e.g., 4 DOUBLE PRECISION). With a threshold at 64 bytes, I found that they are 60 subroutines left. They are given below. It sounds that we are going to have a lots of fun.
A question is: what is the maximum size in bytes of the largest array that can be put on the stack with standard fortran compilers?
Cheers
--julien.
- Code: Select all
=======================================================================================
COMPLEX PRECISION ROUTINES
=======================================================================================
cgbtrf.f:* .. Local Arrays ..
cgbtrf.f- COMPLEX WORK13( LDWORK, NBMAX ),
cgbtrf.f- $ WORK31( LDWORK, NBMAX )
=======================================================================================
cgehrd.f:* .. Local Arrays ..
cgehrd.f- COMPLEX T( LDT, NBMAX )
=======================================================================================
chseqr.f:* .. Local Arrays ..
chseqr.f- COMPLEX HL( NL, NL ), WORKL( NL )
=======================================================================================
clarnv.f:* .. Local Arrays ..
clarnv.f- REAL U( LV )
=======================================================================================
clatdf.f:* .. Local Arrays ..
clatdf.f- REAL RWORK( MAXDIM )
clatdf.f- COMPLEX WORK( 4*MAXDIM ), XM( MAXDIM ), XP( MAXDIM )
=======================================================================================
cpbtrf.f:* .. Local Arrays ..
cpbtrf.f- COMPLEX WORK( LDWORK, NBMAX )
=======================================================================================
ctgex2.f:* .. Local Arrays ..
ctgex2.f- COMPLEX S( LDST, LDST ), T( LDST, LDST ), WORK( 8 )
=======================================================================================
ctgsy2.f:* .. Local Arrays ..
ctgsy2.f- INTEGER IPIV( LDZ ), JPIV( LDZ )
ctgsy2.f- COMPLEX RHS( LDZ ), Z( LDZ, LDZ )
=======================================================================================
cunmlq.f:* .. Local Arrays ..
cunmlq.f- COMPLEX T( LDT, NBMAX )
=======================================================================================
cunmql.f:* .. Local Arrays ..
cunmql.f- COMPLEX T( LDT, NBMAX )
=======================================================================================
cunmqr.f:* .. Local Arrays ..
cunmqr.f- COMPLEX T( LDT, NBMAX )
=======================================================================================
cunmrq.f:* .. Local Arrays ..
cunmrq.f- COMPLEX T( LDT, NBMAX )
=======================================================================================
cunmrz.f:* .. Local Arrays ..
cunmrz.f- COMPLEX T( LDT, NBMAX )
=======================================================================================
=======================================================================================
REAL PRECISION ROUTINES
=======================================================================================
dgbtrf.f:* .. Local Arrays ..
dgbtrf.f- DOUBLE PRECISION WORK13( LDWORK, NBMAX ),
dgbtrf.f- $ WORK31( LDWORK, NBMAX )
=======================================================================================
dgehrd.f:* .. Local Arrays ..
dgehrd.f- DOUBLE PRECISION T( LDT, NBMAX )
=======================================================================================
dhseqr.f:* .. Local Arrays ..
dhseqr.f- DOUBLE PRECISION HL( NL, NL ), WORKL( NL )
=======================================================================================
dlaexc.f:* .. Local Arrays ..
dlaexc.f- DOUBLE PRECISION D( LDD, 4 ), U( 3 ), U1( 3 ), U2( 3 ),
dlaexc.f- $ X( LDX, 2 )
=======================================================================================
dlaln2.f:* .. Local Arrays ..
dlaln2.f- LOGICAL RSWAP( 4 ), ZSWAP( 4 )
dlaln2.f- INTEGER IPIVOT( 4, 4 )
dlaln2.f- DOUBLE PRECISION CI( 2, 2 ), CIV( 4 ), CR( 2, 2 ), CRV( 4 )
=======================================================================================
dlarnv.f:* .. Local Arrays ..
dlarnv.f- DOUBLE PRECISION U( LV )
=======================================================================================
dlaruv.f:* .. Local Arrays ..
dlaruv.f- INTEGER MM( LV, 4 )
=======================================================================================
dlasrt.f:* .. Local Arrays ..
dlasrt.f- INTEGER STACK( 2, 32 )
=======================================================================================
dlatdf.f:* .. Local Arrays ..
dlatdf.f- INTEGER IWORK( MAXDIM )
dlatdf.f- DOUBLE PRECISION WORK( 4*MAXDIM ), XM( MAXDIM ), XP( MAXDIM )
=======================================================================================
dormlq.f:* .. Local Arrays ..
dormlq.f- DOUBLE PRECISION T( LDT, NBMAX )
=======================================================================================
dormql.f:* .. Local Arrays ..
dormql.f- DOUBLE PRECISION T( LDT, NBMAX )
=======================================================================================
dormqr.f:* .. Local Arrays ..
dormqr.f- DOUBLE PRECISION T( LDT, NBMAX )
=======================================================================================
dormrq.f:* .. Local Arrays ..
dormrq.f- DOUBLE PRECISION T( LDT, NBMAX )
=======================================================================================
dormrz.f:* .. Local Arrays ..
dormrz.f- DOUBLE PRECISION T( LDT, NBMAX )
=======================================================================================
dpbtrf.f:* .. Local Arrays ..
dpbtrf.f- DOUBLE PRECISION WORK( LDWORK, NBMAX )
=======================================================================================
dtgex2.f:* .. Local Arrays ..
dtgex2.f- INTEGER IWORK( LDST )
dtgex2.f- DOUBLE PRECISION AI( 2 ), AR( 2 ), BE( 2 ), IR( LDST, LDST ),
dtgex2.f- $ IRCOP( LDST, LDST ), LI( LDST, LDST ),
dtgex2.f- $ LICOP( LDST, LDST ), S( LDST, LDST ),
=======================================================================================
dtgsy2.f:* .. Local Arrays ..
dtgsy2.f- INTEGER IPIV( LDZ ), JPIV( LDZ )
dtgsy2.f- DOUBLE PRECISION RHS( LDZ ), Z( LDZ, LDZ )
=======================================================================================