LAPACK/ScaLAPACK Development

by **karturov** » Wed Nov 14, 2012 3:44 am

Hi,

A problem as follows has been discovered in PDGEMM - let's consider a PDGEMM example with parameters: m=120M=120000000, n=80, nrhs=80, nrows=4, ncols=80.

In this case, we have such local matrices: A(30M x 1), B(20 x 1), C(30M x 1).
In PBLAS/SRC/PTOOLS/PB_CpgemmAB.c we have (line 360):

kb = pilaenv_( &ctxt, C2F_CHAR( &TYPE->type ) );

..so, kb==32. Then (line 429):

PB_COutV( TYPE, COLUMN, NOINIT, M, N, Cd0, kb, &WA, WAd0, &WAfr, &WAsum );

There WA is tried to be allocated (PB_COutV.c:299):
*YAPTR = PB_Cmalloc( Amp * K * TYPE->size );
The problem is that (Amp * K * TYPE->size) == (20M * 32 *

that's more than 5 billions and 'int' overflow occurs.

So, in this testcase, there;s no need to have kb=32, but it's enough to have it equal to 1. I propose to truncate kb if it's bigger than needed or if we know that 'int' will be exceeded.

Please find and review the hot-fix attached. And in general, it isn't correct that PB_Cmalloc accepts int, but no size_t:

char * PB_Cmalloc ( int );

Regards,
Konstantin

by **karturov** » Wed Nov 14, 2012 4:35 am

SEGFAULT occured with the following parameters:

m=74612736 n=80 nrhs=80 nrows=3 ncols=80

by **kentot123** » Tue Dec 04, 2012 3:22 pm

I had same problem too.

LAPACK/ScaLAPACK Development

P?DGEMM issue: too large blocks for data replication

P?DGEMM issue: too large blocks for data replication

Re: P?DGEMM issue: too large blocks for data replication

Re: P?DGEMM issue: too large blocks for data replication

Who is online