LAPACK/ScaLAPACK Development

by **ashtonaut** » Sun May 29, 2005 9:59 pm

Hi,

I am writing code that solves a large system of equations [A]{x}={b}, where [A] is stored in banded form, and the solution is generated using PDGBTRF followed by PDGBTRS. We have a beowulf cluster running Rocks 3.3 and using the ACML v2.5.

I have noticed something that makes no sense to me, perhaps someone could explain...

Looking at the PDGBTRF code from the netlib website,
http://www.netlib.org/scalapack/double/pdgbtrf.f,
I see that there must be a local vector on each process called AF. The minimum length of AF is defined as
(NB+bwu)*(bwl+bwu)+6*(bwl+bwu)*(bwl+2*bwu).

In our case at the moment, the matrix [A] has a hbw of about 2200 (so bwl=bwu=hbw=2200), and N~=27000.

My question is, why does AF so take up so much memory?

We can't get PDGBTRF to work on our cluster (with 2GB RAM per node). The [A] matrix is fine, because we can split this across the processes. However, the AF vector almost requires as much memory as [A], but it has to be stored locally. This means that regardless of how many machines we split the code over, the local vector AF will still be of order (fbw)^2, which gets huge very quickly as you increase problem size, and becomes the limiting factor!

Could someone please help us out? We thought we would be able to run larger problems simply by splitting them across more processes, but it seems in this case the local vector AF is going to prevent us from doing this - any thoughts?

Any help most appreciated.

Regards,

Ashton

by **efdazedo** » Mon Jun 20, 2005 10:07 am

It seems to me the software is intended for "narrow" band solver, where the bandwidth is a small fraction of the overall size N.

If this problem is coming from a finite element grid, perhaps the problem can be reordered using a bandwidth reducing algorithm such as RCM (reverse cuthill-mckee) or Gibbs-poole-stockmeyer ordering.

I think symmetric RCM is available in matlab.

There are lapack working notes related to the parallel banded solver,

Implementation in ScaLAPACK of Divide-and-Conquer Algorithms
for Banded and Tridiagonal Linear Systems
UT-CS-97-358, April 1997.

A Comparison of Parallel Solvers for Diagonally Dominant
and General Narrow-Banded Linear Systems
UT-CS-99-414, Feb 1999.

A Comparison of Parallel Solvers for Diagonally Dominant
and General Narrow-Banded Linear Systems II
UT-CS-99-415, May 1999.

http://www.netlib.org/lapack/lawns/downloads/

LAPACK/ScaLAPACK Development

PDGBTRF Auxilliary Fillin Space

PDGBTRF Auxilliary Fillin Space

storage for (narrow) band solver

Who is online