Hi,
I am writing code that solves a large system of equations [A]{x}={b}, where [A] is stored in banded form, and the solution is generated using PDGBTRF followed by PDGBTRS. We have a beowulf cluster running Rocks 3.3 and using the ACML v2.5.
I have noticed something that makes no sense to me, perhaps someone could explain...
Looking at the PDGBTRF code from the netlib website,
http://www.netlib.org/scalapack/double/pdgbtrf.f,
I see that there must be a local vector on each process called AF. The minimum length of AF is defined as
(NB+bwu)*(bwl+bwu)+6*(bwl+bwu)*(bwl+2*bwu).
In our case at the moment, the matrix [A] has a hbw of about 2200 (so bwl=bwu=hbw=2200), and N~=27000.
My question is, why does AF so take up so much memory?
We can't get PDGBTRF to work on our cluster (with 2GB RAM per node). The [A] matrix is fine, because we can split this across the processes. However, the AF vector almost requires as much memory as [A], but it has to be stored locally. This means that regardless of how many machines we split the code over, the local vector AF will still be of order (fbw)^2, which gets huge very quickly as you increase problem size, and becomes the limiting factor!
Could someone please help us out? We thought we would be able to run larger problems simply by splitting them across more processes, but it seems in this case the local vector AF is going to prevent us from doing this - any thoughts?
Any help most appreciated.
Regards,
Ashton

