Since you have two cards, does it still fail if you set the environment variable MAGMA_NUM_GPUS = 2? That should force it to use the multi-GPU non-resident code.
Ah, I think I found why it is failing. Oddly, it will fail for sizes between about half of the GPU memory and the full GPU memory. In my tests, it worked for smaller sizes AND for sizes greater than the GPU memory. It also works for any size that is a multiple of 32. For single-complex on our card, half of 2687.4 MB is 13271 x 13271, all of 2687.4 MB is 18768 x 18768. So sizes between about 13200 and 18700 fail, except multiples of 32, while other sizes work (see below). The problem is it needs to transpose the matrix, and is allocating an extra matrix to do so, except if the matrix is an exact multiple of 32 it can transpose in-place. While if it doesn't fit on the GPU at all, then it uses the non-resident code. The need for an extra matrix allocation will be eliminated in the next release. Meanwhile, I think I can figure out a quick patch to resolve the problem for you. Will advise in a few days.
-mark
Code: Select all
romulus ~/magma-1.3.0-fermi/testing> ./testing_cgesv -N 10000 -N 12000 -N 13000 -N 13200 -N 13216 -N 13232 -N 18000 -N 19000 -N 20000
MAGMA 1.3.0
device 0: Tesla S2050, 1147.0 MHz clock, 2687.4 MB memory, capability 2.0
Usage: ./testing_cgesv -N <matrix size> -R <right hand sides>
-N can be repeated up to 10 times
N NRHS GPU GFlop/s (sec) ||B - AX|| / ||A||*||X||
===========================================================
10000 100 527.99 ( 5.20) 1.37e-06
12000 100 560.82 ( 8.42) 1.57e-06
13000 100 572.55 ( 10.47) 1.96e-06 # smaller than half of GPU memory okay
magma_cgesv returned error -113.
13200 100 5771.38 ( 1.09) 7.52e-01
13216 100 595.23 ( 10.58) 1.88e-06 # multiple of 32 okay
magma_cgesv returned error -113.
13232 100 7686.60 ( 0.82) 7.56e-01
magma_cgesv returned error -113.
18000 100 7840.05 ( 2.02) 7.48e-01
19000 100 242.20 ( 76.71) 6.74e-05 # bigger than GPU memory okay
20000 100 312.14 ( 69.37) 2.67e-05