magma_cgesv stability

mgates3 · Post by **mgates3** » Thu Mar 28, 2013 2:13 pm

At different matrix sizes though, for different cards, I guess?

Since you have two cards, does it still fail if you set the environment variable MAGMA_NUM_GPUS = 2? That should force it to use the multi-GPU non-resident code.

Ah, I think I found why it is failing. Oddly, it will fail for sizes between about half of the GPU memory and the full GPU memory. In my tests, it worked for smaller sizes AND for sizes greater than the GPU memory. It also works for any size that is a multiple of 32. For single-complex on our card, half of 2687.4 MB is 13271 x 13271, all of 2687.4 MB is 18768 x 18768. So sizes between about 13200 and 18700 fail, except multiples of 32, while other sizes work (see below). The problem is it needs to transpose the matrix, and is allocating an extra matrix to do so, except if the matrix is an exact multiple of 32 it can transpose in-place. While if it doesn't fit on the GPU at all, then it uses the non-resident code. The need for an extra matrix allocation will be eliminated in the next release. Meanwhile, I think I can figure out a quick patch to resolve the problem for you. Will advise in a few days.

-mark

Code: Select all

romulus ~/magma-1.3.0-fermi/testing> ./testing_cgesv -N 10000 -N 12000 -N 13000 -N 13200 -N 13216 -N 13232 -N 18000 -N 19000 -N 20000
MAGMA 1.3.0
device 0: Tesla S2050, 1147.0 MHz clock, 2687.4 MB memory, capability 2.0

Usage: ./testing_cgesv -N <matrix size> -R <right hand sides>
  -N can be repeated up to 10 times

    N   NRHS   GPU GFlop/s (sec)   ||B - AX|| / ||A||*||X||
===========================================================
10000    100    527.99 (   5.20)   1.37e-06
12000    100    560.82 (   8.42)   1.57e-06
13000    100    572.55 (  10.47)   1.96e-06  # smaller than half of GPU memory okay

magma_cgesv returned error -113.
13200    100   5771.38 (   1.09)   7.52e-01

13216    100    595.23 (  10.58)   1.88e-06   # multiple of 32 okay

magma_cgesv returned error -113.
13232    100   7686.60 (   0.82)   7.56e-01

magma_cgesv returned error -113.
18000    100   7840.05 (   2.02)   7.48e-01

19000    100    242.20 (  76.71)   6.74e-05  # bigger than GPU memory okay
20000    100    312.14 (  69.37)   2.67e-05

mh1 · Post by **mh1** » Mon Apr 01, 2013 2:20 pm

Very good. I look forward to the fix. By the way, I have a request. Give MAGMA cgesv the capability to use exactly the GPU's I want it to use on a given machine rather than using the MAGMA_NUM_GPUS variable. Is this possible for a future release?

mh1 · Post by **mh1** » Thu May 02, 2013 1:37 am

I am following up for others that may have similar problems. I am still seeing accuracy problems in MAGMA CGESV. It appears most noticeable when GPU RAM is exhausted. I do not think the error estimates used in the MAGMA CGEST example are adequate to reflect good accuracy in all applications in general. I have demonstrated poor accuracy in my application in particular (when GPU RAM is exhausted) and good accuracy (when GPU RAM is not exhausted). I consider this problem unresolved.

mgates3 · Post by **mgates3** » Tue May 07, 2013 2:16 pm

Yes, we're aware of this bug report, though I think we've been able to exactly reproduce the problem. (A little hard, given that we have different cards with different amounts of memory.)
-mark

mh1 · Post by **mh1** » Tue Jan 14, 2014 11:23 am

I now have time to re-visit MAGMA. Did MAGMA ever resolve the problems I noted in this thread? I don't want to waste my time w/ MAGMA if the problems have not been fixed.

Thanks

mgates3 · Post by **mgates3** » Tue Jan 14, 2014 1:54 pm

I don't know that I was ever able to exactly reproduce your issue -- having different GPUs available here -- but all the current tests indicate that it is working fine. In the below tests, the * ones are running out of GPU memory (determined by additional instrumentation). The maximum that could possibly fit in this GPU's memory is about N=18768.

Code: Select all

romulus ~/magma-trunk-fermi/testing> ./testing_cgesv --range 15000:25000:1000 -c
MAGMA 1.4.0 svn, compiled for CUDA capability >= 2.0
device 0: Tesla S2050, 1147.0 MHz clock, 2687.4 MB memory, capability 2.0
device 1: Tesla S2050, 1147.0 MHz clock, 2687.4 MB memory, capability 2.0
Usage: ./testing_cgesv [options] [-h|--help]

ngpu 1
    N  NRHS   CPU Gflop/s (sec)   GPU GFlop/s (sec)   ||B - AX|| / N*||A||*||X||
================================================================================
15000     1     ---   (  ---  )    598.45 (  15.04)   1.36e-10  ok
16000     1     ---   (  ---  )    604.59 (  18.07)   1.36e-10  ok
17000     1     ---   (  ---  )    614.81 (  21.31)   1.39e-10  ok
18000     1     ---   (  ---  )    622.82 (  24.97)   1.36e-10  ok
19000     1     ---   (  ---  )    568.68 (  32.17)   2.41e-10  ok *
20000     1     ---   (  ---  )    587.63 (  36.31)   2.02e-10  ok *
21000     1     ---   (  ---  )    591.73 (  41.74)   2.20e-10  ok *
22000     1     ---   (  ---  )    609.72 (  46.58)   2.16e-10  ok *
23000     1     ---   (  ---  )    599.81 (  54.10)   2.02e-10  ok *
24000     1     ---   (  ---  )    603.61 (  61.08)   2.23e-10  ok *
25000     1     ---   (  ---  )    608.09 (  68.53)   2.46e-10  ok *


romulus ~/magma-trunk-fermi/testing> ./testing_cgesv --range 18000:19000:50 -c
MAGMA 1.4.0 svn, compiled for CUDA capability >= 2.0
device 0: Tesla S2050, 1147.0 MHz clock, 2687.4 MB memory, capability 2.0
device 1: Tesla S2050, 1147.0 MHz clock, 2687.4 MB memory, capability 2.0
Usage: ./testing_cgesv [options] [-h|--help]

ngpu 1
    N  NRHS   CPU Gflop/s (sec)   GPU GFlop/s (sec)   ||B - AX|| / N*||A||*||X||
================================================================================
18000     1     ---   (  ---  )    628.01 (  24.77)   1.37e-10  ok
18050     1     ---   (  ---  )    624.38 (  25.12)   1.39e-10  ok
18100     1     ---   (  ---  )    626.27 (  25.25)   1.00e-10  ok
18150     1     ---   (  ---  )    625.42 (  25.50)   1.27e-10  ok
18200     1     ---   (  ---  )    627.22 (  25.63)   1.25e-10  ok
18250     1     ---   (  ---  )    626.60 (  25.87)   1.31e-10  ok
18300     1     ---   (  ---  )    628.40 (  26.01)   1.09e-10  ok
18350     1     ---   (  ---  )    627.29 (  26.27)   1.04e-10  ok
18400     1     ---   (  ---  )    638.88 (  26.01)   1.24e-10  ok
18450     1     ---   (  ---  )    552.28 (  30.33)   1.99e-10  ok *
18500     1     ---   (  ---  )    552.23 (  30.58)   2.96e-10  ok *
18550     1     ---   (  ---  )    588.87 (  28.91)   2.17e-10  ok *
18600     1     ---   (  ---  )    588.29 (  29.17)   1.85e-10  ok *
18650     1     ---   (  ---  )    590.35 (  29.31)   2.32e-10  ok *
18700     1     ---   (  ---  )    590.41 (  29.54)   2.67e-10  ok *
18750     1     ---   (  ---  )    591.47 (  29.72)   2.11e-10  ok *
18800     1     ---   (  ---  )    590.90 (  29.99)   2.36e-10  ok *
18850     1     ---   (  ---  )    590.42 (  30.26)   2.23e-10  ok *
18900     1     ---   (  ---  )    591.90 (  30.42)   1.94e-10  ok *
18950     1     ---   (  ---  )    589.53 (  30.79)   2.28e-10  ok *
19000     1     ---   (  ---  )    593.25 (  30.84)   2.19e-10  ok *

mh1 · Post by **mh1** » Tue May 26, 2015 4:48 pm

I have re-visited many of the issues I found in the original post. MAGMA 1.6.2 appears to have stabilized all the accuracy problems last noted. I am seeing very good performance and accuracy on typical CEM dense linear systems. Very impressed with newer version of MAGMA.

MAGMA Forum

magma_cgesv stability

Re: magma_cgesv stability

Re: magma_cgesv stability

Re: magma_cgesv stability

Re: magma_cgesv stability

Re: magma_cgesv stability

Re: magma_cgesv stability

Re: magma_cgesv stability