MAGMA Forum

Posted: **Thu Jan 06, 2011 6:31 pm**

Stan

I have run a test on zgeqrf and get the following strange results, the first I have seen with a z case.

I looked in zgeqrf.cpp and can see no define for a magma BLAS routine and therefore nothing to change to move to CUBLAS.

In this case the problem is repeatable, with a number of other tests running O.K. in between including sgeqrf, cgeqrf and dgeqrf.

Here is a set of the strange answers.

Code: Select all

fletcher@fletcher-desktop:~/magma_1.0.0-rc2/testing$ ./testing_zgeqrf
device 0: GeForce GTX 460, 1400.0 MHz clock, 2047.2 MB memory

Usage: 
  testing_zgeqrf -M 1024 -N 1024



  M     N   CPU GFlop/s   GPU GFlop/s    ||R||_F / ||A||_F
==========================================================
 1024  1024   25.10          45.22        2.868703e-15
 2048  2048   31.17          63.19        5.441290e-01
 3072  3072   32.28          67.20        5.947517e-01
 4032  4032   32.83          68.49        6.070624e-01
 5184  5184   32.60          69.45        6.263964e-01
 6016  6016   32.27          70.04        6.323491e-01
 7040  7040   31.64          70.40        6.328434e-01
 8064  8064   31.17          70.77        6.319259e-01
 9088  9088   31.02          71.15        6.393267e-01
 9984  9984   31.14          71.33        6.397711e-01

The cgeqrf answers give me the highest GPU values I have seen for my card, and also rather poor residual values about 1.e-6 compared to 1.e-9 for sgetrf. sgeqrf also is about 1.e-6. Is that to be expected for this algorithm?

Code: Select all

fletcher@fletcher-desktop:~/magma_1.0.0-rc2/testing$ ./testing_cgeqrf
device 0: GeForce GTX 460, 1400.0 MHz clock, 2047.2 MB memory

Usage: 
  testing_cgeqrf -M 1024 -N 1024



  M     N   CPU GFlop/s   GPU GFlop/s    ||R||_F / ||A||_F
==========================================================
 1024  1024   36.16         142.92        1.447583e-06
 2048  2048   58.61         191.37        1.843371e-06
 3072  3072   62.51         341.94        2.260692e-06
 4032  4032   63.71         370.86        2.584254e-06
 5184  5184   63.14         422.85        3.051479e-06
 6016  6016   62.09         427.17        3.216158e-06
 7040  7040   61.86         438.37        3.365463e-06
 8064  8064   61.17         440.61        3.442715e-06
 9088  9088   61.81         445.47        3.522330e-06
 9984  9984   61.52         450.82        3.602567e-06

Thanks for all your help.

John

Posted: **Thu Jan 06, 2011 6:58 pm**

It looks like MAGMA BLAS is not a problem than. You can check if it is the CPU LAPACK/BLAS by changing

Code: Select all

magma_zgetrf( M, N, h_R, lda, ipiv, &info);

to

Code: Select all

lapackf77_zgetrf(&M, &N, h_R, &lda, ipiv, &info);

in file testing_zgetrf.cpp. Do you get the error expected in this case?

In single complex you get the correct result. We just didn't scale uniformly the residuals. In the LU case we divide by N and the scaled residuals are of order e-9 in single and e-18 in double. The same would be the case for QR if we divided by N.

Posted: **Fri Jan 07, 2011 5:34 am**

Stan

I have done a further test on zgeqrf, printing out the norms of both the LAPACK(A) and MAGMA(B) arrays, the difference norm and the ratio. The results are rather strange. The A and B norms are similar but not identical, yet the difference is quite large, implying for me that the two matrices may be out of registration when the norm is done, or there is some other discrepancy. The only difference between my test code and yours is to save values of the norms, I will add the code below.

The first set of matrix values does not show the error.

zgeqrf_gpu shows a similar pattern of results. dgeqrf does not.

My tentative conclusion is that this is a different problem from the NaN problem.

Best wishes

John

Code: Select all

fletcher@fletcher-desktop:~/magma_1.0.0-rc2/testing$ ./testing_zgeqrf
device 0: GeForce GTX 460, 1400.0 MHz clock, 2047.2 MB memory

Usage: 
  testing_zgeqrf -M 1024 -N 1024



  M     N   CPU GFlop/s   GPU GFlop/s  ||A||_F(CPU)   ||B||_F(GPU)         ||R||_F       ||R||_F / ||A||_F
==========================================================================================================
 1024  1024   28.29        56.73     8.368268e+02      8.368268e+02       2.400607e-12    2.868703e-15
 2048  2048   30.83        63.69     1.680144e+03      1.672797e+03       9.142152e+02    5.441290e-01
 3072  3072   32.09        67.20     2.521304e+03      2.508772e+03       1.499550e+03    5.947517e-01
 4032  4032   32.73        68.29     3.304335e+03      3.292898e+03       2.005937e+03    6.070624e-01
 5184  5184   32.13        69.41     4.256429e+03      4.233308e+03       2.666212e+03    6.263964e-01
 6016  6016   31.61        69.92     4.929089e+03      4.912592e+03       3.116905e+03    6.323491e-01
 7040  7040   31.11        70.42     5.761416e+03      5.748411e+03       3.646074e+03    6.328434e-01
 8064  8064   30.94        70.73     6.593147e+03      6.584770e+03       4.166380e+03    6.319259e-01
 9088  9088   31.51        71.05     7.439901e+03      7.421128e+03       4.756528e+03    6.393267e-01
 9984  9984   31.16        71.32     8.162330e+03      8.152934e+03       5.222023e+03    6.397711e-01

Code: Select all

        matnorm = lapackf77_zlange("f", &M, &N, h_A, &lda, work);
        value = lapackf77_zlange("f", &M, &N, h_R, &lda, work);
        blasf77_zaxpy(&n2, &mzone, h_A, &ione, h_R, &ione);
        value2 = lapackf77_zlange("f", &M, &N, h_R, &lda, work);
        printf("%5d %5d  %6.2f       %6.2f     %e      %e       %e    %e\n",
               M, N, cpu_perf, gpu_perf, matnorm, value, value2, value2 / matnorm);

plus declaration of value and value2 and adjusting the heading.

Posted: **Fri Jan 07, 2011 7:00 pm**

Further testing shows me that testing_zgeqrf fails for any matrix size with N > 1040.

The pattern is this, for matrix size M by N

If N <= 1040 all is O.K.

If N > 1040 then the first 284*M entries in the matrix are O.K. This does not depend on the value of N.

I have also inspected the numbers in the array Tau returned by the LAPACK and MAGMA versions and they change at number 284 (zero based index).

I don't understand the significance of the number 284.

It is possible that the combination of the Tau vector with the matrix returned is in both cases a valid decomposition, but just not the same. The way to test that is to use it in zgeqrs to see if the same result is recovered, within numerical error.

Or it could be a bug, that one of the decompositions is wrong.

John

Posted: **Sun Jan 09, 2011 3:48 pm**

I have now done a further test with zgeqrs_gpu as follows.

If I am interpreting the values correctly, this implies a problem with the LAPACK on my computer, rather than MAGMA.

Is that correct?

John

Code: Select all

fletcher@fletcher-desktop:~/magma_1.0.0-rc2/testing$ ./testing_zgeqrs_gpu
device 0: GeForce GTX 460, 1400.0 MHz clock, 2047.2 MB memory

Usage: 
  testing_zgeqrs_gpu -nrhs 3  -M 1024  -N 1024


                                         ||b-Ax|| / (N||A||)
  M     N    CPU GFlop/s   GPU GFlop/s      CPU      GPU    
============================================================
 1024  1024     22.6         34.9       1.13e-18   3.00e-18
 2048  2048     27.6         62.4       3.18e-04   7.22e-18
 3072  3072     29.7         66.4       1.41e-04   2.18e-18
 4032  4032     30.9         67.8       1.68e-04   2.86e-18
 5184  5184     31.2         68.7       3.34e-04   1.61e-18
 6016  6016     30.7         70.0       1.58e-04   1.53e-18
 7040  7040     21.6         70.4       3.20e-04   8.06e-19
 8064  8064     30.2         70.8       1.20e-04   1.19e-18
 9088  9088     30.2         71.2       4.45e-04   4.60e-18
10112 10112     29.4         71.4       2.57e-04   2.38e-18

Posted: **Sun Jan 09, 2011 4:46 pm**

Now this is interesting. Indeed looks like there may be a problem with the LAPACK on your machine. It's still very weird to me though, as things sometimes produce what is expected and sometimes not. In your posts with program outputs I see you use more than one thread. Are the results consistent when you set the number of threads to one?

Posted: **Mon Jan 10, 2011 9:39 am**

Stan

I was thinking along the same lines. I will do some trials with GotoBLAS set to different numbers of threads and also the reference single threaded BLAS.

If you see this, would you post a copy of the results you get for zgeqrs_gpu on your system.

Thanks

John

Posted: **Mon Jan 10, 2011 3:48 pm**

I don't have access to our GTX480 system right now but if you are interested to just see the errors printed, here are the results on a C2070, using 6 MKL threads (on a dual socket Six-Core AMD Opteron 2435 host)

Code: Select all

[tomov@yona12 testing]$ ./testing_zgeqrs_gpu 
device 0: Tesla C2070, 1147.0 MHz clock, 5375.4 MB memory
device 1: Tesla C2070, 1147.0 MHz clock, 5375.4 MB memory

Usage: 
  testing_zgeqrs_gpu -nrhs 3  -M 1024  -N 1024

                                         ||b-Ax|| / (N||A||)
  M     N    CPU GFlop/s   GPU GFlop/s      CPU      GPU    
============================================================
 1024  1024     18.8         34.5       2.10e-18   3.16e-18
 2048  2048     20.2         52.3       4.19e-18   6.75e-18
 3072  3072     30.0        166.0       1.69e-18   3.12e-18
 4032  4032     35.9        211.5       1.90e-18   2.55e-18
 5184  5184     37.5        232.5       1.18e-18   2.00e-18
 6016  6016     39.2        241.7       1.63e-18   1.88e-18
 7040  7040     40.3        252.1       9.27e-19   1.00e-18
 8064  8064     41.3        257.1       6.34e-19   9.56e-19
...

Setting the number of threads to one I get

Code: Select all

                                         ||b-Ax|| / (N||A||)
  M     N    CPU GFlop/s   GPU GFlop/s      CPU      GPU    
============================================================
 1024  1024      4.9         34.1       2.13e-18   3.05e-18
 2048  2048      4.0         25.2       4.66e-18   6.80e-18
 3072  3072      6.0         78.9       1.51e-18   3.08e-18
 4032  4032      7.2        125.9       1.68e-18   2.54e-18
 5184  5184      7.4        163.1       1.20e-18   2.04e-18
 6016  6016      7.4        188.8       1.20e-18   1.86e-18
 7040  7040      7.5        210.4       7.38e-19   1.00e-18
 8064  8064      7.6        227.7       6.04e-19   9.58e-19
 ...

Posted: **Mon Jan 10, 2011 7:58 pm**

Stan

I have tried with GotoBLAS with single threads, which still gave errors, and then with the reference blas on my computer, which gives none of these errors. It looks as though there is something wrong with my installation of GotoBLAS, where I accepted the default analysis which it did. I need to take that problem away from you, as it is nothing to do with MAGMA.

If you have any thoughts on where I should go with that I would appreciate that.

I am not sure yet where I am with the other problem with the NaN's. I will check that out. My system is in an odd state at the moment as I commented out magma_blas in some routines.

Thank you for all your help with this. I am very keen to make use of MAGMA and clearly a multithreaded BLAS is an important tool to help MAGMA work well.

Thanks again

John

MAGMA Forum

Error testing zgeqrf

Error testing zgeqrf

Re: Error testing zgeqrf

Re: Error testing zgeqrf

Re: Error testing zgeqrf

Re: Error testing zgeqrf

Re: Error testing zgeqrf

Re: Error testing zgeqrf

Re: Error testing zgeqrf

Re: Error testing zgeqrf