Page 1 of 1
Error testing zgeqrf
Posted: Thu Jan 06, 2011 6:31 pm
by fletchjp
Stan
I have run a test on zgeqrf and get the following strange results, the first I have seen with a z case.
I looked in zgeqrf.cpp and can see no define for a magma BLAS routine and therefore nothing to change to move to CUBLAS.
In this case the problem is repeatable, with a number of other tests running O.K. in between including sgeqrf, cgeqrf and dgeqrf.
Here is a set of the strange answers.
Code: Select all
fletcher@fletcher-desktop:~/magma_1.0.0-rc2/testing$ ./testing_zgeqrf
device 0: GeForce GTX 460, 1400.0 MHz clock, 2047.2 MB memory
Usage:
testing_zgeqrf -M 1024 -N 1024
M N CPU GFlop/s GPU GFlop/s ||R||_F / ||A||_F
==========================================================
1024 1024 25.10 45.22 2.868703e-15
2048 2048 31.17 63.19 5.441290e-01
3072 3072 32.28 67.20 5.947517e-01
4032 4032 32.83 68.49 6.070624e-01
5184 5184 32.60 69.45 6.263964e-01
6016 6016 32.27 70.04 6.323491e-01
7040 7040 31.64 70.40 6.328434e-01
8064 8064 31.17 70.77 6.319259e-01
9088 9088 31.02 71.15 6.393267e-01
9984 9984 31.14 71.33 6.397711e-01
The cgeqrf answers give me the highest GPU values I have seen for my card, and also rather poor residual values about 1.e-6 compared to 1.e-9 for sgetrf. sgeqrf also is about 1.e-6. Is that to be expected for this algorithm?
Code: Select all
fletcher@fletcher-desktop:~/magma_1.0.0-rc2/testing$ ./testing_cgeqrf
device 0: GeForce GTX 460, 1400.0 MHz clock, 2047.2 MB memory
Usage:
testing_cgeqrf -M 1024 -N 1024
M N CPU GFlop/s GPU GFlop/s ||R||_F / ||A||_F
==========================================================
1024 1024 36.16 142.92 1.447583e-06
2048 2048 58.61 191.37 1.843371e-06
3072 3072 62.51 341.94 2.260692e-06
4032 4032 63.71 370.86 2.584254e-06
5184 5184 63.14 422.85 3.051479e-06
6016 6016 62.09 427.17 3.216158e-06
7040 7040 61.86 438.37 3.365463e-06
8064 8064 61.17 440.61 3.442715e-06
9088 9088 61.81 445.47 3.522330e-06
9984 9984 61.52 450.82 3.602567e-06
Thanks for all your help.
John
Re: Error testing zgeqrf
Posted: Thu Jan 06, 2011 6:58 pm
by Stan Tomov
It looks like MAGMA BLAS is not a problem than. You can check if it is the CPU LAPACK/BLAS by changing
Code: Select all
magma_zgetrf( M, N, h_R, lda, ipiv, &info);
to
Code: Select all
lapackf77_zgetrf(&M, &N, h_R, &lda, ipiv, &info);
in file
testing_zgetrf.cpp. Do you get the error expected in this case?
In single complex you get the correct result. We just didn't scale uniformly the residuals. In the LU case we divide by N and the scaled residuals are of order e-9 in single and e-18 in double. The same would be the case for QR if we divided by N.
Re: Error testing zgeqrf
Posted: Fri Jan 07, 2011 5:34 am
by fletchjp
Stan
I have done a further test on zgeqrf, printing out the norms of both the LAPACK(A) and MAGMA(B) arrays, the difference norm and the ratio. The results are rather strange. The A and B norms are similar but not identical, yet the difference is quite large, implying for me that the two matrices may be out of registration when the norm is done, or there is some other discrepancy. The only difference between my test code and yours is to save values of the norms, I will add the code below.
The first set of matrix values does not show the error.
zgeqrf_gpu shows a similar pattern of results. dgeqrf does not.
My tentative conclusion is that this is a different problem from the NaN problem.
Best wishes
John
Code: Select all
fletcher@fletcher-desktop:~/magma_1.0.0-rc2/testing$ ./testing_zgeqrf
device 0: GeForce GTX 460, 1400.0 MHz clock, 2047.2 MB memory
Usage:
testing_zgeqrf -M 1024 -N 1024
M N CPU GFlop/s GPU GFlop/s ||A||_F(CPU) ||B||_F(GPU) ||R||_F ||R||_F / ||A||_F
==========================================================================================================
1024 1024 28.29 56.73 8.368268e+02 8.368268e+02 2.400607e-12 2.868703e-15
2048 2048 30.83 63.69 1.680144e+03 1.672797e+03 9.142152e+02 5.441290e-01
3072 3072 32.09 67.20 2.521304e+03 2.508772e+03 1.499550e+03 5.947517e-01
4032 4032 32.73 68.29 3.304335e+03 3.292898e+03 2.005937e+03 6.070624e-01
5184 5184 32.13 69.41 4.256429e+03 4.233308e+03 2.666212e+03 6.263964e-01
6016 6016 31.61 69.92 4.929089e+03 4.912592e+03 3.116905e+03 6.323491e-01
7040 7040 31.11 70.42 5.761416e+03 5.748411e+03 3.646074e+03 6.328434e-01
8064 8064 30.94 70.73 6.593147e+03 6.584770e+03 4.166380e+03 6.319259e-01
9088 9088 31.51 71.05 7.439901e+03 7.421128e+03 4.756528e+03 6.393267e-01
9984 9984 31.16 71.32 8.162330e+03 8.152934e+03 5.222023e+03 6.397711e-01
Code: Select all
matnorm = lapackf77_zlange("f", &M, &N, h_A, &lda, work);
value = lapackf77_zlange("f", &M, &N, h_R, &lda, work);
blasf77_zaxpy(&n2, &mzone, h_A, &ione, h_R, &ione);
value2 = lapackf77_zlange("f", &M, &N, h_R, &lda, work);
printf("%5d %5d %6.2f %6.2f %e %e %e %e\n",
M, N, cpu_perf, gpu_perf, matnorm, value, value2, value2 / matnorm);
plus declaration of value and value2 and adjusting the heading.
Re: Error testing zgeqrf
Posted: Fri Jan 07, 2011 7:00 pm
by fletchjp
Further testing shows me that testing_zgeqrf fails for any matrix size with N > 1040.
The pattern is this, for matrix size M by N
If N <= 1040 all is O.K.
If N > 1040 then the first 284*M entries in the matrix are O.K. This does not depend on the value of N.
I have also inspected the numbers in the array Tau returned by the LAPACK and MAGMA versions and they change at number 284 (zero based index).
I don't understand the significance of the number 284.
It is possible that the combination of the Tau vector with the matrix returned is in both cases a valid decomposition, but just not the same. The way to test that is to use it in zgeqrs to see if the same result is recovered, within numerical error.
Or it could be a bug, that one of the decompositions is wrong.
John
Re: Error testing zgeqrf
Posted: Sun Jan 09, 2011 3:48 pm
by fletchjp
I have now done a further test with zgeqrs_gpu as follows.
If I am interpreting the values correctly, this implies a problem with the LAPACK on my computer, rather than MAGMA.
Is that correct?
John
Code: Select all
fletcher@fletcher-desktop:~/magma_1.0.0-rc2/testing$ ./testing_zgeqrs_gpu
device 0: GeForce GTX 460, 1400.0 MHz clock, 2047.2 MB memory
Usage:
testing_zgeqrs_gpu -nrhs 3 -M 1024 -N 1024
||b-Ax|| / (N||A||)
M N CPU GFlop/s GPU GFlop/s CPU GPU
============================================================
1024 1024 22.6 34.9 1.13e-18 3.00e-18
2048 2048 27.6 62.4 3.18e-04 7.22e-18
3072 3072 29.7 66.4 1.41e-04 2.18e-18
4032 4032 30.9 67.8 1.68e-04 2.86e-18
5184 5184 31.2 68.7 3.34e-04 1.61e-18
6016 6016 30.7 70.0 1.58e-04 1.53e-18
7040 7040 21.6 70.4 3.20e-04 8.06e-19
8064 8064 30.2 70.8 1.20e-04 1.19e-18
9088 9088 30.2 71.2 4.45e-04 4.60e-18
10112 10112 29.4 71.4 2.57e-04 2.38e-18
Re: Error testing zgeqrf
Posted: Sun Jan 09, 2011 4:46 pm
by Stan Tomov
Now this is interesting. Indeed looks like there may be a problem with the LAPACK on your machine. It's still very weird to me though, as things sometimes produce what is expected and sometimes not. In your posts with program outputs I see you use more than one thread. Are the results consistent when you set the number of threads to one?
Re: Error testing zgeqrf
Posted: Mon Jan 10, 2011 9:39 am
by fletchjp
Stan
I was thinking along the same lines. I will do some trials with GotoBLAS set to different numbers of threads and also the reference single threaded BLAS.
If you see this, would you post a copy of the results you get for zgeqrs_gpu on your system.
Thanks
John
Re: Error testing zgeqrf
Posted: Mon Jan 10, 2011 3:48 pm
by Stan Tomov
I don't have access to our GTX480 system right now but if you are interested to just see the errors printed, here are the results on a C2070, using 6 MKL threads (on a dual socket Six-Core AMD Opteron 2435 host)
Code: Select all
[tomov@yona12 testing]$ ./testing_zgeqrs_gpu
device 0: Tesla C2070, 1147.0 MHz clock, 5375.4 MB memory
device 1: Tesla C2070, 1147.0 MHz clock, 5375.4 MB memory
Usage:
testing_zgeqrs_gpu -nrhs 3 -M 1024 -N 1024
||b-Ax|| / (N||A||)
M N CPU GFlop/s GPU GFlop/s CPU GPU
============================================================
1024 1024 18.8 34.5 2.10e-18 3.16e-18
2048 2048 20.2 52.3 4.19e-18 6.75e-18
3072 3072 30.0 166.0 1.69e-18 3.12e-18
4032 4032 35.9 211.5 1.90e-18 2.55e-18
5184 5184 37.5 232.5 1.18e-18 2.00e-18
6016 6016 39.2 241.7 1.63e-18 1.88e-18
7040 7040 40.3 252.1 9.27e-19 1.00e-18
8064 8064 41.3 257.1 6.34e-19 9.56e-19
...
Setting the number of threads to one I get
Code: Select all
||b-Ax|| / (N||A||)
M N CPU GFlop/s GPU GFlop/s CPU GPU
============================================================
1024 1024 4.9 34.1 2.13e-18 3.05e-18
2048 2048 4.0 25.2 4.66e-18 6.80e-18
3072 3072 6.0 78.9 1.51e-18 3.08e-18
4032 4032 7.2 125.9 1.68e-18 2.54e-18
5184 5184 7.4 163.1 1.20e-18 2.04e-18
6016 6016 7.4 188.8 1.20e-18 1.86e-18
7040 7040 7.5 210.4 7.38e-19 1.00e-18
8064 8064 7.6 227.7 6.04e-19 9.58e-19
...
Re: Error testing zgeqrf
Posted: Mon Jan 10, 2011 7:58 pm
by fletchjp
Stan
I have tried with GotoBLAS with single threads, which still gave errors, and then with the reference blas on my computer, which gives none of these errors. It looks as though there is something wrong with my installation of GotoBLAS, where I accepted the default analysis which it did. I need to take that problem away from you, as it is nothing to do with MAGMA.
If you have any thoughts on where I should go with that I would appreciate that.
I am not sure yet where I am with the other problem with the NaN's. I will check that out. My system is in an odd state at the moment as I commented out magma_blas in some routines.
Thank you for all your help with this. I am very keen to make use of MAGMA and clearly a multithreaded BLAS is an important tool to help MAGMA work well.
Thanks again
John