Problem about magma_zgeqrf_gpu (A possible bug)
Posted: Thu May 21, 2015 12:24 pm
Hi guys,
I'm trying to do a QR decomposition for a two dimensional matrix a, I also need the determinant of the R matrix.
In Lapack, it can be done by:
1. Call zgeqrf ( m , n , a , lda , tau , work , lwork , info )
2. Determinant of R can be calculated by: det=1.0; for(size_t i=0; i<m; i++) det*=a(i,i);
3. Get the Q matrix by: call zungqr ( m , n , k , a , lda , tau , work , lwork , info )
In Magma, I have a few choices:
A. Use CPU interface:
1. Call magma_zgeqrf(m, n, a, lda, tau, work, lwork, info);
2. Get the determinant R: det=1.0; for(size_t i=0; i<m; i++) det*=a(i,i);
3. Get the Q matrix by magma_zungqr2( m , n , k , a , lda , tau , info )
B. Use GPU interface
0. Pass a to GPU memory da
1. Call magma_zgeqrf_gpu(m,n,da,lda,tau,dT,&info);
2. Pass da to CPU a
3. Get the determinant R: det=1.0; for(size_t i=0; i<m; i++) det*=a(i,i);
4. Get the Q matrix by: call magma_zungqr_gpu(m,n,n,da,lda,tau,dT,nb,&info);
5. Pass da to cpu a
Both A and B methods, gives same Q compare with original lapack.
While the determinant of R is different:
In small lattice size, A and B has the same determinant with original lapack.
In large lattice size, only A has the same determinant with original lapack, B has a totally different result.
Please see the following results (the matrix a is random generated.)
m , n, det form original lapack, det from A, det from B
8 8 29.1222 29.1222 29.1222
16 16 99287.8 99287.8 99287.8
32 32 2.92035e+14 2.92035e+14 2.92035e+14
64 64 2.81437e+38 2.81437e+38 2.53932e+14
128 128 1.23295e+96 1.23295e+96 7.51771e+14
When m>64, we start to see difference. I guess it might be a bug, has anyone looked into it before? I'm using cuda/5.5 and magma/1.6.1 built on acml.
Thanks,
Hao
I'm trying to do a QR decomposition for a two dimensional matrix a, I also need the determinant of the R matrix.
In Lapack, it can be done by:
1. Call zgeqrf ( m , n , a , lda , tau , work , lwork , info )
2. Determinant of R can be calculated by: det=1.0; for(size_t i=0; i<m; i++) det*=a(i,i);
3. Get the Q matrix by: call zungqr ( m , n , k , a , lda , tau , work , lwork , info )
In Magma, I have a few choices:
A. Use CPU interface:
1. Call magma_zgeqrf(m, n, a, lda, tau, work, lwork, info);
2. Get the determinant R: det=1.0; for(size_t i=0; i<m; i++) det*=a(i,i);
3. Get the Q matrix by magma_zungqr2( m , n , k , a , lda , tau , info )
B. Use GPU interface
0. Pass a to GPU memory da
1. Call magma_zgeqrf_gpu(m,n,da,lda,tau,dT,&info);
2. Pass da to CPU a
3. Get the determinant R: det=1.0; for(size_t i=0; i<m; i++) det*=a(i,i);
4. Get the Q matrix by: call magma_zungqr_gpu(m,n,n,da,lda,tau,dT,nb,&info);
5. Pass da to cpu a
Both A and B methods, gives same Q compare with original lapack.
While the determinant of R is different:
In small lattice size, A and B has the same determinant with original lapack.
In large lattice size, only A has the same determinant with original lapack, B has a totally different result.
Please see the following results (the matrix a is random generated.)
m , n, det form original lapack, det from A, det from B
8 8 29.1222 29.1222 29.1222
16 16 99287.8 99287.8 99287.8
32 32 2.92035e+14 2.92035e+14 2.92035e+14
64 64 2.81437e+38 2.81437e+38 2.53932e+14
128 128 1.23295e+96 1.23295e+96 7.51771e+14
When m>64, we start to see difference. I guess it might be a bug, has anyone looked into it before? I'm using cuda/5.5 and magma/1.6.1 built on acml.
Thanks,
Hao