Problem about magma_zgeqrf_gpu (A possible bug)

boruoshihao · Post by **boruoshihao** » Thu May 21, 2015 12:24 pm

Hi guys,
I'm trying to do a QR decomposition for a two dimensional matrix a, I also need the determinant of the R matrix.
In Lapack, it can be done by:
1. Call zgeqrf ( m , n , a , lda , tau , work , lwork , info )
2. Determinant of R can be calculated by: det=1.0; for(size_t i=0; i<m; i++) det*=a(i,i);
3. Get the Q matrix by: call zungqr ( m , n , k , a , lda , tau , work , lwork , info )

In Magma, I have a few choices:
A. Use CPU interface:
1. Call magma_zgeqrf(m, n, a, lda, tau, work, lwork, info);
2. Get the determinant R: det=1.0; for(size_t i=0; i<m; i++) det*=a(i,i);
3. Get the Q matrix by magma_zungqr2( m , n , k , a , lda , tau , info )

B. Use GPU interface
0. Pass a to GPU memory da
1. Call magma_zgeqrf_gpu(m,n,da,lda,tau,dT,&info);
2. Pass da to CPU a
3. Get the determinant R: det=1.0; for(size_t i=0; i<m; i++) det*=a(i,i);
4. Get the Q matrix by: call magma_zungqr_gpu(m,n,n,da,lda,tau,dT,nb,&info);
5. Pass da to cpu a

Both A and B methods, gives same Q compare with original lapack.
While the determinant of R is different:
In small lattice size, A and B has the same determinant with original lapack.
In large lattice size, only A has the same determinant with original lapack, B has a totally different result.

Please see the following results (the matrix a is random generated.)

m , n, det form original lapack, det from A, det from B
8 8 29.1222 29.1222 29.1222
16 16 99287.8 99287.8 99287.8
32 32 2.92035e+14 2.92035e+14 2.92035e+14
64 64 2.81437e+38 2.81437e+38 2.53932e+14
128 128 1.23295e+96 1.23295e+96 7.51771e+14

When m>64, we start to see difference. I guess it might be a bug, has anyone looked into it before? I'm using cuda/5.5 and magma/1.6.1 built on acml.

Thanks,

Hao

mgates3 · Post by **mgates3** » Thu May 21, 2015 8:25 pm

For the GPU interface, use magma_zgeqrf2_gpu, which has LAPACK complaint arguments.

magma_zgeqrf3_gpu stores diagonal blocks of R separately in dT.
magma_zgeqrf_gpu also inverts the diagonal blocks of R in dT, for faster application when solving a linear system.

-mark

boruoshihao · Post by **boruoshihao** » Thu May 21, 2015 10:06 pm

Thank you Mark,
I guess I did not make my question clear enough before. I want A=Q*R, where Q is orthogonal and normalized, then mutiply the diagonal element of R will be the det(R).
If I use magma_zgeqrf2_gpu, which zungqr should be called to get final Q? magma_zungqr2 and magma_zungqr is cpu interface, magma_zungqr_gpu ask for dT, which is not in magma_zgeqrf2_gpu.

I just tried to get diagonal blocks of R from dT in both magma_zgeqrf3_gpu and magma_zgeqrf_gpu. While the det(R) is not correct, it looks like the normalization information is stored in A, not in R after we call zgeqrf.

Hao

mgates3 · Post by **mgates3** » Fri May 22, 2015 3:25 pm

I'm not sure what you mean by normalized. Q is orthogonal, meaning Q^T * Q = I. An "orthogonal" matrix has orthonormal columns. (I don't know why it's not called an orthonormal matrix, but orthogonal is the standard terminology. See
http://en.wikipedia.org/wiki/Orthogonal_matrix)

From magma_zgeqrf2_gpu, if you only need R, then it is immediately available as the upper triangle of the matrix. There is no need to call zungqr. If you need Q, unfortunately there does not appear to be a corresponding zungqr2_gpu routine. I'll do some investigation to see if there is an alternative. You should be able to copy the dA matrix to the CPU and use zungqr2.

From magma_zgeqrf_gpu, the diagonal blocks of R are inverted in dT, so it requires extra work to get the original (non-inverted) diagonal of R.

From magma_zgeqrf3_gpu, it should be possible to get the original R diagonal blocks, but you would have to look in detail about how they are stored. I would not recommend this path.

-mark

mgates3 · Post by **mgates3** » Fri May 22, 2015 3:57 pm

Also, why are you computing the determinant? It's useful for theory, but not so often in numerical computations.
-mark

boruoshihao · Post by **boruoshihao** » Fri May 22, 2015 10:37 pm

Dear Mark,
Thank you for the explanation. In my simulation, a Matrix A is a physical wave function. When we apply QR to A, Q is a normalized wave function, the Determinant of R is just the norm of the original wave function. This why I need the Determinant of R.
I can always use CPU interface, it works fine. However it's better to use GPU interface, since we need to call both zgeqrf and zungqr for a complete QR, use CPU interface will cause additional pass between GPU and CPU. (My main code is wrote in CPU, only call magma for QR.)

1. For magma_zgeqrf2_gpu, if I need to copy dA matrix to the CPU and use zungqr2, I would rather to use magma_zgeqrf and magma_zungqr2, since the cost will be same for my code.
2. For magma_zgeqrf_gpu, I have tried to transfer dT to CPU. In the documentation, I can find: " It starts with MIN(M,N)*NB block that store the triangular T matrices, followed by the MIN(M,N)*NB block of the diagonal inverses for the R matrix. " Then I can calculate the det ( Inverse R) from the second MIN(M,N)*NB.
Suppose N<M, we pass dT to T in CPU, T is a one dimensional array has the same size with dT, determinant of R can be calculated:
det=1.0;
for(int i=0;i<N;i++)
{
det*=T(i+i*NB+NB*N);
}
det=1.0/det;
However the det is not correct compared with the one come from zgeqrf.

3. For magma_zgeqrf3_gpu, I tried the to calculate determinant of R similar to magma_zgeqrf_gpu, since dT store the original R: " It starts with MIN(M,N)*NB block that store the triangular T matrices, followed by the MIN(M,N)*NB block of the diagonal matrices for the R matrix. The rest of the array is used as workspace."
I just pass dT to T in cpu, T is a one dimensional array has the same size with dT, then calculate determinant from:
det=1.0;
for(int i=0;i<N;i++)
{
det*=T(i+i*NB+NB*N);
}
det is not correct neither.

Hope I did not bother you too much, the CPU interface works fine for me. I'm just curious about the GPU interface, it looks like the det is not correct in GPU interface. Thanks~~

Hao

haidar · Post by **haidar** » Tue May 26, 2015 4:42 pm

Hi Hao,
The R and T matrix used in output of either qrf routine are special.
The R is not the whole R, it contains only N/NB diagonal block of size NBxNB each.
When is it inverted it is because it simplify the solve when it need to be solve for example Ax =b ==> QRx=b ==> Rx = Q'b
When it is stored, it is because it original position in A is zeroed to accelerate the application or generation of Q bu unmqr/ungqr.

So I think that, in order to compute the determinant you have to use the upper portion of A combined with the block diagonal.

Thanks
Azzam

MAGMA Forum

Problem about magma_zgeqrf_gpu (A possible bug)

Problem about magma_zgeqrf_gpu (A possible bug)

Re: Problem about magma_zgeqrf_gpu (A possible bug)

Re: Problem about magma_zgeqrf_gpu (A possible bug)

Re: Problem about magma_zgeqrf_gpu (A possible bug)

Re: Problem about magma_zgeqrf_gpu (A possible bug)

Re: Problem about magma_zgeqrf_gpu (A possible bug)

Re: Problem about magma_zgeqrf_gpu (A possible bug)