If you need just det(R), then use magma_zgeqrf2_gpu, which has LAPACK-complaint output.
If you also need to generate Q, unfortunately there is not currently a corresponding magma_zungqr2_gpu.
Using either magma_zgeqrf_gpu or magma_zgeqrf3_gpu, you can access the diagonal blocks of R in the dT workspace. I DO NOT recommend doing this, as it gets into what is an internal structure of MAGMA. But here it is.
There are nt = ceil( N / nb ) blocks. The first (nt - 1) diagonal nb x nb blocks of R are stored side-by-side in dT, starting at dT[ min_mn*nb ], each with a leading dimension of nb. In the testing_zgeqrf_gpu.cpp, you can see how we copy these back to d_A to reconstruct the complete R matrix. For magma_zgeqrf_gpu, these blocks are also inverted.
The last diagonal block is of size jb = min_mn % nb, or if that is zero, jb = nb. It is stored in the matrix dA, just as in LAPACK.
Code: Select all
// copy diagonal blocks of R back to A
for( int i=0; i < min_mn-nb; i += nb ) {
magma_int_t ib = min( min_mn-i, nb );
magmablas_zlacpy( MagmaUpper, ib, ib, &dT[min_mn*nb + i*nb], nb, &d_A[ i + i*ldda ], ldda, opts.queue );
}
Looking at the code you tried (in your previous post a year ago), you tried using a single loop i = 0 to N. Since the diagonal blocks are stored side-by-side, you actually need a double loop:
Code: Select all
for( int i=0; i < min_mn-nb; i += nb ) {
for( int j=0; j < nb; ++j ) {
det *= T[ j + (min_mn+i+j)*nb ];
}
}
and then you need to also get the contribution from the last jb-by-jb block, which is stored in dA, not in dT.
As an example, with NB = 4, here is what the output matrices look like:
dgeqrf_gpu (diagonal blocks of R^{-1} stored in T, except last diagonal block of R stored in A)
Code: Select all
A=[
1.00 0. 0. 0. -1.57 -1.14 -1.73 -1.36 -1.26 -1.96
0.31 1.00 0. 0. -0.08 0.36 0.57 -0.28 -0.14 0.28
0.03 -0.28 1.00 0. -0.14 0.04 0.02 -0.29 -0.32 -0.09
0.23 -0.46 -0.10 1.00 0.43 -0.03 0.44 0.62 0.18 0.44
0.15 -0.11 0.15 0.22 1.00 0. 0. 0. -0.10 -0.41
0.39 -0.28 -0.48 0.27 -0.09 1.00 0. 0. -0.01 0.17
0.48 0.42 0.14 0.44 -0.30 0.30 1.00 0. 0.21 -0.38
0.20 -0.21 0.70 0.33 -0.62 -0.26 -0.65 1.00 -0.39 0.22
0.37 0.54 -0.05 -0.56 -0.47 0.00 -0.40 0.02 -0.75 -0.04
0.40 -0.11 -0.20 0.20 -0.20 0.22 -0.07 0.23 0.09 0.26
];
T=[
1.06 -0.42 -0.04 -0.99 1.15 0.13 -0.52 0.87 0. 0. -0.51 -0.75 0.86 0.40 -1.12 0.95 -0.37 3.01 0. 0.
-0.90 1.06 0.23 0.88 -0.17 1.65 -0.95 -0.56 0. 0. 0. 1.13 -0.13 -0.89 0. -1.51 0.60 1.23 0. 0.
1.34 0. 1.10 -0.10 -0.16 0. 1.26 1.61 0. 0. 0. 0. -1.15 -0.61 0. 0. 1.20 -0.25 0. 0.
-2.19 0. 0. 1.13 -0.68 0. 0. 1.90 0. 0. 0. 0. 0. 0.88 0. 0. 0. -6.16 0. 0.
];
dgeqrf3_gpu (diagonal blocks of R stored in T, except last diagonal block of R stored in A)
Code: Select all
A=[
1.00 0. 0. 0. -1.57 -1.14 -1.73 -1.36 -1.26 -1.96
0.31 1.00 0. 0. -0.08 0.36 0.57 -0.28 -0.14 0.28
0.03 -0.28 1.00 0. -0.14 0.04 0.02 -0.29 -0.32 -0.09
0.23 -0.46 -0.10 1.00 0.43 -0.03 0.44 0.62 0.18 0.44
0.15 -0.11 0.15 0.22 1.00 0. 0. 0. -0.10 -0.41
0.39 -0.28 -0.48 0.27 -0.09 1.00 0. 0. -0.01 0.17
0.48 0.42 0.14 0.44 -0.30 0.30 1.00 0. 0.21 -0.38
0.20 -0.21 0.70 0.33 -0.62 -0.26 -0.65 1.00 -0.39 0.22
0.37 0.54 -0.05 -0.56 -0.47 0.00 -0.40 0.02 -0.75 -0.04
0.40 -0.11 -0.20 0.20 -0.20 0.22 -0.07 0.23 0.09 0.26
];
T=[
1.06 -0.42 -0.04 -0.99 1.15 0.13 -0.52 0.87 0. 0. -1.98 -1.31 -1.33 -1.34 -0.89 -0.56 0.00 -0.55 0. 0.
-0.90 1.06 0.23 0.88 -0.17 1.65 -0.95 -0.56 0. 0. 0. 0.89 -0.10 0.83 0. -0.66 0.33 -0.15 0. 0.
1.34 0. 1.10 -0.10 -0.16 0. 1.26 1.61 0. 0. 0. 0. -0.87 -0.60 0. 0. 0.83 -0.03 0. 0.
-2.19 0. 0. 1.13 -0.68 0. 0. 1.90 0. 0. 0. 0. 0. 1.14 0. 0. 0. -0.16 0. 0.
];
dgeqrf2_gpu (R stored in A)
Code: Select all
A=[
-1.98 -1.31 -1.33 -1.34 -1.57 -1.14 -1.73 -1.36 -1.26 -1.96
0.31 0.89 -0.10 0.83 -0.08 0.36 0.57 -0.28 -0.14 0.28
0.03 -0.28 -0.87 -0.60 -0.14 0.04 0.02 -0.29 -0.32 -0.09
0.23 -0.46 -0.10 1.14 0.43 -0.03 0.44 0.62 0.18 0.44
0.15 -0.11 0.15 0.22 -0.89 -0.56 0.00 -0.55 -0.10 -0.41
0.39 -0.28 -0.48 0.27 -0.09 -0.66 0.33 -0.15 -0.01 0.17
0.48 0.42 0.14 0.44 -0.30 0.30 0.83 -0.03 0.21 -0.38
0.20 -0.21 0.70 0.33 -0.62 -0.26 -0.65 -0.16 -0.39 0.22
0.37 0.54 -0.05 -0.56 -0.47 0.00 -0.40 0.02 -0.75 -0.04
0.40 -0.11 -0.20 0.20 -0.20 0.22 -0.07 0.23 0.09 0.26
];
Sorry these are a bit complicated. They were intended as internal structures, not really to be accessed by end users.
-mark