Just as a reminder: I want to obtain both Q and R.
When I use magma_zgeqrf2_gpu, I have direct access to R, but there is no matching function to restore Q: magma_zungqr and magma_zungqr2 both require A to be in host memory and magma_zungqr_gpu requires the dT array which I don't get from magma_zgeqrf2_gpu.
When I use magma_zgeqrf3_gpu, I can use magma_zungqr_gpu to obtain Q and the code from testing_zgeqrf_gpu.cpp to restore R?
Just as a small side question: What are the computational complexities of *geqrf* and *ungqr*? Is the complexity of *ungqr* negligible in comparison to *geqrf* (and therefore the reason, why there is only a CPU-*ungqr* for magma_zgeqrf2_gpu)?
MAGMA + pycuda + my own CUDA kernels
Re: MAGMA + pycuda + my own CUDA kernels
With the currently available functions, you can use either
There's no particular reason that magma_zungqr2_gpu doesn't exist. We've just never needed it.
For a real, square matrix:
geqrf is 4/3 n^3 flops
ungqr is 4/3 n^3 flops
In complex, those get multiplied by about 4.
For rectangular matrices, it depends on what part of Q you want. LAPACK Working Note (LAWN) 41 has detailed flop counts for most of the routines (listed under the single-precision names: sgeqrf, sorgqr, etc.).
http://www.netlib.org/lapack/lawnspdf/lawn41.pdf
Often, you can use unmqr (multiply by Q) instead of ungqr (generate explicit Q), but not always.
-mark
- magma_zgeqrf2_gpu( dA ), copy dA to A on host, magma_zungqr2( A )
- magma_zgeqrf3_gpu( dA, dT ), copy dA to dQ, magma_zungqr_gpu( dQ, dT ), reconstruct R in dA using bits from dT
- magma_zgeqrf2_gpu( dA ), copy dA to wA on host, set dQ = identity on GPU [magmablas_zlaset( zero, one, dQ )], magma_zunmqr2_gpu( dA, dQ, wA )
There's no particular reason that magma_zungqr2_gpu doesn't exist. We've just never needed it.
For a real, square matrix:
geqrf is 4/3 n^3 flops
ungqr is 4/3 n^3 flops
In complex, those get multiplied by about 4.
For rectangular matrices, it depends on what part of Q you want. LAPACK Working Note (LAWN) 41 has detailed flop counts for most of the routines (listed under the single-precision names: sgeqrf, sorgqr, etc.).
http://www.netlib.org/lapack/lawnspdf/lawn41.pdf
Often, you can use unmqr (multiply by Q) instead of ungqr (generate explicit Q), but not always.
-mark