I have recently made the move from MAGMA 0.2 to 1.0 RC3, and have noticed that in certain cases, the GPU LU decomposition (device interface) such as is provided by magma_cgetrf_gpu(), the device memory used has increased by about a factor 2.
Upon investigation of cgetrf_gpu.cpp I found the following (from line 138).
Code: Select all
if ((m == n) && (m % 32 == 0) && (ldda%32 == 0))
magmablas_cinplace_transpose( dAT, ldda, lddat );
else {
if ( CUBLAS_STATUS_SUCCESS != cublasAlloc(maxm*maxn, sizeof(cuFloatComplex), (void**)&dAT) ) {
cublasFree( dAP );
return MAGMA_ERR_CUBLASALLOC;
}
magmablas_ctranspose2( dAT, lddat, dA, ldda, m, n );
}
If this is not the case, I will have a look at what is required and submit a patch for square matrices that are not a multiple of 32.
Thanks, and keep up the good work.