Unable to run zgetri_gpu

hsahasra · Post by **hsahasra** » Tue Apr 15, 2014 12:59 am

Hi,

I'm having some weird problems using zgetrf+zgetri while inverting a matrix. I'm doing exactly what is being done in the testing_zgetri_gpu example. I get the same error if I use magma_zgesv_gpu with an identity RHS. I'm able to run my code and get correct answers for small (80x80) matrices. But when I try to run bigger matrices, I get the following errors repeated many times:

Code: Select all

CUDA runtime error: unspecified launch failure (4) in magma_zgetrf_gpu at zgetrf_gpu.cpp:170
CUBLAS error: memory mapping error (11) in magma_zgetrf_gpu at zgetrf_gpu.cpp:172
CUDA runtime error: unspecified launch failure (4) in magma_zgetrf_gpu at zgetrf_gpu.cpp:188
CUBLAS error: memory mapping error (11) in magma_zgetrf_gpu at zgetrf_gpu.cpp:195
CUDA runtime error: unspecified launch failure (4) in magma_zgetrf_gpu at zgetrf_gpu.cpp:199
CUDA runtime error: unspecified launch failure (4) in magma_zgetrf_gpu at zgetrf_gpu.cpp:170
CUBLAS error: memory mapping error (11) in magma_zgetrf_gpu at zgetrf_gpu.cpp:172
CUDA runtime error: unspecified launch failure (4) in magma_zgetrf_gpu at zgetrf_gpu.cpp:188
CUBLAS error: memory mapping error (11) in magma_zgetrf_gpu at zgetrf_gpu.cpp:195
CUDA runtime error: unspecified launch failure (4) in magma_zgetrf_gpu at zgetrf_gpu.cpp:199
CUDA runtime error: unspecified launch failure (4) in magma_zgetrf_gpu at zgetrf_gpu.cpp:170
CUBLAS error: memory mapping error (11) in magma_zgetrf_gpu at zgetrf_gpu.cpp:172
CUDA runtime error: unspecified launch failure (4) in magma_zgetrf_gpu at zgetrf_gpu.cpp:188
CUBLAS error: memory mapping error (11) in magma_zgetrf_gpu at zgetrf_gpu.cpp:195
CUDA runtime error: unspecified launch failure (4) in magma_zgetrf_gpu at zgetrf_gpu.cpp:199
CUDA runtime error: unspecified launch failure (4) in magma_zgetrf_gpu at zgetrf_gpu.cpp:170

When I run cuda-gdb with memcheck, I get Lane Illegal Address exception, even for small matrices.

Code: Select all

[Launch of CUDA Kernel 257 (ztranspose3_32<<<(2,3,1),(16,8,1)>>>) on Device 0]
Memcheck detected an illegal access to address (@global)0x1300952700

Program received signal CUDA_EXCEPTION_1, Lane Illegal Address.
[Switching focus to CUDA kernel 257, grid 258, block (1,2,0), thread (0,6,0), device 0, sm 9, warp 0, lane 0]
0x0000000000dc6220 in ztranspose3_32(int, int, magmaDoubleComplex * @generic, int, const magmaDoubleComplex * @generic, int, int, int)<<<(2,3,1),(16,8,1)>>> (m32=0, n32=16, 
    __val_paramB=0x1300914000, ldb=96, __val_paramA=0x1300932000, lda=96, m=64, n=80) at ztranspose-v2.cu:86
86	    sA[iny+16][inx] = A[16*lda];

I'm running MAGMA 1.4.1 with MKL and icc. Following is my MAGMA make.inc file:

Code: Select all

GPU_TARGET ?= Kepler

CC        = icc
NVCC      = nvcc
FORT      = ifort

ARCH      = ar
ARCHFLAGS = cr
RANLIB    = ranlib

OPTS      = -O0 -g -DADD_ -Wall -openmp -DMAGMA_WITH_MKL -DMAGMA_SETAFFINITY -fPIC
F77OPTS   = -O0 -g -DADD_ -warn all -fPIC
FOPTS     = -O0 -g -DADD_ -warn all -fPIC
NVOPTS    = -O0 -g -G -DADD_ -Xcompiler "-fno-strict-aliasing -fPIC"
LDOPTS    = -openmp

# old MKL
#LIB       = -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_lapack -lmkl_core -lguide -lpthread -lcublas -lcudart -lstdc++ -lm

# see MKL Link Advisor at http://software.intel.com/sites/products/mkl/
# icc with MKL 10.3
#LIB       = -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lpthread -lcublas -lcudart -lstdc++ -lm
LIB        = -lmkl_intel_lp64 -lmkl_intel_thread -lpthread -lmkl_core -lcublas -lcudart -lstdc++ -lm -lifcore

# define library directories preferably in your environment, or here.
# for MKL run, e.g.: source /opt/intel/composerxe/mkl/bin/mklvars.sh intel64
MKLROOT ?= $(MKL_HOME)
CUDADIR ?= $(CUDA_ROOT)
-include make.check-mkl
-include make.check-cuda

LIBDIR    = -L$(MKLROOT)/lib/intel64 \
            -L$(CUDADIR)/lib64

INC       = -I$(CUDADIR)/include -I$(MKLROOT)/include

Harshad

hsahasra · Post by **hsahasra** » Tue Apr 15, 2014 1:18 am

The only difference in my code is that I'm using cudaMalloc instead of magma_malloc. Should it matter?

mgates3 · Post by **mgates3** » Tue Apr 15, 2014 11:42 am

No, cudaMalloc is fine, assuming you remember to multiply by sizeof(magmaDoubleComplex). magma_zmalloc provides some extra type safety, avoiding the (void*) cast, and handles the sizeof() part for you.

BTW, it is generally not advised to invert a matrix. It's faster and more accurate to solve Ax=b using gesv, than to invert and multiply x=A^{-1}*b.

-mark

hsahasra · Post by **hsahasra** » Tue Apr 15, 2014 1:02 pm

Hi Mark,

Yes, I'm multiplying by sizeof(cuDoubleComplex). I tried to use zgesv, but I get the same error. How do you usually set the rhs for zgesv while inverting a matrix?

I don't understand why the code works for 80x80 matrices and fails for larger matrices. Here's what I currently do:

Code: Select all

  //Memory allocation
  int ldd = ((number_of_rows+31)/32)*32;
  gpuErrchk( cudaMalloc((void **)&values_gpu, ldd*number_of_columns*sizeof(cuDoubleComplex)) );
  .
  .
  magma_int_t *ipiv; //Pivot for inversions
  magma_int_t info, ldwork; //Some variables required for inversion using MAGMA
  magmaDoubleComplex *dwork; //Workspace for inversion

  int M = number_of_rows;

  //Invert G <- this part is not asynchronous
  ldwork = M * magma_get_zgetri_nb( M );
  ipiv = (magma_int_t *)malloc(M*sizeof(magma_int_t));
  magma_malloc((void **)&dwork, ldwork*sizeof(magmaDoubleComplex));

  //Compute the LU factorization
  magma_zgetrf_gpu(M, M, values_gpu, ldd, ipiv, &info);
  magmaErrchk( info );
  //Calculate the inverse
  magma_zgetri_gpu(M, values_gpu, ldd, ipiv, dwork, ldwork, &info);
  magmaErrchk( info );

  //Clear the inversion workspace
  free(ipiv);
  magma_free(dwork);

Thanks,
Harshad

mgates3 · Post by **mgates3** » Tue Apr 15, 2014 1:46 pm

I don't understand your question about how to "set the rhs for zgesv while inverting a matrix". When using zgesv, the matrix is never inverted. It is factored, and then two triangular solves are done. An example setting the rhs is in testing_zgesv[_gpu].cpp.

It's hard to tell what the issue is without seeing the complete code. For instance, I assume number_of_columns == number_of_rows, but this is not visible here. If you could provide a short, complete example that can be compiled, it would be helpful.

Also, I would re-iterate that you should almost never invert a matrix. But the error appears to come in the factorization, before inversion.

-mark

hsahasra · Post by **hsahasra** » Wed Apr 16, 2014 7:11 pm

Sorry, I didn't see that you said invert and multiply x=A^{-1}*b. In my code I'm actually inverting the matrix and not solving a linear equation. I can also implicitly invert a matrix using zgesv, by setting the RHS to an identity matrix. So my question was, do you know of a way of creating an identity matrix using MAGMA/CUBLAS for the RHS?

The code I'm working on is pretty big, so I'll try making a working example.

Thanks,
Harshad

hsahasra · Post by **hsahasra** » Wed Apr 16, 2014 9:25 pm

Hi Mark,

The matrices which are inverting properly are of sizes 80x80, 320x320 and 1280x1280. The inversion fails when I try to do it for a matrix of size 720x720. Is this problem related to the number of rows/columns being divisible by 16/32?

Harshad

mgates3 · Post by **mgates3** » Mon Apr 21, 2014 11:26 am

If you need the inverse for some purpose other than solving a linear system, then zgetri( ) should be the best option. It does the inverse in-place, rather than requiring a second N x N matrix for the result.

You can set a matrix to the identity using LAPACK's zlaset with alpha=0 and beta=1. The MAGMA interface is a bit different. Use magmablas_zlaset_identity to set a matrix on the GPU to the identity.

It should work for any matrix size, regardless of whether it is divisible by 32, but having LDA divisible by 32 will improve the CUDA performance. Of course, it's always possible that there is a bug regarding this. I'll do some investigating.

-mark

mgates3 · Post by **mgates3** » Mon Apr 21, 2014 1:24 pm

I have been unable to replicate this problem, for any size, with ldd = M rounded up to multiple of 32, or with ldd = M. Please advise if you continue to get errors using either the MAGMA testing_zgetri or the attached code.

Code: Select all

sweetums ~/magma-trunk/testing> ./testing_zgetri_gpu -N 80 -N 320 -N 720 -N 1280 -c
MAGMA 1.4.0 svn compiled for CUDA capability >= 1.0
CUDA runtime 5050, driver 5050. OpenMP threads 8. MKL 11.1.0, MKL threads 8. 
device 0: Tesla K20m, 705.5 MHz clock, 4799.6 MB memory, capability 3.5
device 1: Tesla K20m, 705.5 MHz clock, 4799.6 MB memory, capability 3.5
Usage: ./testing_zgetri_gpu [options] [-h|--help]

    N   CPU GFlop/s (sec)   GPU GFlop/s (sec)   ||R||_F / ||A||_F
=================================================================
   80      4.13 (   0.00)      1.15 (   0.00)   7.78e-16  ok
  320     34.30 (   0.01)     36.05 (   0.00)   2.09e-14  ok
  720     67.68 (   0.03)    174.62 (   0.01)   7.98e-15  ok
 1280     97.93 (   0.11)    240.47 (   0.05)   1.17e-14  ok

Here is the output of a modified version of the code you provided. (Basically, I filled in the missing pieces.) The modified code is attached.

Code: Select all

sweetums ~/magma-trunk/testing> ./test_zgetri 80
M = 80, N = 80, lda = 80, ldd = 96
res = 3.32e-17
done
sweetums ~/magma-trunk/testing> ./test_zgetri 320
M = 320, N = 320, lda = 320, ldd = 320
res = 1.01e-16
done
sweetums ~/magma-trunk/testing> ./test_zgetri 720
M = 720, N = 720, lda = 720, ldd = 736
res = 2.81e-16
done
sweetums ~/magma-trunk/testing> ./test_zgetri 1280
M = 1280, N = 1280, lda = 1280, ldd = 1280
res = 4.16e-16
done

hsahasra · Post by **hsahasra** » Tue Apr 22, 2014 5:34 pm

Thanks Mark. I'll start from this code and see what's wrong with mine. There must be some small bug somewhere. Will post when I find something.

MAGMA Forum

Unable to run zgetri_gpu

Unable to run zgetri_gpu

Re: Unable to run zgetri_gpu

Re: Unable to run zgetri_gpu

Re: Unable to run zgetri_gpu

Re: Unable to run zgetri_gpu

Re: Unable to run zgetri_gpu

Re: Unable to run zgetri_gpu

Re: Unable to run zgetri_gpu

Re: Unable to run zgetri_gpu

Re: Unable to run zgetri_gpu