I'm having some weird problems using zgetrf+zgetri while inverting a matrix. I'm doing exactly what is being done in the testing_zgetri_gpu example. I get the same error if I use magma_zgesv_gpu with an identity RHS. I'm able to run my code and get correct answers for small (80x80) matrices. But when I try to run bigger matrices, I get the following errors repeated many times:
Code: Select all
CUDA runtime error: unspecified launch failure (4) in magma_zgetrf_gpu at zgetrf_gpu.cpp:170
CUBLAS error: memory mapping error (11) in magma_zgetrf_gpu at zgetrf_gpu.cpp:172
CUDA runtime error: unspecified launch failure (4) in magma_zgetrf_gpu at zgetrf_gpu.cpp:188
CUBLAS error: memory mapping error (11) in magma_zgetrf_gpu at zgetrf_gpu.cpp:195
CUDA runtime error: unspecified launch failure (4) in magma_zgetrf_gpu at zgetrf_gpu.cpp:199
CUDA runtime error: unspecified launch failure (4) in magma_zgetrf_gpu at zgetrf_gpu.cpp:170
CUBLAS error: memory mapping error (11) in magma_zgetrf_gpu at zgetrf_gpu.cpp:172
CUDA runtime error: unspecified launch failure (4) in magma_zgetrf_gpu at zgetrf_gpu.cpp:188
CUBLAS error: memory mapping error (11) in magma_zgetrf_gpu at zgetrf_gpu.cpp:195
CUDA runtime error: unspecified launch failure (4) in magma_zgetrf_gpu at zgetrf_gpu.cpp:199
CUDA runtime error: unspecified launch failure (4) in magma_zgetrf_gpu at zgetrf_gpu.cpp:170
CUBLAS error: memory mapping error (11) in magma_zgetrf_gpu at zgetrf_gpu.cpp:172
CUDA runtime error: unspecified launch failure (4) in magma_zgetrf_gpu at zgetrf_gpu.cpp:188
CUBLAS error: memory mapping error (11) in magma_zgetrf_gpu at zgetrf_gpu.cpp:195
CUDA runtime error: unspecified launch failure (4) in magma_zgetrf_gpu at zgetrf_gpu.cpp:199
CUDA runtime error: unspecified launch failure (4) in magma_zgetrf_gpu at zgetrf_gpu.cpp:170
Code: Select all
[Launch of CUDA Kernel 257 (ztranspose3_32<<<(2,3,1),(16,8,1)>>>) on Device 0]
Memcheck detected an illegal access to address (@global)0x1300952700
Program received signal CUDA_EXCEPTION_1, Lane Illegal Address.
[Switching focus to CUDA kernel 257, grid 258, block (1,2,0), thread (0,6,0), device 0, sm 9, warp 0, lane 0]
0x0000000000dc6220 in ztranspose3_32(int, int, magmaDoubleComplex * @generic, int, const magmaDoubleComplex * @generic, int, int, int)<<<(2,3,1),(16,8,1)>>> (m32=0, n32=16,
__val_paramB=0x1300914000, ldb=96, __val_paramA=0x1300932000, lda=96, m=64, n=80) at ztranspose-v2.cu:86
86 sA[iny+16][inx] = A[16*lda];
Code: Select all
GPU_TARGET ?= Kepler
CC = icc
NVCC = nvcc
FORT = ifort
ARCH = ar
ARCHFLAGS = cr
RANLIB = ranlib
OPTS = -O0 -g -DADD_ -Wall -openmp -DMAGMA_WITH_MKL -DMAGMA_SETAFFINITY -fPIC
F77OPTS = -O0 -g -DADD_ -warn all -fPIC
FOPTS = -O0 -g -DADD_ -warn all -fPIC
NVOPTS = -O0 -g -G -DADD_ -Xcompiler "-fno-strict-aliasing -fPIC"
LDOPTS = -openmp
# old MKL
#LIB = -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_lapack -lmkl_core -lguide -lpthread -lcublas -lcudart -lstdc++ -lm
# see MKL Link Advisor at http://software.intel.com/sites/products/mkl/
# icc with MKL 10.3
#LIB = -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lpthread -lcublas -lcudart -lstdc++ -lm
LIB = -lmkl_intel_lp64 -lmkl_intel_thread -lpthread -lmkl_core -lcublas -lcudart -lstdc++ -lm -lifcore
# define library directories preferably in your environment, or here.
# for MKL run, e.g.: source /opt/intel/composerxe/mkl/bin/mklvars.sh intel64
MKLROOT ?= $(MKL_HOME)
CUDADIR ?= $(CUDA_ROOT)
-include make.check-mkl
-include make.check-cuda
LIBDIR = -L$(MKLROOT)/lib/intel64 \
-L$(CUDADIR)/lib64
INC = -I$(CUDADIR)/include -I$(MKLROOT)/include