Running MAGMA in a Cluster

farhad · Post by **farhad** » Tue Oct 18, 2016 10:06 am

Hi,

I have installed MAGMA in a node in a cluster at NTU and trying to run in K20 GPU with CUDA 7.0. I downloaded latest version of MAGMA and OpenBLAS and have set the required path in the make.inc file.

When I try to execute any program I get following error.

#########################################################################################################################################

% MAGMA 2.0.2 compiled for CUDA capability >= 2.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 0, driver 7050. OpenMP threads 1.
% Tue Oct 18 21:58:13 2016
% Usage: ./testing_dgemm [options] [-h|--help]

CUDA runtime error: no CUDA-capable device is detected (38) in magma_setdevice at interface_cuda/interface.cpp:461
CUDA runtime error: no CUDA-capable device is detected (38) in magma_getdevices at interface_cuda/interface.cpp:437
CUDA runtime error: no CUDA-capable device is detected (38) in magma_setdevice at interface_cuda/interface.cpp:461
CUDA runtime error: no CUDA-capable device is detected (38) in parse_opts at testing/magma_util.cpp:581
CUBLAS error: not initialized (1) in parse_opts at testing/magma_util.cpp:581
CUBLAS error: not initialized (1) in parse_opts at testing/magma_util.cpp:581
MAGMA error: function-specific error, see documentation (1) in parse_opts at testing/magma_util.cpp:581
MAGMA error: function-specific error, see documentation (1) in parse_opts at testing/magma_util.cpp:581
CUDA runtime error: no CUDA-capable device is detected (38) in magma_setdevice at interface_cuda/interface.cpp:461
CUDA runtime error: no CUDA-capable device is detected (38) in parse_opts at testing/magma_util.cpp:582
CUBLAS error: not initialized (1) in parse_opts at testing/magma_util.cpp:582
CUBLAS error: not initialized (1) in parse_opts at testing/magma_util.cpp:582
MAGMA error: function-specific error, see documentation (1) in parse_opts at testing/magma_util.cpp:582
MAGMA error: function-specific error, see documentation (1) in parse_opts at testing/magma_util.cpp:582
% If running lapack (option --lapack), MAGMA and cuBLAS error are both computed
% relative to CPU BLAS result. Else, MAGMA error is computed relative to cuBLAS result.

% transA = No transpose, transB = No transpose
% M N K MAGMA Gflop/s (ms) cuBLAS Gflop/s (ms) CPU Gflop/s (ms) MAGMA error cuBLAS error
%========================================================================================================
!!!! magma_malloc failed for: d_A

##########################################################################################################################

Can you please help me with this error?

mgates3 · Post by **mgates3** » Tue Oct 18, 2016 12:05 pm

It doesn’t seem that CUDA, and hence MAGMA, is seeing your GPU. Is this on Linux? If so, what does nvidia-smi show?

Code: Select all

prompt> nvidia-smi 
Tue Oct 18 11:30:59 2016       
+------------------------------------------------------+                       
| NVIDIA-SMI 352.68     Driver Version: 352.68         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K40c          On   | 0000:83:00.0     Off |                    0 |
| 23%   35C    P8    21W / 235W |     23MiB / 11519MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K40c          On   | 0000:84:00.0     Off |                    0 |
| 23%   22C    P8    20W / 235W |     23MiB / 11519MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

farhad · Post by **farhad** » Tue Oct 18, 2016 11:10 pm

mgates3 · Post by **mgates3** » Wed Oct 19, 2016 1:42 am

Then I’m not sure what is going on. It should print the available devices in the header:

Code: Select all

prompt> ./testing_dpotrf -n 500
% MAGMA 2.1.0 svn compiled for CUDA capability >= 3.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 7050, driver 7050. MAGMA not compiled with OpenMP. 
% device 0: GeForce GT 750M, 925.5 MHz clock, 2047.6 MiB memory, capability 3.0
% Wed Oct 19 01:19:06 2016
% Usage: ./testing_dpotrf [options] [-h|--help]

You can edit magma/interface_cuda/interface.cpp
In the magma_print_environment( ) function, you can add a print of the number of devices it sees:

Code: Select all

    // print devices
    int ndevices = 0;
    err = cudaGetDeviceCount( &ndevices );
printf( "ndevices %d\n", ndevices );
    if ( err != cudaErrorNoDevice ) {
        check_error( err );
    }

Which yields the output:

Code: Select all

prompt> ./testing_dpotrf -n 500
% MAGMA 2.1.0 svn compiled for CUDA capability >= 3.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 7050, driver 7050. MAGMA not compiled with OpenMP. 
ndevices 1
% device 0: GeForce GT 750M, 925.5 MHz clock, 2047.6 MiB memory, capability 3.0
% Wed Oct 19 01:27:42 2016
% Usage: ./testing_dpotrf [options] [-h|--help]

I can force the error you see by hiding all devices:

Code: Select all

prompt> setenv CUDA_VISIBLE_DEVICES ""
prompt> ./testing_dpotrf -n 500
% MAGMA 2.1.0 svn compiled for CUDA capability >= 3.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 0, driver 7050. MAGMA not compiled with OpenMP. 
ndevices 0
% Wed Oct 19 01:29:32 2016
% Usage: ./testing_dpotrf [options] [-h|--help]

CUDA runtime error: no CUDA-capable device is detected (38) in magma_getdevices at interface_cuda/interface.cpp:519

-mark

farhad · Post by **farhad** » Wed Oct 19, 2016 9:34 am

This is the header I am getting..

% MAGMA 2.0.2 compiled for CUDA capability >= 2.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 0, driver 7050. ndevices 0
OpenMP threads 1.
% Wed Oct 19 15:06:35 2016
% Usage: ./testing_dgemm [options] [-h|--help]

mgates3 · Post by **mgates3** » Wed Oct 19, 2016 12:59 pm

To remove MAGMA from the picture, attached is a simple program that queries the CUDA devices. Compile with nvcc. (Can be compiled with gcc if given right include and lib paths.)

On my laptop:

Code: Select all

prompt> nvcc -o cuda-devices cuda-devices.c
prompt> ./cuda-devices
ndev 1
device 0
  name                          = GeForce GT 750M
  asyncEngineCount              = 1
  canMapHostMemory              = 1
  capability major.minor        = 3.0
  clockRate                     = 925.5 MHz
  computeMode                   = 0
  concurrentKernels             = 1
  deviceOverlap                 = 1
  ECCEnabled                    = 0
  integrated                    = 0
  kernelExecTimeoutEnabled      = 1
  l2CacheSize                   = 256.0 KB
  maxGridSize                   = 2147483647 x 65535 x 65535
  maxTexture1D                  = 65536
  maxTexture1DLayered           = 16384 x  2048
  maxTexture2D                  = 65536 x 65536
  maxTexture2DLayered           = 16384 x 16384 x  2048
  maxTexture3D                  =  4096 x  4096 x  4096
  maxThreadsDim                 =  1024 x  1024 x    64
  maxThreadsPerBlock            = 1024
  maxThreadsPerMultiProcessor   = 2048
  memoryBusWidth                = 128
  memoryClockRate               = 2508.0 MHz
  memPitch                      = 2048.0 MB
  multiProcessorCount           = 2
  pciBusID                      = 1
  pciDeviceID                   = 0
  pciDomainID                   = 0
  regsPerBlock                  = 65536
  sharedMemPerBlock             = 48.0 KB
  surfaceAlignment              = 512
  tccDriver                     = 0
  textureAlignment              = 512
  totalConstMem                 = 64.0 KB
  totalGlobalMem                = 2047.6 MB
  unifiedAddressing             = 1
  warpSize                      = 32

Forcing no device:

Code: Select all

prompt> setenv CUDA_VISIBLE_DEVICES ""
prompt> ./cuda-devices
ndev 0

farhad · Post by **farhad** » Thu Oct 20, 2016 1:56 am

Hi,

I tried running in a local machine in my lab that has Tesla C2075 and I am getting following output.
##############################################################################
./cuda-devices
ndev 1
device 0
name = Tesla C2075
asyncEngineCount = 2
canMapHostMemory = 1
capability major.minor = 2.0
clockRate = 1147.0 MHz
computeMode = 0
concurrentKernels = 1
deviceOverlap = 1
ECCEnabled = 1
integrated = 0
kernelExecTimeoutEnabled = 1
l2CacheSize = 768.0 KB
maxGridSize = 65535 x 65535 x 65535
maxTexture1D = 65536
maxTexture1DLayered = 16384 x 2048
maxTexture2D = 65536 x 65535
maxTexture2DLayered = 16384 x 16384 x 2048
maxTexture3D = 2048 x 2048 x 2048
maxThreadsDim = 1024 x 1024 x 64
maxThreadsPerBlock = 1024
maxThreadsPerMultiProcessor = 1536
memoryBusWidth = 384
memoryClockRate = 1566.0 MHz
memPitch = 2048.0 MB
multiProcessorCount = 14
pciBusID = 15
pciDeviceID = 0
pciDomainID = 0
regsPerBlock = 32768
sharedMemPerBlock = 48.0 KB
surfaceAlignment = 512
tccDriver = 0
textureAlignment = 512
totalConstMem = 64.0 KB
totalGlobalMem = 5375.2 MB
unifiedAddressing = 1
warpSize = 32
####################################################################################

While I run it on cluster it gives following output.

#########################################################################################
./cuda-devices
ndev 8
cuda-devices: testing/cuda-devices.c:22: main: Assertion `err == 0' failed.
Aborted
#########################################################################################

Any idea how to resolve this issue?

farhad · Post by **farhad** » Fri Oct 21, 2016 2:01 am

Hi,

Now I am able to run it successfully.

Thank you very much.

Thanks & Regards,

mgates3 · Post by **mgates3** » Fri Oct 21, 2016 3:18 pm

Updated code attached that prints error message instead of using assert. If on old CUDA driver version is installed (not matching the nvcc CUDA runtime version), I could see that causing issues.

So you solved the issue with running MAGMA? What was the solution?

-mark

MAGMA Forum

Running MAGMA in a Cluster

Running MAGMA in a Cluster

Re: Running MAGMA in a Cluster

Re: Running MAGMA in a Cluster

Re: Running MAGMA in a Cluster

Re: Running MAGMA in a Cluster

Re: Running MAGMA in a Cluster

Re: Running MAGMA in a Cluster

Re: Running MAGMA in a Cluster

Re: Running MAGMA in a Cluster