Running MAGMA in a Cluster
Running MAGMA in a Cluster
Hi,
I have installed MAGMA in a node in a cluster at NTU and trying to run in K20 GPU with CUDA 7.0. I downloaded latest version of MAGMA and OpenBLAS and have set the required path in the make.inc file.
When I try to execute any program I get following error.
#########################################################################################################################################
% MAGMA 2.0.2 compiled for CUDA capability >= 2.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 0, driver 7050. OpenMP threads 1.
% Tue Oct 18 21:58:13 2016
% Usage: ./testing_dgemm [options] [-h|--help]
CUDA runtime error: no CUDA-capable device is detected (38) in magma_setdevice at interface_cuda/interface.cpp:461
CUDA runtime error: no CUDA-capable device is detected (38) in magma_getdevices at interface_cuda/interface.cpp:437
CUDA runtime error: no CUDA-capable device is detected (38) in magma_setdevice at interface_cuda/interface.cpp:461
CUDA runtime error: no CUDA-capable device is detected (38) in parse_opts at testing/magma_util.cpp:581
CUBLAS error: not initialized (1) in parse_opts at testing/magma_util.cpp:581
CUBLAS error: not initialized (1) in parse_opts at testing/magma_util.cpp:581
MAGMA error: function-specific error, see documentation (1) in parse_opts at testing/magma_util.cpp:581
MAGMA error: function-specific error, see documentation (1) in parse_opts at testing/magma_util.cpp:581
CUDA runtime error: no CUDA-capable device is detected (38) in magma_setdevice at interface_cuda/interface.cpp:461
CUDA runtime error: no CUDA-capable device is detected (38) in parse_opts at testing/magma_util.cpp:582
CUBLAS error: not initialized (1) in parse_opts at testing/magma_util.cpp:582
CUBLAS error: not initialized (1) in parse_opts at testing/magma_util.cpp:582
MAGMA error: function-specific error, see documentation (1) in parse_opts at testing/magma_util.cpp:582
MAGMA error: function-specific error, see documentation (1) in parse_opts at testing/magma_util.cpp:582
% If running lapack (option --lapack), MAGMA and cuBLAS error are both computed
% relative to CPU BLAS result. Else, MAGMA error is computed relative to cuBLAS result.
% transA = No transpose, transB = No transpose
% M N K MAGMA Gflop/s (ms) cuBLAS Gflop/s (ms) CPU Gflop/s (ms) MAGMA error cuBLAS error
%========================================================================================================
!!!! magma_malloc failed for: d_A
##########################################################################################################################
Can you please help me with this error?
I have installed MAGMA in a node in a cluster at NTU and trying to run in K20 GPU with CUDA 7.0. I downloaded latest version of MAGMA and OpenBLAS and have set the required path in the make.inc file.
When I try to execute any program I get following error.
#########################################################################################################################################
% MAGMA 2.0.2 compiled for CUDA capability >= 2.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 0, driver 7050. OpenMP threads 1.
% Tue Oct 18 21:58:13 2016
% Usage: ./testing_dgemm [options] [-h|--help]
CUDA runtime error: no CUDA-capable device is detected (38) in magma_setdevice at interface_cuda/interface.cpp:461
CUDA runtime error: no CUDA-capable device is detected (38) in magma_getdevices at interface_cuda/interface.cpp:437
CUDA runtime error: no CUDA-capable device is detected (38) in magma_setdevice at interface_cuda/interface.cpp:461
CUDA runtime error: no CUDA-capable device is detected (38) in parse_opts at testing/magma_util.cpp:581
CUBLAS error: not initialized (1) in parse_opts at testing/magma_util.cpp:581
CUBLAS error: not initialized (1) in parse_opts at testing/magma_util.cpp:581
MAGMA error: function-specific error, see documentation (1) in parse_opts at testing/magma_util.cpp:581
MAGMA error: function-specific error, see documentation (1) in parse_opts at testing/magma_util.cpp:581
CUDA runtime error: no CUDA-capable device is detected (38) in magma_setdevice at interface_cuda/interface.cpp:461
CUDA runtime error: no CUDA-capable device is detected (38) in parse_opts at testing/magma_util.cpp:582
CUBLAS error: not initialized (1) in parse_opts at testing/magma_util.cpp:582
CUBLAS error: not initialized (1) in parse_opts at testing/magma_util.cpp:582
MAGMA error: function-specific error, see documentation (1) in parse_opts at testing/magma_util.cpp:582
MAGMA error: function-specific error, see documentation (1) in parse_opts at testing/magma_util.cpp:582
% If running lapack (option --lapack), MAGMA and cuBLAS error are both computed
% relative to CPU BLAS result. Else, MAGMA error is computed relative to cuBLAS result.
% transA = No transpose, transB = No transpose
% M N K MAGMA Gflop/s (ms) cuBLAS Gflop/s (ms) CPU Gflop/s (ms) MAGMA error cuBLAS error
%========================================================================================================
!!!! magma_malloc failed for: d_A
##########################################################################################################################
Can you please help me with this error?
Re: Running MAGMA in a Cluster
It doesn’t seem that CUDA, and hence MAGMA, is seeing your GPU. Is this on Linux? If so, what does nvidia-smi show?
Code: Select all
prompt> nvidia-smi
Tue Oct 18 11:30:59 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.68 Driver Version: 352.68 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K40c On | 0000:83:00.0 Off | 0 |
| 23% 35C P8 21W / 235W | 23MiB / 11519MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K40c On | 0000:84:00.0 Off | 0 |
| 23% 22C P8 20W / 235W | 23MiB / 11519MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Re: Running MAGMA in a Cluster
This is my output of nvidia-smi
Wed Oct 19 11:05:53 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.93 Driver Version: 352.93 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20Xm On | 0000:03:00.0 Off | 0 |
| N/A 46C P0 163W / 235W | 5286MiB / 5759MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K20Xm On | 0000:04:00.0 Off | 0 |
| N/A 24C P8 18W / 235W | 15MiB / 5759MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K20Xm On | 0000:83:00.0 Off | 0 |
| N/A 30C P8 18W / 235W | 15MiB / 5759MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K20Xm On | 0000:84:00.0 Off | 0 |
| N/A 28C P8 18W / 235W | 15MiB / 5759MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 5685 C /cm/shared/apps/python/2.7.6/bin/python 5268MiB |
+-----------------------------------------------------------------------------+
Wed Oct 19 11:05:53 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.93 Driver Version: 352.93 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20Xm On | 0000:03:00.0 Off | 0 |
| N/A 46C P0 163W / 235W | 5286MiB / 5759MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K20Xm On | 0000:04:00.0 Off | 0 |
| N/A 24C P8 18W / 235W | 15MiB / 5759MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K20Xm On | 0000:83:00.0 Off | 0 |
| N/A 30C P8 18W / 235W | 15MiB / 5759MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K20Xm On | 0000:84:00.0 Off | 0 |
| N/A 28C P8 18W / 235W | 15MiB / 5759MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 5685 C /cm/shared/apps/python/2.7.6/bin/python 5268MiB |
+-----------------------------------------------------------------------------+
Re: Running MAGMA in a Cluster
Then I’m not sure what is going on. It should print the available devices in the header:
You can edit magma/interface_cuda/interface.cpp
In the magma_print_environment( ) function, you can add a print of the number of devices it sees:
Which yields the output:
I can force the error you see by hiding all devices:
-mark
Code: Select all
prompt> ./testing_dpotrf -n 500
% MAGMA 2.1.0 svn compiled for CUDA capability >= 3.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 7050, driver 7050. MAGMA not compiled with OpenMP.
% device 0: GeForce GT 750M, 925.5 MHz clock, 2047.6 MiB memory, capability 3.0
% Wed Oct 19 01:19:06 2016
% Usage: ./testing_dpotrf [options] [-h|--help]
In the magma_print_environment( ) function, you can add a print of the number of devices it sees:
Code: Select all
// print devices
int ndevices = 0;
err = cudaGetDeviceCount( &ndevices );
printf( "ndevices %d\n", ndevices );
if ( err != cudaErrorNoDevice ) {
check_error( err );
}
Code: Select all
prompt> ./testing_dpotrf -n 500
% MAGMA 2.1.0 svn compiled for CUDA capability >= 3.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 7050, driver 7050. MAGMA not compiled with OpenMP.
ndevices 1
% device 0: GeForce GT 750M, 925.5 MHz clock, 2047.6 MiB memory, capability 3.0
% Wed Oct 19 01:27:42 2016
% Usage: ./testing_dpotrf [options] [-h|--help]
Code: Select all
prompt> setenv CUDA_VISIBLE_DEVICES ""
prompt> ./testing_dpotrf -n 500
% MAGMA 2.1.0 svn compiled for CUDA capability >= 3.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 0, driver 7050. MAGMA not compiled with OpenMP.
ndevices 0
% Wed Oct 19 01:29:32 2016
% Usage: ./testing_dpotrf [options] [-h|--help]
CUDA runtime error: no CUDA-capable device is detected (38) in magma_getdevices at interface_cuda/interface.cpp:519
Re: Running MAGMA in a Cluster
This is the header I am getting..
% MAGMA 2.0.2 compiled for CUDA capability >= 2.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 0, driver 7050. ndevices 0
OpenMP threads 1.
% Wed Oct 19 15:06:35 2016
% Usage: ./testing_dgemm [options] [-h|--help]
% MAGMA 2.0.2 compiled for CUDA capability >= 2.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 0, driver 7050. ndevices 0
OpenMP threads 1.
% Wed Oct 19 15:06:35 2016
% Usage: ./testing_dgemm [options] [-h|--help]
Re: Running MAGMA in a Cluster
To remove MAGMA from the picture, attached is a simple program that queries the CUDA devices. Compile with nvcc. (Can be compiled with gcc if given right include and lib paths.)
On my laptop:
Forcing no device:
On my laptop:
Code: Select all
prompt> nvcc -o cuda-devices cuda-devices.c
prompt> ./cuda-devices
ndev 1
device 0
name = GeForce GT 750M
asyncEngineCount = 1
canMapHostMemory = 1
capability major.minor = 3.0
clockRate = 925.5 MHz
computeMode = 0
concurrentKernels = 1
deviceOverlap = 1
ECCEnabled = 0
integrated = 0
kernelExecTimeoutEnabled = 1
l2CacheSize = 256.0 KB
maxGridSize = 2147483647 x 65535 x 65535
maxTexture1D = 65536
maxTexture1DLayered = 16384 x 2048
maxTexture2D = 65536 x 65536
maxTexture2DLayered = 16384 x 16384 x 2048
maxTexture3D = 4096 x 4096 x 4096
maxThreadsDim = 1024 x 1024 x 64
maxThreadsPerBlock = 1024
maxThreadsPerMultiProcessor = 2048
memoryBusWidth = 128
memoryClockRate = 2508.0 MHz
memPitch = 2048.0 MB
multiProcessorCount = 2
pciBusID = 1
pciDeviceID = 0
pciDomainID = 0
regsPerBlock = 65536
sharedMemPerBlock = 48.0 KB
surfaceAlignment = 512
tccDriver = 0
textureAlignment = 512
totalConstMem = 64.0 KB
totalGlobalMem = 2047.6 MB
unifiedAddressing = 1
warpSize = 32
Code: Select all
prompt> setenv CUDA_VISIBLE_DEVICES ""
prompt> ./cuda-devices
ndev 0
- Attachments
-
- cuda-devices.c
- (4.75 KiB) Downloaded 134 times
Re: Running MAGMA in a Cluster
Hi,
I tried running in a local machine in my lab that has Tesla C2075 and I am getting following output.
##############################################################################
./cuda-devices
ndev 1
device 0
name = Tesla C2075
asyncEngineCount = 2
canMapHostMemory = 1
capability major.minor = 2.0
clockRate = 1147.0 MHz
computeMode = 0
concurrentKernels = 1
deviceOverlap = 1
ECCEnabled = 1
integrated = 0
kernelExecTimeoutEnabled = 1
l2CacheSize = 768.0 KB
maxGridSize = 65535 x 65535 x 65535
maxTexture1D = 65536
maxTexture1DLayered = 16384 x 2048
maxTexture2D = 65536 x 65535
maxTexture2DLayered = 16384 x 16384 x 2048
maxTexture3D = 2048 x 2048 x 2048
maxThreadsDim = 1024 x 1024 x 64
maxThreadsPerBlock = 1024
maxThreadsPerMultiProcessor = 1536
memoryBusWidth = 384
memoryClockRate = 1566.0 MHz
memPitch = 2048.0 MB
multiProcessorCount = 14
pciBusID = 15
pciDeviceID = 0
pciDomainID = 0
regsPerBlock = 32768
sharedMemPerBlock = 48.0 KB
surfaceAlignment = 512
tccDriver = 0
textureAlignment = 512
totalConstMem = 64.0 KB
totalGlobalMem = 5375.2 MB
unifiedAddressing = 1
warpSize = 32
####################################################################################
While I run it on cluster it gives following output.
#########################################################################################
./cuda-devices
ndev 8
cuda-devices: testing/cuda-devices.c:22: main: Assertion `err == 0' failed.
Aborted
#########################################################################################
Any idea how to resolve this issue?
I tried running in a local machine in my lab that has Tesla C2075 and I am getting following output.
##############################################################################
./cuda-devices
ndev 1
device 0
name = Tesla C2075
asyncEngineCount = 2
canMapHostMemory = 1
capability major.minor = 2.0
clockRate = 1147.0 MHz
computeMode = 0
concurrentKernels = 1
deviceOverlap = 1
ECCEnabled = 1
integrated = 0
kernelExecTimeoutEnabled = 1
l2CacheSize = 768.0 KB
maxGridSize = 65535 x 65535 x 65535
maxTexture1D = 65536
maxTexture1DLayered = 16384 x 2048
maxTexture2D = 65536 x 65535
maxTexture2DLayered = 16384 x 16384 x 2048
maxTexture3D = 2048 x 2048 x 2048
maxThreadsDim = 1024 x 1024 x 64
maxThreadsPerBlock = 1024
maxThreadsPerMultiProcessor = 1536
memoryBusWidth = 384
memoryClockRate = 1566.0 MHz
memPitch = 2048.0 MB
multiProcessorCount = 14
pciBusID = 15
pciDeviceID = 0
pciDomainID = 0
regsPerBlock = 32768
sharedMemPerBlock = 48.0 KB
surfaceAlignment = 512
tccDriver = 0
textureAlignment = 512
totalConstMem = 64.0 KB
totalGlobalMem = 5375.2 MB
unifiedAddressing = 1
warpSize = 32
####################################################################################
While I run it on cluster it gives following output.
#########################################################################################
./cuda-devices
ndev 8
cuda-devices: testing/cuda-devices.c:22: main: Assertion `err == 0' failed.
Aborted
#########################################################################################
Any idea how to resolve this issue?
Re: Running MAGMA in a Cluster
Hi,
Now I am able to run it successfully.
Thank you very much.
Thanks & Regards,
Now I am able to run it successfully.
Thank you very much.
Thanks & Regards,
Re: Running MAGMA in a Cluster
Updated code attached that prints error message instead of using assert. If on old CUDA driver version is installed (not matching the nvcc CUDA runtime version), I could see that causing issues.
So you solved the issue with running MAGMA? What was the solution?
-mark
So you solved the issue with running MAGMA? What was the solution?
-mark
- Attachments
-
- cuda-devices.c
- (5.01 KiB) Downloaded 119 times