Help! Magma Test Failed on Multi GPU Setup
Posted: Tue Jul 04, 2017 11:32 am
This is my configuration
% MAGMA 2.2.0 compiled for CUDA capability >= 2.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 8000, driver 8000. OpenMP threads 32. MKL 2017.0.0, MKL threads 16.
% device 0: GeForce GTX 1060 6GB, 1784.5 MHz clock, 6070.8 MiB memory, capability 6.1
% device 1: Tesla M2050, 1147.0 MHz clock, 2622.3 MiB memory, capability 2.0
These are sample of failed tests
% side = Left, uplo = Lower, transA = No transpose, diag = Non-unit, ngpu = 2
% M N MAGMA Gflop/s (ms) CUBLAS Gflop/s (ms) CPU Gflop/s (ms) MAGMA CUBLAS LAPACK error
%============================================================================================================
1 1 0.00 ( 335.65) 0.00 ( 0.10) --- ( --- ) 4.84e+00 3.19e-07 --- failed
2 2 0.00 ( 214.03) 0.00 ( 0.20) --- ( --- ) 1.41e+00 1.31e-07 --- failed
3 3 0.00 ( 210.34) 0.00 ( 0.09) --- ( --- ) 1.47e+00 2.55e-07 --- failed
all test fails,
I know I got a weired setup from cuda capable 6.1 and 2.0, But never the less, they should work together nicely since data transfers is from gpu to cpu and back, and not gpu to gpu, so each Gpu will handle its own Cuda code.
Somebody might argue that 2.0 is too old, still many computing hardware, could still be using 2.0, cause its just too expensive to ditch it out.
Please help on fixing the test failed.
% MAGMA 2.2.0 compiled for CUDA capability >= 2.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 8000, driver 8000. OpenMP threads 32. MKL 2017.0.0, MKL threads 16.
% device 0: GeForce GTX 1060 6GB, 1784.5 MHz clock, 6070.8 MiB memory, capability 6.1
% device 1: Tesla M2050, 1147.0 MHz clock, 2622.3 MiB memory, capability 2.0
These are sample of failed tests
% side = Left, uplo = Lower, transA = No transpose, diag = Non-unit, ngpu = 2
% M N MAGMA Gflop/s (ms) CUBLAS Gflop/s (ms) CPU Gflop/s (ms) MAGMA CUBLAS LAPACK error
%============================================================================================================
1 1 0.00 ( 335.65) 0.00 ( 0.10) --- ( --- ) 4.84e+00 3.19e-07 --- failed
2 2 0.00 ( 214.03) 0.00 ( 0.20) --- ( --- ) 1.41e+00 1.31e-07 --- failed
3 3 0.00 ( 210.34) 0.00 ( 0.09) --- ( --- ) 1.47e+00 2.55e-07 --- failed
all test fails,
I know I got a weired setup from cuda capable 6.1 and 2.0, But never the less, they should work together nicely since data transfers is from gpu to cpu and back, and not gpu to gpu, so each Gpu will handle its own Cuda code.
Somebody might argue that 2.0 is too old, still many computing hardware, could still be using 2.0, cause its just too expensive to ditch it out.
Please help on fixing the test failed.