Max batch size for dgesv ?
Posted: Wed Jun 10, 2020 9:19 pm
Hi;
I started using Magma to solve batched 3x3 problems using dgetrf and dgetrs. It worked fine for small problems but dies for batch sizes of ~80K.
The retval from the magma call is MAGMA_SUCCESS but cudaPeekatLastError reports :
CUDA Error Code[9]: invalid configuration argument
I then tried one of the tests:
./testing_dgesv_batched -n 3:3:3 --batch 80000 and this fails as well.
./testing_dgesv_batched -n 3:3:3 --batch 80000
% MAGMA 2.5.3 compiled for CUDA capability >= 7.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 10010, driver 10010. OpenMP threads 176.
% device 0: Tesla V100-SXM2-16GB, 1530.0 MHz clock, 16128.0 MiB memory, capability 7.0
% device 1: Tesla V100-SXM2-16GB, 1530.0 MHz clock, 16128.0 MiB memory, capability 7.0
% device 2: Tesla V100-SXM2-16GB, 1530.0 MHz clock, 16128.0 MiB memory, capability 7.0
% device 3: Tesla V100-SXM2-16GB, 1530.0 MHz clock, 16128.0 MiB memory, capability 7.0
% Wed Jun 10 18:13:36 2020
% Usage: ./testing_dgesv_batched [options] [-h|--help]
% BatchCount N NRHS CPU Gflop/s (sec) GPU Gflop/s (sec) ||B - AX|| / N*||A||*||X||
%============================================================================================
80000 3 1 --- ( --- ) 1.08 ( 0.00) 5.70e-01 failed
It runs fine for batches up to 65000.
Thanks
Ramesh
I started using Magma to solve batched 3x3 problems using dgetrf and dgetrs. It worked fine for small problems but dies for batch sizes of ~80K.
The retval from the magma call is MAGMA_SUCCESS but cudaPeekatLastError reports :
CUDA Error Code[9]: invalid configuration argument
I then tried one of the tests:
./testing_dgesv_batched -n 3:3:3 --batch 80000 and this fails as well.
./testing_dgesv_batched -n 3:3:3 --batch 80000
% MAGMA 2.5.3 compiled for CUDA capability >= 7.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 10010, driver 10010. OpenMP threads 176.
% device 0: Tesla V100-SXM2-16GB, 1530.0 MHz clock, 16128.0 MiB memory, capability 7.0
% device 1: Tesla V100-SXM2-16GB, 1530.0 MHz clock, 16128.0 MiB memory, capability 7.0
% device 2: Tesla V100-SXM2-16GB, 1530.0 MHz clock, 16128.0 MiB memory, capability 7.0
% device 3: Tesla V100-SXM2-16GB, 1530.0 MHz clock, 16128.0 MiB memory, capability 7.0
% Wed Jun 10 18:13:36 2020
% Usage: ./testing_dgesv_batched [options] [-h|--help]
% BatchCount N NRHS CPU Gflop/s (sec) GPU Gflop/s (sec) ||B - AX|| / N*||A||*||X||
%============================================================================================
80000 3 1 --- ( --- ) 1.08 ( 0.00) 5.70e-01 failed
It runs fine for batches up to 65000.
Thanks
Ramesh