Hi;
I started using Magma to solve batched 3x3 problems using dgetrf and dgetrs. It worked fine for small problems but dies for batch sizes of ~80K.
The retval from the magma call is MAGMA_SUCCESS but cudaPeekatLastError reports :
CUDA Error Code[9]: invalid configuration argument
I then tried one of the tests:
./testing_dgesv_batched -n 3:3:3 --batch 80000 and this fails as well.
./testing_dgesv_batched -n 3:3:3 --batch 80000
% MAGMA 2.5.3 compiled for CUDA capability >= 7.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 10010, driver 10010. OpenMP threads 176.
% device 0: Tesla V100-SXM2-16GB, 1530.0 MHz clock, 16128.0 MiB memory, capability 7.0
% device 1: Tesla V100-SXM2-16GB, 1530.0 MHz clock, 16128.0 MiB memory, capability 7.0
% device 2: Tesla V100-SXM2-16GB, 1530.0 MHz clock, 16128.0 MiB memory, capability 7.0
% device 3: Tesla V100-SXM2-16GB, 1530.0 MHz clock, 16128.0 MiB memory, capability 7.0
% Wed Jun 10 18:13:36 2020
% Usage: ./testing_dgesv_batched [options] [-h|--help]
% BatchCount N NRHS CPU Gflop/s (sec) GPU Gflop/s (sec) ||B - AX|| / N*||A||*||X||
%============================================================================================
80000 3 1 --- ( --- ) 1.08 ( 0.00) 5.70e-01 failed
It runs fine for batches up to 65000.
Thanks
Ramesh
Max batch size for dgesv ?
-
- Posts: 11
- Joined: Mon Dec 10, 2018 3:02 pm
Re: Max batch size for dgesv ?
Most of the batch kernels in MAGMA use the z-dimension of the kernel grid for batching across different problems. The z-dimension has a hardware limit of 65535. That is why the routine fails. I think that for your particular case, the error comes from dgetrs. The batch dgetrf routine does not have this problem for sizes less than 32.
This issue is fixed for some routines like the batch GEMM, but not all the batch routines have this fix. You can avoid the failure by dividing the batch into sub-batches of sizes less than 65k each. This is exactly what we do internally for routines like batch GEMM. Propagating such a fix to all the kernels is on our to-do list.
BTW, for such a small problem size, a fused dgetrf+dgetrs kernel would definitely give a significant performance boost against two separate kernels. This is something we don't yet support for dgesv.
--Ahmad
This issue is fixed for some routines like the batch GEMM, but not all the batch routines have this fix. You can avoid the failure by dividing the batch into sub-batches of sizes less than 65k each. This is exactly what we do internally for routines like batch GEMM. Propagating such a fix to all the kernels is on our to-do list.
BTW, for such a small problem size, a fused dgetrf+dgetrs kernel would definitely give a significant performance boost against two separate kernels. This is something we don't yet support for dgesv.
--Ahmad
Re: Max batch size for dgesv ?
Ahmad;
Thank your for the fast response. Yes, the problem is in dgetrf. In my application, the LHS is fixed for all time steps, so the fused version will not help.
BTW won't it be better/faster to use the X or Y dimension for batching ?
Thanks
Ramesh
Thank your for the fast response. Yes, the problem is in dgetrf. In my application, the LHS is fixed for all time steps, so the fused version will not help.
BTW won't it be better/faster to use the X or Y dimension for batching ?
Thanks
Ramesh
-
- Posts: 11
- Joined: Mon Dec 10, 2018 3:02 pm
Re: Max batch size for dgesv ?
The y-dimension has the same limit as the z-dimensions. Some kernels do use the X dimension for batching, but most of our code use the x-y dimensions for thread configurations. Changing that would be cumbersome because of the many kernels MAGMA has.
BTW, if you problem size is fixed at 3x3, you can just use direct formulas to solve the linear systems.
BTW, if you problem size is fixed at 3x3, you can just use direct formulas to solve the linear systems.
Re: Max batch size for dgesv ?
Understood.
Like Cramers's rule ? Hadn't thought of that.
Thanks
Ramesh
Like Cramers's rule ? Hadn't thought of that.
Thanks
Ramesh