Max batch size for dgesv ?

pramesh · Post by **pramesh** » Wed Jun 10, 2020 9:19 pm

Hi;
I started using Magma to solve batched 3x3 problems using dgetrf and dgetrs. It worked fine for small problems but dies for batch sizes of ~80K.
The retval from the magma call is MAGMA_SUCCESS but cudaPeekatLastError reports :

CUDA Error Code[9]: invalid configuration argument

I then tried one of the tests:

./testing_dgesv_batched -n 3:3:3 --batch 80000 and this fails as well.
./testing_dgesv_batched -n 3:3:3 --batch 80000
% MAGMA 2.5.3 compiled for CUDA capability >= 7.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 10010, driver 10010. OpenMP threads 176.
% device 0: Tesla V100-SXM2-16GB, 1530.0 MHz clock, 16128.0 MiB memory, capability 7.0
% device 1: Tesla V100-SXM2-16GB, 1530.0 MHz clock, 16128.0 MiB memory, capability 7.0
% device 2: Tesla V100-SXM2-16GB, 1530.0 MHz clock, 16128.0 MiB memory, capability 7.0
% device 3: Tesla V100-SXM2-16GB, 1530.0 MHz clock, 16128.0 MiB memory, capability 7.0
% Wed Jun 10 18:13:36 2020
% Usage: ./testing_dgesv_batched [options] [-h|--help]

% BatchCount N NRHS CPU Gflop/s (sec) GPU Gflop/s (sec) ||B - AX|| / N*||A||*||X||
%============================================================================================
80000 3 1 --- ( --- ) 1.08 ( 0.00) 5.70e-01 failed

It runs fine for batches up to 65000.

Thanks

Ramesh

abdelfattah83 · Post by **abdelfattah83** » Thu Jun 11, 2020 12:37 am

Most of the batch kernels in MAGMA use the z-dimension of the kernel grid for batching across different problems. The z-dimension has a hardware limit of 65535. That is why the routine fails. I think that for your particular case, the error comes from dgetrs. The batch dgetrf routine does not have this problem for sizes less than 32.

This issue is fixed for some routines like the batch GEMM, but not all the batch routines have this fix. You can avoid the failure by dividing the batch into sub-batches of sizes less than 65k each. This is exactly what we do internally for routines like batch GEMM. Propagating such a fix to all the kernels is on our to-do list.

BTW, for such a small problem size, a fused dgetrf+dgetrs kernel would definitely give a significant performance boost against two separate kernels. This is something we don't yet support for dgesv.

--Ahmad

pramesh · Post by **pramesh** » Thu Jun 11, 2020 12:30 pm

Ahmad;

Thank your for the fast response. Yes, the problem is in dgetrf. In my application, the LHS is fixed for all time steps, so the fused version will not help.

BTW won't it be better/faster to use the X or Y dimension for batching ?

Thanks

Ramesh

abdelfattah83 · Post by **abdelfattah83** » Thu Jun 11, 2020 4:39 pm

The y-dimension has the same limit as the z-dimensions. Some kernels do use the X dimension for batching, but most of our code use the x-y dimensions for thread configurations. Changing that would be cumbersome because of the many kernels MAGMA has.

BTW, if you problem size is fixed at 3x3, you can just use direct formulas to solve the linear systems.

pramesh · Post by **pramesh** » Thu Jun 11, 2020 5:32 pm

Understood.

Like Cramers's rule ? Hadn't thought of that.

Thanks

Ramesh

MAGMA Forum

Max batch size for dgesv ?

Max batch size for dgesv ?

Re: Max batch size for dgesv ?

Re: Max batch size for dgesv ?

Re: Max batch size for dgesv ?

Re: Max batch size for dgesv ?