I was not able to create a bug report in magma mercurial issues, so I'll report it here.
The subject of this message summarizes the issue, here's a reproducer based on pytorch:
Code: Select all
>>> import torch
>>> m, n = 3, 3
>>> torch.ones(1, m, n, device='cuda').lu()
(tensor([[[1., 1., 1.],
[1., 0., 0.],
[1., nan, nan]]], device='cuda:0'), tensor([[1, 2, 3]], device='cuda:0', dtype=torch.int32))
The source of this issue is likely in the kernel functions implemented in magmablas/zgetrf_batched_smallsq_shfl.cu and ./magmablas/zgetrf_batched_smallsq_noshfl.cu .
Best regards,
Pearu