Batched LU performance

maddyscientist · Post by **maddyscientist** » Tue Dec 15, 2015 9:03 pm

I'm using the batched LU routine in MAGMA, though performance is perhaps less than I would have expected. I have using 32x32 matrices, with a batch size of 25,000, and running on an M6000 I am getting <20 GFLOPS. This compares to matrix inversion (magma_cgetri_outofplace_batched) of 120 GFLOPS. Is this performance expected? For such a large batch size I would have expected better performance.

I also tried the no-pivot variant, which seems marginally faster (10%), though since there is not a no-pivot variant of batched cgetri so I can't use it anyway.

Thanks

haidar · Post by **haidar** » Fri Feb 26, 2016 11:56 am

Sorry for the late answer delay.
what precision was for LU (cgetrf ?)
what is the peak of your machine for this kind of precision?
if you are really interested by this size I will check if we can provide you a specific version that is special designed to 32x32.
Thanks
Azzam

MAGMA Forum

Batched LU performance

Batched LU performance

Re: Batched LU performance