Best option to solve many small linear systems (batched)
Posted: Tue May 26, 2015 4:00 pm
What is the best option to solve many (1000-5000) small linear systems of size approximately 200?
I have tried the magma_sgesv_batched routine and achieve a performance of approximately 40 GFlop/s (on a K80). I use MAGMA 1.6.1 using the Intel compiler 15.0.0 and Intel MKL as the CPU Lapack interface. Performance is excellent for large matrices.
On the other hand using Intel MKL on a dual socket E5-2630 I get approximately 500 GFlop/s for the same problem. Incidentally, the performance of Intel MKL on the Xeon Phi is similar to the performance on the GPU (approxmately 40 GFlop/s which is also somewhat disappointing).
Are there any other options? If not what is actual the limiting factor in this problem. I realize that the flop/byte ratio is not as favorable as for large matrices. Nevertheless, the difference between the CPU and GPU seems too large. Especially since the batch size is quite large.
Thank you,
Lukas
I have tried the magma_sgesv_batched routine and achieve a performance of approximately 40 GFlop/s (on a K80). I use MAGMA 1.6.1 using the Intel compiler 15.0.0 and Intel MKL as the CPU Lapack interface. Performance is excellent for large matrices.
On the other hand using Intel MKL on a dual socket E5-2630 I get approximately 500 GFlop/s for the same problem. Incidentally, the performance of Intel MKL on the Xeon Phi is similar to the performance on the GPU (approxmately 40 GFlop/s which is also somewhat disappointing).
Are there any other options? If not what is actual the limiting factor in this problem. I realize that the flop/byte ratio is not as favorable as for large matrices. Nevertheless, the difference between the CPU and GPU seems too large. Especially since the batch size is quite large.
Thank you,
Lukas