single-threaded / multi-threaded mkl performance difference
single-threaded / multi-threaded mkl performance difference
I build magma twice, one with single-threaded mkl, then with multi-threaded mkl. It is interesting to find that multi-threaded testing program is almost twice as fast as the single threaded version. I only tested sgeqrf. I wonder why magma performance is so much dependent on mkl. Is it true most calculation is done on GPU?
-
Stan Tomov
- Posts: 283
- Joined: Fri Aug 21, 2009 10:39 pm
Re: single-threaded / multi-threaded mkl performance differe
Yes, most of the computation is done on the GPU but the critical path of many of the algorithms is done on the CPU. Therefore, the CPU code has to be as fast as possible. Currently, the code is tuned to get best performance if you use all the cores of a socket on your host. If you would like to use only one core, you must re-tuned the algorithms to get better performance (e.g., reduce the blocking sizes in file control/get_nb.cpp).