I succeeded in compiling MAGMA. But for testing purpose, the parallel linked "testing_dsyevd" is faster than the sequential linked binary on GPU, why it is that? Does "magma_dsyevd" have something to run on CPU?
Code: Select all
#
# this is a sequential linked binary
#
./testing_dsyevd -N 4000
device 0: Tesla C2070, 1147.0 MHz clock, 5375.2 MB memory
testing_dsyevd -N 4000
N CPU Time(s) GPU Time(s) ||R||_F / ||A||_F
==========================================================
4000 29.51 11.62 4.113991e-16 2.838989e-13
#
# this is a parallel linked binary
#
./testing_dsyevd -N 4000
device 0: Tesla C2070, 1147.0 MHz clock, 5375.2 MB memory
testing_dsyevd -N 4000
N CPU Time(s) GPU Time(s) ||R||_F / ||A||_F
==========================================================
4000 9.60 7.45 2.607371e-16 4.292615e-13
Code: Select all
-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthreadCode: Select all
-lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread