1) What is timer resolution used in testing programs (from MAGMA testing directory)
for Linux/x86-64 ?
I obtained some (may be not nonsense) GFLOPS values for C2050 with testing_dgemm run where
small 32 x 32 matrices were used. But like dgemm call requires only something about 1E-5 sec for
one Nehalem/2.7 Ghz core.
2) Do I understand correctly that there are start-stop timings at x86 host side, i.e.
(for example) testing_dgemm printed runtimes (and performance itself) include PCIe transfers delays ?
2) Do I need to perform MAGMA performance tuning (based on src/get_nb.cpp)
after 1.0.0-rc4 installation for NVidia С2050 ?