Gpu worse than cpu

thanasis_giannis · Post by **thanasis_giannis** » Wed Oct 18, 2017 12:32 pm

Hello,
I have a piece of code that originally is running with mkl. I did put magma instead of mkl commands and the timing was worse. The function that I am timing has no data transfers. I really don t know what might have been wrong. Also, I am new to this :D

mgates3 · Post by **mgates3** » Wed Oct 18, 2017 5:46 pm

Please provide more details. What function? What size matrix? What was MKL's performance and MAGMA's performance? What is your computer hardware?

The input & output of one of MAGMA's testers provides much of this, e.g., on a machine with two 8-core Intel E5-2670 (Sandy Bridge):

Code: Select all

bunsen magma/testing> ./testing_dgetrf -n 4000 --lapack
% MAGMA 2.2.0 svn compiled for CUDA capability >= 3.5, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 7050, driver 9000. OpenMP threads 16. MKL 11.3.0, MKL threads 16. 
% device 0: Tesla K40c, 745.0 MHz clock, 11439.9 MiB memory, capability 3.5
% device 1: Tesla K40c, 745.0 MHz clock, 11439.9 MiB memory, capability 3.5
% Wed Oct 18 17:42:42 2017
% Usage: ./testing_dgetrf [options] [-h|--help]

% ngpu 1, version 1
%   M     N   CPU Gflop/s (sec)   GPU Gflop/s (sec)   |PA-LU|/(N*|A|)
%========================================================================
 4000  4000    145.91 (   0.29)    351.77 (   0.12)     ---

thanasis_giannis · Post by **thanasis_giannis** » Thu Oct 19, 2017 9:39 am

Well....maybe I have found something...according to google....functions like axpy are likely to be slower in gpu. Something that has to do with memory. My code is full functions like axpy..so i guess that is...

mgates3 · Post by **mgates3** » Fri Oct 20, 2017 8:28 pm

It all depends on the problem size. GPUs have faster memory, so even an axpy can be faster, but it would have to be rather large. For example:

Code: Select all

bunsen magma/testing> ./testing_daxpy -n 123 -n 1234 -n 1000:20000:1000
% MAGMA 2.2.0 svn compiled for CUDA capability >= 3.5, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 7050, driver 9000. OpenMP threads 16. MKL 11.3.0, MKL threads 16. 
% device 0: Tesla K40c, 745.0 MHz clock, 11439.9 MiB memory, capability 3.5
% device 1: Tesla K40c, 745.0 MHz clock, 11439.9 MiB memory, capability 3.5
% Fri Oct 20 20:27:42 2017
% Usage: ./testing_daxpy [options] [-h|--help]

%   M   cnt     cuBLAS Gflop/s (ms)       CPU Gflop/s (ms)  cuBLAS error
%===========================================================================
  123   100      0.0401 (   0.6130)      0.3155 (   0.0780)    0.00e+00   ok
 1234   100      0.4149 (   0.5949)      1.3479 (   0.1831)    0.00e+00   ok
 1000   100      0.3343 (   0.5982)      1.3422 (   0.1490)    0.00e+00   ok
 2000   100      0.6792 (   0.5889)      1.3707 (   0.2918)    0.00e+00   ok
 3000   100      1.0030 (   0.5982)      1.1382 (   0.5271)    0.00e+00   ok
 4000   100      1.2965 (   0.6170)      1.3843 (   0.5779)    0.00e+00   ok
 5000   100      1.7239 (   0.5801)      1.5336 (   0.6521)    0.00e+00   ok
 6000   100      1.8491 (   0.6490)      1.6783 (   0.7150)    0.00e+00   ok
 7000   100      2.2507 (   0.6220)      1.7327 (   0.8080)    0.00e+00   ok
 8000   100      2.4236 (   0.6602)      1.9185 (   0.8340)    0.00e+00   ok
 9000   100      2.7109 (   0.6640)      1.9524 (   0.9220)    0.00e+00   ok
10000   100      3.1150 (   0.6421)      2.0515 (   0.9749)    0.00e+00   ok
11000   100      3.3180 (   0.6630)      2.1091 (   1.0431)    0.00e+00   ok
12000   100      3.4676 (   0.6921)      2.1960 (   1.0929)    0.00e+00   ok
13000   100      3.5861 (   0.7250)      2.2071 (   1.1780)    0.00e+00   ok
14000   100      3.9056 (   0.7169)      2.3488 (   1.1921)    0.00e+00   ok
15000   100      4.1094 (   0.7300)      2.2558 (   1.3299)    0.00e+00   ok
16000   100      4.1930 (   0.7632)      2.3054 (   1.3881)    0.00e+00   ok
17000   100      4.4859 (   0.7579)      2.3256 (   1.4620)    0.00e+00   ok
18000   100      4.7245 (   0.7620)      2.3530 (   1.5299)    0.00e+00   ok
19000   100      4.7506 (   0.7999)      2.4517 (   1.5500)    0.00e+00   ok
20000   100      5.1213 (   0.7811)      2.4347 (   1.6429)    0.00e+00   ok

MAGMA Forum

Gpu worse than cpu

Gpu worse than cpu

Re: Gpu worse than cpu

Re: Gpu worse than cpu

Re: Gpu worse than cpu