BLAS benchmarks

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)
Post Reply
mgates3
Posts: 918
Joined: Fri Jan 06, 2012 2:13 pm

BLAS benchmarks

Post by mgates3 » Mon Mar 31, 2014 12:47 pm

Section 4.9 of the MAGMA Users' Guide provides benchmark results that show the Magma library significantly outperforming the CuBLAS library for both matrix-matrix and matrix-vector multiplication. I am looking for a fast BLAS library for GPUs, and based on these results the Magma library could be what I am looking for. Unfortunately these benchmarks are pretty old. Is there an updated version available somewhere?

Thanks.
Benoit
Yes, those results are quite old. CUBLAS subsequently incorporated the MAGMA gemm, and has further optimized it on the Kepler architecture. In general, we now use CUBLAS for most BLAS functions, and focus MAGMA on higher level operations like getrf (LU factorization). One notable exception is symv / hemv, where MAGMA has been faster than CUBLAS.

The best thing to do is compile MAGMA and run the specific BLAS testers you are interested in. These are in magma/testing/. Here are current results on a Kepler K20m (705 MHz) with CUDA 5.5, show magma_dsymv is faster than cublasDsymv, while cublasDgemm is faster than magma_dgemm.

Code: Select all

magma/testing> ./testing_dsymv
Usage: ./testing_dsymv [options] [-h|--help]

    N   MAGMA Gflop/s (ms)  CUBLAS Gflop/s (ms)   CPU Gflop/s (ms)  MAGMA error  CUBLAS error
=============================================================================================
 1088     12.47 (   0.19)       8.77 (   0.27)      0.25 (   9.59)    8.88e-16     8.36e-16
 2112     26.72 (   0.33)      13.11 (   0.68)      8.90 (   1.00)    1.61e-15     1.78e-15
 3136     36.52 (   0.54)      14.64 (   1.34)      5.74 (   3.43)    1.81e-15     1.52e-15
 4160     40.45 (   0.86)      15.97 (   2.17)      5.74 (   6.03)    1.86e-15     1.97e-15
 5184     43.25 (   1.24)      16.46 (   3.27)      6.66 (   8.07)    2.50e-15     2.63e-15
 6208     46.11 (   1.67)      17.00 (   4.53)      6.71 (  11.48)    2.93e-15     2.86e-15
 7232     49.30 (   2.12)      17.32 (   6.04)      5.97 (  17.52)    3.21e-15     2.96e-15
 8256     51.02 (   2.67)      17.58 (   7.76)      6.29 (  21.67)    3.25e-15     3.42e-15
 9280     51.15 (   3.37)      17.69 (   9.74)      6.27 (  27.47)    3.48e-15     3.48e-15
10304     52.44 (   4.05)      17.96 (  11.82)      6.19 (  34.33)    4.15e-15     3.88e-15

magma/testing> ./testing_dgemm
Usage: ./testing_dgemm [options] [-h|--help]

transA = No transpose, transB = No transpose
    M     N     K   MAGMA Gflop/s (ms)  CUBLAS Gflop/s (ms)   CPU Gflop/s (ms)  MAGMA error
======================================================================================
 1088  1088  1088    534.29 (   4.82)     894.06 (   2.88)     ---   (  ---  )    1.61e-16
 2112  2112  2112    581.04 (  32.43)    1013.59 (  18.59)     ---   (  ---  )    1.69e-16
 3136  3136  3136    586.50 ( 105.17)    1024.74 (  60.19)     ---   (  ---  )    1.16e-16
 4160  4160  4160    588.77 ( 244.55)    1035.41 ( 139.06)     ---   (  ---  )    1.76e-16
 5184  5184  5184    589.72 ( 472.47)    1039.30 ( 268.09)     ---   (  ---  )    1.42e-16
 6208  6208  6208    590.12 ( 810.85)    1044.01 ( 458.33)     ---   (  ---  )    1.19e-16
 7232  7232  7232    591.01 (1280.00)    1044.37 ( 724.35)     ---   (  ---  )    2.05e-16
 8256  8256  8256    591.85 (1901.63)    1045.84 (1076.15)     ---   (  ---  )    1.80e-16
 9280  9280  9280    592.05 (2699.70)    1047.91 (1525.28)     ---   (  ---  )    1.61e-16
10304 10304 10304    592.48 (3692.96)    1049.07 (2085.66)     ---   (  ---  )    1.45e-16

benoitsteiner
Posts: 1
Joined: Tue Apr 01, 2014 7:31 pm
Contact:

Re: BLAS benchmarks

Post by benoitsteiner » Tue Apr 01, 2014 7:58 pm

Thanks for the detailed answer. I am currently focusing on speeding up basic vector and matrix operations, but I'll definitely give Magma another look when I start working on matrix factorizations.

Benoit

Post Reply