Yes, those results are quite old. CUBLAS subsequently incorporated the MAGMA gemm, and has further optimized it on the Kepler architecture. In general, we now use CUBLAS for most BLAS functions, and focus MAGMA on higher level operations like getrf (LU factorization). One notable exception is symv / hemv, where MAGMA has been faster than CUBLAS.Section 4.9 of the MAGMA Users' Guide provides benchmark results that show the Magma library significantly outperforming the CuBLAS library for both matrix-matrix and matrix-vector multiplication. I am looking for a fast BLAS library for GPUs, and based on these results the Magma library could be what I am looking for. Unfortunately these benchmarks are pretty old. Is there an updated version available somewhere?
Thanks.
Benoit
The best thing to do is compile MAGMA and run the specific BLAS testers you are interested in. These are in magma/testing/. Here are current results on a Kepler K20m (705 MHz) with CUDA 5.5, show magma_dsymv is faster than cublasDsymv, while cublasDgemm is faster than magma_dgemm.
Code: Select all
magma/testing> ./testing_dsymv
Usage: ./testing_dsymv [options] [-h|--help]
N MAGMA Gflop/s (ms) CUBLAS Gflop/s (ms) CPU Gflop/s (ms) MAGMA error CUBLAS error
=============================================================================================
1088 12.47 ( 0.19) 8.77 ( 0.27) 0.25 ( 9.59) 8.88e-16 8.36e-16
2112 26.72 ( 0.33) 13.11 ( 0.68) 8.90 ( 1.00) 1.61e-15 1.78e-15
3136 36.52 ( 0.54) 14.64 ( 1.34) 5.74 ( 3.43) 1.81e-15 1.52e-15
4160 40.45 ( 0.86) 15.97 ( 2.17) 5.74 ( 6.03) 1.86e-15 1.97e-15
5184 43.25 ( 1.24) 16.46 ( 3.27) 6.66 ( 8.07) 2.50e-15 2.63e-15
6208 46.11 ( 1.67) 17.00 ( 4.53) 6.71 ( 11.48) 2.93e-15 2.86e-15
7232 49.30 ( 2.12) 17.32 ( 6.04) 5.97 ( 17.52) 3.21e-15 2.96e-15
8256 51.02 ( 2.67) 17.58 ( 7.76) 6.29 ( 21.67) 3.25e-15 3.42e-15
9280 51.15 ( 3.37) 17.69 ( 9.74) 6.27 ( 27.47) 3.48e-15 3.48e-15
10304 52.44 ( 4.05) 17.96 ( 11.82) 6.19 ( 34.33) 4.15e-15 3.88e-15
magma/testing> ./testing_dgemm
Usage: ./testing_dgemm [options] [-h|--help]
transA = No transpose, transB = No transpose
M N K MAGMA Gflop/s (ms) CUBLAS Gflop/s (ms) CPU Gflop/s (ms) MAGMA error
======================================================================================
1088 1088 1088 534.29 ( 4.82) 894.06 ( 2.88) --- ( --- ) 1.61e-16
2112 2112 2112 581.04 ( 32.43) 1013.59 ( 18.59) --- ( --- ) 1.69e-16
3136 3136 3136 586.50 ( 105.17) 1024.74 ( 60.19) --- ( --- ) 1.16e-16
4160 4160 4160 588.77 ( 244.55) 1035.41 ( 139.06) --- ( --- ) 1.76e-16
5184 5184 5184 589.72 ( 472.47) 1039.30 ( 268.09) --- ( --- ) 1.42e-16
6208 6208 6208 590.12 ( 810.85) 1044.01 ( 458.33) --- ( --- ) 1.19e-16
7232 7232 7232 591.01 (1280.00) 1044.37 ( 724.35) --- ( --- ) 2.05e-16
8256 8256 8256 591.85 (1901.63) 1045.84 (1076.15) --- ( --- ) 1.80e-16
9280 9280 9280 592.05 (2699.70) 1047.91 (1525.28) --- ( --- ) 1.61e-16
10304 10304 10304 592.48 (3692.96) 1049.07 (2085.66) --- ( --- ) 1.45e-16