MAGMA SGEMM Results

mkashifhanif · Post by **mkashifhanif** » Mon May 21, 2012 8:44 am

Hi,

I am facing trouble with SGEMM function. I got following output where error approaches to infinity by running testing_sgemm. In version magma1.1 I tried to add additional function but I always gets wrong result for sgemm operation. When I tried just to copy value of A in kernel code then I can not even get the original data back on CPU. I want to ask whether the presented source code for sgemm operation is correct?

> ./testing_sgemm
device 0: GeForce GTX 560 Ti, 1800.0 MHz clock, 1023.6 MB memory, capability 2.1

Usage:
testing_sgemm [-NN|NT|TN|TT] [-N 1024]

Testing transA = N transB = N
M N K MAGMA GFLop/s CUBLAS GFlop/s error
==================================================================
1024 1024 1024 513.63 442.96 1.088222e+38
1280 1280 1280 531.53 512.50 inf
1600 1600 1600 568.85 512.00 1.618371e+38
2000 2000 2000 590.19 515.28 inf
2500 2500 2500 562.16 488.20 inf
3125 3125 3125 591.78 559.36 inf
3906 3906 3906 600.21 560.27 inf
4882 4882 4882 608.12 563.61 inf
6102 6102 6102 605.01 519.26 inf
magma-1.2.0/testing> ./testing_sgemm
device 0: GeForce GTX 560 Ti, 1800.0 MHz clock, 1023.6 MB memory, capability 2.1

Usage:
testing_sgemm [-NN|NT|TN|TT] [-N 1024]

Testing transA = N transB = N
M N K MAGMA GFLop/s CUBLAS GFlop/s error
==================================================================
1024 1024 1024 513.88 441.23 1.626685e+38
1280 1280 1280 533.97 513.88 inf
1600 1600 1600 568.69 512.45 inf
2000 2000 2000 589.58 514.11 inf
2500 2500 2500 561.38 592.78 inf
3125 3125 3125 591.03 559.03 inf
3906 3906 3906 600.06 560.36 inf
4882 4882 4882 607.80 563.23 inf
6102 6102 6102 604.80 591.99 inf
magma-1.2.0/testing> ./testing_sgemm
device 0: GeForce GTX 560 Ti, 1800.0 MHz clock, 1023.6 MB memory, capability 2.1

Usage:
testing_sgemm [-NN|NT|TN|TT] [-N 1024]

Testing transA = N transB = N
M N K MAGMA GFLop/s CUBLAS GFlop/s error
==================================================================
1024 1024 1024 511.79 441.51 inf
1280 1280 1280 532.47 515.59 1.483815e+38
1600 1600 1600 568.14 511.71 inf
2000 2000 2000 589.54 514.82 1.407334e+38
2500 2500 2500 560.17 592.95 1.616552e+38
3125 3125 3125 591.05 559.18 inf
3906 3906 3906 600.19 560.44 inf
4882 4882 4882 607.78 563.55 inf
6102 6102 6102 604.76 591.47 inf

Best Regards,
Muhammad Kashif Hanif

mgates3 · Post by **mgates3** » Mon May 21, 2012 12:24 pm

Yes, the SGEMM test should work. You can try a small test case and print out the results to see what is going on. E.g., add:
printf( "Magma C=" ); magma_sprint( M, N, h_C, ldc );
printf( "Cublas C=" ); magma_sprint( M, N, h_C2, ldc );
before the Error computation in testing_sgemm.cpp, then run with a small size:
./testing/testing_sgemm -M 10 -N 10 -K 10
(Although small matrices will use different parts of the code than large matrices.)

What is your make.inc file?

-mark

mkashifhanif · Post by **mkashifhanif** » Wed May 23, 2012 11:49 am

Thanks for reply. I was working with this library to add a new function that performs tropical algebra operation for matrix multiplication. I always get wrong results then I switched back to original to see whether SGEMM really works. For that purpose, I have run the example provided in testing and got wrong results too.
One more question, What kind of changes are necessary to convert matrix multiplication(SGEMM) into tropical algebra matrix multiplication. I thought that changing + with min and * with + is enough in source code.
Here is my make.inc

#
# GPU_TARGET specifies for which GPU you want to compile MAGMA:
# "Tesla" (NVIDIA compute capability 1.x cards)
# "Fermi" (NVIDIA compute capability 2.x cards)
# See http://developer.nvidia.com/cuda-gpus

GPU_TARGET = Fermi

CC = gcc
NVCC = nvcc
FORT = gfortran

ARCH = ar
ARCHFLAGS = cr
RANLIB = ranlib

OPTS = -O3 -DADD_
FOPTS = -O3 -DADD_ -cpp
NVOPTS = --compiler-options -fno-strict-aliasing -DUNIX -O3 -DADD_
LDOPTS = -fPIC -nofor_main -Xlinker -zmuldefs

CUDADIR = /usr/local/cuda

MKLDIR = /home/ti6mkh/intel/mkl/10.2.4.032

LIB = -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_lapack -lmkl_core -liomp5 -lguide -lpthread -lcublas -lcudart -lm

LIBDIR = -L$(MKLDIR)/lib/em64t \
-L$(CUDADIR)/lib64
INC = -I$(CUDADIR)/include

LIBMAGMA = $(MAGMA_DIR)/lib/magma.a
LIBMAGMABLAS = $(MAGMA_DIR)/lib/magmablas.a

Best Regards,
Kashif

mkashifhanif · Post by **mkashifhanif** » Fri May 25, 2012 5:18 am

Thanks mark for your suggestion. For small dataset as you suggested the output is correct but I do not know about large dataset. Even if I try to get only data in first matrix by modifying code in sgemm_fermi.cu, I can not get original data. Can you help in this matter. Please find below output of test case.

> ./testing_sgemm -M 10 -N 10 -K 10
device 0: GeForce GTX 560 Ti, 1800.0 MHz clock, 1023.6 MB memory, capability 2.1

Usage:
testing_sgemm [-NN|NT|TN|TT] [-N 1024]

Testing transA = N transB = N
M N K MAGMA GFLop/s CUBLAS GFlop/s error
==================================================================
Magma C=[
0.5309 0.3141 0.1034 0.4282 0.2781 0.4440 0.5038 0.3560 0.2061 0.5288
0.5131 0.4080 0.5172 0.3967 0.5172 0.5500 0.6277 0.4422 0.7152 0.6393
0.1329 0.1735 0.3162 0.0290 0.3068 0.3444 0.2987 -0.1251 0.3070 0.1384
0.5554 0.4245 0.4006 0.6030 0.2836 0.8638 0.5446 0.6321 0.1984 0.8124
0.1984 0.5509 0.2912 0.3129 0.3680 0.1673 0.5928 0.6926 0.0889 0.3096
0.5013 0.7437 0.3697 0.2912 0.5411 0.5972 0.4771 0.7227 0.2096 0.4974
0.1652 0.8534 0.4741 0.4063 0.2415 0.9189 0.6808 0.5213 0.4076 0.7965
0.5360 0.4215 0.5834 0.3692 0.6807 0.5618 0.6733 0.6388 0.0914 0.5321
0.3629 0.4721 0.6128 0.6224 0.6655 0.8606 0.6841 0.4367 0.5933 0.7321
0.8179 0.5555 0.5132 0.6667 0.7431 0.7788 0.9141 0.7568 0.2550 0.8701
];
Cublas C=[
0.5309 0.3141 0.1034 0.4282 0.2781 0.4440 0.5038 0.3560 0.2062 0.5288
0.5131 0.4080 0.5172 0.3967 0.5172 0.5500 0.6277 0.4422 0.7152 0.6393
0.1329 0.1735 0.3162 0.0290 0.3068 0.3444 0.2987 -0.1251 0.3070 0.1384
0.5554 0.4245 0.4006 0.6030 0.2836 0.8638 0.5446 0.6321 0.1984 0.8124
0.1984 0.5509 0.2912 0.3129 0.3680 0.1673 0.5928 0.6926 0.0889 0.3096
0.5013 0.7437 0.3697 0.2912 0.5411 0.5972 0.4771 0.7227 0.2096 0.4974
0.1652 0.8534 0.4741 0.4063 0.2415 0.9189 0.6808 0.5213 0.4076 0.7965
0.5360 0.4215 0.5834 0.3692 0.6807 0.5618 0.6733 0.6388 0.0914 0.5321
0.3629 0.4721 0.6128 0.6224 0.6655 0.8606 0.6841 0.4367 0.5933 0.7321
0.8179 0.5555 0.5132 0.6667 0.7431 0.7788 0.9141 0.7568 0.2550 0.8701
];
10 10 10 0.02 0.04 5.960464e-08

> ./testing_sgemm
device 0: GeForce GTX 560 Ti, 1800.0 MHz clock, 1023.6 MB memory, capability 2.1

Usage:
testing_sgemm [-NN|NT|TN|TT] [-N 1024]

Testing transA = N transB = N
M N K MAGMA GFLop/s CUBLAS GFlop/s error
==================================================================
1024 1024 1024 509.97 441.05 9.266359e+37
1280 1280 1280 532.61 515.08 9.222629e+01
1600 1600 1600 569.40 511.58 inf
2000 2000 2000 588.65 514.57 inf
2500 2500 2500 560.65 493.84 inf
3125 3125 3125 591.25 559.39 inf
3906 3906 3906 600.55 560.53 inf
4882 4882 4882 608.60 563.71 inf
6102 6102 6102 594.46 571.40 inf

MAGMA Forum

MAGMA SGEMM Results

MAGMA SGEMM Results

Re: MAGMA SGEMM Results

Re: MAGMA SGEMM Results

Re: MAGMA SGEMM Results