Hi,
I am facing trouble with SGEMM function. I got following output where error approaches to infinity by running testing_sgemm. In version magma1.1 I tried to add additional function but I always gets wrong result for sgemm operation. When I tried just to copy value of A in kernel code then I can not even get the original data back on CPU. I want to ask whether the presented source code for sgemm operation is correct?
> ./testing_sgemm
device 0: GeForce GTX 560 Ti, 1800.0 MHz clock, 1023.6 MB memory, capability 2.1
Usage:
testing_sgemm [-NN|NT|TN|TT] [-N 1024]
Testing transA = N transB = N
M N K MAGMA GFLop/s CUBLAS GFlop/s error
==================================================================
1024 1024 1024 513.63 442.96 1.088222e+38
1280 1280 1280 531.53 512.50 inf
1600 1600 1600 568.85 512.00 1.618371e+38
2000 2000 2000 590.19 515.28 inf
2500 2500 2500 562.16 488.20 inf
3125 3125 3125 591.78 559.36 inf
3906 3906 3906 600.21 560.27 inf
4882 4882 4882 608.12 563.61 inf
6102 6102 6102 605.01 519.26 inf
magma-1.2.0/testing> ./testing_sgemm
device 0: GeForce GTX 560 Ti, 1800.0 MHz clock, 1023.6 MB memory, capability 2.1
Usage:
testing_sgemm [-NN|NT|TN|TT] [-N 1024]
Testing transA = N transB = N
M N K MAGMA GFLop/s CUBLAS GFlop/s error
==================================================================
1024 1024 1024 513.88 441.23 1.626685e+38
1280 1280 1280 533.97 513.88 inf
1600 1600 1600 568.69 512.45 inf
2000 2000 2000 589.58 514.11 inf
2500 2500 2500 561.38 592.78 inf
3125 3125 3125 591.03 559.03 inf
3906 3906 3906 600.06 560.36 inf
4882 4882 4882 607.80 563.23 inf
6102 6102 6102 604.80 591.99 inf
magma-1.2.0/testing> ./testing_sgemm
device 0: GeForce GTX 560 Ti, 1800.0 MHz clock, 1023.6 MB memory, capability 2.1
Usage:
testing_sgemm [-NN|NT|TN|TT] [-N 1024]
Testing transA = N transB = N
M N K MAGMA GFLop/s CUBLAS GFlop/s error
==================================================================
1024 1024 1024 511.79 441.51 inf
1280 1280 1280 532.47 515.59 1.483815e+38
1600 1600 1600 568.14 511.71 inf
2000 2000 2000 589.54 514.82 1.407334e+38
2500 2500 2500 560.17 592.95 1.616552e+38
3125 3125 3125 591.05 559.18 inf
3906 3906 3906 600.19 560.44 inf
4882 4882 4882 607.78 563.55 inf
6102 6102 6102 604.76 591.47 inf
Best Regards,
Muhammad Kashif Hanif
MAGMA SGEMM Results
Re: MAGMA SGEMM Results
Yes, the SGEMM test should work. You can try a small test case and print out the results to see what is going on. E.g., add:
printf( "Magma C=" ); magma_sprint( M, N, h_C, ldc );
printf( "Cublas C=" ); magma_sprint( M, N, h_C2, ldc );
before the Error computation in testing_sgemm.cpp, then run with a small size:
./testing/testing_sgemm -M 10 -N 10 -K 10
(Although small matrices will use different parts of the code than large matrices.)
What is your make.inc file?
-mark
printf( "Magma C=" ); magma_sprint( M, N, h_C, ldc );
printf( "Cublas C=" ); magma_sprint( M, N, h_C2, ldc );
before the Error computation in testing_sgemm.cpp, then run with a small size:
./testing/testing_sgemm -M 10 -N 10 -K 10
(Although small matrices will use different parts of the code than large matrices.)
What is your make.inc file?
-mark
-
- Posts: 4
- Joined: Thu Dec 15, 2011 12:06 pm
Re: MAGMA SGEMM Results
Thanks for reply. I was working with this library to add a new function that performs tropical algebra operation for matrix multiplication. I always get wrong results then I switched back to original to see whether SGEMM really works. For that purpose, I have run the example provided in testing and got wrong results too.
One more question, What kind of changes are necessary to convert matrix multiplication(SGEMM) into tropical algebra matrix multiplication. I thought that changing + with min and * with + is enough in source code.
Here is my make.inc
#
# GPU_TARGET specifies for which GPU you want to compile MAGMA:
# "Tesla" (NVIDIA compute capability 1.x cards)
# "Fermi" (NVIDIA compute capability 2.x cards)
# See http://developer.nvidia.com/cuda-gpus
GPU_TARGET = Fermi
CC = gcc
NVCC = nvcc
FORT = gfortran
ARCH = ar
ARCHFLAGS = cr
RANLIB = ranlib
OPTS = -O3 -DADD_
FOPTS = -O3 -DADD_ -cpp
NVOPTS = --compiler-options -fno-strict-aliasing -DUNIX -O3 -DADD_
LDOPTS = -fPIC -nofor_main -Xlinker -zmuldefs
CUDADIR = /usr/local/cuda
MKLDIR = /home/ti6mkh/intel/mkl/10.2.4.032
LIB = -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_lapack -lmkl_core -liomp5 -lguide -lpthread -lcublas -lcudart -lm
LIBDIR = -L$(MKLDIR)/lib/em64t \
-L$(CUDADIR)/lib64
INC = -I$(CUDADIR)/include
LIBMAGMA = $(MAGMA_DIR)/lib/magma.a
LIBMAGMABLAS = $(MAGMA_DIR)/lib/magmablas.a
Best Regards,
Kashif
One more question, What kind of changes are necessary to convert matrix multiplication(SGEMM) into tropical algebra matrix multiplication. I thought that changing + with min and * with + is enough in source code.
Here is my make.inc
#
# GPU_TARGET specifies for which GPU you want to compile MAGMA:
# "Tesla" (NVIDIA compute capability 1.x cards)
# "Fermi" (NVIDIA compute capability 2.x cards)
# See http://developer.nvidia.com/cuda-gpus
GPU_TARGET = Fermi
CC = gcc
NVCC = nvcc
FORT = gfortran
ARCH = ar
ARCHFLAGS = cr
RANLIB = ranlib
OPTS = -O3 -DADD_
FOPTS = -O3 -DADD_ -cpp
NVOPTS = --compiler-options -fno-strict-aliasing -DUNIX -O3 -DADD_
LDOPTS = -fPIC -nofor_main -Xlinker -zmuldefs
CUDADIR = /usr/local/cuda
MKLDIR = /home/ti6mkh/intel/mkl/10.2.4.032
LIB = -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_lapack -lmkl_core -liomp5 -lguide -lpthread -lcublas -lcudart -lm
LIBDIR = -L$(MKLDIR)/lib/em64t \
-L$(CUDADIR)/lib64
INC = -I$(CUDADIR)/include
LIBMAGMA = $(MAGMA_DIR)/lib/magma.a
LIBMAGMABLAS = $(MAGMA_DIR)/lib/magmablas.a
Best Regards,
Kashif
-
- Posts: 4
- Joined: Thu Dec 15, 2011 12:06 pm
Re: MAGMA SGEMM Results
Thanks mark for your suggestion. For small dataset as you suggested the output is correct but I do not know about large dataset. Even if I try to get only data in first matrix by modifying code in sgemm_fermi.cu, I can not get original data. Can you help in this matter. Please find below output of test case.
> ./testing_sgemm -M 10 -N 10 -K 10
device 0: GeForce GTX 560 Ti, 1800.0 MHz clock, 1023.6 MB memory, capability 2.1
Usage:
testing_sgemm [-NN|NT|TN|TT] [-N 1024]
Testing transA = N transB = N
M N K MAGMA GFLop/s CUBLAS GFlop/s error
==================================================================
Magma C=[
0.5309 0.3141 0.1034 0.4282 0.2781 0.4440 0.5038 0.3560 0.2061 0.5288
0.5131 0.4080 0.5172 0.3967 0.5172 0.5500 0.6277 0.4422 0.7152 0.6393
0.1329 0.1735 0.3162 0.0290 0.3068 0.3444 0.2987 -0.1251 0.3070 0.1384
0.5554 0.4245 0.4006 0.6030 0.2836 0.8638 0.5446 0.6321 0.1984 0.8124
0.1984 0.5509 0.2912 0.3129 0.3680 0.1673 0.5928 0.6926 0.0889 0.3096
0.5013 0.7437 0.3697 0.2912 0.5411 0.5972 0.4771 0.7227 0.2096 0.4974
0.1652 0.8534 0.4741 0.4063 0.2415 0.9189 0.6808 0.5213 0.4076 0.7965
0.5360 0.4215 0.5834 0.3692 0.6807 0.5618 0.6733 0.6388 0.0914 0.5321
0.3629 0.4721 0.6128 0.6224 0.6655 0.8606 0.6841 0.4367 0.5933 0.7321
0.8179 0.5555 0.5132 0.6667 0.7431 0.7788 0.9141 0.7568 0.2550 0.8701
];
Cublas C=[
0.5309 0.3141 0.1034 0.4282 0.2781 0.4440 0.5038 0.3560 0.2062 0.5288
0.5131 0.4080 0.5172 0.3967 0.5172 0.5500 0.6277 0.4422 0.7152 0.6393
0.1329 0.1735 0.3162 0.0290 0.3068 0.3444 0.2987 -0.1251 0.3070 0.1384
0.5554 0.4245 0.4006 0.6030 0.2836 0.8638 0.5446 0.6321 0.1984 0.8124
0.1984 0.5509 0.2912 0.3129 0.3680 0.1673 0.5928 0.6926 0.0889 0.3096
0.5013 0.7437 0.3697 0.2912 0.5411 0.5972 0.4771 0.7227 0.2096 0.4974
0.1652 0.8534 0.4741 0.4063 0.2415 0.9189 0.6808 0.5213 0.4076 0.7965
0.5360 0.4215 0.5834 0.3692 0.6807 0.5618 0.6733 0.6388 0.0914 0.5321
0.3629 0.4721 0.6128 0.6224 0.6655 0.8606 0.6841 0.4367 0.5933 0.7321
0.8179 0.5555 0.5132 0.6667 0.7431 0.7788 0.9141 0.7568 0.2550 0.8701
];
10 10 10 0.02 0.04 5.960464e-08
> ./testing_sgemm
device 0: GeForce GTX 560 Ti, 1800.0 MHz clock, 1023.6 MB memory, capability 2.1
Usage:
testing_sgemm [-NN|NT|TN|TT] [-N 1024]
Testing transA = N transB = N
M N K MAGMA GFLop/s CUBLAS GFlop/s error
==================================================================
1024 1024 1024 509.97 441.05 9.266359e+37
1280 1280 1280 532.61 515.08 9.222629e+01
1600 1600 1600 569.40 511.58 inf
2000 2000 2000 588.65 514.57 inf
2500 2500 2500 560.65 493.84 inf
3125 3125 3125 591.25 559.39 inf
3906 3906 3906 600.55 560.53 inf
4882 4882 4882 608.60 563.71 inf
6102 6102 6102 594.46 571.40 inf
> ./testing_sgemm -M 10 -N 10 -K 10
device 0: GeForce GTX 560 Ti, 1800.0 MHz clock, 1023.6 MB memory, capability 2.1
Usage:
testing_sgemm [-NN|NT|TN|TT] [-N 1024]
Testing transA = N transB = N
M N K MAGMA GFLop/s CUBLAS GFlop/s error
==================================================================
Magma C=[
0.5309 0.3141 0.1034 0.4282 0.2781 0.4440 0.5038 0.3560 0.2061 0.5288
0.5131 0.4080 0.5172 0.3967 0.5172 0.5500 0.6277 0.4422 0.7152 0.6393
0.1329 0.1735 0.3162 0.0290 0.3068 0.3444 0.2987 -0.1251 0.3070 0.1384
0.5554 0.4245 0.4006 0.6030 0.2836 0.8638 0.5446 0.6321 0.1984 0.8124
0.1984 0.5509 0.2912 0.3129 0.3680 0.1673 0.5928 0.6926 0.0889 0.3096
0.5013 0.7437 0.3697 0.2912 0.5411 0.5972 0.4771 0.7227 0.2096 0.4974
0.1652 0.8534 0.4741 0.4063 0.2415 0.9189 0.6808 0.5213 0.4076 0.7965
0.5360 0.4215 0.5834 0.3692 0.6807 0.5618 0.6733 0.6388 0.0914 0.5321
0.3629 0.4721 0.6128 0.6224 0.6655 0.8606 0.6841 0.4367 0.5933 0.7321
0.8179 0.5555 0.5132 0.6667 0.7431 0.7788 0.9141 0.7568 0.2550 0.8701
];
Cublas C=[
0.5309 0.3141 0.1034 0.4282 0.2781 0.4440 0.5038 0.3560 0.2062 0.5288
0.5131 0.4080 0.5172 0.3967 0.5172 0.5500 0.6277 0.4422 0.7152 0.6393
0.1329 0.1735 0.3162 0.0290 0.3068 0.3444 0.2987 -0.1251 0.3070 0.1384
0.5554 0.4245 0.4006 0.6030 0.2836 0.8638 0.5446 0.6321 0.1984 0.8124
0.1984 0.5509 0.2912 0.3129 0.3680 0.1673 0.5928 0.6926 0.0889 0.3096
0.5013 0.7437 0.3697 0.2912 0.5411 0.5972 0.4771 0.7227 0.2096 0.4974
0.1652 0.8534 0.4741 0.4063 0.2415 0.9189 0.6808 0.5213 0.4076 0.7965
0.5360 0.4215 0.5834 0.3692 0.6807 0.5618 0.6733 0.6388 0.0914 0.5321
0.3629 0.4721 0.6128 0.6224 0.6655 0.8606 0.6841 0.4367 0.5933 0.7321
0.8179 0.5555 0.5132 0.6667 0.7431 0.7788 0.9141 0.7568 0.2550 0.8701
];
10 10 10 0.02 0.04 5.960464e-08
> ./testing_sgemm
device 0: GeForce GTX 560 Ti, 1800.0 MHz clock, 1023.6 MB memory, capability 2.1
Usage:
testing_sgemm [-NN|NT|TN|TT] [-N 1024]
Testing transA = N transB = N
M N K MAGMA GFLop/s CUBLAS GFlop/s error
==================================================================
1024 1024 1024 509.97 441.05 9.266359e+37
1280 1280 1280 532.61 515.08 9.222629e+01
1600 1600 1600 569.40 511.58 inf
2000 2000 2000 588.65 514.57 inf
2500 2500 2500 560.65 493.84 inf
3125 3125 3125 591.25 559.39 inf
3906 3906 3906 600.55 560.53 inf
4882 4882 4882 608.60 563.71 inf
6102 6102 6102 594.46 571.40 inf