I tried magma 0.2 recompiled by me for fedora core 12 x86_64 with the nvidia fermi GTX470 and here are the make.inc.goto and the results from the testing directory.
Note that in the following make.inc.goto the place of the libgoto2.a is hard coded. You will have to modify that line for your own installation. I used GotoBLAS2 version 1.08
and not the fatal prime revision level 1.13!
Code: Select all
#//////////////////////////////////////////////////////////////////////////////
# -- MAGMA (version 0.2) --
# Univ. of Tennessee, Knoxville
# Univ. of California, Berkeley
# Univ. of Colorado, Denver
# November 2009
#
# Contributed by: Allan Menezes (Ontario, Canada)
#//////////////////////////////////////////////////////////////////////////////
CC = gcc
NVCC = nvcc
FORT = gfortran
ARCH = ar
ARCHFLAGS = cr
RANLIB = ranlib
OPTS = -O3 -DADD_
NVOPTS = --compiler-options -fno-inline \
--compiler-options -fno-strict-aliasing \
-arch sm_20 -DUNIX -O3
LDOPTS = -fPIC
LIB = -lgoto2 -lpthread -lcublas -lcudart -llapack -lm
CUDADIR = /usr/local/cuda
LIBDIR = -L/bummer/GotoBLAS2 -L/usr/local/cuda/lib64 -L/usr/lib64
INC = -I../include -I$(CUDADIR)/include
LIBMAGMA = ../lib/libmagma.a
LIBMAGMABLAS = ../lib/libmagmablas.a
Code: Select all
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_cgeqrf -N 1024
N CPU GFlop/s GPU GFlop/s ||R - Q'A|| / ||A||
========================================================
1024 36.50 111.93 2.181499e-06
2048 46.43 178.75 2.763979e-06
3072 51.56 218.36 3.224755e-06
4032 55.16 234.06 4.556965e-06
5184 55.70 245.28 5.060306e-06
6016 55.33 251.24 4.582116e-06
7040 55.39 254.75 4.619145e-06
8064 55.40 256.17 5.499504e-06
9088 55.64 261.69 5.515256e-06
10112 56.14 266.43 5.179223e-06
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_cgeqrf_gpu -N 1024
N CPU GFlop/s GPU GFlop/s ||R||_F / ||A||_F
========================================================
1024 37.80 128.12 1.023109e-06
2048 46.48 190.43 2.401207e-06
3072 51.28 228.74 2.559615e-06
4032 55.21 243.32 1.957078e-06
5184 55.77 250.11 2.122840e-06
6016 55.77 255.67 2.449219e-06
7040 55.66 258.14 2.591782e-06
8064 55.77 259.01 2.737253e-06
9088 56.02 266.85 2.923932e-06
10112 56.54 270.95 3.040652e-06
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_cgetrf -N 1024
N CPU GFlop/s GPU GFlop/s ||PA-LU|| / (||A||*N)
==========================================================
1024 31.42 83.22 6.999727e-09
2048 54.95 140.59 7.347308e-09
3072 63.76 170.82 7.411954e-09
4032 66.36 192.51 7.398736e-09
5184 68.36 209.48 7.359929e-09
6016 69.48 217.40 8.188616e-09
7040 70.65 227.44 9.392204e-09
8064 71.62 234.23 1.037497e-08
9088 72.45 239.14 1.169262e-08
10112 73.21 244.07 1.234344e-08
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_cgetrf_gpu -N 1024
N CPU GFlop/s GPU GFlop/s ||PA-LU|| / (||A||*N)
==========================================================
1024 31.59 99.54 6.999727e-09
2048 54.92 162.54 7.347308e-09
3072 63.80 192.46 7.411954e-09
4032 66.34 211.74 7.398736e-09
5184 68.20 226.98 7.359929e-09
6016 58.58 195.69 8.188616e-09
7040 70.66 243.19 9.392204e-09
8064 71.65 248.47 1.037497e-08
9088 72.47 252.31 1.169262e-08
10112 73.19 256.31 1.234344e-08
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_cpotrf -N 1024
N CPU GFlop/s GPU GFlop/s ||R||_F / ||A||_F
========================================================
1024 42.01 55.58 3.142203e-08
2048 54.50 100.90 3.059146e-08
3072 62.13 130.76 2.495757e-08
4032 66.81 152.90 2.337698e-08
5184 69.53 172.24 3.669203e-08
6048 71.10 183.21 3.505888e-08
7200 71.81 194.28 3.032790e-08
8064 73.13 202.82 2.819979e-08
8928 74.01 209.44 3.803236e-08
10080 74.68 217.62 3.449032e-08
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_cpotrf_gpu -N 1024
N CPU GFlop/s GPU GFlop/s ||R||_F / ||A||_F
========================================================
1024 42.14 63.60 2.622518e-08
2048 54.32 112.59 2.492308e-08
3072 62.17 145.93 2.481157e-08
4032 66.76 169.86 2.620765e-08
5184 69.55 190.25 2.439569e-08
6048 71.08 202.95 2.558894e-08
7200 29.30 200.59 2.507422e-08
8064 22.38 198.57 2.604528e-08
8928 12.26 181.75 2.590413e-08
10080 7.67 171.67 2.688114e-08
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_dgehrd -N 1024
N CPU GFlop/s GPU GFlop/s ||A-QHQ'|| / ||A||
========================================================
1024 4.44 13.01 1.033019e-14
2048 5.24 28.14 2.041184e-14
3072 5.47 37.48 3.014136e-14
4032 5.76 43.04 3.655350e-14
5184 5.87 47.42 4.374540e-14
6016 5.97 49.41 5.527769e-14
7040 6.06 52.31 6.843701e-14
8064 6.09 50.92 8.011122e-14
9088 6.13 52.28 9.053372e-14
10112 6.15 52.86 1.020150e-13
This is an Experimental Release of GEMM Routine without Padding
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
./testing_dgemm N
N magmablas0.2 GFLops/s cudablas-2.3 GFlops/s error
=============================================================================
512 87.495259452412 88.621807857379 0.000000e+00
513 98.905272527473 75.845897191011 0.000000e+00
1024 129.506914003136 130.095332162113 0.000000e+00
1025 115.372897471609 78.648210699288 0.000000e+00
1536 131.796576083794 124.845097874393 0.000000e+00
1537 118.730642806926 80.076772922249 0.000000e+00
2048 132.322823812128 132.793312236711 0.000000e+00
2049 120.588236970479 81.342356997646 0.000000e+00
2560 132.439855381361 132.915685940527 0.000000e+00
2561 121.516228543524 80.982599883807 0.000000e+00
3072 132.623483212220 126.676101485846 0.000000e+00
3073 122.216840851325 81.282141104140 0.000000e+00
3584 132.827109486398 133.288593808177 0.000000e+00
3585 122.623658664786 81.031689917914 0.000000e+00
4096 132.759448433612 133.246615217415 0.000000e+00
4097 122.888285506489 81.512556336204 0.000000e+00
4608 132.815062477348 126.900248836469 0.000000e+00
4609 123.219745840591 81.119433330585 0.000000e+00
5120 132.878051484929 113.454911855825 0.000000e+00
5121 123.434462451430 81.831506989988 0.000000e+00
SYMV Double Precision
Usage
testing_dgemv N
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
n CUBLAS,Gflop/s MAGMABLAS0.2,Gflop/s "error"
==============================================================
64 0.34 0.34 0
128 1.02 0.96 0
192 1.76 1.68 0
256 2.43 2.38 0
320 2.28 2.18 0
384 2.76 2.68 0
448 3.26 3.19 0
512 3.74 3.67 0
576 4.23 4.12 0
704 5.27 5.14 0
832 6.29 6.13 0
960 7.29 7.09 0
1088 8.19 8.05 0
1216 9.18 9.04 0
1408 10.57 10.60 0
1600 11.77 11.91 0
1792 13.24 13.24 0
1984 14.58 14.69 0
2240 16.29 16.34 0
2496 18.03 17.90 0
2816 19.82 19.82 0
3136 20.86 21.38 0
3520 22.80 23.27 0
3904 24.52 24.88 0
4352 25.96 26.42 0
4800 27.12 27.44 0
5312 28.12 28.09 0
5888 28.23 28.97 0
6528 29.31 29.52 0
7232 28.89 19.77 0
8000 29.77 21.48 0
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_dgeqlf -N 1024
N CPU GFlop/s GPU GFlop/s ||R||_F / ||A||_F
========================================================
1024 11.89 44.90 1.692476e-15
2048 14.29 67.30 2.412781e-15
3072 16.39 74.41 2.873908e-15
4032 18.16 76.12 2.933770e-15
5184 18.64 79.86 3.183846e-15
6016 18.76 81.25 3.638615e-15
7040 19.38 81.90 4.039368e-15
8064 19.43 82.65 4.212756e-15
9088 19.65 84.56 4.495418e-15
10112 19.92 84.70 4.804705e-15
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_dgeqrf_gpu -N 1024
N CPU GFlop/s GPU GFlop/s ||R||_F / ||A||_F
========================================================
1024 9.37 50.11 1.959699e-15
2048 14.76 72.04 2.642956e-15
3072 17.10 78.26 3.271786e-15
4032 18.78 79.97 3.356442e-15
5184 19.13 83.80 3.752684e-15
6016 19.11 84.36 4.070131e-15
7040 18.91 85.17 4.403128e-15
8064 19.48 85.00 8.071775e-14
9088 19.91 86.54 5.335508e-15
10112 20.19 86.81 5.304265e-15
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_dgeqrs_gpu -N 1024
N CPU GFlop/s GPU GFlop/s || b-Ax || / ||A||
========================================================
1024 11.46 18.10 8.123717e-16
2048 14.57 68.70 9.930360e-15
3072 16.75 76.42 1.639680e-14
4032 18.91 77.09 3.220842e-15
5184 19.19 82.33 2.035707e-15
6016 19.19 82.20 5.951416e-15
7040 19.58 83.66 4.714261e-15
8064 19.72 83.30 1.597581e-14
9088 20.03 85.85 3.800731e-15
10112 20.27 85.73 2.706913e-15
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_dgesv -N 1024
N GPU GFlop/s || b-Ax || / ||A||
========================================================
1024 17.89 4.674082e-16
2048 73.42 4.412873e-15
3072 91.04 9.000011e-15
4032 100.14 1.354061e-15
5184 107.72 1.173067e-15
6016 111.54 3.439470e-15
7040 115.01 2.687958e-15
8064 117.33 5.400273e-15
9088 119.30 1.662879e-15
10112 120.82 1.550303e-15
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_dgetrf -N 1024
N CPU GFlop/s GPU GFlop/s ||PA-LU|| / (||A||*N)
==========================================================
1024 21.52 30.99 3.514640e-18
2048 28.01 58.60 3.258964e-18
3072 30.92 75.15 2.966111e-18
4032 32.42 85.09 3.348630e-18
5184 33.55 93.63 3.333262e-18
6016 34.10 98.19 2.826022e-18
7040 34.67 102.60 2.802706e-18
8064 35.03 105.88 2.761636e-18
9088 35.41 108.70 2.752465e-18
10112 35.76 110.93 2.726653e-18
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_dgetrf_gpu -N 1024
N CPU GFlop/s GPU GFlop/s ||PA-LU|| / (||A||*N)
==========================================================
1024 21.62 40.91 3.514640e-18
2048 28.07 75.82 3.258964e-18
3072 30.92 93.30 2.966111e-18
4032 32.42 101.83 3.348630e-18
5184 33.58 108.97 3.333262e-18
6016 34.15 112.71 2.826022e-18
7040 34.73 115.95 2.802706e-18
8064 35.10 118.14 2.761636e-18
9088 35.44 120.06 2.752465e-18
10112 35.81 121.47 2.726653e-18
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_dpotrf -N 1024
N CPU GFlop/s GPU GFlop/s ||R||_F / ||A||_F
========================================================
1024 22.24 32.39 4.368765e-17
2048 29.41 54.35 5.255033e-17
3072 32.94 68.23 6.129227e-17
4032 34.63 76.24 6.249455e-17
5184 35.85 86.17 6.400078e-17
6144 36.54 91.53 6.514027e-17
6912 36.86 95.21 6.548325e-17
8192 37.46 98.36 6.854160e-17
8960 37.52 99.77 6.936968e-17
9984 37.78 102.23 7.147590e-17
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_dpotrf_gpu -N 1024
N CPU GFlop/s GPU GFlop/s ||R||_F / ||A||_F
========================================================
1024 22.07 44.01 4.368765e-17
2048 29.02 73.15 5.255033e-17
3072 30.23 83.53 6.129227e-17
4032 34.49 90.55 6.249455e-17
5184 35.61 100.20 6.400078e-17
6144 36.41 104.49 6.514027e-17
6912 36.70 106.63 6.548325e-17
8192 37.32 108.58 6.854160e-17
8960 37.34 110.63 6.936968e-17
9984 37.55 111.84 7.147590e-17
Iterative Refinement- QR
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_dsgeqrsv_gpu -N 1024
CPU GFlop/s GPU GFlop/s
N Doule Double Single Mixed || b-Ax || / ||A||
=========================================================================================
1024 11.45 44.34 60.64 12.02 5.555543e-16 2
2048 14.62 68.22 118.94 95.54 6.616966e-15 3
3072 16.86 73.67 176.15 154.06 7.447985e-14 3
4032 18.96 77.00 183.96 158.79 1.186839e-15 5
5184 19.35 79.62 207.41 199.02 8.647975e-15 2
6016 19.20 80.35 215.42 207.75 9.182751e-14 2
7040 19.78 83.38 220.00 212.85 1.021954e-13 2
8000 19.87 84.63 225.21 220.63 nan 1
Iterative Refinement- LU
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
./testing_dsgesv N
Epsilon(Double): 0.00000000000000011102
Epsilon(Single): 0.00000005960464477539
N Double-Factor Double-Solve Single-Factor Sigle-Solve Mixed Precision Solver || b-Ax || / ||A|| NumIter
===========================================================================================================================================================
1024 40.63 38.34 60.97 57.57 43.68 4.854485e-16 2
2048 75.88 73.28 126.34 122.18 99.83 3.058204e-15 3
3072 93.23 91.24 169.73 165.92 143.95 9.219986e-15 3
4032 101.77 100.07 196.95 193.43 174.12 1.260598e-14 3
5184 108.92 107.64 221.36 218.80 201.54 2.280617e-16 3
6016 112.75 111.61 235.93 233.47 217.56 4.468209e-15 3
7040 115.93 114.99 248.62 246.45 229.51 1.037881e-15 4
8064 118.15 117.31 257.57 255.65 237.37 4.916595e-16 4
Iterative Refinement- Cholesky
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
./testing_dsposv -N 1024
Epsilon(Double): 0.00000000000000011102
Epsilon(Single): 0.00000005960464477539
N Double-Factor Double-Solve Single-Factor Sigle-Solve Mixed Precision Solver || b-Ax || / ||A|| NumIter
===============================================================================================================================================================================
1024 43.55 39.24 86.06 72.60 46.66 6.319456e-19 2
2048 69.10 65.04 160.43 146.90 114.21 6.472202e-19 2
3072 84.07 80.60 203.81 192.97 159.96 7.352729e-19 2
4032 90.49 87.98 236.35 223.93 193.59 7.035593e-19 2
5184 99.87 98.28 251.13 243.51 216.65 7.499680e-19 2
6016 102.90 101.58 263.54 256.40 230.72 8.151097e-19 2
7040 105.75 104.06 271.00 265.25 243.64 6.619553e-19 2
8064 109.31 107.68 278.78 272.46 252.88 8.569304e-19 2
SYMV Double Precision
Usage
testing_dsymv N
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
n CUBLAS,Gflop/s MAGMABLAS0.2,Gflop/s "error"
==============================================================
64 0.30 0.41 0
128 0.76 1.42 0
192 1.13 2.84 0
256 1.56 4.23 0
320 1.97 5.69 0
384 2.22 7.19 0
448 2.48 8.54 0
512 2.76 9.36 0
576 2.99 11.44 0
704 3.26 14.58 0
832 3.75 15.73 0
960 3.86 18.43 0
1088 4.27 20.06 0
1216 4.39 22.07 0
1408 3.65 21.43 0
1600 3.55 24.15 0
1792 3.78 26.00 0
1984 3.97 25.15 0
2240 4.26 27.72 0
2496 4.54 22.74 0
2816 4.42 25.17 0
3136 4.40 25.54 0
3520 4.59 26.22 0
3904 4.87 26.19 0
4352 4.66 28.52 0
4800 4.78 26.20 0
5312 5.00 27.38 0
5888 4.84 28.32 0
6528 4.96 28.97 0
7232 4.89 26.88 0
8000 4.98 28.22 0
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_sgehrd -N 1024
N CPU GFlop/s GPU GFlop/s ||A-QHQ'|| / ||A||
========================================================
1024 9.26 23.06 5.627424e-06
2048 10.38 51.78 1.071559e-05
3072 11.27 74.07 1.585271e-05
4032 12.32 87.28 1.977225e-05
5184 12.56 103.49 2.388178e-05
6016 12.69 110.56 3.007145e-05
7040 12.87 117.60 3.626539e-05
8064 13.05 113.27 4.308553e-05
9088 13.27 115.97 4.882232e-05
10112 13.35 116.96 5.359977e-05
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_sgelqf -N 1024
N CPU GFlop/s GPU GFlop/s ||R||_F / ||A||_F
========================================================
1024 9.67 59.95 1.187897e-06
2048 13.02 113.57 2.131274e-06
3072 16.50 164.73 1.629596e-06
4032 25.37 173.60 1.885469e-06
5184 27.76 196.56 2.058025e-06
6016 25.68 204.90 2.222242e-06
7040 26.05 210.13 2.459177e-06
8064 26.02 209.25 2.677387e-06
9088 27.00 219.53 2.847114e-06
10112 27.36 223.11 3.564045e-06
This is an Experimental Release of GEMM Routine without Padding
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
./testing_sgemm N
N magmablas0.2 GFLops/s cudablas-2.3 GFlops/s error
=============================================================================
512 282.8614 291.4609 0.000000e+00
513 231.7694 201.5010 0.000000e+00
1024 324.9332 338.6664 0.000000e+00
1025 278.3742 233.0932 0.000000e+00
1536 335.5443 346.4345 0.000000e+00
1537 289.9318 239.0678 0.000000e+00
2048 337.7808 348.3135 0.000000e+00
2049 294.3298 239.4412 0.000000e+00
2560 338.1004 481.6126 0.000000e+00
2561 297.3452 243.6591 0.000000e+00
3072 338.0602 349.8797 0.000000e+00
3073 297.8834 244.5856 0.000000e+00
3584 337.9794 349.5612 0.000000e+00
3585 299.4270 241.4427 0.000000e+00
4096 337.8256 350.2049 0.000000e+00
4097 299.8963 245.0743 0.000000e+00
4608 337.7648 349.3987 0.000000e+00
4609 300.5507 241.5184 0.000000e+00
5120 337.5137 475.7731 0.000000e+00
5121 300.4340 241.6148 0.000000e+00
SYMV Sinlge Precision
Usage
testing_sgemv N
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
n CUBLAS,Gflop/s MAGMABLAS0.2,Gflop/s "error"
==============================================================
64 0.39 0.43 0
128 1.26 1.37 0
192 2.23 2.46 0
256 3.20 3.64 0
320 4.27 5.00 0
384 5.36 6.41 0
448 4.27 5.65 0
512 4.95 6.55 0
576 5.58 7.46 0
704 6.93 9.26 0
832 8.34 11.16 0
960 9.70 12.98 0
1088 10.91 14.80 0
1216 12.43 16.71 0
1408 14.47 19.44 0
1600 16.36 22.07 0
1792 18.46 24.42 0
1984 20.40 27.05 0
2240 22.81 30.14 0
2496 25.53 33.32 0
2816 28.22 36.88 0
3136 30.97 39.82 0
3520 34.42 43.63 0
3904 37.82 45.56 0
4352 41.13 50.04 0
4800 43.93 51.37 0
5312 47.03 53.39 0
5888 49.78 56.01 0
6528 53.00 57.05 0
7232 55.00 38.61 0
8000 56.44 41.60 0
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_sgeqlf -N 1024
N CPU GFlop/s GPU GFlop/s ||R||_F / ||A||_F
========================================================
1024 19.44 64.44 1.125170e-06
2048 24.77 144.90 1.338669e-06
3072 28.32 175.35 1.474377e-06
4032 34.71 180.29 1.622605e-06
5184 35.88 206.65 1.740285e-06
6016 34.23 213.08 1.913730e-06
7040 34.64 217.16 2.637886e-06
8064 34.41 215.06 2.256273e-06
9088 35.28 223.63 2.377706e-06
10112 35.38 228.77 2.507514e-06
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_sgeqrf_gpu -N 1024
N CPU GFlop/s GPU GFlop/s ||R||_F / ||A||_F
========================================================
1024 12.71 68.69 1.017220e-06
2048 24.77 106.50 1.456536e-06
3072 29.14 187.60 1.708441e-06
4032 35.66 188.93 1.863337e-06
5184 36.54 214.57 2.029180e-06
6016 35.08 220.63 1.398288e-05
7040 35.55 224.67 2.542552e-06
8064 35.81 221.56 4.327869e-05
9088 36.47 231.36 2.765864e-06
10112 36.66 234.06 2.806814e-06
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_sgeqrf_gpu -N 1024
N CPU GFlop/s GPU GFlop/s ||R||_F / ||A||_F
========================================================
1024 20.31 67.15 3.451768e-01
2048 25.58 129.45 3.397885e-01
3072 29.29 183.56 3.547670e-01
4032 36.16 190.09 3.540365e-01
5184 37.12 211.29 3.732369e-01
6016 35.47 218.60 3.738632e-01
7040 35.51 766.10 1.049284e+00
8064 35.76 911.69 1.359362e+00
9088 36.31 931.08 1.576229e+00
10112 36.62 784.19 1.747851e+00
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_sgeqrs_gpu -N 1024
N CPU GFlop/s GPU GFlop/s || b-Ax || / ||A||
========================================================
1024 18.67 60.44 9.912942e-07
2048 24.25 119.77 7.115094e-06
3072 28.42 177.13 1.680525e-05
4032 34.89 184.41 4.287619e-05
5184 36.44 208.36 1.129348e-06
6016 34.89 215.91 3.104939e-06
7040 35.32 220.43 2.191862e-06
8064 35.46 218.31 2.264476e-05
9088 36.16 228.50 2.229614e-06
10112 36.46 231.59 1.558051e-06
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_sgesv -N 1024
N GPU GFlop/s || b-Ax || / ||A||
========================================================
1024 21.97 2.606234e-07
2048 132.57 2.100937e-06
3072 190.06 4.839033e-06
4032 220.26 8.308236e-07
5184 246.60 6.308150e-07
6016 261.09 1.857649e-06
7040 272.15 1.403984e-06
8064 277.46 2.878860e-06
9088 285.19 1.073657e-06
10112 289.90 7.553145e-07
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_sgetrf -N 1024
N CPU GFlop/s GPU GFlop/s ||PA-LU|| / (||A||*N)
==========================================================
1024 26.48 47.34 1.895988e-09
2048 47.20 110.54 1.774122e-09
3072 55.86 157.02 1.715500e-09
4032 61.06 185.45 1.804801e-09
5184 63.95 211.50 1.798921e-09
6016 65.40 225.40 1.675016e-09
7040 66.91 238.37 1.659101e-09
8064 68.19 246.43 1.770623e-09
9088 68.90 255.79 1.981117e-09
10112 69.75 262.25 2.168543e-09
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_sgetrf_gpu -N 1024
N CPU GFlop/s GPU GFlop/s ||PA-LU|| / (||A||*N)
==========================================================
1024 27.34 58.41 1.895988e-09
2048 47.18 139.03 1.774122e-09
3072 55.94 196.13 1.715500e-09
4032 61.11 225.97 1.804801e-09
5184 64.11 250.61 1.798921e-09
6016 65.45 264.62 1.675016e-09
7040 67.00 275.35 1.659101e-09
8064 68.20 280.27 1.770623e-09
9088 68.95 287.74 1.981117e-09
10112 69.84 292.34 2.168543e-09
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_sgetrf_gpu -N 1024
N CPU GFlop/s GPU GFlop/s ||PA-LU|| / (||A||*N)
==========================================================
1024 27.28 61.22 1.838853e-09
2048 47.26 126.08 1.749864e-09
3072 55.99 169.39 1.699750e-09
4032 61.11 196.44 1.793962e-09
5184 63.61 221.62 1.803271e-09
6016 65.45 235.98 1.655473e-09
7040 66.87 248.56 1.652063e-09
8064 68.20 257.26 1.761585e-09
9088 68.94 265.47 1.984430e-09
10112 69.79 272.07 2.142862e-09
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_spotrf -N 1024
N CPU GFlop/s GPU GFlop/s ||R||_F / ||A||_F
========================================================
1024 34.39 62.78 1.773349e-08
2048 45.09 121.45 2.295336e-08
3072 56.14 159.63 2.736194e-08
4032 61.52 190.96 3.472694e-08
5184 65.33 211.82 3.655633e-08
6048 67.36 223.59 3.691891e-08
7200 69.34 237.23 3.886026e-08
8064 70.61 244.79 3.935199e-08
8928 71.50 251.59 4.046260e-08
10080 72.24 258.25 4.227979e-08
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_spotrf_gpu -N 1024
N CPU GFlop/s GPU GFlop/s ||R||_F / ||A||_F
========================================================
1024 34.52 83.72 1.773349e-08
2048 45.08 159.27 2.295336e-08
3072 56.24 206.71 2.736194e-08
4032 61.54 237.60 3.472694e-08
5184 65.44 251.24 3.655633e-08
6048 67.42 262.63 3.691891e-08
7200 69.35 274.40 3.886026e-08
8064 70.67 278.48 3.935199e-08
8928 71.56 285.03 4.046260e-08
10080 72.27 289.72 4.227979e-08
SYMV Sinlge Precision
Usage
testing_ssymv N
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
n CUBLAS,Gflop/s MAGMABLAS0.2,Gflop/s "error"
==============================================================
64 0.11 0.43 0
128 0.23 1.56 0
192 0.34 3.07 0
256 0.47 5.04 0
320 0.55 7.06 0
384 0.67 9.51 0
448 0.77 11.81 0
512 0.95 14.17 0
576 0.96 15.80 0
704 1.19 19.82 0
832 1.39 22.70 0
960 1.64 27.11 0
1088 1.83 29.97 0
1216 1.97 32.86 0
1408 2.17 35.40 0
1600 2.27 40.00 0
1792 2.46 43.99 0
1984 2.60 47.42 0
2240 2.82 51.20 0
2496 3.04 38.58 0
2816 3.25 42.86 0
3136 3.35 45.11 0
3520 3.50 49.66 0
3904 3.65 51.15 0
4352 3.88 54.74 0
4800 3.92 48.00 0
5312 3.93 49.81 0
5888 4.04 54.51 0
6528 4.14 57.05 0
7232 4.15 51.48 0
8000 4.16 53.02 0
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_zgeqrf -N 1024
N CPU GFlop/s GPU GFlop/s ||R - Q'A|| / ||A||
========================================================
1024 22.86 56.20 3.392327e-15
2048 27.36 72.03 4.522000e-15
3072 29.05 75.70 5.990881e-15
4032 29.67 75.66 8.269346e-15
5184 30.27 80.70 9.103060e-15
6016 30.66 83.31 8.365426e-15
7040 30.89 86.44 9.857029e-15
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_zgeqrf_gpu -N 1024
N CPU GFlop/s GPU GFlop/s ||R||_F / ||A||_F
========================================================
1024 22.44 59.00 2.610599e-15
2048 27.06 73.10 3.576645e-15
3072 29.12 77.27 4.165347e-15
4032 29.68 76.74 4.900768e-15
5184 30.29 81.51 5.629466e-15
6016 30.61 84.73 8.115494e-15
7040 30.83 87.30 6.216535e-15
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_zgetrf -N 1024
N CPU GFlop/s GPU GFlop/s ||PA-LU|| / (||A||*N)
==========================================================
1024 26.54 39.44 1.074309e-17
2048 33.16 68.37 1.088811e-17
3072 35.20 84.17 1.080715e-17
4032 36.20 93.07 1.072353e-17
5184 36.93 100.58 1.056042e-17
6016 37.41 104.18 1.019869e-17
7040 37.73 107.92 1.018500e-17
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_zgetrf_gpu -N 1024
N CPU GFlop/s GPU GFlop/s ||PA-LU|| / (||A||*N)
==========================================================
1024 26.73 46.89 1.927441e-17
2048 33.10 78.82 1.740228e-17
3072 35.31 94.59 1.553913e-17
4032 36.30 102.69 1.492815e-17
5184 36.91 108.89 1.416735e-17
6016 37.43 112.13 1.359233e-17
7040 37.77 114.76 1.304601e-17
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_zpotrf -N 1024
N CPU GFlop/s GPU GFlop/s ||R||_F / ||A||_F
========================================================
1024 26.57 25.74 5.541044e-17
2048 34.20 41.33 5.350120e-17
3072 36.19 50.52 4.189603e-17
4032 37.04 56.24 3.490686e-17
5184 37.80 61.01 5.823441e-17
6048 38.13 63.37 5.342887e-17
7200 38.43 66.12 4.425605e-17
device 0: GeForce GTX 470, 1215.0 MHz clock, 1279.2 MB memory
Usage:
testing_zpotrf_gpu -N 1024
N CPU GFlop/s GPU GFlop/s ||R||_F / ||A||_F
========================================================
1024 28.13 27.90 4.521316e-17
2048 34.14 45.24 4.509872e-17
3072 36.29 54.87 4.351074e-17
4032 37.33 60.65 4.324125e-17
5184 37.87 64.95 4.082688e-17
6048 38.30 67.04 3.924860e-17
7200 38.48 69.45 3.899940e-17
compute 2.0 CUDA devices and below as above make.inc.goto.Furthermore the makefile make.inc.goto has been modified so magma can be
complied with gcc versions >=4.3 including 4.4.3 with the inclusion of --compiler-options -fno-inline in NVOPTS as per a NVIDIA forum with just make!
Here it is below:
Code: Select all
#//////////////////////////////////////////////////////////////////////////////
# -- MAGMA (version 0.2) --
# Univ. of Tennessee
# Univ. of California Berkeley
# November 2009
#//////////////////////////////////////////////////////////////////////////////
include ../make.inc
ALLSRC = sinplace_transpose.cu \
stranspose.cu \
spermute.cu \
sdlaswp.cu \
\
sauxiliary.cu \
dauxiliary.cu \
\
dinplace_transpose.cu \
dtranspose.cu \
dpermute.cu \
\
cinplace_transpose.cu \
ctranspose.cu \
cpermute.cu \
\
zinplace_transpose.cu \
ztranspose.cu \
zpermute.cu \
ztrsm.cu \
ztrmm.cu \
zherk.cu \
\
ctrsm.cu \
ctrmm.cu \
csyrk.cu \
cherk.cu \
\
sgemv.cu \
dgemv.cu \
gemv32.cu \
\
magma_dlacpy.cu \
magma_dgemv_MLU.cu \
magma_dlag2s.cu \
magma_dlange.cu \
magma_dlansy.cu \
magma_dlat2s.cu \
magma_dsymv.cu \
magma_ssymv.cu \
magma_sdaxpycp.cu \
magma_slag2d.cu \
magma_strsm.cu \
magma_dtrsm.cu \
\
dgemm_kernel_a_0.cu \
dgemm_kernel_N_N_64_16_16_16_4_special.cu \
dgemm_kernel_T_N_32_32_8_8_8.cu \
dgemm_kernel_T_T_64_16_16_16_4_v2.cu \
dgemm_kernel_ab_0.cu \
dgemm_kernel_N_N_64_16_16_16_4.cu \
dgemm_kernel_N_T_64_16_4_16_4.cu \
dgemm_kernel_T_T_64_16_16_16_4.cu \
\
sgemm_kernel_a_0.cu \
sgemm_kernel_N_N_64_16_16_16_4_special.cu \
sgemm_kernel_T_N_32_32_8_8_8.cu \
sgemm_kernel_T_T_64_16_16_16_4_v2.cu \
sgemm_kernel_ab_0.cu \
sgemm_kernel_N_N_64_16_16_16_4.cu \
sgemm_kernel_N_T_64_16_4_16_4.cu \
sgemm_kernel_T_T_64_16_16_16_4.cu \
ALLOBJ = $(ALLSRC:.cu=.cu_o)
all: $(LIBMAGMABLAS)
$(LIBMAGMABLAS): $(ALLOBJ)
$(ARCH) $(ARCHFLAGS) $@ $(ALLOBJ)
$(RANLIB) $@
clean:
rm -f *.cu_o *~ *.a *.linkinfo ../lib/libmagmablas.a
%.cu_o: %.cu
$(NVCC) $(NVOPTS) -gencode arch=compute_20,code=compute_20 arch=compute_13,code=compute_13 -gencode arch=compute_10,code=compute_10 $(INC) -c $< -o $@
Thanks to Dr. Tomov and Mr.Goto!
Cheers,
Allan MeneZes!!!