it seems that clmagma-1.3.0 tests don't run.

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)
Post Reply
fossil
Posts: 14
Joined: Sat Nov 14, 2015 6:07 pm

it seems that clmagma-1.3.0 tests don't run.

Post by fossil » Sat Dec 19, 2015 5:28 pm

Dear friends,

I have ubuntu 15.10 with libclblas2, libclblas2-dev, libatlas3-base, libatlas-dev, acml-6.3
HW: CPU Phenom II 4x 965, 2x Radeon HD 6990

I was able to build clmagma-1.3.0 with root privileges using the following make.inc

Code: Select all

#//////////////////////////////////////////////////////////////////////////////
#   -- MAGMA (version 1.1.0) --
#      Univ. of Tennessee, Knoxville
#      Univ. of California, Berkeley
#      Univ. of	Colorado, Denver
#      @date January 2014
#//////////////////////////////////////////////////////////////////////////////

# setenv AMD_CLBLAS_STORAGE_PATH /home/tomov/cl_magma
#
# GPU_TARGET specifies for which GPU you want to compile MAGMA:
#     "Tesla" (NVIDIA compute capability 1.x cards)
#     "Fermi" (NVIDIA compute capability 2.x cards)
#     "AMD"   (clMAGMA with AMD cards)
# See http://developer.nvidia.com/cuda-gpus
GPU_TARGET = AMD

CC        = g++
FORT      = gfortran

ARCH      = ar
ARCHFLAGS = cr
RANLIB    = ranlib

OPTS      = -fPIC -O3 -DADD_ -Wall
FOPTS     = -fPIC -O3 -DADD_ -Wall -x f95-cpp-input
F77OPTS   = -fPIC -O3 -DADD_ -Wall
LDOPTS    = -fPIC

# define library directories preferably in your environment, or here.
#ACMLDIR ?= /opt/acml-4.4.0
#clBLAS  ?= /opt/clAmdBlas-1.11.314
#AMDAPP  ?= /opt/AMDAPP
-include make.check-acml
#-include make.check-clblas

#LIB       = -lacml -lacml_mv
#LIB        = -lacml_mp -lacml_mv -lcblas
LIB        = -lacml_mp -lcblas
LIB       += -lclBLAS -lOpenCL
#LIB       += -lclAmdBlas -lOpenCL

#LIBDIR    = -L$(ACMLDIR)/gfortran64/lib    \
#            -L$(clBLAS)/lib64 \
LIBDIR    = -L$(ACMLDIR)/gfortran64_mp/lib    \
            -L$(CBLASDIR)/lib

INC       = -I$(clBLAS)/include \
            -I$(AMDAPP)/include
I've tried to start run_tests.py but get the message, that clmagma_kernels.co is not in the $LD_LIBRARY_PATH '.'
I set the $LD_LIBRARY_PATH to the lib directory with clmagma_kernels.co, but it doesn't help.
Then I understood, that app somehow reset the LD_LIBRARY_PATH to '.' and look for it in current directory
I copied clmagma_kenels.co to the testing directory.
the error message disappeard.

However, tests don't run.
It just shows the message below and does nothing for a while.
How can I check, that it does something?
radeontop shows no any activity

Code: Select all

victor@fossil-dt5:~/sandbox/clmagma-1.3.0/testing$ sudo ./run_tests.py 
opts {'med': True, 'qr': True, 'sygv': True, 'memcheck': None, 'blas': True, 'chol': True, 'geev': True, 'syev': True, 'large': True, 'start': None, 'lu': True, 'precisions': 'sdcz', 'tol': None, 'small': True, 'aux': True, 'svd': True}
args []

****************************************************************************************************
./testing_sgemm -l -NN -c --range 1:20:1 -N 30 -N 31 -N 32 -N 33 -N 34 -N 62 -N 63 -N 64 -N 65 -N 66 -N 94 -N 95 -N 96 -N 97 -N 98 -N 126 -N 127 -N 128 -N 129 -N 130 -N 254 -N 255 -N 256 -N 257 -N 258 -N 510 -N 511 -N 512 -N 513 -N 514 --range 100:900:100 --range 1000:4000:1000 -N 2,1 -N 3,1 -N 4,2 -N 20,19 -N 20,10 -N 20,2 -N 20,1 -N 200,199 -N 200,100 -N 200,20 -N 200,10 -N 200,1 -N 600,599 -N 600,300 -N 600,60 -N 600,30 -N 600,10 -N 600,1 -N 2000,1999 -N 2000,1000 -N 2000,200 -N 2000,100 -N 2000,10 -N 2000,1 -N 1,2 -N 1,3 -N 2,4 -N 19,20 -N 10,20 -N 2,20 -N 1,20 -N 199,200 -N 100,200 -N 20,200 -N 10,200 -N 1,200 -N 599,600 -N 300,600 -N 60,600 -N 30,600 -N 10,600 -N 1,600 -N 1999,2000 -N 1000,2000 -N 200,2000 -N 100,2000 -N 10,2000 -N 1,2000 -N 1,2,3 -N 2,1,3 -N 1,3,2 -N 2,3,1 -N 3,1,2 -N 3,2,1 -N 10,20,30 -N 20,10,30 -N 10,30,20 -N 20,30,10 -N 30,10,20 -N 30,20,10 -N 100,200,300 -N 200,100,300 -N 100,300,200 -N 200,300,100 -N 300,100,200 -N 300,200,100 -N 100,300,600 -N 300,100,600 -N 100,600,300 -N 300,600,100 -N 600,100,300 -N 600,300,100 -N 1000,2000,3000 -N 2000,1000,3000 -N 1000,3000,2000 -N 2000,3000,1000 -N 3000,1000,2000 -N 3000,2000,1000
****************************************************************************************************
I've interrupted it and run
testing_benchmark with -T [0123] even without root privileges
radeontop shows some activity and testing_benchmark runs without any errors.

what I can try next to be sure, that clmagma works?
potentially I need qr and lu decomposition solvers.

Best regards,
Victor

mgates3
Posts: 918
Joined: Fri Jan 06, 2012 2:13 pm

Re: it seems that clmagma-1.3.0 tests don't run.

Post by mgates3 » Sat Dec 19, 2015 7:06 pm

It's probably looking in MAGMA_CL_DIR, not in LD_LIBRARY_PATH. I think looking in LD_LIBRARY_PATH is a newer feature not yet in the release.

I would try running tests directly, instead of via run_tests.py, as below. This is running with clBLAS 2.4. Have not tried with other versions.

Code: Select all

clmagma-1.3.0/testing> ./testing_sgemm -l -NN -c --range 100:400:100
% clMAGMA 1.3.0 
% OpenCL platform OpenCL 1.2 (Jul 29 2014 21:24:39). MAGMA not compiled with OpenMP.
% Device: GeForce GT 750M, 2048.0 MiB memory, max allocation 512.0 MiB, driver  8.26.29 310.40.55f01
Usage: ./testing_sgemm [options] [-h|--help]

transA = No transpose, transB = No transpose
    M     N     K   clBLAS Gflop/s (ms)   CPU Gflop/s (ms)  clBLAS error
=========================================================================================================
  100   100   100      0.33 (   6.11)     17.26 (   0.12)    1.72e-07   ok
  200   200   200      5.81 (   2.76)     37.66 (   0.42)    2.24e-07   ok
  300   300   300     19.16 (   2.82)     90.78 (   0.59)    2.77e-07   ok
  400   400   400     27.77 (   4.61)     77.26 (   1.66)    3.05e-07   ok

fossil
Posts: 14
Joined: Sat Nov 14, 2015 6:07 pm

Re: it seems that clmagma-1.3.0 tests don't run.

Post by fossil » Sun Dec 20, 2015 1:15 pm

Hi Mark,

I've tried to run separate test.
below is the result.

Code: Select all

victor@fossil-dt5:~/sandbox/clmagma-1.3.0/testing$ ./testing_sgemm -l -NN -c --range 100:400:100
% clMAGMA 1.3.0 
% OpenCL platform . MAGMA not compiled with OpenMP.
% Device: Cayman, 1803.0 MiB memory, max allocation 512.0 MiB, driver  1800.8 (VM)
% Device: Cayman, 1801.0 MiB memory, max allocation 512.0 MiB, driver  1800.8 (VM)
Usage: ./testing_sgemm [options] [-h|--help]

transA = No transpose, transB = No transpose
    M     N     K   clBLAS Gflop/s (ms)   CPU Gflop/s (ms)  clBLAS error
=========================================================================================================
Strange is, that I expected to see 4 Devices: each of 2 HD 6990 has 2 GPUs.

if I run clinfo i get very strange results too:

Code: Select all

victor@fossil-dt5:~/sandbox/clmagma-1.3.0/testing$ clinfo |grep "Number of devices"
Number of devices:				 3
victor@fossil-dt5:~/sandbox/clmagma-1.3.0/testing$ sudo -E clinfo |grep "Number of devices"
[sudo] password for victor: 
Number of devices:				 3
victor@fossil-dt5:~/sandbox/clmagma-1.3.0/testing$ sudo clinfo |grep "Number of devices"
Number of devices:				 5
as user it found 3 Devices 1 CPU and 2 GPUs
as root with user environment settings it found the same 3 Devices
as root with clean root env settings it found ALL 5 Devices: 1 CPU and 4 GPUs

are there any manual describing the proper setup to make ALL GPUs available for regular user?

Then I did some tests with clBLAS-client and it seems, that it works (gemm at least)

Thanks

Post Reply