MAGMA 2.10.0
Matrix Algebra for GPU and Multicore Architectures
Loading...
Searching...
No Matches
Installing MAGMA

First, create a make.inc file, using one of the examples as a template. Set environment variables for where external packages are installed, either in your .cshrc/.bashrc file, or in the make.inc file itself.

CUDA

All the make.inc files assume $CUDADIR is set in your environment. For bash (sh), put in ~/.bashrc (with your system's path):

export CUDADIR=/usr/loca/cuda

For csh/tcsh, put in ~/.cshrc:

setenv CUDADIR /usr/local/cuda

AMD Optimizing CPU Libraries (AOCL) / BLIS & FLAME

AOCL has adopted the BLIS and libFLAME libraries. These may be installed in separate directories. Set $BLIS_DIR and $FLAME_DIR in your environment or make.inc file. For bash (sh), put in ~/.bashrc (with your system's paths):

export BLIS_DIR=/opt/blis
export FLAME_DIR=/opt/libflame

For csh/tcsh, put in ~/.cshrc:

setenv BLIS_DIR  /opt/blis
setenv FLAME_DIR /opt/libflame

Intel MKL

The MKL make.inc files assume $MKLROOT is set in your environment. To set it, for bash (sh), put in ~/.bashrc (with your system's path):

source /opt/intel/bin/compilervars.sh intel64

For csh/tcsh, put in ~/.cshrc:

source /opt/intel/bin/compilervars.csh intel64

MAGMA is tested with both LP64 and ILP64.

ATLAS

The ATLAS make.inc file assumes $ATLASDIR and $LAPACKDIR are set in your environment. If not installed, install LAPACK from http://www.netlib.org/lapack/ For bash (sh), put in ~/.bashrc (with your system's path):

export ATLASDIR=/opt/atlas
export LAPACKDIR=/opt/LAPACK

For csh/tcsh, put in ~/.cshrc:

setenv ATLASDIR  /opt/atlas
setenv LAPACKDIR /opt/LAPACK

OpenBLAS

The OpenBLAS make.inc file assumes $OPENBLASDIR is set in your environment. For bash (sh), put in ~/.bashrc (with your system's path):

export OPENBLASDIR=/opt/openblas

For csh/tcsh, put in ~/.cshrc:

setenv OPENBLASDIR /opt/openblas

Linking to BLAS

Depending on the Fortran compiler used for your BLAS and LAPACK libraries, the linking convention is one of:

  • Add underscore, so gemm() in Fortran becomes gemm_() in C.
  • Uppercase, so gemm() in Fortran becomes GEMM() in C.
  • No change, so gemm() in Fortran stays gemm() in C.

Set -DADD_, -DUPCASE, or -DNOCHANGE, respectively, in all FLAGS in your make.inc file to select the appropriate one. Use nm to examine your BLAS library:

sh methane lib> nm libopenblas.so | grep -i dsyr2k
000000000017ee50 T cblas_dsyr2k
000000000017c8b0 T dsyr2k_           # Note this line
00000000001fa690 T dsyr2k_LN
00000000001fb2e0 T dsyr2k_LT
00000000001f8f70 T dsyr2k_UN
00000000001f9b70 T dsyr2k_UT
00000000001fcab0 T dsyr2k_kernel_L
00000000001fc750 T dsyr2k_kernel_U

In this case, it shows that -DADD_ (dsyr2k_) should work. The default in all example make.inc files is -DADD_, except for IBM ESSL, which uses -DNOCHANGE.

Compile-time options

Several compiler defines, below, affect how MAGMA is compiled and might have a large performance impact. These are set in make.inc files using the -D compiler flag, e.g., -DMAGMA_WITH_MKL in CFLAGS.

  • MAGMA_WITH_MKL

    If linked with MKL, allows MAGMA to get MKL's version and set MKL's number of threads.

  • MAGMA_NOAFFINITY

    Disables thread affinity, available in glibc 2.6 and later.

  • BATCH_DISABLE_CHECKING

    For batched routines, disables the info_array that contains errors. For example, for Cholesky factorization if you are sure your matrix is SPD and want better performance, you can compile with this flag.

  • BATCH_DISABLE_CLEANUP

    For batched routines, disables the cleanup code. For example, the {sy|he}rk called with "lower" will write data on the upper triangular portion of the matrix.

  • BATCHED_DISABLE_PARCPU

    In the testing directory, disables the parallel implementation of the batched computation on CPU. Can be used to compare a naive versus a parallelized CPU batched computation.

Run-time options

These variables control MAGMA, BLAS, and LAPACK run-time behavior.

  • $MAGMA_NUM_GPUS

For multi-GPU functions, set $MAGMA_NUM_GPUS to the number of GPUs to use.

  • $OMP_NUM_THREADS
  • $MKL_NUM_THREADS

    For multi-core BLAS libraries, set $OMP_NUM_THREADS or $MKL_NUM_THREADS to the number of CPU threads, depending on your BLAS library. See the documentation for your BLAS and LAPACK libraries.

Building without Fortran

If you do not have a Fortran compiler, comment out FORT in make.inc. MAGMA's Fortran 90 interface and Fortran testers will not be built. Also, many testers will not be able to check their results – they will print an error message, e.g.:

magma/testing> ./testing_dgehrd -N 100 -c
...
Cannot check results: dhst01_ unavailable, since there was no Fortran compiler.
  100     ---   (  ---  )      0.70 (   0.00)   0.00e+00        0.00e+00   ok

Building shared libraries

By default, all make.inc files (except ATLAS) add the -fPIC option to CFLAGS, FFLAGS, F90FLAGS, and NVCCFLAGS, required for building a shared library. Note in NVCCFLAGS that -fPIC is passed via the -Xcompiler option. Running:

make

or

make lib
make test
make sparse-lib
make sparse-test

will create shared libraries:

lib/libmagma.so
lib/libmagma_sparse.so

and static libraries:

lib/libmagma.a
lib/libmagma_sparse.a

and testing drivers in testing and sparse-iter/testing.

The current exception is for ATLAS, in make.inc.atlas, which in our install is a static library, thus requiring MAGMA to be a static library.

Building static libraries

Static libraries are always built along with the shared libraries above. Alternatively, comment out FPIC in your make.inc file to compile only a static library. Then, running:

make

will create static libraries:

lib/libmagma.a
lib/libmagma_sparse.a

and testing drivers in testing and sparse-iter/testing.

Installation

To install libraries and include files in a given prefix, run:

make install prefix=/usr/local/magma

The default prefix is /usr/local/magma. You can also set prefix in make.inc. This installs MAGMA libraries in ${prefix}/lib, MAGMA header files in ${prefix}/include, and ${prefix}/lib/pkgconfig/magma.pc for pkg-config.

Tuning

You can modify the blocking factors for the algorithms of interest in control/get_nb.cpp.

Performance results are included in results/vA.B.C/cudaX.Y-zzz/\*.txt for MAGMA version A.B.C, CUDA version X.Y, and GPU zzz.