MAGMA provides implementations for CUDA, HIP, Intel Xeon Phi, and OpenCL. The latest releases are MAGMA 2.7.0 for CUDA and HIP, MAGMA MIC 1.4.0 for Intel Xeon Phi, and clMAGMA 1.3 for OpenCL. The libraries available for download are listed below in the order of their release dates.

Please use any of the following publications to reference MAGMA.

MAGMA Bitbucket repository:


MagmaDNN 1.4

MagmaDNN 1.4 is now available. MagnaDNN provides HP data analytics and machine learning tools using MAGMA as its computational backend. Updates in this release include:

  • Bug fixes and performance improvements;
  • Added ResnetBlocks for Resnet 18/34 models;
  • Added Concat Operations for 3D and 4D layers;
  • Added Unet Blocks for Unet models;
  • Added LSTM functionality;
  • Added OpenCV compatability.

MagmaDNN's repository is on Bitbucket:

MAGMA 2.7.0

MAGMA 2.7.0 is now released. This release includes:

  • Add support for builds targeting NVIDIA's Hopper architecture;
  • New routine: magma_dshposv_gpu and magma_dshposv_native solve Ax = b,
    for a symmetric positive definite matrix 'A', using FP16 during the Cholesky
    factorization. GMRES-based iterative refinement is used to recover the solution
    up to double precision accuracy. The '_gpu' suffix denotes a hybrid CPU-GPU
    factorization, while '_native' denotes a GPU-only factorization;
  • Performance improvement for the batch QR factorization routine;
  • Performance improvement for the variable size batch LU factorization routine;
  • Bug fixes, performance optimizations, benchmark additions, and maintenance
    updates to support current and new MAGMA routines, latest NVIDIA and AMD
    math libraries and GPU hardware.
MAGMA 2.6.2

MAGMA 2.6.2 is now released. This release includes:

  • New routine: magma_{s,d,c,z}getrf_vbatched provides a variable-size batched LU
    factorization with partial pivoting. This is a reference implementation, with more
    performance optimizations planned for future releases;
  • New routine: magmablas_{s,d,c,z}trsm_vbatched now provides a variable-size batched
    TRSM that does not invert the diagonal blocks of the input triangular matrix. The
     routine can be tested by passing "--version 3" to testing_{s,d,c,z}trsm_vbatched;
  • Caling more hipBLAS functions;
  • Bug fixes (n==0 in Cholesky factorization; synchronization in LQ; installation);
  • Remove gfx803 target for AMD GPUs;
  • Add uplo argument in inertia compuattion routines (only upper was supported before);
  • Fix memory leak in magma_queue for hip functions;
  • Add FP16 and FP16-FP32 GEMM benchmark for HIP (testing_hgemm).
MAGMA 2.6.1

MAGMA 2.6.1 is now released. This patch release of MAGMA 2.6 includes:

  • Bug fix for installing MAGMA with spack on CUDA 9 and older;
  • Expert interface for Cholesky factorization to improve performance for small problems;
  • Definition changes for some magma_blas routines to call AMD BLAS for HIP installation
    (these routines were previously either not present or were underperforming
      in AMD BLAS, and were therefore defined through magmablas).
MAGMA 2.6.0

MAGMA 2.6.0 is now released. Updates include:

  • Added HIP support for AMD GPUs (former hipMAGMA) as part of MAGMA;
  • Added inertia computational routines for GPUs;
  • Performance improvements for AMD GPUs;
  • Performance improvement for magma_Xgesv_batched for small sizes;
  • Added Bunch-Kaufman GPU-only sover using BLAS calls (magma_zhetrs_gpu);
  • Added include/magma_config.h file storing the configuration for a particular magma installation (CUDA vs. HIP, etc.);
  • Added expert interfaces for magma_Xgetrf_gpu and magma_Xpotrf_gpu. These interfaces allow the user to specify the factorization mode; hybrid (CPU+GPU) vs. native (GPU only), as well as the blocking size (nb);
  • Added tuning for small size LU, QR, and Cholesky factorizations.
