
             MAGMA Release Notes

-----------------------------------------------------

MAGMA is intended for a single CUDA enabled NVIDIA GPU.  It supports
Tesla and Fermi GPUs. For more details see the MAGMA 1.0 presentation.

Included are routines for the following algorithms:

    * LU, QR, and Cholesky factorizations in both real and complex
      arithmetic (single and double);
    * Hessenberg, bidiagonal, and tridiagonal reductions in both real
      and complex arithmetic (single and double);
    * Linear solvers based on LU, QR, and Cholesky in both real and
      complex arithmetic (single and double);
    * Eigen and singular value problem solvers in both real and
      complex arithmetic (single and double);
    * Generalized Hermitian-definite eigenproblem solver in both
      real and complex arithmetic (single and double);
    * Mixed-precision iterative refinement solvers based on LU, QR,
      and Cholesky in both real and complex arithmetic;
    * MAGMA BLAS in real arithmetic (single and double), including
      gemm, gemv, symv, and trsm.

 1.x.x
    * Fix bug in [zcsd]getrf_gpu.cpp
    * Fix workspace requirement for SVD in [zcsd]gesvd.cpp
    * Fix a bug in freeing pinned memory (in interface_cuda/alloc.cpp)
    * Fix a bug in [zcsd]geqrf_mgpu.cpp 
    * Fix zdotc to use cblas for portability
    * Fix uppercase entries in blas/lapack headers
    * Use magma_int_t in blas/lapack headers, and fix sources accordingly
    * Fix magma_is_devptr error handling
    * Add magma_malloc_cpu to allocate CPU memory aligned to 32-byte boundary
      for performance and reproducibility
    * Fix memory leaks in latrd* and zcgeqrsv_gpu
    * Remove dependency on CUDA device driver
    * Add QR with pivoting in CPU interface (functions [zcsd]geqp3)
    * Add hegst/sygst Fortran interface
    * Improve performance of gesv CPU interface by 30%
    * Improve performance of ungqr/orgqr CPU and GPU interfaces by 30%;
      more for small matrices

 1.2.0
    * Fix bugs in [zcsd]hegst[_gpu].cpp
    * Fix a bug in [zcsd]latrd.cpp
    * Fix a bug in [zcsd]gelqf_gpu.cpp
    * Added application of a block reflector H or its transpose from the Right.
      Routines changed -- [zcsd]larfb_gpu.cpp, [zc]unmqr2_gpu.cpp, and
      [ds]ormqr2_gpu.cpp
    * Fix *larfb_gpu for reflector vectors stored row-wise.
    * Fix memory allocation bugs in [zc]unmqr2_gpu.cpp, [ds]ormqr2_gpu.cpp,
      [zc]unmqr.cpp, and [ds]ormqr.cpp (thanks to Azzam Haidar).
    * Fix bug in *lacpy that overwrote memory.
    * Fix residual formula in testing_*gesv* and testing_*posv*.
    * Fix sizeptr.cpp compile warning that caused make to fail.
    * Fix warning in *getrf.cpp when nb0 is zero.
    * Add reduction to band-diagonal for symmetric/Hermitian definite matrices
      in [zc]hebbd.cpp and [ds]sybbd.cpp
    * Updated eigensolvers for standard and generalized eigenproblems for
      symmetric/Hermitian definite matrices 
    * Add wrappers around CUDA and CUBLAS functions,
      for portability and error checking.
    * Add tracing functions.
    * Add two-stage reduction to tridiabonal form
    * Add matrix print functions.
    * Make info and return codes consistent.
    * Change GPU_TARGET in make.inc to descriptive name (e.g., Fermi).
    * Move magma_stream to -lmagmablas to eliminate dependency on -lmagma.

 1.1.0 - 11-11-11
    * Fix a bug in [zcsd]geqrf_gpu.cpp and [zcsd]geqrf3_gpu.cpp for n>m
    * Fix a bug in [zcsd]laset - to call the kernel only when m!=0 && n!=0
    * Fix a bug in [zcsd]gehrd for ilo > 1 or ihi < n.
    * Added missing Fortran interfaces
    * Add general matrix inverse, [zcds]getri GPU interface.
    * Add [zcds]potri in CPU and GPU interfaces 
       [Hatem Ltaief et al.]
    * Add [zcds]trtri in CPU and GPU interfaces 
       [Hatem Ltaief et al.]
    * Add [zcds]lauum in CPU and GPU interfaces 
       [Hatem Ltaief et al.]
    * Add zgemm for Fermi obtained using autotuning
    * Add non-GPU-resident versions of [zcds]geqrf, [zcds]potrf, and [zcds]getrf
    * Add multi-GPU LU, QR, and Cholesky factorizations
    * Add tile algorithms for multicore and multi-GPUs using the StarPU
      runtime system (in directory 'multi-gpu-dynamic')
    * Add [zcds]gesv and [zcds]posv in CPU interface. GPU interface was already in 1.0
    * Add LAPACK linear equation testing code (in 'testing/lin')
    * Add experimental directory ('exp') with algorithms for:
      (1) Multi-core QR, LU, Cholskey
      (2) Single GPU, all available CPU cores QR
    * Add eigenvalue solver driver routines for the standard and generalized 
      symmetric/Hermitian eigenvalue problems [Raffaele Solca et al.].

 1.0.0 - August 25th, 2011
    * Fix make.inc.mkl (Thanks to ar1309)
    * Add gpu interfaces to [zcsd]hetrd, [zcsd]heevd
    * Add all cases for [zcds]unmtr_gpu 
       [Raffaele Solca et al.]
    * Add generalized Hermitian-definite eigenproblem solver ([zcds]hegvd)
       [Raffaele Solca et al.]

 1.0.0RC5 - April 6th, 2011
    * Add fortran interface for lapack functions
    * Add new QR version on GPU ([zcsd]geqrf3_gpu) and corresponding
      LS solver ([zcds]geqrs3_gpu)
    * Add [cz]unmtr, [sd]ormtr functions
    * Add two functions in fortran to compute the offset on device pointers
          magmaf_[sdcz]off1d( NewPtr, OldPtr, inc, i)
          magmaf_[sdcz]off2d( NewPtr, OldPtr, lda, i, j)
        indices are given in Fortran (1 to N)
    * WARNING: add FOPTS variable to the make.inc to use preprocessing
      in compilation of Fortran files
    * WARNING: fix bug with fortran compilers which don;t change the name
      now fortran prefix is magmaf instead of magma
    * Small documentation fixes
    * Fix timing under windows, thanks to Evan Lazar
    * Fix problem when __func__ is not present, thanks to Evan Lazar
    * Fix bug with m==n==0 in LU, thanks to Evan Lazar
    * Fix bug on [cz]unmqr, [sd]ormqr functions
    * Fix bug in [zcsd]gebrd; fixes bug in SVD for n>m
    * Fix bug in [zcsd]geqrs_gpu for multiple RHS
    * Added functionality - zcgesv_gpu and dsgesv_gpu can now solve also 
      A' X = B using mixed-precision iterative refinement
    * Fix error code in testings.h to compile with cuda 4.0

 1.0.0RC4 - March 8th, 2011

    * Add control directory to group all non computational functions
    * Integration of the eigenvalues solvers
    * Clean some f2c code in eigenvalues solvers
    * Arithmetic consistency: cuDoubleComplex and cuFloatComplex are 
      the  only types used for complex now.
    * Consistency of the interface of some functions.
    * Clean most of the return values in lapack functions
    * Fix multiple definition of min, max,
    * Fix headers problem under windows, thanks to Willem Burger

