- Added variable size batched Cholesky factorization magma_[sdcz]potrf_vbatched
- Added new fixed size batched BLAS routines {hemm, symm, hemv, symv, trmm}_batched
- Added new variable size batched BLAS routines {hemm, symm, hemv, symv, trmm, trsm}_vbatched
- Fixed memory leaks in {sy,he}evdx_2stage and getri_outofplace_batched.
- Fixed bug for small matrices in {symm, hemm}_mgpu and updated tester.
- Fixed libraries in make.inc examples for MKL with gcc.
- More robust error checking for Batched BLAS routines.
- Added Incomplete Sparse Approximate Inverse (ISAI) Preconditioner for sparse triangular solves, including batched generation.
- Added Block-Jacobi triangular solves, including variable blocksize (based on supervariable amalgamation).
- Added ParILUT, a parallel threshold ILU based on OpenMP.
- Added CSR5 format and CSR5 SpMV kernel, a sparse matrix vector product often outperforming the cuSPARSE SpMV CSR and HYB.
http://icl.cs.utk.edu/magma/software/