PLASMA RELEASE NOTES
====================


Summary of Current Features
---------------------------


* Solution of dense systems of linear equations and least square problems
  in real space and complex space, using single precision and double precision,
  via the Cholesky, LU, QR and LQ factorizations

* Solution of standard and generalized dense symmetric Eigenvalue Problem in real space and complex space,
  using single precision and double precision, via two-stage tridiagonal reduction
  followed by QR iteration (eigenvalues only, no eigenvectors yet)

* Dense Singular Value Decomposition in real space and complex space, using single
  precision and double precision, via two-stage bidiagonal reduction followed by QR iteration
  (singular values only, no singular vectors yet)

* Solution of dense linear systems of equations in real space and complex space
  using the mixed-precision algorithm based on the Cholesky, LU, QR and LQ factorizations

* Tree-based QR and LQ factorizations and Q matrix generation and application (``tall and skinny'')

* Tree-based bidiagonal reduction (``tall and skinny'')

* Explicit matrix inversion based on Cholesky factorization (symmetric positive definite)

* Parallel and cache-efficient in-place layout translations (Gustavson et at.)

* Complete set of Level 3 BLAS routines for matrices stored in tile layout

* Simple LAPACK-like interface for greater productivity and advanced (tile) interface
  for full control and maximum performance;
  Routines for conversion between LAPACK matrix layout and PLASMA's tile layout

* Dynamic scheduler QUARK (QUeuing And Runtime for Kernels) and dynamically scheduled
  versions of all computational routines (alongside statically scheduled ones)

* Asynchronous interface for launching dynamically scheduled routines in a non-blocking mode.
  Sequence and request constructs for controlling progress and checking errors

* Automatic handling of workspace allocation; A set of auxiliary functions to assist the user
  with workspace allocation

* A simple set of "sanity" tests for all numerical routines including Level 3 BLAS routines
  for matrices in tile layout

* An advanced testing suite for exhaustive numerical testing of all the routines in all precisions
  (based on the testing suite of the LAPACK library)

* Basic timing suite for most of the routines in all precisions

* Thread safety

* Support for Make and CMake build systems

* LAPACK-style comments in the source code using the Doxygen system

* Native support for Microsoft Windows using WinThreads through a thin OS interaction layer

* Installer capable of downloading from Netlib and installing missing components of PLASMA's
  software stack (BLAS, CBLAS, LAPACK, LAPACKE C API)

* Extensive documentation including Installation Guide, Users' Guide, Reference Manual
  and an HTML code browser, a guide on running PLASMA with the TAU package, Contributors' Guide,
  a README and Release Notes.

* A comprehensive set of usage examples


New Features by Release
-----------------------


=== 2.4.0, June 6th, 2011 ===

* Tree-based QR and LQ factorizations: routines for application of the Q matrix support
  all combinations of input parameters: Left/Right, NoTrans/Trans/ConjTrans

* Symmetric Engenvalue Problem using: tile reduction to band-tridiagonal form, reduction
  to "standard" tridiagonal form by bulge chasing, finding eigenvalues using the QR algorithm
  (eigenvectors currently not supported)

* Singular Value Decomposition using: tile reduction to band-bidiagonal form, reduction
  to ``standard'' bidiagonal form by bulge chasing, finding singular values using the QR algorithm
  (singular vectors currently no supported)

* Gaussian Elimination with partial pivoting (as opposed to the incremental pivoting in the tile
  LU factorization) and parallel panel (using Quark extensions for nested parallelism)
  WARNING: Following the integration of this new feature, the
  interface to call LU factorization has changed. Now, PLASMA_zgetrf
  follows the LAPACK interface and corresponds to the new partial
  pivoting. Old interface related to LU factorization with incremental
  pivoting is now renamed PLASMA_zgetrf_incpiv.



=== 2.3.1, November 30th, 2010 ===


* Add functions to generate random matrices (plrnt, plghe and plgsy)
  => fix the problem with time_zpotri_tile.c reported by Katayama on the forum
  (http://icl.cs.utk.edu/plasma/forum/viewtopic.php?f=2&t=59)

* Fix a deadlock in norm computations with static scheduling

* Installer: fix the LAPACK version when libtmg is the only library to be install
  Thanks to Henc. (http://icl.cs.utk.edu/plasma/forum/viewtopic.php?f=2&t=60)


=== 2.3.0, November 15th, 2010 ===


* Parallel and cache-efficient in-place layout translations (Gustavson et al.)

* Tree-based QR factorization and Q matrix generation (``tall and skinny'')

* Explicit matrix inversion based on Cholesky factorization (symmetric positive definite)

* Replacement of LAPACK C Wrapper with LAPACKE C API by Intel


=== 2.2.0, July 9th, 2010 ===


* Dynamic scheduler QUARK (QUeuing And Runtime for Kernels) and dynamically scheduled
  versions of all computational routines (alongside statically scheduled ones)

* Asynchronous interface for launching dynamically scheduled routines in a non-blocking mode.
  Sequence and request constructs for controlling progress and checking errors

* Removal of CBLAS and pieces of LAPACK from PLASMA's source tree.
  BLAS, CBLAS, LAPACK and Netlib LAPACK C Wrapper become PLASMA's software dependencies
  required prior to the installation of PLASMA

* Installer capable of downloading from Netlib and installing missing components of PLASMA's
  software stack (BLAS, CBLAS, LAPACK, LAPACK C Wrapper)

* Complete set of Level 3 BLAS routines for matrices stored in tile layout


=== 2.1.0, November 15th, 2009 ===


* Native support for Microsoft Windows using WinThreads

* Support for Make and CMake build systems

* Performance-optimized mixed-precision routine for the solution of linear systems of equations
  using the LU factorization

* Initial timing code (PLASMA_dgesv only)

* Release notes


=== 2.0.0, July 4th, 2008 ===


* Support for real and complex arithmetic in single and double precision

* Generation and application of the Q matrix from the QR and LQ factorizations

* Prototype of mixed-precision routine for the solution of linear systems of equations
  using the LU factorization (not optimized for performance)

* Simple interface and native interface

* Major code cleanup and restructuring

* Redesigned workspace allocation

* LAPACK testing

* Examples

* Thread safety

* Python installer

* Documentation: Installation Guide, Users' Guide with routine reference and an HTML code browser,
  a guide on running PLASMA with the TAU package, initial draft of Contributors' Guide, a README file
  and a LICENSE file


=== 1.0.0, November 15th, 2008 ===


* Double precision routines for the solution of linear systems of equations and least square problems
  using Cholesky, LU, QR and LQ factorizations
