                              PLASMA RELEASE NOTES
     __________________________________________________________________

Summary of Current Features

     * Solution  of  dense  systems  of  linear equations and least square
       problems  in  real  space and complex space, using single precision
       and   double   precision,   via   the   Cholesky,  LU,  QR  and  LQ
       factorizations
     * Solution  of  standard  and  generalized dense symmetric Eigenvalue
       Problem in real space and complex space, using single precision and
       double  precision,  via two-stage tridiagonal reduction followed by
       QR iteration (eigenvalues only, no eigenvectors yet)
     * Dense Singular Value Decomposition in real space and complex space,
       using   single   precision  and  double  precision,  via  two-stage
       bidiagonal  reduction  followed  by  QR  iteration (singular values
       only, no singular vectors yet)
     * Solution  of  dense  linear  systems of equations in real space and
       complex  space  using  the  mixed-precision  algorithm based on the
       Cholesky, LU, QR and LQ factorizations
     * Tree-based  QR  and  LQ  factorizations and Q matrix generation and
       application (“tall and skinny”)
     * Tree-based bidiagonal reduction (“tall and skinny”)
     * Explicit   matrix   inversion   based   on  Cholesky  factorization
       (symmetric positive definite)
     * Parallel   and   cache-efficient   in-place   layout   translations
       (Gustavson et at.)
     * Complete  set  of Level 3 BLAS routines for matrices stored in tile
       layout
     * Simple  LAPACK-like interface for greater productivity and advanced
       (tile) interface for full control and maximum performance; Routines
       for  conversion  between  LAPACK  matrix layout and PLASMA’s tile
       layout
     * Dynamic  scheduler  QUARK  (QUeuing  And  Runtime  for Kernels) and
       dynamically   scheduled  versions  of  all  computational  routines
       (alongside statically scheduled ones)
     * Asynchronous interface for launching dynamically scheduled routines
       in  a  non-blocking  mode.  Sequence  and  request  constructs  for
       controlling progress and checking errors
     * Automatic  handling  of  workspace  allocation;  A set of auxiliary
       functions to assist the user with workspace allocation
     * A simple set of "sanity" tests for all numerical routines including
       Level 3 BLAS routines for matrices in tile layout
     * An  advanced  testing suite for exhaustive numerical testing of all
       the  routines  in all precisions (based on the testing suite of the
       LAPACK library)
     * Basic timing suite for most of the routines in all precisions
     * Thread safety
     * Support for Make and CMake build systems
     * LAPACK-style comments in the source code using the Doxygen system
     * Native  support  for  Microsoft  Windows using WinThreads through a
       thin OS interaction layer
     * Installer capable of downloading from Netlib and installing missing
       components of PLASMA’s software stack (BLAS, CBLAS, LAPACK, LAPACKE
       C API)
     * Extensive documentation including Installation Guide, Users' Guide,
       Reference  Manual  and  an  HTML  code  browser, a guide on running
       PLASMA  with  the  TAU  package,  Contributors' Guide, a README and
       Release Notes.
     * A comprehensive set of usage examples
     __________________________________________________________________

New Features by Release

  2.4.6, August 20th, 2012

     * Add  eigenvectors  support  in eigensolvers for symmetric/hermitian
       problems and generalized problems.
     * Add support of Frobenius norm.
     * Release  the  precision  generation  script  used  to  generate the
       precision s, d and c from z, as well as ds from zc
     * Add all Fortran90 for mixed precision routines.
     * Add  all  Fortran90  wrappers  to  tile  interface and asynchronous
       interface. Thanks to NAG for providing those wrappers.
     * Add 4 examples with Fortran90 interface.
     * Add support for all computational functions in F77 wrappers.
     * Fix  memory  leaks  related  to  fake  dependencies  in dynamically
       scheduled algorithms.
     * Fix interface issues in eigensolvers routines.
     * Fixed returned info in PLASMA_zgetrf function
     * Fixed bug with matrices of size 0.
     * WARNING:  all  lapack interfaces having a T or L argument for QR or
       LU  factorization  have  been  changed  to  take  a descriptor. The
       workspace  allocation  has been changed to match those requirements
       and   all   functions   PLASMA_Alloc_Workspace_XXXXX_Tile  are  now
       deprecated    and   users   are   encouraged   to   move   to   the
       PLASMA_Alloc_Workspace_XXXXX version.

  2.4.5, November 22nd, 2011

     * Add  LU  inversion functions: PLASMA_zgetri, PLASMA_zgetri_Tile and
       PLASMA_zgetri_Tile_Async   using   the   recursive  parallel  panel
       implementation of LU factorization.
     * The  householder  reduction  trees for QR and LQ factorizations can
       now  work on general cases and not only on matrices with M multiple
       of MB.
     * Matrices  generation  has been changed in every timing, testing and
       example  files to use a parallel initialization generating a better
       distribution  of  the data on the architecture, especially for Tile
       interface. “numactl” is not required anymore.
     * Timing  routines  can  now generate DAGs with the --dag option, and
       traces with --trace option if EZTRACE is present.

  2.4.2, September 14th, 2011

     * New  version  of quark removing active waiting and allowing user to
       bind tasks to set of cores.
     * Installer:  Fix  compatibility  issues between plasma-installer and
       PGI compiler reported on Kraken by A. Bouteiller.
     * Fix one memory leak with Hwloc.
     * Introduce  a  new  kernel  for  the  recursive LU operation on tile
       layout which reduces cache misses.
     * Fix  several  bugs and introduce new features thanks to people from
       Fujitsu and NAG :
          + The  new  LU factorization with partial pivoting introduced in
            release 2.4 is now working on rectangular matrices.
          + Add missing functions to Fortran 77 interface.
          + Add  a  new  Fortran  90  interface  to  all  LAPACK  and Tile
            interface. Asynchronous interface and mixed precision routines
            are not available yet.
          + Fix arguments order in header files to fit implementation.

  2.4.1, July 8th, 2011

     * Fix   bug   with   Fujitsu   compiler   reported   on   the   forum
       ([1]http://icl.cs.utk.edu/plasma/forum/viewtopic.php?f=2&t=108)
     * Unbind  threads  in  PLASMA_Finalize to avoid problem of binding in
       OpenMP  section  following  PLASMA calls (still possible on Mac and
       AIX  without hwloc). A better fix is to create the OpenMP thread in
       the  user  code before any call to PLASMA thanks to a fake parallel
       section.

  2.4.0, June 6th, 2011

     * Tree-based  QR  and  LQ factorizations: routines for application of
       the   Q  matrix  support  all  combinations  of  input  parameters:
       Left/Right, NoTrans/Trans/ConjTrans
     * Symmetric    Engenvalue    Problem   using:   tile   reduction   to
       band-tridiagonal  form, reduction to "standard" tridiagonal form by
       bulge   chasing,   finding   eigenvalues  using  the  QR  algorithm
       (eigenvectors currently not supported)
     * Singular    Value    Decomposition   using:   tile   reduction   to
       band-bidiagonal form, reduction to “standard” bidiagonal form by
       bulge  chasing,  finding  singular  values  using  the QR algorithm
       (singular vectors currently no supported)
     * Gaussian  Elimination  with  partial  pivoting  (as  opposed to the
       incremental  pivoting  in  the  tile LU factorization) and parallel
       panel  (using  Quark  extensions  for  nested parallelism) WARNING:
       Following  the  integration  of  this new feature, the interface to
       call  LU  factorization has changed. Now, PLASMA_zgetrf follows the
       LAPACK  interface  and corresponds to the new partial pivoting. Old
       interface  related to LU factorization with incremental pivoting is
       now renamed PLASMA_zgetrf_incpiv.

  2.3.1, November 30th, 2010

     * Add  functions to generate random matrices (plrnt, plghe and plgsy)
       ⇒ fix the problem with time_zpotri_tile.c reported by Katayama on
       the forum
       ([2]http://icl.cs.utk.edu/plasma/forum/viewtopic.php?f=2&t=59)
     * Fix a deadlock in norm computations with static scheduling
     * Installer:  fix  the LAPACK version when libtmg is the only library
       to be install Thanks to Henc.
       ([3]http://icl.cs.utk.edu/plasma/forum/viewtopic.php?f=2&t=60)

  2.3.0, November 15th, 2010

     * Parallel   and   cache-efficient   in-place   layout   translations
       (Gustavson et al.)
     * Tree-based  QR  factorization  and Q matrix generation (“tall and
       skinny”)
     * Explicit   matrix   inversion   based   on  Cholesky  factorization
       (symmetric positive definite)
     * Replacement of LAPACK C Wrapper with LAPACKE C API by Intel

  2.2.0, July 9th, 2010

     * Dynamic  scheduler  QUARK  (QUeuing  And  Runtime  for Kernels) and
       dynamically   scheduled  versions  of  all  computational  routines
       (alongside statically scheduled ones)
     * Asynchronous interface for launching dynamically scheduled routines
       in  a  non-blocking  mode.  Sequence  and  request  constructs  for
       controlling progress and checking errors
     * Removal  of CBLAS and pieces of LAPACK from PLASMA’s source tree.
       BLAS,  CBLAS,  LAPACK and Netlib LAPACK C Wrapper become PLASMA’s
       software dependencies required prior to the installation of PLASMA
     * Installer capable of downloading from Netlib and installing missing
       components of PLASMA’s software stack (BLAS, CBLAS, LAPACK, LAPACK
       C Wrapper)
     * Complete  set  of Level 3 BLAS routines for matrices stored in tile
       layout

  2.1.0, November 15th, 2009

     * Native support for Microsoft Windows using WinThreads
     * Support for Make and CMake build systems
     * Performance-optimized  mixed-precision  routine for the solution of
       linear systems of equations using the LU factorization
     * Initial timing code (PLASMA_dgesv only)
     * Release notes

  2.0.0, July 4th, 2008

     * Support  for  real  and  complex  arithmetic  in  single and double
       precision
     * Generation  and  application  of  the  Q  matrix from the QR and LQ
       factorizations
     * Prototype  of  mixed-precision  routine  for the solution of linear
       systems  of equations using the LU factorization (not optimized for
       performance)
     * Simple interface and native interface
     * Major code cleanup and restructuring
     * Redesigned workspace allocation
     * LAPACK testing
     * Examples
     * Thread safety
     * Python installer
     * Documentation:   Installation  Guide,  Users'  Guide  with  routine
       reference  and an HTML code browser, a guide on running PLASMA with
       the  TAU  package,  initial  draft of Contributors' Guide, a README
       file and a LICENSE file

  1.0.0, November 15th, 2008

     * Double  precision  routines  for  the solution of linear systems of
       equations  and  least square problems using Cholesky, LU, QR and LQ
       factorizations
     __________________________________________________________________

   Last updated 2012-08-20 19:58:58 CEST

Références

   1. http://icl.cs.utk.edu/plasma/forum/viewtopic.php?f=2&t=108
   2. http://icl.cs.utk.edu/plasma/forum/viewtopic.php?f=2&t=59
   3. http://icl.cs.utk.edu/plasma/forum/viewtopic.php?f=2&t=60
