                                 PLASMA README

      Univ. of Tennessee, Univ. of California Berkeley and Univ. of Colorado
      Denver
   __________ ____       _____    _________   _____      _____
   \______   \    |     /  _  \  /   _____/  /     \    /  _  \
    |     ___/    |    /  /_\  \ \_____  \  /  \ /  \  /  /_\  \
    |    |   |    |___/    |    \/        \/    Y    \/    |    \
    |____|   |_______ \____|__  /_______  /\____|__  /\____|__  /
                     \/       \/        \/         \/         \/

      Parallel Linear Algebra Software for Multicore Architectures

      [1]http://icl.cs.utk.edu/plasma/
     __________________________________________________________________

1. Purpose of PLASMA

   The  main  purpose of PLASMA is to address the performance shortcomings
   of the [2]LAPACK and [3]ScaLAPACK libraries on multicore processors and
   multi-socket  systems  of  multicore  processors and their inability to
   efficiently  utilize  accelerators  such  as  Graphics Processing Units
   (GPUs).  PLASMA  provides  routines  to  solve dense general systems of
   linear   equations,  symmetric  positive  definite  systems  of  linear
   equations and linear least squares problems, using LU, Cholesky, QR and
   LQ factorizations. Real arithmetic and complex arithmetic are supported
   in both single precision and double precision.

   PLASMA has been designed to supercede LAPACK and ScaLAPACK, principally
   by restructuring the software to achieve much greater efficiency, where
   possible,  on  modern  computers  based on multicore processors. PLASMA
   also  relies  on new or improved algorithms. Currently, however, PLASMA
   does  not  serve  as  a  complete  replacement of LAPACK due to limited
   functionality.  Specifically, PLASMA does not support band matrices and
   does  not  solve  eigenvalue  and singular value problems. Also, PLASMA
   does   not   replace  ScaLAPACK  as  software  for  distributed  memory
   computers, since it only supports shared-memory machines.
     __________________________________________________________________

2. Where to Find More Information

   The main repository for PLASMA documentation is the distribution ./docs
   directory.  The  directory  contains  important  documents  such as the
   Users'  Guide  and  the  Reference Manual. PLASMA documentation is also
   available online on the PLASMA website:
   [4]http://icl.cs.utk.edu/plasma/.  For installation instructions please
   refer  to  the  [5]Installation  Guide. In addition, the [6]PLASMA User
   Forum  can be used to post general questions and comments as well as to
   report technical problems.
     __________________________________________________________________

3. Important Information about BLAS and LAPACK

  3.1. Optimized BLAS are Critical for Performance

   It  is absolutely critical for performance to use PLASMA in conjunction
   with   an   optimized   implementation  of  the  Basic  Linear  Algebra
   Subroutines  (BLAS)  library. Such implementations are usually provided
   by  the processor manufacturer and are usually available free of charge
   for non-profit use, such as academic research. Examples include:
     * The VecLib from Apple,
     * The AMD Core Math Library (ACML),
     * The Math Kernel Library (MKL) from Intel,
     * The Engineering and Scientific Software Library (ESSL) from IBM.

   Open-source alternatives also exist, such as:
     * [7]Goto BLAS,
     * [8]Automatically Tuned Linear Algebra Software (ATLAS).

   As  the  last resort, the FORTRAN implementation of BLAS from [9]Netlib
   can  be  used  (often  referred  to  as reference BLAS). However, since
   Netlib  BLAS  are  completely unoptimized, PLASMA with Netlib BLAS will
   deliver correct numerical results, but no performance whatsoever.

   For   comprehensive  installation  instructions  please  refer  to  the
   [10]Installation  Guide. However, if you decide to install manually and
   edit the installation scripts then there is one important issue to keep
   in  mind.  Modern  optimized  BLAS  are  not  stand-alone libraries but
   instead  are  bundled  with additional software, primarily LAPACK. This
   makes  it  necessary to use proper linking flags. Commonly these are -L
   (for path to library) and -l (for library name):
-L/usr/lib -lblas

   This will only pull in symbols from the BLAS library that are needed by
   PLASMA.  The  commonly made mistake is to just specify the full path to
   the library:
/usr/lib/libblas.a

   This  method  will  work  for  simple cases but it forces the linker to
   include the whole contents of the BLAS library rather than just pull in
   the  missing  symbols.  Aside  from  making the binary executable files
   larger,  this  method will easily cause name clashes as the same symbol
   name might be included multiple times.

  3.2. Multithreading within BLAS Must be Disabled

   Many  Basic  Linear  Algebra Subroutines (BLAS) implementations exploit
   parallelism   within  BLAS  through  multithreading.  PLASMA,  however,
   utilizes  BLAS  for  high  performance  implementations  of single-core
   operations  (often  referred to as kernels) and exploits parallelism at
   the  algorithmic level above the level of BLAS. For that reason, PLASMA
   must  not  be used in conjunction with a multithreaded BLAS, as this is
   likely  to  create  more  execution  threads  than  actual  cores.  The
   phenomenon,   known   as  oversubscribing  of  cores,  will  completely
   annihilate  PLASMA’s  performance  due  to  devastating impact on the
   operation of cache memories for dense linear algebra workloads.

   PLASMA  needs  to  be  linked  with  a  sequential  BLAS  library  or a
   multithreaded  BLAS  library with multithreading disabled. To learn how
   to  link  with sequential MKL, please consult the [11]Intel Math Kernel
   Library  Link Line Advisor; To learn how to link with sequential ATLAS,
   please  consult  the  [12]ATLAS  errata.  Alternatively,  PLASMA can be
   linked  with a multithreaded BLAS library with multithreading disabled.
   Typically,  disabling  of  multithreading  can  be  done by setting the
   appropriate environment variable from the command prompt, for instance:
export OMP_NUM_THREADS=1
export MKL_NUM_THREADS=1
export GOTO_NUM_THREADS=1

   Currently, this disables the simultaneous use of PLASMA for some of the
   user’s  functions  and  vendor BLAS for others. This cannot be easily
   remedied   without   standardization   of   interoperability  rules  of
   multithreaded   libraries.  One  consolation  is  that  PLASMA  already
   delivers fast parallel implementations of all Level 3 BLAS routines and
   there  is  virtually  no  benefit from parallelization of Level 1 and 2
   BLAS  routines  on  current  generation  of  multicore platforms due to
   memory contention.

  3.3. PLASMA Software Stack

   PLASMA  requires the following software packages to be installed in the
   system prior to PLASMA’s installation: BLAS, CBLAS, LAPACK and Netlib
   LAPACK C Wrapper. CBLAS and the components of LAPACK required by PLASMA
   are  commonly  bundled  with  BLAS.  If  this  is  not  the case Netlib
   implementation  of CBLAS and Netlib LAPACK can be used. The C interface
   to  LAPACK  has not been standardized yet and the LAPACK C Wrapper from
   Netlib has to be used for the time being.

   BLAS  is  the  set of Basic Linear Algebra Subprograms described in the
   previous  section.  An  unoptimized reference implementation of BLAS is
   available  fron Netlib at [13]http://www.netlib.org/blas/. As mentioned
   before, it is critical for performance that optimized implementation of
   BLAS is used instead of Netlib BLAS.

   CBLAS  is  the  C  language  interface  to  BLAS  available  with  most
   implementations of BLAS. A reference implementation from Netlib is also
   available    at   [14]http://www.netlib.org/blas/blast-forum/cblas.tgz.
   Since  CBLAS  is  only  a set of wrappers to the actual BLAS, the CBLAS
   from Netlib can be used without any adverse effects on performance.

   LAPACK  is  a large package of linear algebra routines for a wide range
   of  problems.  PLASMA uses only a tiny portion of LAPACK, which is also
   commonly  bundled with BLAS distributions. If this is not the case, the
   complete  LAPACK  distribution  from  Netlib  can  be  used,  which  is
   available  at [15]http://www.netlib.org/lapack/. Although vendor LAPACK
   routines  can be more optimized than those from Netlib, there should be
   no  adverse  performance  effects  of using Netlib LAPACK, since PLASMA
   only relies on LAPACK for implementing some of its sequential kernels.

   The  user  can point PLASMA’s installer to all the components already
   installed  in  the  system.  For  all  the  missing  components  Netlib
   equivalents  will  be  installed.  The  installer can also be forced to
   disregard  any  software  already  installed  in the system and use the
   Netlib packages instead.

  3.4. Thou Shalt Not Mix Compilers

   For  a  given  processor,  the user can have different compilers at his
   disposal.  For instance, GNU, PGI and Intel compilers are available for
   Intel  processors.  Different  compilers  can  have  slightly different
   Application  Binary Interfaces (ABIs) and mixing compilers is generally
   a bad idea. User’s code and the PLASMA library should be compiled with
   the  same  compiler,  and so should be BLAS, CBLAS, LAPACK and LAPACK C
   Wrapper,  if a source distribution is used. If a binary distribution of
   the  BLAS  is  used,  the  correct  version  has  to be chosen (the one
   providing  the  right  ABI).  For  Intel processors, the [16]Intel Math
   Kernel Library Link Line Advisor can be used to assist with the choice.

  3.5. Linking FORTRAN code with C Code

   Currently  PLASMA  library  does not contain any FORTRAN code any more.
   FORTRAN is only used in PLASMA’s testing suite derived from the one of
   LAPACK (/testing/lin/). Because of that, neither a FORTRAN compiler nor
   the FORTRAN standard library has to be involved in compiling PLASMA and
   linking  it  with  C code. The following paragraph is preserved in this
   README only because it is a very useful piece of information for novice
   users who are forced to mix FORTRAN and C.

   If  FORTRAN code is mixed with C code, the FORTRAN standard library has
   to  be incluced. Sometimes it can be accomplished by simply putting the
   standard   FORTRAN   library  at  the  and  of  the  link  line,  e.g.,
   "-lgfortran"  when  using  GCC.  Alternatively, FORTRAN compiler can be
   used  for  linking. This will accomplish the same effect automatically.
   However,  the Intel IFORT compiler, when used for linking, assumes that
   the   main  program  is  in  FORTRAN  and  links  for_main.o  into  the
   application.  This  provides  the linker with two main() functions (one
   created  by  the  user  and one inserted by the build system), which is
   cause a linker error. To prevent this from happening, the "-nofor_main"
   link option has to be given.
     __________________________________________________________________

4. A Note on Running on NUMA Systems

   PLASMA  is a software package for shared memory systems, both Symmetric
   Multi-Processors  (SMP)  and  Non-Uniform Memory Access (NUMA) systems.
   PLASMA  does  not  detect  the type of the system and does not take any
   specific actions in that respect. PLASMA’s performance may be poor on
   NUMA  systems  if  matrices  are  not distributed among multiple memory
   nodes.  The  current  wisdom  is to use "numactl --interleave=all" when
   running an application that uses PLASMA.
     __________________________________________________________________

5. License Information

   PLASMA  is  a  software  package  provided  by University of Tennessee,
   University  of California, Berkeley and University of Colorado, Denver.
   PLASMA’s  license  is  a  BSD-style  permissive free software license
   (properly  called  modified BSD). It allows proprietary commercial use,
   and for the software released under the license to be incorporated into
   proprietary  commercial  products.  Works  based on the material may be
   released  under  a  proprietary  license  as long as PLASMA’s license
   requirements  are maintained, as stated in the LICENSE file, located in
   the  main directory of the PLASMA distribution. In contrast to copyleft
   licenses, like the GNU General Public License, PLASMA’s license allows
   for  copies  and derivatives of the source code to be made available on
   terms more restrictive than those of the original license.
     __________________________________________________________________

6. Publications

   A  number  of  technical reports were written during the development of
   PLASMA  and  published as [17]LAPACK Working Notes by the University of
   Tennessee.  Almost  all  of  these  reports  later  appeared as journal
   articles.

   One  convenient way to make a reference to PLASMA is to cite [18]PLASMA
   Users'  Guide, Version 2.3, also available as a [19]LAPACK Working Note
   XXX and a [20]Technical Report UT-CS-XX-XXX. The following BibTeX entry
   can be used for LaTeX documents:

   @TECHREPORT{plasma_users_guide,
       author       = {{Dongarra}, J. and {Kurzak}, J. and {Langou}, J. and
                       {Langou}, J. and {Ltaief), H. and {Luszczek}, P. and
                       {YarKhan}, A. and {Alvaro}, W. and {Faverge}, M. and
                       {Haidar}, A. and {Hoffman}, J. and {Agullo}, E. and
                       {Buttari}, A. and {Hadri}, B.},
       title        = {{PLASMA} Users' Guide, Version 2.3},
       institution  = {Electrical Engineering and Computer Science Department,
                       Univesity of Tennessee},
       year         = {2010},
       type         = {technical report},
       number       = {UT-CS-XX-XXX},
       address      = {Knoxville, Tennessee 37996},
       month        = {September},
       note         = {},
       key          = {},
   }
     __________________________________________________________________

7. Funding

   The   PLASMA  project  is  funded  in  part  by  the  National  Science
   Foundation, Department of Energy, Microsoft, and the MathWorks.
     __________________________________________________________________

   Last updated 2010-07-21 23:25:36 EDT

Références

   1. http://icl.cs.utk.edu/plasma/
   2. http://www.netlib.org/lapack/
   3. http://www.netlib.org/scalapack/
   4. http://icl.cs.utk.edu/plasma/
   5. http://icl.cs.utk.edu/projectsfiles/plasma/html/InstallationGuide.html
   6. http://icl.cs.utk.edu/plasma/forum/
   7. http://web.tacc.utexas.edu/~kgoto/
   8. http://math-atlas.sourceforge.net/
   9. http://www.netlib.org/blas/
  10. http://icl.cs.utk.edu/projectsfiles/plasma/html/InstallationGuide.html
  11. http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/
  12. http://math-atlas.sourceforge.net/errata.html#LINK
  13. http://www.netlib.org/blas/
  14. http://www.netlib.org/blas/blast-forum/cblas.tgz
  15. http://www.netlib.org/lapack/
  16. http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/
  17. http://www.netlib.org/lapack/lawns/downloads/
  18. http://icl.cs.utk.edu/projectsfiles/plasma/pdf/users_guide.pdf
  19. http://www.netlib.org/lapack/lawnspdf/lawnXXX.pdf
  20. http://www.cs.utk.edu/~library/TechReports/2010/ut-cs-XX-XXX.pdf
