1. Purpose of PLASMA

The main purpose of PLASMA is to address the performance shortcomings of the LAPACK and ScaLAPACK libraries on multicore processors and multi-socket systems of multicore processors. PLASMA provides routines to solve dense general systems of linear equations, symmetric positive definite systems of linear equations and linear least squares problems, using LU, Cholesky, QR and LQ factorizations. Real arithmetic and complex arithmetic are supported in both single precision and double precision.

PLASMA has been designed to supercede LAPACK and ScaLAPACK, principally by restructuring the software to achieve much greater efficiency, where possible, on modern computers based on multicore processors. PLASMA also relies on new or improved algorithms. Currently, however, PLASMA does not serve as a complete replacement of LAPACK due to limited functionality. Specifically, PLASMA does not support band matrices and does not solve eigenvalue and singular value problems. Also, PLASMA does not replace ScaLAPACK as software for distributed memory computers, since it only supports shared-memory machines.

2. Where to Find More Information

The main repository for PLASMA documentation, created to date, is the distribution ./docs directory. The directory contains important documents such as the reference manual.

PLASMA documentation is also available online on the PLASMA website: http://icl.cs.utk.edu/plasma/

For installation please refer to the installation guide.

In addition, the PLASMA User Forum can be used to post general questions and comments as well as to report technical problems.

3. Important Information about BLAS and LAPACK

3.1. Optimized BLAS are Critical for Performance

It is absolutely critical for performance to use PLASMA in conjunction with an optimized implementation of the Basic Linear Algebra Subroutines (BLAS) library. Such implementations are usually provided by the processor manufacturer and are usually available free of charge for non-profit use, such as academic research. Examples include:

Open-source alternatives also exist, such as:

As a last resort, the FORTRAN implementation of BLAS from Netlib can be used (often referred to as reference BLAS). However, since Netlib BLAS are completely unoptimized, PLASMA with Netlib BLAS will deliver correct numerical results, but no performance whatsoever.

3.2. Multithreading within BLAS Must be Disabled

Many Basic Linear Algebra Subroutines (BLAS) implementations exploit parallelism within BLAS through multithreading. PLASMA, however, utilizes BLAS for high performance implementations of single-core operations (often referred to as kernels) and exploits parallelism at the algorithmic level above the level of BLAS. For that reason, PLASMA must not be used in conjunction with a multithreaded BLAS, as this is likely to create more execution threads than actual cores, which will annihilate PLASMA’s performance. PLASMA needs to be linked with a sequential BLAS library or a multithreaded BLAS library with multithreading disabled. Typically, disabling of multithreading can be done by setting an appropriate environment variable from the command prompt, for instance:

> export OMP_NUM_THREADS=1
> export MKL_NUM_THREADS=1
> export GOTO_NUM_THREADS=1

If you are using the ATLAS library, please provide the serial interface rather than the parallel. Take a look at http://math-atlas.sourceforge.net/errata.html#LINK for more info.

3.3. CBLAS Is Not Required

Although many BLAS implementations provide C language API (CBLAS), some deviate from the standard. Internally, PLASMA calls BLAS through both the FORTRAN 77 interface and the C interface. However, PLASMA does not require the user to provide CBLAS. Instead, PLASMA includes the Netlib CBLAS and only requires BLAS with the FORTRAN 77 interface (sometimes referred to as the legacy BLAS).

3.4. LAPACK Is Not Required

PLASMA does not require the user to provide the LAPACK library. Internally, PLASMA calls a small subset of LAPACK routines. However, all necessary LAPACK routines are included in PLASMA.

3.5. Thou Shalt Not Mix Compilers

For a given processor, the user can have different compilers at his disposal. For instance, GNU, PGI and Intel compilers are available for Intel processors. Different compilers can have slightly different Application Binary Interfaces (ABIs) and mixing compilers is generally a bad idea. User’s code and the PLASMA library should be compiled with the same compiler, and so should be the BLAS library, if a source distribution is used. If a binary distribution of the BLAS is used, the correct version has to be chosen (the one providing the right ABI). For Intel processors, the Intel Math Kernel Library Link Line Advisor can be used to assist with the choice.

4. License Information

PLASMA is a software package provided by University of Tennessee, University of California, Berkeley and University of Colorado, Denver. PLASMA’s license is a BSD-style permissive free software license (properly called modified BSD). It allows proprietary commercial use, and for the software released under the license to be incorporated into proprietary commercial products. Works based on the material may be released under a proprietary license as long as PLASMA’s license requirements are maintained, as stated in the LICENSE file, located in the main directory of the PLASMA distribution. In contrast to copyleft licenses, like the GNU General Public License, PLASMA’s license allows for copies and derivatives of the source code to be made available on terms more restrictive than those of the original license.

5. Publications

A number of technical reports were written during the development of PLASMA and published as LAPACK Working Notes by the University of Tennessee. Many of these reports later appeared as journal articles.

6. Funding

The PLASMA project is funded by the National Science Foundation (NSF Grant No. CCF-0811642, NSF Grant No. CCF-0811520) and the Microsoft Corporation.