Coding Style

The following coding style should be followed in all MAGMA code. Consistency is important, and clarity of code is paramount. There are rare occasions to ignore a particular guideline, when following it would make the code less readable. The overriding goal is to make code easy-to-read and maintainable. This guide is based on the PLASMA coding style guide, though slightly different.

ISO C++ standard: CPU code should conform to ISO C++ (C++11) or ISO C (C99). GPU code should conform to the CUDA or OpenCL specs. One way to check for compliance is to use the following commands:
```
g++ -std=c++11 -Wall -pedantic -c magma.cpp
gcc -std=c99   -Wall -pedantic -c magma.c
```
This can be done by setting CFLAGS and CXXFLAGS appropriately in make.inc. MAGMA code needs to be portable and work on Unix, MacOS, and Windows. Most MAGMA code is compiled as C++, but uses essentially C style conventions. For example, all externally visible functions are declared extern "C" and C-style malloc routines are used, instead of the C++ new operator.
No Trailing Whitespace: There should be no trailing whitespace characters at the end of lines and no whitespace characters at the end of files (the last closing curly brace should be followed by a single newline).
Whitespace Separators: There should be a space between a C language keyword (if, for, while) and the left parenthesis, and a space before the left curly brace. There should be a space after the start of a comment (// foo). There should not be space between a function name and the left parenthesis.

There should be a space before and after boolean operators (<, <=, >, >=, ==, !=) and assignment operators (=, +=, -=, *=, /=), except when declaring a variable or initializing a loop variable. There should not be space between increment operators (++, --) and the variable. Space may be used around other operators (+, -, etc.) to aid readability.

Semi-colons are not preceded by a space. Inside for loop statement, semi-colons are followed by a space, e.g., for (i=0; i < n; ++i). Similarly, commas are not preceded by a space, and are followed by a space. Occasionally, additional spaces can be used after the comma to line up items on consecutive lines for better readability.
Blank Lines: Two blank lines go between functions. Single blank lines may be used, sparingly, inside functions to separate major sections. There should not be excessive blank lines. In particular, there should not be blank lines after an opening curly brace, or before a closing curly brace. There should not be multiple consecutive blank lines inside a function.
Indentation and Braces: Every function, if block, and loop should be indented by four spaces. For functions, the opening and closing braces are on lines by themselves, in the first column. For control flow (if, else, for, while), the left curly brace follows the control flow statement on the same line. There is no newline between the control flow statement and the block enclosed by curly braces. The closing curly brace is on a new line right after the end of the enclosed block.

Even when the body of a block is only one line, it should be on a separate line from the control flow statement, and it is preferred to enclose it with curly braces to avoid errors when adding code (goto fail; goto fail;). Especially, loops and if statements containing more than one line of text should have curly braces, regardless of the number of C statements.
Examples:

void func( magma_int_t n, double *A, magma_int_t lda )
{
    if (m <= 0 || n <= 0 || k <= 0) return;  // Avoid
    code_not_inside_if();
    if (m <= 0 || n <= 0 || k <= 0)          // Okay
        return;
    if (m <= 0 || n <= 0 || k <= 0) {        // Better
        return;
    }
    if (condition)      // Avoid single statement, multi-line if and loops!
        magma_zherk( uplo, trans, n, k,
                     alpha, dA(0,0), ldda,
                     beta,  dB(0,0), lddb );
    if (condition) {    // Better
        magma_zherk( uplo, trans, n, k,
                     alpha, dA(0,0), ldda,
                     beta,  dB(0,0), lddb );
    }
    if (condition)      // Avoid single block, multi-statement if and loops!
        for (i=0; i < n; ++i) {
            A(i,i) = 0;
            cnt += 1;
        }
    if (condition) {    // Avoid inconsistent indentation! Always use 4 spaces.
      for (i=0; i < n; ++i) {
            A(i,i) = 0;
             cnt += 1;
       }
    }
    if (condition) {    // Better
        for (i=0; i < n; ++i) {
            A(i,i) = 0;
            cnt += 1;
        }
    }
    if (condition) {    // Avoid excessive newlines
        norm = magma_dnrm2( n, dx(0), incx );
    }
}

Matrix macros: C does not understand 2D matrices, other than some limited, fixed size constructs (int foo[4][4]). We use macros to implement 2D matrices. The macro returns a pointer to the element, not the element itself, unlike A[i][j]. For CUDA and Xeon Phi, dA(i,j) yields the pointer to element i,j. In OpenCL, dA(i,j) yields two values: the cl_mem object dA, and the offset to element i,j. This significantly improves the ease of writing code, readability, and porting to platforms such as OpenCL.

// host pointers are the same for clBLAS, CUDA, and Xeon Phi.
#define  A(i_,j_)  (A + (i_) + (j_)*lda)
// for clBLAS, return cl_mem object and offset (2 values);
// for others, return (pointer + offset) (1 value).
#ifdef MAGMA_HAVE_OPENCL
    #define dA(i_,j_)  dA , (i_) + ((size_t) j_)*lda
#else
    #define dA(i_,j_) (dA + (i_) + ((size_t) j_)*lda)
#endif
// for host pointers, A == A(0,0).
blasf77_zherk( lapack_uplo_const(uplo), lapack_trans_const(trans), &n, &k,
               &alpha, A,      &lda,
               &beta,  B(i,j), &ldb );
// for device pointers, use dA(0,0) instead of dA, to aid porting to OpenCL.
magma_zherk( uplo, trans, n, k,
             alpha, dA(0,0), ldda,
             beta,  dB(i,j), lddb );

Documentation: Each function should be documented using Doxygen, using a C-style comment block immediately before the function. Generally follow the traditional LAPACK documentation style, with added Doxygen markup.

A Purpose section gives an overall description of the routine.

An Arguments section specifies each argument — spelled and capitalized exactly as in the C code — with its type, dimensions, description, and valid values (e.g., n >= 0).

Each function should be placed in a group using @ingroup, which groups routines by the related driver routine (e.g., zgetrf, zgetri, zgetf2 are all in one of the magma_zgesv groups), and whether it is a driver, computational, or auxiliary routine. There are also groups for BLAS and auxiliary routines. See docs/doxygen-modules.h for a list of groups (modules).

After compiling documentation (cd docs ; make), check docs/output_err for missing parameters or other issues.
Kernel documentation: For computational kernels, the CPU driver function (e.g., magmablas_zlacpy) should be documented with Doxygen, as above. The low-level, internal CUDA/OpenCL kernel itself (e.g., zlacpy_device_full) should have a short paragraph describing how the matrix or computation is divided into blocks, and what each thread does. This tremendously increases the readability and maintainability of GPU device code, which is inherently difficult to understand. For instance:

/*
    Divides matrix into ceil( m/BLK_X ) x ceil( n/BLK_Y ) blocks.
    Each block has BLK_X threads.
    Each thread loops across one row, updating BLK_Y entries.
    Code similar to zlaset.
*/
static __device__ void
zlacpy_device_full(

Internal comments: Comments inside functions should be used to provide additional insight into what each section of code is doing, sufficient that another person can quickly follow the code.
Line length: There is no specific limit on the length of lines. Up to 90 columns is fine. Clarity is paramount. For multi-line function calls it is recommended that new lines start either after the left paranthesis or indented 4 spaces, always aligned with the first argument.

magma_zherk(                           // Okay; preferred style for prototypes
    uplo, trans, n, k,
    alpha, dA(0,0), ldda,
    beta,  dB(i,j), lddb );
magma_zherk( uplo, trans, n, k,        // Okay
             alpha, dA(0,0), ldda,
             beta,  dB(i,j), lddb );
magma_zherk( uplo, trans, n, k,        // Avoid
    alpha, dA(0,0), ldda,
    beta,  dB(i,j), lddb );

Newlines: Only Unix newlines (\n, ASCII 10) should be used; Windows newlines (\r\n) and returns (\r) should not be used. Every file should have exactly one end-of-line character at the end, unless it's a zero-length file.

Tabs: Tab characters should not be used. Tabs should always be emulated by four spaces, a feature available in almost any text editor.

For vim, sample ~/.vimrc file:

" see http://stackoverflow.com/questions/19835905
set shiftwidth=4
set softtabstop=4
set expandtab
set shiftround

For emacs, sample ~/.emacs file:

; see http://www.emacswiki.org/emacs/NoTabs
; see http://www.emacswiki.org/emacs/IndentingC
(setq-default indent-tabs-mode nil)
(setq c-default-style "k&r"
      c-basic-offset 4)

Variable Declarations: For the most part, variables should be declared at the beginning of each function, unless doing otherwise significantly improves code clarity in a specific case. Loop variables may be declared inside the for statement itself.
Constants: Constants should have appropriate types. In most cases, floating point constants need not have a decimal point — e.g., 2 will get converted to 2.0 or 2.0f as appropriate, while 2.0 sometimes gives a warning if double is not supported (e.g., on some OpenCL devices). If a constant is a bit mask, it is recommended that it is given in hexadecimal notation.
printf Strings: C concatenates strings separated by whitespace. There is no need for multiple printf calls to print a multi-line message. One printf can be used with multiple strings:

printf( "Usage: %s\n"
        "  --flag1  description\n"
        "  --flag2  description\n"
        "  --flag3  description\n",
        argv[0] );

F77 Trailing Underscore: When calling a FORTRAN function the trailing underscore should never be used. If the underscore is needed it should be added by an appropriate conditional preprocessor definition in an appropriate header file (e.g., magma_zlapack.h).
Special Characters: No special characters should be used used in the code. The ASCII codes allowed in files are between 32 and 127, plus code 10 for new line.

Naming Conventions

Functions: follow the names used by LAPACK. Externally visible C functions are prefixed with magma_ or magmablas_, and are all lowercase (e.g., magma_zgetrf).
Constants: are prefixed with Magma (e.g., MagmaNoTrans), and are CamelCase.
Macros: are prefixed with MAGMA_ (e.g., MAGMA_Z_ADD), and are all uppercase.
Filenames: C files are named after the function each contains, e.g., zgetrf.cpp contains magma_zgetrf(). External header filenames should be prefixed with magma_ or magmablas_; they are in the include directory.
Variables: follow general math typesetting rules: matrices are uppercase (A, B, C), vectors are lowercase (x, y, z), matrix sizes are lowercase (m, n, k), loop indices are lowercase (i, j, k), scalars are often Greek (alpha, beta).

Variables on the CPU should follow LAPACK names.

Variables on the GPU are similar but have "d" prepended. The corresponding leading dimension likewise has an added "d": dA and ldda, dB and lddb, etc.

Coding Practices

Prototypes: All function prototypes should be placed in appropriate header files. A prototype should not be placed in any C file (except perhaps the one its function is defined in). This avoids bugs by ensuring that all code uses the same prototype for each function.
Preprocessor Macros: Conditional compilation, through the #define directive, should be used only for portability reasons, and never for making choices that can be decided at runtime. Excessive use of #define macros leads to frequent recompilations and obscure code.
Dead Code: There should be no dead code: no code that is never executed, no including of header files that are not necessary, no unused variables. Dead code can be justified if it serves as a comment, e.g., canonical form of optimized code. In such case the code should be in comments.
OS Interactions: Error checks have to follow each interaction with the OS. The code should never be terminated by the OS. In particular each memory allocation should be checked. The code cannot produce a segmentation fault.
User Interactions: User input needs to be checked for correctness. The user should not be able to cause undefined behavior (except by lying about an allocation size). In particular the user should not be able to cause termination of the code by the OS.