![]() |
MAGMA
2.0.0
Matrix Algebra for GPU and Multicore Architectures
|
The sparse-iter package recently added to the MAGMA software stack contains sparse BLAS routines as well as functions to handle the complete iterative solution process of a sparse linear system of equations.
A typical application includes:
For each of these steps, multiple options are offered by the MAGMA software stack.
For a more generic programming approach, the sparse data structures (matrices and vectors) are stored in data structures containing all information necessary to access the sparse-BLAS via wrappers:
struct magma_z_matrix { magma_storage_t storage_type; // matrix format - CSR, ELL, SELL-P magma_location_t memory_location; // CPU or DEV magma_symmetry_t sym; // opt: indicate symmetry magma_diagorder_t diagorder_type; // opt: only needed for factorization matrices magma_uplo_t fill_mode; // fill mode full/lower/upper magma_int_t num_rows; // number of rows magma_int_t num_cols; // number of columns magma_int_t nnz; // opt: number of nonzeros magma_int_t max_nnz_row; // opt: max number of nonzeros in one row magma_int_t diameter; // opt: max distance of entry from main diagonal union { magmaDoubleComplex *val; // array containing values in CPU case magmaDoubleComplex_ptr dval; // array containing values in DEV case }; union { magmaDoubleComplex *diag; // opt: diagonal entries in CPU case magmaDoubleComplex_ptr ddiag; // opt: diagonal entries in DEV case }; union { magma_index_t *row; // opt: row pointer CPU case magmaIndex_ptr drow; // opt: row pointer DEV case }; union { magma_index_t *rowidx; // opt: array containing row indices CPU case magmaIndex_ptr drowidx; // opt: array containing row indices DEV case }; union { magma_index_t *col; // opt: array containing col indices CPU case magmaIndex_ptr dcol; // opt: array containing col indices DEV case }; union { magma_index_t *list; // opt: linked list pointing to next element magmaIndex_ptr dlist; // opt: linked list pointing to next element }; magma_index_t *blockinfo; // opt: for BCSR format CPU case magma_int_t blocksize; // opt: info for SELL-P/BCSR magma_int_t numblocks; // opt: info for SELL-P/BCSR magma_int_t alignment; // opt: info for SELL-P/BCSR magma_order_t major; // opt: row/col major for dense matrices magma_int_t ld; // opt: leading dimension for dense } magma_z_matrix;
The purpose of the unions (e.g. for the val array) is to support different hardware platforms, where the magmaXXX_ptr is adapted to the respective device characteristics.
For sparse matrices, the main formats are CSR, ELL and the MAGMA-specific SELL-P. Generally, the sparse-BLAS routines provide for these the best performance.
Without specifying the storage type, the memory location or the dimension of the matrix, the sparse matrix vector product can then be used via the wrapper:
magma_z_spmv( magmaDoubleComplex alpha, magma_z_matrix A, magma_z_matrix x, magmaDoubleComplex beta, magma_z_matrix y, magma_queue_t queue );
A sparse matrix stored in mtx format can be read from disk via the function:
magma_z_csr_mtx( magma_z_matrix *A, const char *filename, magma_queue_t queue );
A sparse linear system of dimension mxn present in main memory can be passed to MAGMA via the function:
magma_zcsrset( magma_int_t m, magma_int_t n, magma_index_t *row, magma_index_t *col, magmaDoubleComplex *val, magma_z_matrix *A, magma_queue_t queue );
where row, col, val contain the matrix in CSR format.
If the matrix is already present in DEV memory, the corresponding function is
magma_zcsrset_gpu( magma_int_t m, magma_int_t n, magmaIndex_ptr row, magmaIndex_ptr col, magmaDoubleComplex_ptr val, magma_z_matrix *A, magma_queue_t queue );
Similarly, matrices handled in MAGMA can be returned via the functions
magma_zcsrget( magma_z_matrix A, magma_int_t *m, magma_int_t *n, magma_index_t **row, magma_index_t **col, magmaDoubleComplex **val, magma_queue_t queue ); magma_zcsrget_gpu( magma_z_matrix A, magma_int_t *m, magma_int_t *n, magmaIndex_ptr *row, magmaIndex_ptr *col, magmaDoubleComplex_ptr *val, magma_queue_t queue ); respectively write_z_csrtomtx( magma_z_matrix A, const char *filename, magma_queue_t queue );
Additionally, MAGMA contains routines to generate stencil discretizations of different kind.
Vectors are handled as dense matrices (Magma_DENSE) and can be initialized inside MAGMA via
magma_z_vinit( magma_z_matrix *x, magma_location_t memory_location, magma_int_t num_rows, magmaDoubleComplex values, magma_queue_t queue );
where ''memory_location'' sets the location (Magma_CPU or Magma_DEV). Also, vectors can be read from file via
magma_z_vread( magma_z_matrix *x, magma_int_t length, char * filename, magma_queue_t queue );
or - in case of a block of sparse vectors stored as CSR matrix - via
magma_z_vspread( magma_z_matrix *x, const char * filename, magma_queue_t queue );
or passed from/to main memory:
magma_zvset( magma_int_t m, magma_int_t n, magmaDoubleComplex *val, magma_z_matrix *b, magma_queue_t queue ); magma_zvget( magma_z_matrix b magma_int_t *m, magma_int_t *n, magmaDoubleComplex **val, magma_queue_t queue );
To convert a matrix from one into another format, the CPU-based routine
magma_z_mconvert( magma_z_matrix A, magma_z_matrix *B, magma_storage_t old_format, magma_storage_t new_format, magma_queue_t queue );
can be used where old_format and new_format determine the specific conversion.
All iterative solvers and eigensolvers included in the sparse-iter package work on the device. Hence, it is required to send the respective data structures to the device for solving, and back to access the solution. The functions
magma_z_mtransfer( magma_z_matrix A, magma_z_matrix *B, magma_location_t src, magma_location_t dst, magma_queue_t queue ); magma_z_vtransfer( magma_z_matrix x, magma_z_matrix *y, magma_location_t src, magma_location_t dst, magma_queue_t queue );
allow any data copy operation - from host to device, device to device, device to host or host to host.
Linear algebra objects can be deallocated via
magma_z_mfree( magma_z_matrix *A, magma_queue_t queue );
The sparse-iter package contains a variety of linear solvers, eigensolvers, and preconditioners. The standard procedure to call a solver is to pass the linear algebra objects (located on the device) and a structure called magma_z_solver_par (respectively and magma_z_precond_par) controlling the iterative solver and collecting information during the execution:
struct magma_z_solver_par { magma_solver_type solver; // solver type magma_int_t version; // sometimes there are different versions double atol; // absolute residual stopping criterion double rtol; // relative residual stopping criterion magma_int_t maxiter; // upper iteration limit magma_int_t restart; // for GMRES magma_ortho_t ortho; // for GMRES magma_int_t numiter; // feedback: number of needed iterations double init_res; // feedback: initial residual double final_res; // feedback: final residual double iter_res; // feedback: iteratively computed residual real_Double_t runtime; // feedback: runtime needed real_Double_t *res_vec; // feedback: array containing residuals real_Double_t *timing; // feedback: detailed timing magma_int_t verbose; // print residual ever 'verbose' iterations magma_int_t num_eigenvalues; // number of EV for eigensolvers magma_int_t ev_length; // needed for framework double *eigenvalues; // feedback: array containing eigenvalues magmaDoubleComplex_ptr eigenvectors; // feedback: array containing eigenvectors on DEV magma_int_t info; // feedback: did the solver converge etc.
the input for verbose is: 0 = production mode k>0 = convergence and timing is monitored in *res_vec and *timeing every k-th iteration
the output of info is: 0 = convergence (stopping criterion met) -1 = no convergence
} magma_z_solver_par;
These entities can either be initialized manually, or via the function
magma_zsolverinfo_init( magma_z_solver_par *solver_par, magma_z_preconditioner *precond, magma_queue_t queue );
setting them to some default values. For eigensolvers, the workspace needed for the eigenvectors has to be allocated consistent with the matrix dimension, which requires additionally calling
magma_zeigensolverinfo_init( magma_z_solver_par *solver_par, magma_queue_t queue );
after setting solver_par.ev_length and solver_par.num_eigenvalues to the correct numbers.
An easy way to access the data collected during a solver execution is given by the function
magma_zsolverinfo( magma_z_solver_par *solver_par, magma_z_preconditioner *precond_par, magma_queue_t queue );
After completion,
magma_zsolverinfo_free( magma_z_solver_par *solver_par, magma_z_preconditioner *precond, magma_queue_t queue );
deallocates all memory used within the solver and preconditioner structure.
A solver can then be called via
magma_z*solvername*( magma_z_matrix A, magma_z_matrix b, magma_z_matrix *x, magma_z_solver_par *solver_par, magma_queue_t queue );
respectively
magma_zp*solvername*( magma_z_matrix A, magma_z_matrix b, magma_z_matrix *x, magma_z_solver_par *solver_par, magma_z_preconditioner *precond_par, magma_queue_t queue );
for preconditioned solvers. More conveniant is the use the wrapper
magma_z_solver( magma_z_matrix A, magma_z_matrix b, magma_z_matrix *x, magma_zopts *zopts, magma_queue_t queue );
where zopts is a structure containing both, the solver and the preconditioner information: struct magma_zopts{ magma_z_solver_par solver_par; magma_z_preconditioner precond_par; magma_storage_t input_format; int blocksize; int alignment; magma_storage_t output_format; magma_location_t input_location; magma_location_t output_location; magma_scale_t scaling; }magma_zopts;
All entities of this structure can be initialized from command line by calling
magma_zparse_opts( int argc, char** argv, magma_zopts *opts, int *matrices, magma_queue_t queue );
Especially when using sparse-iter for the first time, the easiest way to get familiar with the package is to use and modify one of the predefined testers.
In the following example we assume to have an application coded in C and running in double precision that at some instance requires solving a linear system of the form Ax=b, where A and b are generated within the application. Furthermore, we assume that the matrix A is in CSR format together with a RHS vector b in present in main memory. We do not know which solver and preconditioner combination works best for this problem, and which data format gives the best sparse-BLAS performance. For this purpose we extract parts of the testing_dsolver.cpp located in the sparse-iter/testing folder.
First, we copy the includes, initialization and finalization into our application program. Then we focus on the linear system we want to solve. We extract line 34-43 and copy it into the right location. This includes the setup of the solver parameters via command line. As we do not want to read the linear system from file, but solve the system present in memory, we use
magma_dcsrset( m, n, row, col, val, &A, queue );
to pass the matrix (present in row, col val) and
magma_dvset( m, n, valb, &b, queue );
to pass the RHS vector b (entries in valb) to MAGMA. For the solution vector, we allocate an initial guess on the device via:
magma_d_vector x_d; magma_d_vinit( &x_d, Magma_DEV, A.num_cols, one, queue );
We then copy line 93-95 into the application to convert the matrix into the preferred format and copy it to the device. The RHS is transferred via
magma_d_vector b_d; magma_d_vtransfer( b, &b_d, Magma_CPU, Magma_DEV, queue );
We solve with
magma_d_solver( B_d, b_d, &x_d, &zopts, queue );
and return the solution vector to the application via
magma_int_t m, n; double * valx; magma_dvget( x_d, &m, &n, &valx, queue );
Finally, we clean up the used memory via
magma_z_mfree(&B_d, queue ); magma_z_mfree(&B, queue ); magma_z_vfree(&x_d, queue ); magma_z_vfree(&b_d, queue );
and magma_zsolverinfo_free( &zopts.solver_par, &zopts.precond_par, queue );
Finally, we clean up the used memory via
magma_z_mfree(&B_d, queue ); magma_z_mfree(&B, queue ); magma_z_vfree(&x_d, queue ); magma_z_vfree(&b_d, queue );
and
magma_dsolverinfo_free( &dopts.solver_par, &dopts.precond_par, queue );