# MAGMA MIC

MAGMA on Intel Xeon Phi (MIC)

Innovative Computing Laboratory Electrical Engineering and Computer Science University of Tennessee

Piotr Luszczek (presenter)



### Intel Xeon Phi specific considerations

- Intel Xeon Phi coprocessors (vs GPUs) are less dependent on host
  - can login on the coprocessor, develop, and run programs in native mode
- There is no high-level API similar to CUDA/OpenCL facilitating Intel Xeon Phi's use from the host
  - OpenCL 1.2 support is available for Intel Xeon Phi as of Intel SDK XE 2013
     Beta
- There is Phi-specific syntax for pragma programming
  - Syntax: #pragma offload target(mic:0) in(a,b) out(c) wait(x)
  - It may be too high-level for some HPC applications and numerical libraries
- Compiler offload interface (COI)
- We used Intel Xeon Phi's Low Level API (LLAPI) to develop MAGMA API
  - Allows us to uniformly handle hybrid systems
  - Messaging API (SCIF) is platform-specific

# **MAGMA MIC programming model**



- Intel Xeon Phi acts as coprocessor
- On the Intel Xeon Phi, MAGMA runs a "server"
- Communications are implemented using LLAPI and SCIF

### **MAGMA MIC Cholesky Code**

```
for (i=0; i< n; i+= nb) {
    ib = min(nb, n - i):
    magma_zherk( MagmaUpper, MagmaConjTrans,
                    jb, j, m one, dA(0, j), ldda, one, dA(j, j), ldda
    magma zgetmatrix async( jb, jb, dA(j,j), ldda, work, 0, jb, q
    if (j+jb < n)
       magma zgemm( MagmaConjTrans, MagmaNoTrans, jb,
                         dA(0, j), ldda, dA(0, j+jb), ldda, z one,
    magma event sync( event );
    lapackf77 zpotrf( MagmaUpperStr, &jb, work, &jb, info );
    if ( *info != 0 )
        *info += j:
    magma zsetmatrix async( jb, jb, work, 0, jb, dA(j,j), ldda, qi
    if (j+jb < n)
       magma event sync( event );
       magma ztrsm( MagmaLeft, MagmaUpper, MagmaConj7.....,
                       jb, n-j-jb, z_one, dA(j, j), ldda, dA(j, j+jb), ldda,
```

```
// BLAS functions
magma_err_t
magma_zgemm(
    magma_trans_t transA, magma_trans_t transB,
    magma_int_t m, magma_int_t m, magma_int_t k,
    magmaDoubleComplex alpha, magmaDoubleComplex const ptr dA, size t dA offset, magma_int_t lda,
                              magmaDoubleComplex const ptr dB, size t dB offset, magma int t ldb,
    magmaDoubleComplex beta, magmaDoubleComplex.ptr
                                                        dC, size_t dC_offset, magma_int_t ldc.
    magma_queue_t handle )
    magma_mic_zgemm_param gemm_param;
    gemm_param.transa = transA;
    gemm_param.transb = transB;
    gemm_param.n
    genm_param.k
                      = k:
    gemm_param.alpha = alpha;
                      = dA + dA offset;
    gemm_param.a
                      = lda;
    gemm_param.lda
    gemm_param.b
                      = dB + dB_offset;
    gemm_param.ldb
                      = ldb;
    genm_param.beta
                    = beta:
                      = dC + dC_offset;
    gemm_param.c
                     = ldc;
    gemm_param.ldc
    int control msg = magma_mic_ZGEMM;
    if ((err = scif_send(gEpd, &control_msg, sizeof(control_msg), 1)) <= 0) {
```

Send asynchronous requests to the MIC;
Queued & Executed on the MIC

### MAGMA MIC Performance (QR)



### **MAGMA MIC Performance (Cholesky)**



### **MAGMA MIC Performance (LU)**



### MAGMA MIC Performance (LU, QR, Chol.)



# **MAGMA MIC Performance (Hessenberg)**



# From Single to Multi-MIC Support

- Data distribution
  - 1-D block-cyclic distribution
- Algorithm
  - MIC holding current panel is sending it to CPU
  - All updates are done in parallel on the MICs
  - Look-ahead is done with MIC holding the next panel



### MAGMA MIC Scalability for LU (real64)



# MAGMA MIC Scalability for QR (real64)

### MAGMA DGEQRF Performance(Multiple Card)



# MAGMA MIC Scalability for Cholesky (real64)

### MAGMA DPOTRF Performance(Multiple Card)



#### Host

+4 MIC

→ 3 MIC

→2 MIC

❤1 MIC

Sandy Bridge (2 x 8 @2.6 GHz) DP Peak 332 GFlop/s

#### Coprocessor

Intel Xeon Phi ( 60 @ 1.09 GHz) DP Peak 1046 GFlop/s

System DP Peak 1378 GFlop/s MPSS 2.1.4346-16 compiler\_xe\_2013.1.117

### **Contact Information and Generous Sponsors**

### **Stan** Tomov

tomov@eecs.utk.edu

#### **MAGMA** team

http://icl.cs.utk.edu/magma/

#### **PLASMA** team

http://icl.cs.utk.edu/plasma/

### **Collaborating partners**

- University of Tennessee, Knoxville
- University of California, Berkeley
- University of Colorado, Denver
- INRIA, France (StarPU team)
- KAUST, Saudi Arabia













MAGMA



**PLASMA** 

