Implementing Opencl kernel using clmagma arrays

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)
Post Reply
jmrnavarro
Posts: 3
Joined: Thu May 12, 2016 5:22 am

Implementing Opencl kernel using clmagma arrays

Post by jmrnavarro » Thu May 12, 2016 5:31 am

Hello everyone!
i'm begginer with Opencl using clMagma library, my problem look like this little example:

Code: Select all

int row = 3, col = 3;
double *A_h, *B_h;
cl_mem A_k, B_k;
magma_malloc(&A_k, row*col*sizeof(double));
magma_malloc(&B_k, row*col*sizeof(double));
magma_malloc_cpu((void**) &A_h, row*col*sizeof(double));
magma_malloc_cpu((void**) &B_h, row*col*sizeof(double));
for(int z = 0; z < row*col; z++) A_h[z] = z+1;
for(int z = 0; z < row*col; z++) printf("%f ",A_h[z]); printf("\n");

magma_dsetmatrix(row, col, A_h, row, A_k, size, row, queue);
magma_dgemm(MagmaNoTrans, MagmaTrans, row, row, col, alpha, A_k, size, row, A_k, size, row, beta, B_k, size, row, queue);
magma_dgetmatrix(row,col, B_k, size, row, B_h, row, queue);

for(int z = 0; z < row*col; z++) printf("%f ",B_h[z]); printf("\n");//working 


cl_kernel k_prueba;
k_prueba = clCreateKernel(program, "prueba", &status);
	
status = clSetKernelArg(k_prueba, 0, sizeof(cl_mem), &A_k);
exitOnFail(status, "Unable to set kernel prueba arguments.");//fail error -38

size_t global_prueba = row*col;
status = clEnqueueNDRangeKernel(command_queue, k_prueba, 1, NULL, &global_prueba, NULL, 0, NULL, NULL);
exitOnFail(status, "Launch OpenCL prueba kernel");

magma_dgetmatrix(row, col, A_k, size, row, A_h, row, queue);
for(int z = 0; z < row*col; z++) printf("%f ",A_h[z]); printf("\n");
As you can see what i'm trying to achieve is: i have some arrays, that are allocated using magma_malloc, then do some work whit them like magma_dgemm, and finaly use that array inside a normal OpenCl Kernel, but it seems that cant be use like that because i get error -38 (CL_INVALID_MEM_OBJECT)

So my question is how can i do some work using both Clmagma and Opencl kernel?

i have tried already other options but didnt fully work.
  • clEnqueueCopyBuffer. i'm guessing that magma pointer is not a buffer?
  • clCreateBuffer, declaring arrays whit clCreateBuffer don't work in magma_dgemm
Any help will be much apreciated, thx for your time reading this and sorry if there is any misspelling error.
Best regards, Miguel

mgates3
Posts: 918
Joined: Fri Jan 06, 2012 2:13 pm

Re: Implementing Opencl kernel using clmagma arrays

Post by mgates3 » Fri May 13, 2016 8:46 pm

At the moment, in clMAGMA, a magma ptr type is a cl_mem object. (We may change this at some point, but if so would provide a mechanism to go convert back-and-forth.)

I think the problem is you need to use the same context as clMAGMA. At the moment, this means initializing clMAGMA, then getting the context from it. This can be done by creating a queue and querying it for its context. (A clMAGMA queue is currently the same as an OpenCL queue.) Below is a working example.

This is a bit more obscure than I would like, so we'll add some functions to make integration with outside OpenCL contexts easier.

Code: Select all

// swap.cpp
#include <stdio.h>
#include <string.h>

#include "magma.h"


// ------------------------------------------------------------
// internal MAGMA routine, in interface_opencl/error.cpp
// This interface may change in the future!
extern "C"
const char* magma_clGetErrorString( cl_int error );


// ------------------------------------------------------------
void exitOnFail_( int err, const char* msg,
                  const char* func, const char* file, int line )
{
    if ( err != 0 ) {
        fprintf( stderr, "Error: %s; %d: %s in %s at %s:%d\n",
                 msg, err, magma_clGetErrorString(err),
                 func, file, line );
        exit(1);
    }
}

#define exitOnFail( err, msg ) \
    exitOnFail_( err, msg, __func__, __FILE__, __LINE__ )


// ------------------------------------------------------------
int main( int argc, char** argv )
{
    // ----- added
    magma_init();
    
    int err = 0;
    
    // I don't really recommend embedding OpenCL code this way, but for brevity...
    const char* src = 
        "__kernel void dswap( int n, __global double* x, __global double *y )"
        "{"
        "    int i = get_local_id(0);"
        "    double tmp = x[i];"
        "    x[i] = y[i];"
        "    y[i] = tmp;"
        "}";
    
    double alpha = 3.0;
    double beta  = 2.0;
    
    int ndevices = 0;
    magma_device_t device;
    magma_getdevices( &device, 1, &ndevices );
    
    magma_queue_t queue;
    magma_queue_create( device, &queue );
    
    // get OpenCL context from MAGMA queue
    // (probably MAGMA will provide a function to do this nicely in the future.)
    cl_context context;
    err = clGetCommandQueueInfo( queue, CL_QUEUE_CONTEXT, sizeof(cl_context), &context, NULL );
    exitOnFail( err, "clGetCommandQueueInfo" );
    
    cl_program program = clCreateProgramWithSource( context, 1, &src, NULL, &err );
    exitOnFail( err, "clCreateProgramWithSource" );
    
    err = clBuildProgram( program, 1, &device, NULL, NULL, NULL );
    exitOnFail( err, "clBuildProgram" );
    
    char text[ 8192 ];
    err = clGetProgramBuildInfo( program, device, CL_PROGRAM_BUILD_LOG, sizeof(text), text, NULL );
    exitOnFail( err, "clGetProgramBuildInfo" );
    int len = strlen( text );
    if ( len > 0 ) {
        printf( "warning:\n%s\n\n", text );
    }
    // ----- done added
    
    
    
    int row = 3, col = 3;
    double *A_h, *B_h;
    cl_mem A_k, B_k;

    magma_malloc( &A_k, row*col*sizeof(double) );
    magma_malloc( &B_k, row*col*sizeof(double) );

    magma_malloc_cpu( (void**) &A_h, row*col*sizeof(double) );
    magma_malloc_cpu( (void**) &B_h, row*col*sizeof(double) );

    for(int z = 0; z < row*col; z++) {
        A_h[z] = z+1;
    }
    printf( "A = [ " );
    for(int z = 0; z < row*col; z++) {
        printf( "%6.2f ", A_h[z] );
    }
    printf( "]\n" );

    int offset = 0; // changed "size" to "offset"
    magma_dsetmatrix( row, col, A_h, row, A_k, offset, row, queue );
    magma_dsetmatrix( row, col, A_h, row, B_k, offset, row, queue );  // added (else B_k is unset)
    magma_dgemm( MagmaNoTrans, MagmaTrans, row, row, col,
                 alpha, A_k, offset, row,
                        A_k, offset, row,
                 beta,  B_k, offset, row, queue );
    magma_dgetmatrix( row, col, B_k, offset, row, B_h, row, queue );

    printf( "B = [ " );
    for(int z = 0; z < row*col; z++) {
        printf( "%6.2f ", B_h[z] );
    }
    printf( "]\n" );

    cl_kernel k_prueba;
    k_prueba = clCreateKernel( program, "dswap", &err );
    exitOnFail( err, "clCreateKernel" );

    // added couple arguments for dswap kernel
    int size = row*col;
    int arg = 0;
    err = clSetKernelArg( k_prueba, arg++, sizeof(size), &size );  exitOnFail( err, "clSetKernelArg size" );
    err = clSetKernelArg( k_prueba, arg++, sizeof(A_k),  &A_k  );  exitOnFail( err, "clSetKernelArg A_k"  );
    err = clSetKernelArg( k_prueba, arg++, sizeof(B_k),  &B_k  );  exitOnFail( err, "clSetKernelArg B_k"  );

    printf( "swap\n" );
    size_t global_prueba = row*col;
    err = clEnqueueNDRangeKernel( queue, k_prueba, 1, NULL, &global_prueba, NULL, 0, NULL, NULL );
    exitOnFail( err, "clEnqueueNDRangeKernel" );

    magma_dgetmatrix( row, col, A_k, offset, row, A_h, row, queue );
    printf( "A = [ " );
    for(int z = 0; z < row*col; z++) {
        printf( "%6.2f ", A_h[z]);
    }
    printf( "]\n" );

    magma_dgetmatrix( row, col, B_k, offset, row, B_h, row, queue );
    printf( "B = [ " );
    for(int z = 0; z < row*col; z++) {
        printf( "%6.2f ", B_h[z]);
    }
    printf( "]\n" );
    
    
    
    // ----- added
    magma_finalize();
    return 0;
}
Compile and link with clMAGMA. Here's an example on MacOS using the OpenCL framework. Adjust the libraries as needed to link with your OpenCL library.

Code: Select all

prompt>  g++ -Wall -std=c++11 -I include -o swap swap.cpp -L/path/to/clmagma/lib -lclmagma -L/usr/local/clBLAS-2.4/lib -lclBLAS -framework OpenCL

prompt> ./swap
A = [   1.00   2.00   3.00   4.00   5.00   6.00   7.00   8.00   9.00 ]
B = [ 200.00 238.00 276.00 242.00 289.00 336.00 284.00 340.00 396.00 ]
swap
A = [ 200.00 238.00 276.00 242.00 289.00 336.00 284.00 340.00 396.00 ]
B = [   1.00   2.00   3.00   4.00   5.00   6.00   7.00   8.00   9.00 ]
-mark

jmrnavarro
Posts: 3
Joined: Thu May 12, 2016 5:22 am

Re: Implementing Opencl kernel using clmagma arrays

Post by jmrnavarro » Sat May 14, 2016 8:16 am

Hello
Awesome, really appreciated, thanks! this is working now :D
best regards, Miguel

Post Reply