Page 1 of 1
Fortran Subarrays on GPU in RC4
Posted: Sun Mar 13, 2011 6:50 am
by fletchjp
I have some FORTRAN code which I am porting to work with MAGMA. I am using magma_dgetrf_gpu and magma_dgetrs_gpu which work fine. I am now working on speeding up the building of the matrix.
I am calculating some values on the GPU which form one row of the matrix. At the moment I copy them back a row at a time to the matrix on the CPU and then copy the whole matrix back to the GPU. This is clearly wasteful:
The device pointers are defined as in testing_dgetrf_gpu_f.f in RC$:
Code: Select all
real, dimension(4) :: devptrA, devptrB
My code to transfer one row looks like this (I am storing the transpose as the elements are then adjacent):
Code: Select all
call cublas_get_matrix(n, 1, size_of_elt, devptrD, n,
$ G(1,jrow),n)
G is an array on the CPU. This is followed later by the following:
Code: Select all
!---- devPtrA = G
call cublas_set_matrix(n, n, size_of_elt, G, ldda, devptrA, ldda)
What I would like to do is something like this:
Code: Select all
call cublas_dcopy(n,devptrD,1,devptrXXX,1)
where devptrXXX needs to point to the correct location in devptrA. I have been looking around for an example of this and cannot find one.
If I can crack this I can save two complete matrix transfers and the memory of the array on the CPU.
It would help to have some explanation for the design decision to change the type of these pointers from RC3 to RC4
Please help if you can.
Thanks
John
Re: Fortran Subarrays on GPU in RC4
Posted: Mon Mar 14, 2011 4:49 am
by fletchjp
I have been working on a solution to my own problem and here is one which works with gfortran (4.4.3) on Ubuntu Linux 10.4 (64 bit).
I could not find a way to add anything to the pointers as defined in the MAGMA examples. So I looked at the code in cuda/src/fortran.c and added a new routine of my own. I spotted that all the interface routines did was to cast the FORTRAN pointer to a C pointer, so I have added the offset at that point. The function name has a trailing underscore for FORTRAN linkage.
Code: Select all
/* dcopy_offset.c
This is a special version of the cublas_dcopy routine with an extra
argument for the offset.
*/
#include <ctype.h>
#include <stdio.h>
#include <string.h>
#include <stddef.h>
#include <stdlib.h>
#if defined(__GNUC__)
#include <stdint.h>
#endif /* __GNUC__ */
#include "cublas.h" /* CUBLAS public header file */
#include "fortran_common.h"
#include "fortran.h"
/* Note _ at end of name for linking with gfortran */
void cublas_dcopy_offset_ (const int *n, const devptr_t *devPtrx, const int *incx,
const devptr_t *devPtry, const int *incy, const int *offset)
{
double *x = (double *)(*devPtrx);
double *y = (double *)(*devPtry+*offset);
cublasDcopy (*n, x, *incx, y, *incy);
}
Compiling requires access to the directories cuda/include and cuda/src:
Code: Select all
gcc -O3 -DADD_ -DGPUSHMEM=200 -I$(CUDA_INCLUDE) -I$(CUDA_SRC) -c dcopy_offset.c
I found this reference:
http://www.gsic.titech.ac.jp/~ccwww/teb ... la5_e.html
which helped with the calculation of the offset. The calculation is implemented as a FORTRAN function.
Code: Select all
INTEGER FUNCTION IDX2F(i,j,ld)
IDX2F = ((((j)-1)*(ld))+((i)-1))
end
The usage is like this:
Code: Select all
call cublas_dcopy_offset(n,devptrD,1,devptrA,1,
& IDX2F(1,jrow,ldda)*size_of_elt)
I have tested this and it works in my environment. The function name on the C function will not need the underscore in some other environments.
I plan to extend this to allow the offset to be on the first parameter, or both.
I hope this helps in the developments.
Please let me have any comments on whether there is a neater way.
John
Re: Fortran Subarrays on GPU in RC4
Posted: Tue Mar 15, 2011 5:08 pm
by mateo70
John,
thanks for this, we were searching how to handle that too. And we were thinking about providing a function to the user to compute the correct pointer on the device, but it will probably work for MAGMA and not for cublas.
Mathieu
Re: Fortran Subarrays on GPU in RC4
Posted: Wed Mar 16, 2011 8:57 am
by fletchjp
Transfers have a big setup time. I have an application where I transfer 6 blocks 1 by N (N=8500 in this case).
I would like to transfer one block 6 by N and set up pointers to 6 contiguous blocks of N. That would save 5 setups per occurence. I have to do this 4250 times, so it is a big saving.
One way to solve this would be to have a FORTRAN subroutine which took a pointer and an offset and gave back a new pointer which was the old pointer plus the offset.
John
Re: Fortran Subarrays on GPU in RC4
Posted: Sat Mar 19, 2011 7:57 pm
by fletchjp
I have now written the following C function to handle pointer arithmetic from MAGMA FORTRAN. The same headers as in my previous posting.
Code: Select all
void devptr_offset_(devptr_t *devPtr2,const devptr_t *devPtr1,const int *offset, const int *size)
{
*devPtr2 = (*devPtr1 + (*offset)*(*size));
}
Usage is like this from a FORTRAN program:
Code: Select all
stat = cublas_alloc(n*2, size_of_elt, devPtrD)
call devptr_offset(devPtrD1,devPtrD,n,size_of_elt)
Then devPtrD points to start of the array and devPtrD1 to the second n.
Note the need to give the size of each element.
I have applied this to my problem with the six vectors which I discussed above and find that the gain in time is smaller than I expected. I needed to put the arrays in a common block on the CPU so that I could control their relative location. Each vector is about 6000 double precision variables.
John
Re: Fortran Subarrays on GPU in RC4
Posted: Mon Mar 21, 2011 12:07 pm
by mateo70
Thanks john,
that's close to what we planned to include in the next release with the fortran interface. I'm just busy with other projects right now, so I don't have a date for this final release.
The prototype we were thinking about is:
Code: Select all
magma_[zcds]offset( NewPtr, OldPtr, LDA, I, J)
Mathieu
Re: Fortran Subarrays on GPU in RC4
Posted: Mon Mar 21, 2011 1:11 pm
by fletchjp
I guess you will use the type letter to deduce the size of the elements.
Could you do a one dimensional version as well?
John
Re: Fortran Subarrays on GPU in RC4
Posted: Wed Apr 06, 2011 2:46 pm
by mateo70
Yes, I will do that.
Hopefully I looked at this post before to do it

.
Mathieu