Page 1 of 1
why not do all the work on GPU?
Posted: Mon Jan 17, 2011 4:13 am
by rych
I'm looking at magma_dgeqrf_gpu (dgeqrf_gpu.cpp), for example, and see host allocations and data exchange between CPU and GPU. Can't we allocate all the memory, including input, output, and work, and call a sequence of GPU kernels to operate on it? Without calling CPU LAPACK functions?
Igor
Re: why not do all the work on GPU?
Posted: Fri Jan 21, 2011 2:37 am
by Stan Tomov
This is possible and people have tried it but it is in general slower. The computations/tasks that are offloaded to the CPU are small and can not be executed efficiently in parallel on the GPU. These small tasks can be offloaded to the CPU and overlapped with more efficient work (e.g., Level 3 BLAS) on the GPU. What happens is that asymptotically, for large matrices, the execution of the small tasks on the CPU get totally overlapped by work on the GPU, and as a result the overall algorithm runs with the speed that one can execute the Level 3 BLAS on the GPU.
Re: why not do all the work on GPU?
Posted: Thu Feb 17, 2011 1:31 am
by lucky0002
Hello , everyone
I am a student, and new to GPGPU thing ,
I have one question,
MAGMA is on CUDA , and I have heard one other name ViennaCL ,
so Which one is better in terms of performance/Speed ..?