Page 1 of 1

sgetrf_gpu calculate partial on cpu! why?

Posted: Wed Aug 03, 2011 7:01 am
by tomac
I've noticed that the factorization (sgetrf_gpu) always does some smal computation on the CPU. why is that? the synchronization, get-setMatrix and re-start of the kernel takes a lot of time.
is there the posibility to turn it off, or do this part also on gpu?

Edit: ahhhh finally found your Paper "Dense Linear Algebra Solvers for..."
Now I think i understand, it´s necessary for pivoting?! is it possible to deaktivate pivote, idont need it on an Laplace.
and why is the cpu doing the factorization for panels

Thanks tomac

Re: sgetrf_gpu calculate partial on cpu! why?

Posted: Thu Aug 11, 2011 4:45 pm
by Stan Tomov
Hi Tomac,
It is possible to deactivate the pivoting and the code would be faster. We will add it to the release.
In general the panels are difficult to parallelize and would not run on the GPU as efficiently as Level 3 BLAS. Therefore we schedule/execute them on the CPU. We manage to overlap (for N big enough) the CPU work with updates on the GPU. As a result, the algorithm runs as fast as fast we can do the Level 3 BLAS (needed for the algorithm) on the GPU.
Stan

Re: sgetrf_gpu calculate partial on cpu! why?

Posted: Tue Aug 16, 2011 11:15 am
by tomac
Hi Stan, first thanks for reply, its really helpful for understanding.
When is the next release day.
Is it possible to get an prepatch or trunk version or something like this.
Thanks again
Tomac

Re: sgetrf_gpu calculate partial on cpu! why?

Posted: Fri Sep 09, 2011 7:32 am
by tomac
Thanks for the nopiv versions

nevertheless Fortran beats GPU

I have done some tests with sgetrf on cpu and gpu
First i thought the gpu results are bad but then I calculate the flops and recognized i got impossible results for cpu variant.
My question is, how is that possible. The result matrices are correct and i take the time at the right place.
5046x5046 i got factorization on cpu with fortran sgetrf = 0.25 s ~ 514 Gflops
omg I have a little supercomputer under my desk.
What is going on? Is the dense Matrix transformed to an sparse? I have mainly zero entries in my matrix, because i convertet a sparse into a dense.
I substract 1 from every entrie in my matrix, and see the results are possible.
factorization on cpu with fortran sgetrf = 18.79 s  ~ 6,8 Gflops
That means Fortran really recognize zeros, but the gpu version doesn't.