Hi Stan,
Do you have the 1000 RHSs at once or you get them (and solve) one by one in some iterative process.
It's a iterative process, in other words, for Imax = 1000 & N=10112 => the global time is 1000*184.706(ms) = 184s
But in reality, I have to increase accuracy of my calculs, I have to maximize the size of A. And actualy, I don't know the max size of A and so of the system (respecting nhrs=1) I can allocate on a GTX 295 with almost 1792MB GDDR3???
If you have your RHSs on the CPU you can do the slaswp at once on the CPU and send the data only once to the GPU for the triangular solves.
As I said, I have to compute juste one time the LU factorization and iterate N times the linear system solving. So, I can do juste one time slaswp after the LU factorization.
I benchmark slaswp CPU routine with data copie.
Code: Select all
start = get_current_time();
cublasGetMatrix( N, NRHS, sizeof(float), B,N ,h_work_M_S, N);
int k1 = 1 ;
int k2 = N;
int k3 = 1;
slaswp_(&NRHS, h_work_M_S, &LDB, &k1, &k2, IPIV, &k3);
cublasSetMatrix( N, NRHS, sizeof(float), h_work_M_S, N, B, N);
end = get_current_time();
N GPU GFlop/s time(ms)
========================================================
1024 16695.93 0.043000
2048 95583.53 0.060000
3072 268697.59 0.072000
4032 546642.44 0.080000
5184 978208.38 0.095000
6016 1344698.75 0.108000
7040 1972103.62 0.118000
8064 2649402.25 0.132000
9088 3405177.00 0.147000
10112 1318399.62 0.523000
So, this has no influence on global speed of routine: N=1024, 0,043ms among 4.059ms.
And my last question, Can you explain me what is the hwork array ?
I konw only that:
HWORK (workspace) REAL array, dimension N*NRHS[/code] from MAGMA guide
PS:
Performance of SGETRS_GPU with NRHS = 1:
Code: Select all
N GPU GFlop/s || b-Ax || / ||A|| Time (ms)
========================================================
1024 176.87 2.513783e-07 4.059000
2048 517.23 2.111665e-06 11.088000
3072 911.57 5.364181e-06 21.223000
4032 1282.67 6.626522e-07 34.094000
5184 1748.31 6.161365e-07 53.154000
6016 2092.16 2.017927e-06 69.415000
7040 2494.92 1.475847e-06 93.273000
8064 2919.91 3.561859e-06 119.771000
9088 3324.24 1.063919e-06 150.579000
10112 3733.08 8.097373e-07 184.706000