Multiple queues and sgemv_batched
Posted: Wed Jul 05, 2017 3:52 pm
Hello
I have a problem where i have to call 9 different sgemv_batched calls, on completely different data, save for the batch of A arrays, which is really the same matrix over and over. So i thought i could parallelize the bunch by creating 9 different queues and assigning each queue to one batched sgemv. However, the total time is still the sum of the times of each batch. Im using magma_v2, and declare the queues like so.
int device = 0;
magma_queue_t queue;
magma_queue_create(device, &queue);
So my question is : Is it impossible to cast all those batched sgemvs simulaneously, because of the function or something else I am unaware of, or am I making a mistake in my execution, (in which case i shall post my full code) ?
(Btw I built MAGMA with sequential mkl, not sure if that has anything to do with it)
A matrix is the same 128x128 matrix , the x and y are vectors of 128 components, and the batchCount is around 16000
Also, a slightly different question- do 3-4 milliseconds sound ok for each batch, on a GTX 970?
Any help would be greatly appreciated
Cheers
I have a problem where i have to call 9 different sgemv_batched calls, on completely different data, save for the batch of A arrays, which is really the same matrix over and over. So i thought i could parallelize the bunch by creating 9 different queues and assigning each queue to one batched sgemv. However, the total time is still the sum of the times of each batch. Im using magma_v2, and declare the queues like so.
int device = 0;
magma_queue_t queue;
magma_queue_create(device, &queue);
So my question is : Is it impossible to cast all those batched sgemvs simulaneously, because of the function or something else I am unaware of, or am I making a mistake in my execution, (in which case i shall post my full code) ?
(Btw I built MAGMA with sequential mkl, not sure if that has anything to do with it)
A matrix is the same 128x128 matrix , the x and y are vectors of 128 components, and the batchCount is around 16000
Also, a slightly different question- do 3-4 milliseconds sound ok for each batch, on a GTX 970?
Any help would be greatly appreciated
Cheers