cblas_gemv / cblas_gemm much faster with vector of 0s

Posted:
Mon Jun 27, 2016 1:11 pm
by kangshiyin
I saw a question on stackoverflow.
http://stackoverflow.com/questions/3804 ... rix-valuesIt shows that in a vector-matrix multiplication, gemv/gemm runs much faster when the vector contains a lot of 0s, and the result is correct.
Until now I can only reproduce this with the package libblas-dev on Ubuntu 14.04.
Could you help explain why?
Re: cblas_gemv / cblas_gemm much faster with vector of 0s

Posted:
Tue Jun 28, 2016 1:29 am
by Julien Langou
Hi Usman,
I have no idea. You multiply a 1-by-m vector A by the m-by-m identity B and you want to compute C = AB. (So you will get C=A. Fine.)
Then when A is a sparse-vector, you observe that the code runs (significantly) faster than when A is a dense vector.
OK. No idea why on my end.
Cheers,
Julien.
Re: cblas_gemv / cblas_gemm much faster with vector of 0s

Posted:
Tue Jun 28, 2016 2:54 am
by kangshiyin
Thank you for the reply. An answer to the original question confirms that such optimization exists in that particular version of BLAS.
Re: cblas_gemv / cblas_gemm much faster with vector of 0s

Posted:
Thu Jul 14, 2016 1:23 am
by ChrisBragg
When you optimize gemv and gemm different techniques apply this
* For the matrix-matrix operation you are using blocked algorithms. Block sizes depend on cache sizes.
* For optimising the matrix-vector product you use so called fused Level 1 operations (e.g. fused dot-products or fused axpy).