Hi all,
I was just wondering if anyone can comment on the efficiency of using dgemm to perform a matrix-vector multiply (for large matrices/vectors....dimensions > 2000) over the traditional dgemv routine i.e. set the #columns to 1 of matrix B (the vector) and perform a matrix-multiply with dgemm.
The reason I ask is that I use Intel's MKL library which has multithreaded Level 3 Blas and therefore I can employ multiple threads to carry out the matrix-vector multiply when I use a coerced dgemm routine as opposed to the more specific dgemv.
In general I was just wondering if anyone knew the performance penalties of dgemm over dgemv for a matrix-vector multiply with the column size set to 1, for instances in which the MKL library is not available i.e. when porting to another architecture.
Comments gratefully received.
Thanks.

