Hi Adrian.
( Your question is relevant to LAPACK and LINPACK and BLAS and `anything` using matrices with a leading dimension argument. )
If the question is `why should LDA > N?` then the first answer is `because we want to work on submatrices.` So for example if A is a 20-by-20 matrix and we want to work on the submatrix A(3:5,5:10), we will use M=3, N=6, PTR=&(A(3,5)), LDA=20 to describe the submatrix. If this is not clear, please let me know and I can explain more. All this to say that LDA was initially created to handle submatrices. And so the first answer to `why should LDA > N?` is `because we want to work on submatrices.`
Now, `why should sometimes LDA > N for performance reasons?`. Yes this has to do with cache lines. I am not so much an expert of all this, but for example if N=64, it might make sense sometimes for performance reason to initialize A with LDA=65. This has to do with cache lines. The goal is to have the matrix A on as many cache lines as possible; as opposed to having A on just a few. If A is only on a few cache lines then each time you load elements, you are more likely to erase the cache lines over and over again. If someone wants to explain more, please go ahead.
See graph below for ZGGEV. I forgot the architecture but it was a while back. (Like 10 - 15 years ago.) We have LDA=N. You clearly see that when N is a multiple of 2 then time is much larger than when not. A remedy is when N is power of 2, then take LDA=N+1. (This remedy is not shown on the curve.) This curve is an example among many.
Cheers,
Julien

- time for ZGGEV (I forgot the machine architecture)
- Untitled.png (241.49 KiB) Viewed 20050 times