Matrix-Matrix Multiplication - Simple Optimization by Cache Reuse
Purpose: This exercise is intended to show how the reuse of data that has been loaded into cache by some previous instruction can save time and thus increase the performance of your code.
Information: Perform the matrix multiplication A = A + B * C using the code segment below as a template and ordering the ijk loops in to the following orders (ijk, jki, kij, and kji ). In the file matmul.f, one ordering has been provided for you (ijk), as well as a high performance BLAS routine dgemm which does double precision general matrix multiplication. dgemm and other routines can be obtained from Netlib.
The variables in the matmul routine ( reproduced on the next page) are chosen for compatibility with the BLAS routines and have the following meanings: the variables ii, jj, kk, reflect the sizes of the matrix A (ii by jj), B(ii by kk) and C(kk by jj); the variables lda, ldb and ldc are the leading dimensions of each of those matrices and reflect the total size of the allocated matrix, not just the part of the matrix used.