PPT Slide
subroutine kji ( A, ii, jj, lda, B, kk, ldb, C, ldc )
double precision A( lda, *), B(ldb, *), C(ldc, *)
integer i, j, k
do k = 1, kk
do j = 1, jj
do i = 1, ii
A(i,j) = A(i,j) +B(i,k) * C(k,j)
enddo
enddo
enddo
return
enddo
However, this is not the best optimization technique. Performance can be improved further by blocking and unrolling the loops. The first optimization will demonstrate the effect of loop unrolling. In the instructions, you will be asked to add code to unroll the j, k, and i loops by two, so that you have, for example, do j = 1, jj, 2, and add code to compensate for all the loops that you are skipping, for example, A(i,j) = A(i,j) + B(i,k) *C(k,j) + B(i,k+1) * C(k+1, j). Think of multiplying a 2x2 matrix to figure out the unrolling.