Outer Loop Unrolling
Tuned
Each multiply requires 5/4 loads and one store.
do i = 1, lda, 4
do j = 1, ldb
A(i,j) = B(i,j) * C(j) + D(j)
A(i+1,j) = B(i+1,j) * C(j) + D(j)
A(i+2,j) = B(i+2,j) * C(j) + D(j)
A(i+3,j) = B(i+3,j) * C(j) + D(j)
enddo
enddo
Previous slide
Next slide
Back to first slide
View graphic version