PPT Slide
Some improvements to the simple loops approach to matrix multiplication which are implemented by dgemm include loop unrolling ( some of the innermost loops are expanded so that not so many branch instructions are necessary), and blocking ( data is used as much as possible while it is in cache). These methods will be explored later in Exercise 2.