Experiences in autotuning matrix multiplication for energy minimization on GPUs