From CUDA to OpenCL: Towards a Performance-portable Solution for Multi-platform GPU Programming