Hi All,
I'm going to use magma in my research and have a question which me and google cannot to answer for.
Consider I have no external GPU, but have good CPU (Intel Skylake).
What have better performance: Magma/lapack with CPU or clMagma with opencl CPU?
For instance, for Matrix*Matrix multiplication task.
I suppose the answer is clMagma, but I cannot google any information how to compile clMagma for Intel opencl SDK for CPU.
I am first who try to use clMagma for Intel opencl CPU?))
Thanks, Sergey
CPU: clMagma vs Magma/Lapack
CPU: clMagma vs Magma/Lapack
Last edited by fsergeyal on Tue Aug 23, 2016 12:50 am, edited 1 time in total.
Re: CPU: clMagma vs Magma/Lapack
With no GPU, you can just use LAPACK. Vendor and open source libraries (such as MKL, ACML, OpenBLAS) include both LAPACK and BLAS, and are optimized for CPUs.
MAGMA and clMAGMA are both designed for use with an added GPU, where the CPU BLAS routines do not operate. Theoretically, it should be possible to treat the CPU as an OpenCL device and run clBLAS on the CPU, but it wouldn't be as optimized as the vendor BLAS, and would incur extra data copies (between the "host" CPU and the OpenCL "device" CPU).
-mark
MAGMA and clMAGMA are both designed for use with an added GPU, where the CPU BLAS routines do not operate. Theoretically, it should be possible to treat the CPU as an OpenCL device and run clBLAS on the CPU, but it wouldn't be as optimized as the vendor BLAS, and would incur extra data copies (between the "host" CPU and the OpenCL "device" CPU).
-mark
Re: CPU: clMagma vs Magma/Lapack
Thanks, Mark.mgates3 wrote:With no GPU, you can just use LAPACK. Vendor and open source libraries (such as MKL, ACML, OpenBLAS) include both LAPACK and BLAS, and are optimized for CPUs.
MAGMA and clMAGMA are both designed for use with an added GPU, where the CPU BLAS routines do not operate. Theoretically, it should be possible to treat the CPU as an OpenCL device and run clBLAS on the CPU, but it wouldn't be as optimized as the vendor BLAS, and would incur extra data copies (between the "host" CPU and the OpenCL "device" CPU).
-mark
As I know, LAPACK does not use such CPU features as SSE, AVX, AVX2, FMA.
Also, data (between the "host" CPU and the OpenCL "device") is copied from DDR4 to same DDR4 and by bulk and not byte-by-byte - it should be very fast.
So, we have two scalepans here and I suppose clMagma on opencl CPU would be faster when you multiplying thousands 4000*4000 matrices.
Sergey
Re: CPU: clMagma vs Magma/Lapack
Well, it seems I was wrong
Intel MKL supports AVX2
https://software.intel.com/en-us/articl ... intel-avx2
So, I have no questions now)
Intel MKL supports AVX2
https://software.intel.com/en-us/articl ... intel-avx2
So, I have no questions now)
Re: CPU: clMagma vs Magma/Lapack
Yes, all the modern BLAS libraries (MKL, ACML, OpenBLAS, ATLAS) will use as much SSE/AVX/etc. as they can. LAPACK itself doesn't have explicit SSE/AVX/etc. calls, but relies on the optimized BLAS for the bulk of its computation.
All the above libraries are freely available. MKL now has a community license.
https://software.intel.com/en-us/articles/free-mkl
-mark
All the above libraries are freely available. MKL now has a community license.
https://software.intel.com/en-us/articles/free-mkl
-mark