MAGMA GEMM Sources for Fermi Released
MAGMA GEMM Sources for Fermi Released
The MAGMA BLAS SGEMM and DGEMM sources for Fermi GPUs are now released.
These improved GEMMs, developed by Rajib Nath and Stan Tomov, will be
part of the up-coming MAGMA 0.3 library release and will be included in
CUBLAS 3.2 as well.
The basic algorithm is described in:
Nath, R., Tomov, S., Dongarra, J. "An Improved MAGMA GEMM for Fermi GPUs,"
University of Tennessee Computer Science Technical Report, UT-CS-10-655
(also LAPACK working note 227), July 29, 2010.
http://icl.cs.utk.edu/projectsfiles/mag ... i_gemm.pdf
On a C2050 GPU the new DGEMM gets up to 300 GFlop/s (58% of peak) and
the SGEMM up to 645 (63% of peak). On a GTX480 DGEMM gets up to 166 GFlop/s
and SGEMM up to 844 GFlop/s.
These improved GEMMs, developed by Rajib Nath and Stan Tomov, will be
part of the up-coming MAGMA 0.3 library release and will be included in
CUBLAS 3.2 as well.
The basic algorithm is described in:
Nath, R., Tomov, S., Dongarra, J. "An Improved MAGMA GEMM for Fermi GPUs,"
University of Tennessee Computer Science Technical Report, UT-CS-10-655
(also LAPACK working note 227), July 29, 2010.
http://icl.cs.utk.edu/projectsfiles/mag ... i_gemm.pdf
On a C2050 GPU the new DGEMM gets up to 300 GFlop/s (58% of peak) and
the SGEMM up to 645 (63% of peak). On a GTX480 DGEMM gets up to 166 GFlop/s
and SGEMM up to 844 GFlop/s.
- Attachments
-
- magmablas_gemm_fermi.tar.gz
- (9.95 KiB) Downloaded 792 times
Re: MAGMA GEMM Sources for Fermi Released
When will we see the cgemm and zgemm equivalents?
Malcolm
Malcolm
-
Stan Tomov
- Posts: 283
- Joined: Fri Aug 21, 2009 10:39 pm
Re: MAGMA GEMM Sources for Fermi Released
I am not sure if we would personally write the equivalents. NVIDIA is preparing CUBLAS 3.2
that will have improved c/z gemms using ideas from the s/d gemms.
Stan
that will have improved c/z gemms using ideas from the s/d gemms.
Stan
-
Boxed Cylon
- Posts: 36
- Joined: Sat Nov 21, 2009 6:03 pm
Re: MAGMA GEMM Sources for Fermi Released
I preface this post with the declaration that I know just about nothing about details of these routines...
I was looking through the fermi_sgemm.cu routine to get some sense of how the code was engineered. I noticed the __mul24 function, and wondered what it did. A google search turned up the Fermi Tuning Guide with:
Should the fermi_sgemm.cu routine be using __mul24? (Or perhaps there are reasons 24-bit integers are employed?)
I was looking through the fermi_sgemm.cu routine to get some sense of how the code was engineered. I noticed the __mul24 function, and wondered what it did. A google search turned up the Fermi Tuning Guide with:
Code: Select all
32-Bit Integer Multiplication
On devices of compute capability 1.x, 32-bit integer multiplication is implemented using multiple instructions as it is not natively supported. 24-bit integer multiplication is natively supported via the __[u]mul24 intrinsic.
On devices of compute capability 2.0, however, 32-bit integer multiplication is natively supported, but 24-bit integer multiplication is not. __[u]mul24 is therefore implemented using multiple instructions and should not be used (Section 5.4.1).-
Stan Tomov
- Posts: 283
- Joined: Fri Aug 21, 2009 10:39 pm
Re: MAGMA GEMM Sources for Fermi Released
There is no reason to use __mul24. We will remove it. Thanks for pointing this out.
-
Allan Menezes
- Posts: 14
- Joined: Wed Aug 05, 2009 10:01 pm
Re: MAGMA GEMM Sources for Fermi Released
Dear Stan,
As this is just pointer arithmetic and used in only a few places it does not change the perfomance much at all as per my experiment below.
Just for fun I changed fermi_dgemm.cu and fermi_sgemm.cu with a single #define on top as #define __mul24(a,b) ((a)*(b)) and there was no significant difference in Gflops and err was still 0.00 on a GTX-480.
The device memory still on available fermi devices is < 4GB and is going to change in the future with the Tesla C2070 and CUDA 3.2 to 64 bit addresses.
Thank you,
Allan
As this is just pointer arithmetic and used in only a few places it does not change the perfomance much at all as per my experiment below.
Just for fun I changed fermi_dgemm.cu and fermi_sgemm.cu with a single #define on top as #define __mul24(a,b) ((a)*(b)) and there was no significant difference in Gflops and err was still 0.00 on a GTX-480.
The device memory still on available fermi devices is < 4GB and is going to change in the future with the Tesla C2070 and CUDA 3.2 to 64 bit addresses.
Thank you,
Allan
-
rramachand21
- Posts: 2
- Joined: Tue Nov 30, 2010 5:02 pm
Re: MAGMA GEMM Sources for Fermi Released
Hello,
I am new to cuda and this api. Could I please get the source code for matrix vector multiplication (sgemv and dgemv) which is generic.
Thanks,
Ranjith
I am new to cuda and this api. Could I please get the source code for matrix vector multiplication (sgemv and dgemv) which is generic.
Thanks,
Ranjith
Re: MAGMA GEMM Sources for Fermi Released
Hello,
Why does Magmablas only works when m,n,k are multiple of 96?
Can it work if m,n,k are not multiple of 96?
Thanks and Regards
Abhishek Nikam
Why does Magmablas only works when m,n,k are multiple of 96?
Can it work if m,n,k are not multiple of 96?
Thanks and Regards
Abhishek Nikam
Re: MAGMA GEMM Sources for Fermi Released
It should work for any m, n, k, not just multiples of 96. If you are having a problem with other sizes, please post specifics, e.g., the output of magma/testing/testing_dgemm.
(There may be problems for very large matrices, due to exceeding GPU texture memory. As I recall, in these cases we just call cublas.)
-mark
(There may be problems for very large matrices, due to exceeding GPU texture memory. As I recall, in these cases we just call cublas.)
-mark
Re: MAGMA GEMM Sources for Fermi Released
Hello,
Thanks for the reply, for my work I need to only use the open source Magma Blas_Gemm.
Also, it calls cublasgemm if the dimensions are not multiple of 96.
It does not specify any particular warning about large sizes (large sizes with dimensions multiples of 96 must work).
Also, would the magma blas gemm work for dimensions are not multiple of 96 but are pretty small sizes.
Are there any specific changes which need to be done for that?
Also it does not work with latest Cuda versions, Is there any way with which I can make it run with latest Cuda versions?
Thanks and Regards
Abhishek NIkam
Thanks for the reply, for my work I need to only use the open source Magma Blas_Gemm.
Also, it calls cublasgemm if the dimensions are not multiple of 96.
It does not specify any particular warning about large sizes (large sizes with dimensions multiples of 96 must work).
Also, would the magma blas gemm work for dimensions are not multiple of 96 but are pretty small sizes.
Are there any specific changes which need to be done for that?
Also it does not work with latest Cuda versions, Is there any way with which I can make it run with latest Cuda versions?
Thanks and Regards
Abhishek NIkam