Page 1 of 3
MAGMA GEMM Sources for Fermi Released
Posted: Wed Aug 04, 2010 12:54 pm
by admin
The MAGMA BLAS SGEMM and DGEMM sources for Fermi GPUs are now released.
These improved GEMMs, developed by Rajib Nath and Stan Tomov, will be
part of the up-coming MAGMA 0.3 library release and will be included in
CUBLAS 3.2 as well.
The basic algorithm is described in:
Nath, R., Tomov, S., Dongarra, J. "An Improved MAGMA GEMM for Fermi GPUs,"
University of Tennessee Computer Science Technical Report, UT-CS-10-655
(also LAPACK working note 227), July 29, 2010.
http://icl.cs.utk.edu/projectsfiles/mag ... i_gemm.pdf
On a C2050 GPU the new DGEMM gets up to 300 GFlop/s (58% of peak) and
the SGEMM up to 645 (63% of peak). On a GTX480 DGEMM gets up to 166 GFlop/s
and SGEMM up to 844 GFlop/s.
Re: MAGMA GEMM Sources for Fermi Released
Posted: Thu Aug 05, 2010 10:04 am
by mbibby
When will we see the cgemm and zgemm equivalents?
Malcolm
Re: MAGMA GEMM Sources for Fermi Released
Posted: Thu Aug 05, 2010 12:32 pm
by Stan Tomov
I am not sure if we would personally write the equivalents. NVIDIA is preparing CUBLAS 3.2
that will have improved c/z gemms using ideas from the s/d gemms.
Stan
Re: MAGMA GEMM Sources for Fermi Released
Posted: Fri Aug 13, 2010 2:26 am
by Boxed Cylon
I preface this post with the declaration that I know just about nothing about details of these routines...
I was looking through the fermi_sgemm.cu routine to get some sense of how the code was engineered. I noticed the __mul24 function, and wondered what it did. A google search turned up the Fermi Tuning Guide with:
Code: Select all
32-Bit Integer Multiplication
On devices of compute capability 1.x, 32-bit integer multiplication is implemented using multiple instructions as it is not natively supported. 24-bit integer multiplication is natively supported via the __[u]mul24 intrinsic.
On devices of compute capability 2.0, however, 32-bit integer multiplication is natively supported, but 24-bit integer multiplication is not. __[u]mul24 is therefore implemented using multiple instructions and should not be used (Section 5.4.1).
Should the fermi_sgemm.cu routine be using __mul24? (Or perhaps there are reasons 24-bit integers are employed?)
Re: MAGMA GEMM Sources for Fermi Released
Posted: Tue Sep 07, 2010 1:15 pm
by Stan Tomov
There is no reason to use __mul24. We will remove it. Thanks for pointing this out.
Re: MAGMA GEMM Sources for Fermi Released
Posted: Sun Sep 12, 2010 11:51 pm
by Allan Menezes
Dear Stan,
As this is just pointer arithmetic and used in only a few places it does not change the perfomance much at all as per my experiment below.
Just for fun I changed fermi_dgemm.cu and fermi_sgemm.cu with a single #define on top as #define __mul24(a,b) ((a)*(b)) and there was no significant difference in Gflops and err was still 0.00 on a GTX-480.
The device memory still on available fermi devices is < 4GB and is going to change in the future with the Tesla C2070 and CUDA 3.2 to 64 bit addresses.
Thank you,
Allan
Re: MAGMA GEMM Sources for Fermi Released
Posted: Tue Nov 30, 2010 5:05 pm
by rramachand21
Hello,
I am new to cuda and this api. Could I please get the source code for matrix vector multiplication (sgemv and dgemv) which is generic.
Thanks,
Ranjith
Re: MAGMA GEMM Sources for Fermi Released
Posted: Fri Mar 16, 2018 5:13 pm
by anikam
Hello,
Why does Magmablas only works when m,n,k are multiple of 96?
Can it work if m,n,k are not multiple of 96?
Thanks and Regards
Abhishek Nikam
Re: MAGMA GEMM Sources for Fermi Released
Posted: Fri Mar 16, 2018 5:17 pm
by mgates3
It should work for any m, n, k, not just multiples of 96. If you are having a problem with other sizes, please post specifics, e.g., the output of magma/testing/testing_dgemm.
(There may be problems for very large matrices, due to exceeding GPU texture memory. As I recall, in these cases we just call cublas.)
-mark
Re: MAGMA GEMM Sources for Fermi Released
Posted: Fri Mar 16, 2018 5:36 pm
by anikam
Hello,
Thanks for the reply, for my work I need to only use the open source Magma Blas_Gemm.
Also, it calls cublasgemm if the dimensions are not multiple of 96.
It does not specify any particular warning about large sizes (large sizes with dimensions multiples of 96 must work).
Also, would the magma blas gemm work for dimensions are not multiple of 96 but are pretty small sizes.
Are there any specific changes which need to be done for that?
Also it does not work with latest Cuda versions, Is there any way with which I can make it run with latest Cuda versions?
Thanks and Regards
Abhishek NIkam