MAGMA version 0.1 for 32 and 64-bit Linux is now available...
See Software section for download link:
http://icl.cs.utk.edu/magma/software/
For more information visit the MAGMA web site:
http://icl.cs.utk.edu/magma/
Please use this forum for questions and comments in regards to MAGMA.
Best regards,
MAGMA version 0.1 Released
Re: MAGMA version 0.1 Released
As you undoubtedly know, much scientific/technical work is done with complex quantities. Hence, the complex versions of the codes that you have just released will be very welcome
Thanks for the good work.
Malcolm
Thanks for the good work.
Malcolm
Re: MAGMA version 0.1 Released
Thanks for bringing this up. Complex versions are high in our priority to add. Actually we have them implemented
on the "high" level of the other versions (we generate the different precision almost automatically) but we don't
have yet the complex CUDA BLAS that is needed, e.g. complex versions of syrk, trmm, trsm. We are checking
with NVIDIA on this, and are considering a MAGMA implementation as well.
Does anybody in the community already have and may be willing to contribute these routines
to the MAGMA project?
Regards,
Stan Tomov
on the "high" level of the other versions (we generate the different precision almost automatically) but we don't
have yet the complex CUDA BLAS that is needed, e.g. complex versions of syrk, trmm, trsm. We are checking
with NVIDIA on this, and are considering a MAGMA implementation as well.
Does anybody in the community already have and may be willing to contribute these routines
to the MAGMA project?
Regards,
Stan Tomov
Re: MAGMA version 0.1 Released
Thank you for the hard work on this library.
I've just built MAGMA and run some tests on GTX 285. Undoubtedly, functions referred to the GPU interface offer better GPU's performance than the same ones in the CPU interface's case (e.g.: sgetrf and sgetrf_gpu). Is that because the time to exchange data between CPU's memory and GPU's memory in the CPU's interface is bigger than in the GPU's interface?
Also, when handling computing on CPU and GPU at the same time by using MAGMA, can you explain how you divide the data to handle on them?
Thanks
Nguyen
For reference, below is my test result of sgetrf and sgetrf_gpu:
I've just built MAGMA and run some tests on GTX 285. Undoubtedly, functions referred to the GPU interface offer better GPU's performance than the same ones in the CPU interface's case (e.g.: sgetrf and sgetrf_gpu). Is that because the time to exchange data between CPU's memory and GPU's memory in the CPU's interface is bigger than in the GPU's interface?
Also, when handling computing on CPU and GPU at the same time by using MAGMA, can you explain how you divide the data to handle on them?
Thanks
Nguyen
For reference, below is my test result of sgetrf and sgetrf_gpu:
Code: Select all
./testing_sgetrf
device 0: GeForce GTX 285, 1476.0 MHz clock, 1023.3 MB memory
device 1: GeForce GTX 285, 1476.0 MHz clock, 1023.8 MB memory
Usage:
testing_sgetrf -N 1024
N CPU GFlop/s GPU GFlop/s ||PA-LU|| / (||A||*N)
==========================================================
1024 13.89 43.79 2.049213e-09
2048 26.16 101.55 1.924833e-09
3072 30.91 158.78 1.918028e-09
4032 37.87 210.94 1.860973e-09
5184 42.00 240.96 1.840821e-09
6016 45.24 261.49 1.836232e-09
7040 47.89 280.14 1.826794e-09
8064 50.43 294.79 1.931242e-09
9088 52.48 305.72 2.152791e-09
10112 54.43 315.43 2.341549e-09
./testing_sgetrf_gpu
device 0: GeForce GTX 285, 1476.0 MHz clock, 1023.3 MB memory
device 1: GeForce GTX 285, 1476.0 MHz clock, 1023.8 MB memory
Usage:
testing_sgetrf_gpu -N 1024
N CPU GFlop/s GPU GFlop/s ||PA-LU|| / (||A||*N)
==========================================================
1024 13.31 48.41 2.049213e-09
2048 25.94 114.76 1.924833e-09
3072 30.91 179.62 1.918028e-09
4032 38.08 233.55 1.860973e-09
5184 42.56 270.25 1.840821e-09
6016 45.39 290.48 1.836232e-09
7040 48.21 306.81 1.826794e-09
8064 50.67 319.91 1.931242e-09
9088 52.80 330.09 2.152791e-09
10112 54.64 338.68 2.341549e-09
-
Stan Tomov
- Posts: 283
- Joined: Fri Aug 21, 2009 10:39 pm
Re: MAGMA version 0.1 Released
Thanks for trying out MAGMA and the input. The GTX 285 results look impressive!
Tomov, S., Dongarra, J., Baboulin, M. Towards Dense Linear Algebra for Hybrid GPU Accelerated Manycore Systems, LAPACK Working Note 210, October 17, 2008.
for the one-sided factorizations and in
Tomov, S., Dongarra, J. Accelerating the reduction to upper Hessenberg form through hybrid GPU-based computing, LAPACK Working Note 219, May 24, 2009.
for the two-sided.
Regards,
Stan Tomov
Briefly, yes. As most of the computation is done on the GPU, to minimize the communications, the matrix to be factored has to mostly reside on the GPU memory. In the CPU interface the matrix starts from the CPU and the result is expected to be on the CPU, so an overhead of copying the original matrix to the GPU and bringing the result back to the CPU is to be expected. For some algorithms, like QR for example, we can better intermix computation and communication and hide some of this overhead.the GPU interface offer better GPU's performance than the same ones in the CPU interface's case (e.g.: sgetrf and sgetrf_gpu). Is that because the time to exchange data between CPU's memory and GPU's memory in the CPU's interface is bigger than in the GPU's interface?
There are of course variations for the different algorithms, but in general, if we look at Figure 4 and the notations there, the panel A1 has to be factored and A2 updated. For the one-sided factorizations that are currently in MAGMA no more data than A1 is needed in order to factor it, so A1 is sent to the CPU and factored there. This is overlapped with updating A2 (from previous iterations) on the GPU. More on this can be found inAlso, when handling computing on CPU and GPU at the same time by using MAGMA, can you explain how you divide the data to handle on them?
Tomov, S., Dongarra, J., Baboulin, M. Towards Dense Linear Algebra for Hybrid GPU Accelerated Manycore Systems, LAPACK Working Note 210, October 17, 2008.
for the one-sided factorizations and in
Tomov, S., Dongarra, J. Accelerating the reduction to upper Hessenberg form through hybrid GPU-based computing, LAPACK Working Note 219, May 24, 2009.
for the two-sided.
Regards,
Stan Tomov
Re: MAGMA version 0.1 Released
Thank you very much for your detailed explanation. This would help me a lot.
Regards,
Nguyen
Regards,
Nguyen