Some questions on testing_sgetrf
Some questions on testing_sgetrf
Hi,
i'm trying Magma, in particolular I have 2 questions on the example testing_sgetrf.
1) I use two different GPU, 8400m GS and 9500GT. The program runs ok on 9500GT but when I try it on 8400m GS I have a segmentation fault (note that I run examples with -N 128 because the small memory of this GPU). In particoular exploring the code and debugging it with some printf I saw that the problem is when I do cudaMallocHost and then I set h_R]. The setting of h_R values produces segmentation fault. Why?
2) When I run the code on 9500 GT now, I have always NaN on the error column of the output (||PA-LU||/||A||*N). In particoular if I explore the code the NaN is in the value of "residual" parameter in "get_LU_error" function. What does it mean?
Thanks a lot for your answers.
Danilo
i'm trying Magma, in particolular I have 2 questions on the example testing_sgetrf.
1) I use two different GPU, 8400m GS and 9500GT. The program runs ok on 9500GT but when I try it on 8400m GS I have a segmentation fault (note that I run examples with -N 128 because the small memory of this GPU). In particoular exploring the code and debugging it with some printf I saw that the problem is when I do cudaMallocHost and then I set h_R]. The setting of h_R values produces segmentation fault. Why?
2) When I run the code on 9500 GT now, I have always NaN on the error column of the output (||PA-LU||/||A||*N). In particoular if I explore the code the NaN is in the value of "residual" parameter in "get_LU_error" function. What does it mean?
Thanks a lot for your answers.
Danilo
Re: Some questions on testing_sgetrf
I add some informations:
I run code on Ubuntu 9 - 32 bit. I use standard lapack and blas downloaded with package manager Synaptic. I have no compilation error and the NaN result I obtain is due to very high value returned from residual parameter in "get_LU_error". If I modify the code in order to display the error of Lapack CPU version of sgetrf I obtain about 10e-10, so standard-CPU sgetrf works fine, I think that there is some problem in the results of magma_sgetrf. Do you know possible causes of the problem? Thanks.
I run code on Ubuntu 9 - 32 bit. I use standard lapack and blas downloaded with package manager Synaptic. I have no compilation error and the NaN result I obtain is due to very high value returned from residual parameter in "get_LU_error". If I modify the code in order to display the error of Lapack CPU version of sgetrf I obtain about 10e-10, so standard-CPU sgetrf works fine, I think that there is some problem in the results of magma_sgetrf. Do you know possible causes of the problem? Thanks.
-
Stan Tomov
- Posts: 283
- Joined: Fri Aug 21, 2009 10:39 pm
Re: Some questions on testing_sgetrf
Hello,
A user had a similar problem before and in that case updating the driver fixed it. You can run an older cuda on a new driver ( for example CUDA 2.1 on 190 driver) but not vice-versa. For example
CUDA 2.3 requires 190.xx
CUDA 2.2 requires 185.xx
CUDA 2.1 requires 180.xx
You can check your driver with
The above is the result on my system and it tells me that the driver is 190.18. On my system I have CUDA 2.3 so the combination is fine.
What driver and CUDA do you have? Also, did you take the 32-bit version of MAGMA? In addition, if you run the LU on size <= 128 we call directly the LAPACK implementation and the GPU is not used (i.e. MAGMA is more like a wrapper in that case to call the LAPACK+BLAS combination that is on the system).
Stan
A user had a similar problem before and in that case updating the driver fixed it. You can run an older cuda on a new driver ( for example CUDA 2.1 on 190 driver) but not vice-versa. For example
CUDA 2.3 requires 190.xx
CUDA 2.2 requires 185.xx
CUDA 2.1 requires 180.xx
You can check your driver with
Code: Select all
> cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 190.18 Wed Jul 22 15:36:09 PDT 2009
GCC version: gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)
What driver and CUDA do you have? Also, did you take the 32-bit version of MAGMA? In addition, if you run the LU on size <= 128 we call directly the LAPACK implementation and the GPU is not used (i.e. MAGMA is more like a wrapper in that case to call the LAPACK+BLAS combination that is on the system).
Stan
Re: Some questions on testing_sgetrf
Hi,
thanks Stan for your reply. I solved the first problem updating the drivers to 185.xx (I'm using CUDA 2.2). But I have not already resolved the second problem. magma_sgetrf seems not working properly and it gives wrong results. I have too high values for error and the output is Nan.
Where is the problem?
I'm using the 32 bit magma and magmablas library, no compilation error.
thanks Stan for your reply. I solved the first problem updating the drivers to 185.xx (I'm using CUDA 2.2). But I have not already resolved the second problem. magma_sgetrf seems not working properly and it gives wrong results. I have too high values for error and the output is Nan.
Where is the problem?
I'm using the 32 bit magma and magmablas library, no compilation error.
Re: Some questions on testing_sgetrf
Hi,
another feedback: I also install cuda 2.3 with 190.xx driver, I tried magma as with standard lapack and blas as with ACML Package using included make.inc. No compilation error, but when I run "testing_sgetrf" I have Nan as result of "get_LU_error" function. I tried also "testing_sgetrf_gpu" and print the residual value for different n1 (as in the code) but I have always Nan.
another feedback: I also install cuda 2.3 with 190.xx driver, I tried magma as with standard lapack and blas as with ACML Package using included make.inc. No compilation error, but when I run "testing_sgetrf" I have Nan as result of "get_LU_error" function. I tried also "testing_sgetrf_gpu" and print the residual value for different n1 (as in the code) but I have always Nan.
-
Stan Tomov
- Posts: 283
- Joined: Fri Aug 21, 2009 10:39 pm
Re: Some questions on testing_sgetrf
Hi,
It looks like we have to recompile a few CUDA kernels for your system. To make sure that's the problem, are the other functions O.K., e.g. what do you get when running testing_sgeqrf?
Thanks,
Stan
It looks like we have to recompile a few CUDA kernels for your system. To make sure that's the problem, are the other functions O.K., e.g. what do you get when running testing_sgeqrf?
Thanks,
Stan
Re: Some questions on testing_sgetrf
Hi Stan,
this is my hardware situation:
PC1 - Notebook - Ubuntu 9-32bit - Nvidia 8400M-GS --> compilation OK, "magma_sgetrf" gives LU matrix with many Nan and Inf values. The other functions "magma_sgeqrf" and "magma_spotrf" RUN PERFECTY (I think, because the error is very low, comparable to the results of the .txt files)
PC2 - Desktop - Ubuntu 8-32bit - Nvidia 9500-GT --> compilation OK, "magma_sgetrf" gives LU matrix with many Nan and Inf values. I don't run yet the other functions, I can next week.
Thanks for your help.
Danilo
this is my hardware situation:
PC1 - Notebook - Ubuntu 9-32bit - Nvidia 8400M-GS --> compilation OK, "magma_sgetrf" gives LU matrix with many Nan and Inf values. The other functions "magma_sgeqrf" and "magma_spotrf" RUN PERFECTY (I think, because the error is very low, comparable to the results of the .txt files)
PC2 - Desktop - Ubuntu 8-32bit - Nvidia 9500-GT --> compilation OK, "magma_sgetrf" gives LU matrix with many Nan and Inf values. I don't run yet the other functions, I can next week.
Thanks for your help.
Danilo
-
Stan Tomov
- Posts: 283
- Joined: Fri Aug 21, 2009 10:39 pm
Re: Some questions on testing_sgetrf
Just for the record of this topic, recompiling the MAGMA CUDA kernels for this specific configuration fixed the problem.