Hi,
I realized that on a server where CUDA drivers are installed but there is no GPU magma_init() returns SUCESS. Is it something expected?
(nvidia-smi gives "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running" as expected.)
When there is no magma-compatible GPU on the server, my code needs to catch that and switch to another solver instead of magma solver. I handled the problem using magma_getdevice_arch() (after magma_init) which returns 0 and since it is not a meaningful device number (less than 300), I switch the solvers. But magma_getdevice_arch() also prints to std err, which I do not want.
What should be the proper and robust way to check if a compatible GPU exists and the setup is OK for magma?
Thanks.
magma_init returns MAGMA_SUCCESS with no GPU
-
Stan Tomov
- Posts: 283
- Joined: Fri Aug 21, 2009 10:39 pm
Re: magma_init returns MAGMA_SUCCESS with no GPU
One of the functions of magma_init() is to determine how many devices are out there. If there are none, the number of devices is initialized as 0. The code that checks this looks like this:
so if err is cudaErrorNoDevice, which is your case, g_magma_devices_cnt gets initialized as 0 and magma_init returns SUCCESS.
Technically, everything gets initialized as documented. We will discuss it internally if the functionality of magma_init should be modified to return some type of error in this case. Meanwhile, you could do similar check, e.g.,
We haven't done this because there are MAGMA MIC and clMAGMA versions and the plan was to make them available through single interface. Currently, we are also adding hipMAGMA where devices can be AMD GPUs. A plan is to allow the user to specify what device (or devices) to use, allowing use of different GPUs/accelerators (form Nvidia, AMD, Intel, etc.) and even to virtually define a device, e.g., a CPU socket or certain number of CPU cores, etc., which would be another way to cover the case of no GPU devices available.
Code: Select all
err = cudaGetDeviceCount( &g_magma_devices_cnt );
if ( err != 0 && err != cudaErrorNoDevice ) {
info = MAGMA_ERR_UNKNOWN;
goto cleanup;
}
Technically, everything gets initialized as documented. We will discuss it internally if the functionality of magma_init should be modified to return some type of error in this case. Meanwhile, you could do similar check, e.g.,
Code: Select all
if (cudaGetDeviceCount( &g_magma_devices_cnt ) == cudaErrorNoDevice ) {
// use other solvers
}
else
{
// use magma
}