Hi,
I wonder if I could get comments on how to set specific GPU devices for different MPI processors in calling magma_*_m functions. My machine has 28 CPUs and 4 GPU devices. Let's say my MPI programs runs with 2 MPI processors and each calls magma_*_m(NGPU=2, ...). What I want is that
MPI proc #0 uses GPU devices #0~1 and
MPI proc #1 uses GPU devices #2~3.
For this, I tried cudaSetValidDevices and magma_setdevice but they didn't help. I always see that the two MPI processors use GPU devices #0~1 sharing the resources. Does anyone suffer the same problem? Or, is Magma simply unable to choose specific GPU devices to use?
Thanks,
Hong
set specific GPU devices for different MPI processors?
Re: set specific GPU devices for different MPI processors?
You can try setting $CUDA_VISIBLE_DEVICES in the environment. It looks like you can set this early on in your program — before doing any CUDA calls — and it will still have effect. Here's some sample code:
Sample output on a 4 GPU node (sorted for clarity):
Sorry this is a bit of a hack. MAGMA really needs a magma_set_visible_devices function that changes what MAGMA thinks devices 0, 1, ... are.
-mark
Code: Select all
#include <cuda_runtime.h>
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
// -----------------------------------------------------------------------------
void listdev( int rank )
{
cudaError_t err;
int dev_cnt = 0;
err = cudaGetDeviceCount( &dev_cnt );
assert( err == cudaSuccess || err == cudaErrorNoDevice );
printf( "rank %d, cnt %d\n", rank, dev_cnt );
cudaDeviceProp prop;
for (int dev = 0; dev < dev_cnt; ++dev) {
err = cudaGetDeviceProperties( &prop, dev );
assert( err == cudaSuccess );
printf( "rank %d, dev %d, prop %s, pci %d, %d, %d\n",
rank, dev,
prop.name,
prop.pciBusID,
prop.pciDeviceID,
prop.pciDomainID );
}
}
// -----------------------------------------------------------------------------
int main( int argc, char** argv )
{
MPI_Init( &argc, &argv );
int rank;
MPI_Comm_rank( MPI_COMM_WORLD, &rank );
if (rank == 0)
setenv( "CUDA_VISIBLE_DEVICES", "0,1", 1 );
else
setenv( "CUDA_VISIBLE_DEVICES", "2,3", 1 );
printf( "rank %d, CUDA_VISIBLE_DEVICES=%s\n",
rank, getenv( "CUDA_VISIBLE_DEVICES" ));
listdev( rank );
MPI_Finalize();
return 0;
}
Code: Select all
>> mpirun -np 4 ./mpi-cuda-valid-devices
[mgates@b01 test]$ mpirun -np 4 ./mpi-cuda-visible-devices
rank 0, CUDA_VISIBLE_DEVICES 0,1
rank 0, cnt 2
rank 0, dev 0, prop GeForce GTX 1060 6GB, pci 2, 0, 0
rank 0, dev 1, prop GeForce GTX 1060 6GB, pci 4, 0, 0
rank 1, CUDA_VISIBLE_DEVICES 2,3
rank 1, cnt 2
rank 1, dev 0, prop GeForce GTX 1060 6GB, pci 132, 0, 0 # i.e., device 2
rank 1, dev 1, prop GeForce GTX 1060 6GB, pci 133, 0, 0 # i.e., device 3
rank 2, CUDA_VISIBLE_DEVICES 2,3
rank 2, cnt 2
rank 2, dev 0, prop GeForce GTX 1060 6GB, pci 132, 0, 0
rank 2, dev 1, prop GeForce GTX 1060 6GB, pci 133, 0, 0
rank 3, CUDA_VISIBLE_DEVICES 2,3
rank 3, cnt 2
rank 3, dev 0, prop GeForce GTX 1060 6GB, pci 132, 0, 0
rank 3, dev 1, prop GeForce GTX 1060 6GB, pci 133, 0, 0
-mark
Re: set specific GPU devices for different MPI processors?
Great! I've confirmed your remedy works. Now I could obtain better load balance of my computations and achieve 20~30% of speed-up. Thanks.