Page 1 of 1

set specific GPU devices for different MPI processors?

Posted: Tue Aug 14, 2018 6:52 pm
by hhpark
Hi,
I wonder if I could get comments on how to set specific GPU devices for different MPI processors in calling magma_*_m functions. My machine has 28 CPUs and 4 GPU devices. Let's say my MPI programs runs with 2 MPI processors and each calls magma_*_m(NGPU=2, ...). What I want is that
MPI proc #0 uses GPU devices #0~1 and
MPI proc #1 uses GPU devices #2~3.
For this, I tried cudaSetValidDevices and magma_setdevice but they didn't help. I always see that the two MPI processors use GPU devices #0~1 sharing the resources. Does anyone suffer the same problem? Or, is Magma simply unable to choose specific GPU devices to use?

Thanks,
Hong

Re: set specific GPU devices for different MPI processors?

Posted: Wed Aug 15, 2018 11:12 am
by mgates3
You can try setting $CUDA_VISIBLE_DEVICES in the environment. It looks like you can set this early on in your program — before doing any CUDA calls — and it will still have effect. Here's some sample code:

Code: Select all

#include <cuda_runtime.h>
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

// -----------------------------------------------------------------------------
void listdev( int rank )
{
    cudaError_t err;
    
    int dev_cnt = 0;
    err = cudaGetDeviceCount( &dev_cnt );
    assert( err == cudaSuccess || err == cudaErrorNoDevice );
    printf( "rank %d, cnt %d\n", rank, dev_cnt );
    
    cudaDeviceProp prop;
    for (int dev = 0; dev < dev_cnt; ++dev) {
        err = cudaGetDeviceProperties( &prop, dev );
        assert( err == cudaSuccess );
        printf( "rank %d, dev %d, prop %s, pci %d, %d, %d\n",
                rank, dev,
                prop.name,
                prop.pciBusID,
                prop.pciDeviceID,
                prop.pciDomainID );
    }
}

// -----------------------------------------------------------------------------
int main( int argc, char** argv )
{
    MPI_Init( &argc, &argv );
    int rank;
    MPI_Comm_rank( MPI_COMM_WORLD, &rank );
    
    if (rank == 0)
        setenv( "CUDA_VISIBLE_DEVICES", "0,1", 1 );
    else
        setenv( "CUDA_VISIBLE_DEVICES", "2,3", 1 );
    
    printf( "rank %d, CUDA_VISIBLE_DEVICES=%s\n",
            rank, getenv( "CUDA_VISIBLE_DEVICES" ));
    
    listdev( rank );
    
    MPI_Finalize();
    
    return 0;
}
Sample output on a 4 GPU node (sorted for clarity):

Code: Select all

>> mpirun -np 4 ./mpi-cuda-valid-devices
[mgates@b01 test]$ mpirun -np 4 ./mpi-cuda-visible-devices
rank 0, CUDA_VISIBLE_DEVICES 0,1
rank 0, cnt 2
rank 0, dev 0, prop GeForce GTX 1060 6GB, pci 2, 0, 0
rank 0, dev 1, prop GeForce GTX 1060 6GB, pci 4, 0, 0
rank 1, CUDA_VISIBLE_DEVICES 2,3
rank 1, cnt 2
rank 1, dev 0, prop GeForce GTX 1060 6GB, pci 132, 0, 0  # i.e., device 2
rank 1, dev 1, prop GeForce GTX 1060 6GB, pci 133, 0, 0  # i.e., device 3
rank 2, CUDA_VISIBLE_DEVICES 2,3
rank 2, cnt 2
rank 2, dev 0, prop GeForce GTX 1060 6GB, pci 132, 0, 0
rank 2, dev 1, prop GeForce GTX 1060 6GB, pci 133, 0, 0
rank 3, CUDA_VISIBLE_DEVICES 2,3
rank 3, cnt 2
rank 3, dev 0, prop GeForce GTX 1060 6GB, pci 132, 0, 0
rank 3, dev 1, prop GeForce GTX 1060 6GB, pci 133, 0, 0
Sorry this is a bit of a hack. MAGMA really needs a magma_set_visible_devices function that changes what MAGMA thinks devices 0, 1, ... are.

-mark

Re: set specific GPU devices for different MPI processors?

Posted: Wed Aug 15, 2018 1:00 pm
by hhpark
Great! I've confirmed your remedy works. Now I could obtain better load balance of my computations and achieve 20~30% of speed-up. Thanks.