Problem with zgesv

kpar · Post by **kpar** » Mon Feb 29, 2016 2:44 am

Hi All,

I have a problem using magma_zgesv function on machine with 3 GPUs and matrix size 160k. In this case the function call uses only 2 GPUs and then eventually segfaults. For matrix sizes up to 150k I have no issues.
Are there any limitations on matrix size for dense linear solver?

Konstantin

mgates3 · Post by **mgates3** » Thu Mar 10, 2016 10:03 am

Not at that point. Above around N=46k, i.e., sqrt( 2**31 ), it may need to use MAGMA_ILP64. Then magma_int_t is a 64-bit long long instead of a 32-bit int, so it will correctly compute offsets. See make.inc.mkl-ilp64.

But if it works for 150k, it should continue to work for 160k. It should use an out-of-GPU-memory algorithm, so the entire problem does not need to fit into GPU memory at once. Eventually, it will probably have an issue if not even a couple panels fit into GPU memory, e.g., 160k x NB exceeds GPU memory for some block size NB.

How do you know it uses only 2 GPUs?

Is this reproducible using the magma tester, for example below? (Here with much smaller sizes and 2 GPUs.) If so, the complete input and output, as shown, would be helpful to give details about your environment that we need to diagnose a problem.

Code: Select all

bunsen magma-trunk/testing> ./testing_dgesv -N 15000 -N 16000 -c --ngpu 2
% MAGMA 2.0.1 svn compiled for CUDA capability >= 3.5, 64-bit magma_int_t, 64-bit pointer.
% CUDA runtime 7000, driver 7050. OpenMP threads 16. MKL 11.2.2, MKL threads 16. 
% device 0: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
% device 1: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
% Thu Mar 10 09:01:32 2016
% Usage: ./testing_dgesv [options] [-h|--help]

% ngpu 2
%   N  NRHS   CPU Gflop/s (sec)   GPU Gflop/s (sec)   ||B - AX|| / N*||A||*||X||
%===============================================================================
15000     1     ---   (  ---  )    590.45 (   3.81)   2.66e-19   ok
16000     1     ---   (  ---  )    678.99 (   4.02)   3.17e-19   ok

-mark

kpar · Post by **kpar** » Fri Mar 11, 2016 10:42 am

Hello mark,

Here is my environment configuration:

Code: Select all

% MAGMA 2.0.0  compiled for CUDA capability >= 2.0, 64-bit magma_int_t, 64-bit pointer.
% CUDA runtime 7050, driver 8000. OpenMP threads 12.
% device 0: Quadro K6000, 901.5 MHz clock, 12287.8 MB memory, capability 3.5
% device 1: Quadro K6000, 901.5 MHz clock, 12287.8 MB memory, capability 3.5
% device 2: Quadro K6000, 901.5 MHz clock, 12287.8 MB memory, capability 3.5

I have managed to reproduce the problem using testing_zgetrf tester application.
It requires small modifications in main function. Below there is modified main function (modifications are marked with <---)

Code: Select all

int main( int argc, char** argv)
{
    TESTING_INIT();

    real_Double_t   gflops, gpu_perf, gpu_time, cpu_perf=0, cpu_time=0;
    double          error;
    magmaDoubleComplex *h_A;
    magma_int_t     *ipiv;
    magma_int_t     M, N, n2, lda, info, min_mn;
    magma_int_t     status = 0;
    
    magma_opts opts;
    opts.parse_opts( argc, argv );
    
    double tol = opts.tolerance * lapackf77_dlamch("E");

    printf("%% ngpu %d, version %d\n", (int) opts.ngpu, (int) opts.version );
    if ( opts.check == 2 ) {
        printf("%%   M     N   CPU Gflop/s (sec)   GPU Gflop/s (sec)   |Ax-b|/(N*|A|*|x|)\n");
    }
    else {
        printf("%%   M     N   CPU Gflop/s (sec)   GPU Gflop/s (sec)   |PA-LU|/(N*|A|)\n");
    }
    printf("%%========================================================================\n");
    for( int itest = 0; itest < opts.ntest; ++itest ) {
        for( int iter = 0; iter < opts.niter; ++iter ) {
            M = opts.msize[itest];
            N = opts.nsize[itest];
            min_mn = min(M, N);
            lda    = M;
            n2     = lda*N;
            gflops = FLOPS_ZGETRF( M, N ) / 1e9;
            
			// TESTING_MALLOC_CPU( ipiv, magma_int_t, min_mn );        <------------- MODIFIED LINE
			// TESTING_MALLOC_PIN( h_A,  magmaDoubleComplex, n2 );     <------------- MODIFIED LINE

			ipiv = nullptr;                                            <------------- MODIFIED LINE
			h_A = nullptr;                                             <------------- MODIFIED LINE
            
            /* =====================================================================
               Performs operation using LAPACK
               =================================================================== */
            if ( opts.lapack ) {
                init_matrix( opts, M, N, h_A, lda );
                
                cpu_time = magma_wtime();
                lapackf77_zgetrf( &M, &N, h_A, &lda, ipiv, &info );
                cpu_time = magma_wtime() - cpu_time;
                cpu_perf = gflops / cpu_time;
                if (info != 0) {
                    printf("lapackf77_zgetrf returned error %d: %s.\n",
                           (int) info, magma_strerror( info ));
                }
            }
            
            /* ====================================================================
               Performs operation using MAGMA
               =================================================================== */
            // init_matrix( opts, M, N, h_A, lda );                    <------------- MODIFIED LINE
            if ( opts.version == 2 || opts.version == 3 ) {
                // no pivoting versions, so set ipiv to identity
                for (magma_int_t i=0; i < min_mn; ++i ) {
                    ipiv[i] = i+1;
                }
            }
            
            gpu_time = magma_wtime();
            if ( opts.version == 1 ) {
                magma_zgetrf( M, N, h_A, lda, ipiv, &info );
            }
            else if ( opts.version == 2 ) {
                magma_zgetrf_nopiv( M, N, h_A, lda, &info );
            }
            else if ( opts.version == 3 ) {
                magma_zgetf2_nopiv( M, N, h_A, lda, &info );
            }
            gpu_time = magma_wtime() - gpu_time;
            gpu_perf = gflops / gpu_time;
            if (info != 0) {
                printf("magma_zgetrf returned error %d: %s.\n",
                       (int) info, magma_strerror( info ));
            }
            
            /* =====================================================================
               Check the factorization
               =================================================================== */
            if ( opts.lapack ) {
                printf("%5d %5d   %7.2f (%7.2f)   %7.2f (%7.2f)",
                       (int) M, (int) N, cpu_perf, cpu_time, gpu_perf, gpu_time );
            }
            else {
                printf("%5d %5d     ---   (  ---  )   %7.2f (%7.2f)",
                       (int) M, (int) N, gpu_perf, gpu_time );
            }
            if ( opts.check == 2 ) {
                error = get_residual( opts, M, N, h_A, lda, ipiv );
                printf("   %8.2e   %s\n", error, (error < tol ? "ok" : "failed"));
                status += ! (error < tol);
            }
            else if ( opts.check ) {
                error = get_LU_error( opts, M, N, h_A, lda, ipiv );
                printf("   %8.2e   %s\n", error, (error < tol ? "ok" : "failed"));
                status += ! (error < tol);
            }
            else {
                printf("     ---   \n");
            }
            
            // TESTING_FREE_CPU( ipiv );                               <------------- MODIFIED LINE
            // TESTING_FREE_PIN( h_A  );                               <------------- MODIFIED LINE
            fflush( stdout );
        }
        if ( opts.niter > 1 ) {
            printf( "\n" );
        }
    }

    opts.cleanup();
    TESTING_FINALIZE();
    return status;
}

When I run in debug mode modified application with following arguments:

Code: Select all

-N 160000 --ngpu 3

the application crashes on function call magma_zsetmatrix_async in function magmablas_zsetmatrix_transpose_mgpu. It seems that only 2 queues are created/allocated instead of 3 (according to the number of GPUs).

From you post I have noticed that you use 2 GPU environment and have managed to reproduce the problem for such environment too. If you run application with following arguments:

Code: Select all

-N 210000 --ngpu 2

application will crash in magma_zgetrf2_mgpu function.

I think both cases are similar. Can you propose some workaround or fix for this problem?

Konstantin

mgates3 · Post by **mgates3** » Wed Apr 06, 2016 8:07 am

I don't understand your modifications. You are not allocating the h_A and ipiv arrays? Then there is no matrix to solve. It should crash immediately for any matrix size.
-mark

MAGMA Forum

Problem with zgesv

Problem with zgesv

Re: Problem with zgesv

Re: Problem with zgesv

Re: Problem with zgesv