The LAPACK forum has moved to https://github.com/Reference-LAPACK/lapack/discussions.

Segmentation Fault in ScaLAPACK Test

Open discussion regarding features, bugs, issues, vendors, etc.

Segmentation Fault in ScaLAPACK Test

Postby Fortran » Mon Sep 29, 2008 2:17 pm

Okay, I had this here on the forum for a bit, but I thought I'd figured out the problem, namely 32-bit LAPACK packed into 64-bit ATLAS. That turned out not to be the case and the new all 64-bit ATLAS passes all check and ptcheck.

To that end, I now present a bug I'm getting. First, I built ScaLAPACK and BLACK with:
Code: Select all
# ./setup.py --blaslib="-L/usr/local/atlas/lib -llapack -lf77blas -lcblas -latlas" --lapacklib="-L/usr/local/atlas/lib -llapack -lf77blas -lcblas -latlas" --downblacs

This showed no problems.

I then went into the build/scalapack-1.8.0/TESTING directory and tried to run test xdsep:
Code: Select all
# mpirun -np 4 ./xdsep
SCALAPACK symmetric Eigendecomposition routines.
' '                                                                             
 
Running tests of the parallel symmetric eigenvalue routine:  PDSYEVX &  PDSYEV & PDSYEVD.
The following scaled residual checks will be computed:
 ||AQ - QL|| / ((abstol + ||A|| * eps) * N)
 ||Q^T*Q - I|| / (N * eps)

An explanation of the input/output parameters follows:
RESULT   : passed; or an indication of which eigen request test failed
N        : The number of rows and columns of the matrix A.
P        : The number of process rows.
Q        : The number of process columns.
NB       : The size of the square blocks the matrix A is split into.
THRESH   : If a residual value is less than THRESH, RESULT is flagged as PASSED.
         : the QTQ norm is allowed to exceed THRESH for those eigenvectors
         :  which could not be reorthogonalized for lack of workspace.
TYP      : matrix type (see PDSEPtst.f).
SUB      : Subtests (see PDSEPtst).f
CHK      : ||AQ - QL|| / ((abstol + ||A|| * eps) * N)
QTQ      : ||Q^T*Q - I||/ (N * eps)
         : when the adjusted QTQ exceeds THRESH
 the adjusted QTQ norm is printed
         : otherwise the true QTQ norm is printed
           If NT>1, CHK and QTQ are the max over all eigen request tests
TEST     : EVX - testing PDSYEVX, EV - testing PDSYEV, EVD - testing PDSYEVD
 
     N  NB   P   Q TYP SUB   WALL      CPU      CHK       QTQ    CHECK    TEST
 ----- --- --- --- --- --- -------- -------- --------- --------- -----    ----
'TEST 1 - test tiny matrices - different process configurations'               
     0   1   1   2   8   N     0.00    -1.00   0.0       0.0     PASSED   EVX 
[oxygen:07445] *** Process received signal ***
[oxygen:07445] Signal: Segmentation fault (11)
[oxygen:07445] Signal code:  (128)
[oxygen:07445] Failing at address: (nil)
[oxygen:07445] [ 0] /lib64/libpthread.so.0 [0x3729a0ed30]
[oxygen:07445] [ 1] /usr/local/lib/openmpi/mca_pml_ob1.so [0x2ad6aec778a9]
[oxygen:07445] [ 2] /usr/local/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_progress+0x6b9) [0x2ad6af28ed99]
[oxygen:07445] [ 3] /usr/local/lib/openmpi/mca_bml_r2.so(mca_bml_r2_progress+0x2b) [0x2ad6aee8305b]
[oxygen:07445] [ 4] /usr/local/lib/libopen-pal.so.0(opal_progress+0x4a) [0x2ad6a96a2b5a]
[oxygen:07445] [ 5] /usr/local/lib/libmpi.so.0(ompi_request_wait_all+0x1cd) [0x2ad6a91d559d]
[oxygen:07445] [ 6] /usr/local/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_allreduce_intra_recursivedoubling+0x313) [0x2ad6afcb2f23]
[oxygen:07445] [ 7] /usr/local/lib/libmpi.so.0(ompi_comm_activate+0x90) [0x2ad6a91c49a0]
[oxygen:07445] [ 8] /usr/local/lib/libmpi.so.0(ompi_comm_create+0x174) [0x2ad6a91c4744]
[oxygen:07445] [ 9] /usr/local/lib/libmpi.so.0(MPI_Comm_create+0xc8) [0x2ad6a91ed2b8]
[oxygen:07445] [10] ./xdsep(Cblacs_gridmap+0x160) [0x4c0740]
[oxygen:07445] [11] ./xdsep(SL_Cgridreshape+0x110) [0x41fec0]
[oxygen:07445] [12] ./xdsep(pdlasizesyev_+0x262) [0x4184d2]
[oxygen:07445] [13] ./xdsep(pdsqpsubtst_+0x6e4) [0x418cb4]
[oxygen:07445] [14] ./xdsep(pdseptst_+0x607a) [0x40cf1a]
[oxygen:07445] [15] ./xdsep(pdsepreq_+0x82e) [0x416ade]
[oxygen:07445] [16] ./xdsep(MAIN__+0x15a5) [0x415ef1]
[oxygen:07445] [17] ./xdsep(main+0x2c) [0x4c469c]
[oxygen:07445] [18] /lib64/libc.so.6(__libc_start_main+0xfa) [0x3728e1e32a]
[oxygen:07445] [19] ./xdsep(dsymv_+0x79) [0x406dd9]
[oxygen:07445] *** End of error message ***
mpirun noticed that job rank 0 with PID 7445 on node oxygen.nrl.navy.mil exited on signal 11 (Segmentation fault).
3 additional processes aborted (not shown)


I am now stuck and cannot figure out how to fix this. Any help would be appreciated.
Fortran
 
Posts: 7
Joined: Fri Sep 12, 2008 8:40 am
Location: Alexandria, VA

Re: Segmentation Fault in ScaLAPACK Test

Postby Julien Langou » Mon Sep 29, 2008 2:42 pm

Hello, I know that can be seen as a weird request but can you try with anoth MPI library. (mpich for example). Best wishes, Julien Langou.
Julien Langou
 
Posts: 835
Joined: Thu Dec 09, 2004 12:32 pm
Location: Denver, CO, USA

Re: Segmentation Fault in ScaLAPACK Test

Postby Fortran » Mon Sep 29, 2008 3:06 pm

Update: I downloaded and hand-compiled BLACS being careful to use m64 everywhere. Same with scalapack. I think it now works. At least, xdsep works. Tomorrow I will try the hojillion test routines to make sure all works.
Fortran
 
Posts: 7
Joined: Fri Sep 12, 2008 8:40 am
Location: Alexandria, VA

Re: Segmentation Fault in ScaLAPACK Test

Postby joshtanga » Wed Oct 29, 2008 5:03 pm

I have a segmentation fault that seems to be caused by ' sl_init '. I am using C++ to call Scalapack routines. I list below the portion of my code which seems associated with the error.

I link/compile with:
Code: Select all
 
LIBS =
LIBS += -lscalapack
#LIBS += -lblacsF77init
LIBS += -lblacsF77
LIBS += -lblacs
#LIBS += -lblacsF77init
#LIBS += -lblacsCinit
LIBS += -lblacsC
#LIBS += -lf2c
#LIBS += -latlas
LIBS += -lblacs
#LIBS += -lblacsCinit
LIBS += -llapack
LIBS += -lblas
LIBS += -lacml
LIBS += -lm
LIBS += -pgf90libs
LIBS += -pgf77libs
LIBS += -lpgftnrtl
LIBS += -ltmpe
LIBS += -lpmpich

        pgCC  debugging.cpp ${LIBS}  -o dgesvd

where the file debugging.cpp is :
Code: Select all
#include <iostream>
#include <mpi.h>
#include "timer.h"

// Initialize the process grid
  int context;
    int nprow = 1;
      int npcol = 1;
        int nprocs = 1;
          int myrow = 250;
            int mycol = 250;
              int id;


               extern "C" void sl_init_(int* ictxt, int* nprow, int* npcol);
               extern "C" void blacs_gridinfo_(int *context, int *nprow, int *npcol, int *myrow, int *mycol);
               extern "C" void blacs_pinfo_(int* id, int* nprocs);


               extern void  blacs_get_( int context, int request, int* value);
               extern int   blacs_gridinit_( int* context, char * order, int np_row, int np_col);
               //extern void  blacs_gridinfo_( int context, int*  np_row, int* np_col, int*  my_row, int*  my_col);
               extern "C" void  blacs_gridexit_( int* context);
               extern "C" void  blacs_exit_( int error_code);


               int main(int nargs, char *args[])
               {

                   sl_init_(&context, &nprow, &npcol);
                   blacs_gridinfo_(&context, &nprow, &npcol, &myrow, &mycol);

                     //  blacs_gridexit_(&context);
                     // blacs_exit_(0);
                 }


and the result is
Code: Select all

 mpirun -np 1 debugging
Starting MPI_Init...
/usr/bin/mpirun.ch_p4: line 243: 10566 Segmentation fault      "/home/jthompson/Scalapack/debugging" -p4pg "/home/jthompson/Scalapack/PI10509" -p4wd "/home/jthompson/Scalapack"


Please advise.
joshtanga
 
Posts: 6
Joined: Tue Oct 21, 2008 10:24 pm

Re: Segmentation Fault in ScaLAPACK Test

Postby joshtanga » Wed Nov 05, 2008 1:46 pm

This problem was solved by adding a call

MPI_Init(&argc, &argv);

as the first line of main(). Oddly enough, a call to MPI_Finalize caused an error while the lone call to MPI_Init sufficed.

UPDATE: 12/1/08. Calling both Cblacs_gridexit and MPI_Finalize caused no error. I'm not sure what caused the initial error with MPI_Finalize.
joshtanga
 
Posts: 6
Joined: Tue Oct 21, 2008 10:24 pm


Return to User Discussion

Who is online

Users browsing this forum: No registered users and 7 guests