The LAPACK forum has moved to https://github.com/Reference-LAPACK/lapack/discussions.

Urgent-Can anyone help me?

Open discussion regarding features, bugs, issues, vendors, etc.

Urgent-Can anyone help me?

Postby bruce » Sat Jan 13, 2007 7:33 pm

Hi, there:


I am not sure which routine is to do the two-Dimensional Block Cyclic Data Distribution.

I am confused that a call to the ScaLAPACK TOOLS routine SL_INIT initializes the process grid and routine Cblacs_gridinit( &ictxt, "Row", nprow, npcol ) is also for initializing a grid (I saw that in fortran sample code). I just do not know which one I should use. I used Cblacs_gridinit() in the MPI C code and I got the eigenvalues but I am still confused about that.

Any help will be highly appreciated!
bruce
 
Posts: 36
Joined: Mon Sep 25, 2006 5:11 am

Postby Julie » Wed Jan 17, 2007 1:16 pm

Both of them.

What are in SCALAPACK/TOOLS are tools! That means routines to help you simplify your code.
Code: Select all
*  SL_INIT initializes an NPROW x NPCOL process grid using a row-major
*  ordering  of  the  processes. This routine retrieves a default system
*  context  which  will  include all available processes. In addition it
*  spawns the processes if needed.
[ ....]
      CALL BLACS_GRIDINIT( ICTXT, 'Row-major', NPROW, NPCOL )


You see SL_INIT is doing all the call you need to initialize your process grid.

Usually, I do not use this routine (it is a personal choice), but if you look at the example from the book, they use it: http://www.netlib.org/scalapack/slug/no ... 0000000000

I will advice you to read at least the following section from the ScaLAPACK book: http://www.netlib.org/scalapack/slug/no ... 0000000000
to understand the 4 basic steps of ScaLAPACK:

1. Initialize the process grid
2. Distribute the matrix on the process grid
3. Call ScaLAPACK routine
4. Release the process grid

Hope it helps
Julie
Julie
 
Posts: 299
Joined: Wed Feb 23, 2005 12:32 am
Location: ICL, Denver. Colorado

Postby bruce » Wed Jan 17, 2007 11:54 pm

Hi, Julie:

I really appreciate your help! I saw the implementation of SL_INIT in example1.f (Simplest example program calling PDGESV) file. It is very helpful.

I still have some questions in how to distribute matrix to process grids concerning the sample code. I am not sure if I understand everything correctly. I hope you can help me to confirm it.

I saw the following code in example1.f
============================
IF( MYROW.EQ.0 .AND. MYCOL.EQ.0 ) THEN
AA( 1 ) = S
AA( 2 ) = -S
AA( 3 ) = -S
AA( 4 ) = -S
AA( 5 ) = -S
AA( 1+MXLLDA ) = C
AA( 2+MXLLDA ) = C
AA( 3+MXLLDA ) = -C
AA( 4+MXLLDA ) = -C
AA( 5+MXLLDA ) = -C
AA( 1+2*MXLLDA ) = A
AA( 2+2*MXLLDA ) = A
AA( 3+2*MXLLDA ) = A
AA( 4+2*MXLLDA ) = A
AA( 5+2*MXLLDA ) = -A
AA( 1+3*MXLLDA ) = C
AA( 2+3*MXLLDA ) = C
AA( 3+3*MXLLDA ) = C
AA( 4+3*MXLLDA ) = C
AA( 5+3*MXLLDA ) = -C
B( 1 ) = 0.0D0
B( 2 ) = 0.0D0
B( 3 ) = 0.0D0
B( 4 ) = 0.0D0
B( 5 ) = 0.0D0
ELSE IF( MYROW.EQ.0 .AND. MYCOL.EQ.1 ) THEN
AA( 1 ) = A
AA( 2 ) = A
AA( 3 ) = -A
AA( 4 ) = -A
AA( 5 ) = -A
AA( 1+MXLLDA ) = L
AA( 2+MXLLDA ) = L
AA( 3+MXLLDA ) = -L
AA( 4+MXLLDA ) = -L
AA( 5+MXLLDA ) = -L
AA( 1+2*MXLLDA ) = K
AA( 2+2*MXLLDA ) = K
AA( 3+2*MXLLDA ) = K
AA( 4+2*MXLLDA ) = K
AA( 5+2*MXLLDA ) = K
ELSE IF( MYROW.EQ.0 .AND. MYCOL.EQ.2 ) THEN
AA( 1 ) = A
AA( 2 ) = A
AA( 3 ) = A
AA( 4 ) = -A
AA( 5 ) = -A
AA( 1+MXLLDA ) = P
AA( 2+MXLLDA ) = P
AA( 3+MXLLDA ) = P
AA( 4+MXLLDA ) = P
AA( 5+MXLLDA ) = -P
ELSE IF( MYROW.EQ.1 .AND. MYCOL.EQ.0 ) THEN
AA( 1 ) = -S
AA( 2 ) = -S
AA( 3 ) = -S
AA( 4 ) = -S
AA( 1+MXLLDA ) = -C
AA( 2+MXLLDA ) = -C
AA( 3+MXLLDA ) = -C
AA( 4+MXLLDA ) = C
AA( 1+2*MXLLDA ) = A
AA( 2+2*MXLLDA ) = A
AA( 3+2*MXLLDA ) = A
AA( 4+2*MXLLDA ) = -A
AA( 1+3*MXLLDA ) = C
AA( 2+3*MXLLDA ) = C
AA( 3+3*MXLLDA ) = C
AA( 4+3*MXLLDA ) = C
B( 1 ) = 1.0D0
B( 2 ) = 0.0D0
B( 3 ) = 0.0D0
B( 4 ) = 0.0D0
ELSE IF( MYROW.EQ.1 .AND. MYCOL.EQ.1 ) THEN
AA( 1 ) = A
AA( 2 ) = -A
AA( 3 ) = -A
AA( 4 ) = -A
AA( 1+MXLLDA ) = L
AA( 2+MXLLDA ) = L
AA( 3+MXLLDA ) = -L
AA( 4+MXLLDA ) = -L
AA( 1+2*MXLLDA ) = K
AA( 2+2*MXLLDA ) = K
AA( 3+2*MXLLDA ) = K
AA( 4+2*MXLLDA ) = K
ELSE IF( MYROW.EQ.1 .AND. MYCOL.EQ.2 ) THEN
AA( 1 ) = A
AA( 2 ) = A
AA( 3 ) = -A
AA( 4 ) = -A
AA( 1+MXLLDA ) = P
AA( 2+MXLLDA ) = P
AA( 3+MXLLDA ) = -P
AA( 4+MXLLDA ) = -P
END IF

This sections seems that it manually distribute matrix to different process (2*3 grid. Process 00, 01,02,10,11,12). So I am wondering if the distributing of the matrix on the process grid should be done manually or there is some routines to do that.

=================================

My second question is that
I read the sample fortran codes of
SAMPLE_PDSYEV_CALL.f and SAMPLE_PDSYEVX_CALL.f code in Scalapack website. I read the source code, and I can see the initialization of process grid and initialization of array discriptor by calling the routine CALL DESCINIT( DESCA, N, N, NB, NB, 0, 0, CONTEXT, LDA, INFO ). However, both of them call a subroutine PDLAMODHILB. I also check the subroutine PDLAMODHILB, and I know that this routine is used to create a matrix for later computation. But I didn't see how the matrix is distributed to grids in a two-Dimensional way or using block cyclic distribution. However, I see some section in PDLAMODHILB implementation like this:
===========================
This is just to keep ftnchek happy
IF( BLOCK_CYCLIC_2D*CSRC_*CTXT_*DLEN_*DT_*LLD_*MB_*M_*NB_*N_*
$ RSRC_.LT.0 )RETURN
*
)
===========================
So I am wondering if it did the block cyclic distribution in this subroutine already?

===================================

My third question is that:

Last time you give me a very useful MPI C code to call pdsyev routine. I run it and modify it a little for my use and I can get the eigenvalues. However, I am still confused about the part of how the matrix is distributed to the process grid using two dimentional cyclic distribution.

Here is the code you gave it to me and I did a little change.
=========================

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/time.h>
#include "mpi.h"

#include "blas.h"
#include "blacs.h"
#include "scalapack.h"

extern void pdsyev_(char *jobz, char *uplo, int *n, double *a, int *ia, int *ja, int *desca, double *w, double *z, int *iz, int *jz, int *descz, double *work, int *lwork, int *info);

int main(int argc, char ** argv)
{
int iam, nprocs;
int myrank_mpi, nprocs_mpi;
int ictxt, nprow, npcol, myrow, mycol;
int n,nb;
int info, lwork;
int desca[9],descz[9];
double *a, *z, *w,*work;
int izero=0, ione=1;
double mone=(-1.0e0);
char jobz, uplo;
int i;


MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD,&myrank_mpi);
MPI_Comm_size(MPI_COMM_WORLD, &nprocs_mpi);

// printf("myrank_mpi is: %d \n",myrank_mpi);
// printf("nprocs_mpi is: %d \n", nprocs_mpi);
n=3;
nb=1;
nprow=1;
npcol=1;
jobz='V';
uplo='L';




Cblacs_pinfo(&iam, &nprocs);
Cblacs_get(-1,0,&ictxt);
Cblacs_gridinit(&ictxt,"Row",nprow, npcol);
Cblacs_gridinfo(ictxt, &nprow,&npcol, &myrow, &mycol);


a=(double *) malloc(n*n*sizeof(double));
w=(double *) malloc(n*sizeof(double));
z=(double *) malloc(n*n*sizeof(double));


a[0]=2;
a[1]=1;a[4]=2;
a[2]=1;a[5]=1; a[8]=2;

descinit_(desca, &n, &n, &nb, &nb, &izero, &izero, &ictxt, &n, &info);
descinit_(descz, &n, &n, &nb, &nb, &izero, &izero, &ictxt, &n, &info);

work=(double *)malloc(1*sizeof(double));
lwork=-1;

pdsyev_(&jobz, &uplo, &n, a , &ione, &ione, desca, w, z, &ione, &ione, descz, work, &lwork, &info);

lwork=(int)work[0];
free(work);

work=(double *)malloc(lwork*sizeof(double));

pdsyev_(&jobz, &uplo, &n, a, &ione, &ione, desca, w, z, &ione, &ione, descz, work, &lwork, &info);


fprintf(stdout, "e1=%f\n", w[0]);
fprintf(stdout, "e2=%f\n", w[1]);
fprintf(stdout, "e3=%f\n", w[2]);

free(work);
free(w);
free(a);

MPI_Finalize();
exit(0);
}
===========================
Like you said, I see the eigenvalues 1,1,4. However, I am not sure if the jobs can really be parallelized and which part is doing the distribution of matrix to grid. I hope it is not just work as Lapack routine.

I am new to this field and maybe my questions is quite silly but I really appreciate your help. I can see my progress in learning Scalapack with your tremendous help.

Bruce
bruce
 
Posts: 36
Joined: Mon Sep 25, 2006 5:11 am

Postby Julien Langou » Fri Jan 19, 2007 7:45 pm

This sections seems that it manually distribute matrix to different process
(2*3 grid. Process 00, 01,02,10,11,12). So I am wondering if the distributing of the matrix
on the process grid should be done manually or there is some routines to do that.


This is an example of a user that distribute manually its matrix on the processor grid.
This is one way to go. Another way to go is to use some routines to help (see below).
For example PDELSET.

So I am wondering if it did the block cyclic distribution in this subroutine already?


The affectation of the value in the matrix is done in PDLAMODHILB. This is a parallel
distributed routines that need to be called by all the processes in the grid. The affectation
of the coefficient is done through the subroutine PDELSET.

PDELSET is the easiest way to set up your matrix. So this is where you want to start.
Can not do easier. From all the processes in the grid, you perform two loops I=1,N and
J=I,N and you call PDELSET ( A , I ,J, DESCA, AIJ ). And that's it.

A is the pointer on your local array. DESCA has all the inofrmation on A, the grid that are
needed. I and J are where you want to put the value AIJ in A.

So basically your job is to provide AIJ for each I and J. For each I and J, PDELSET will
find the process that owns the data A(I,J), then if you are the process that owns the data,
you affect the data in the correct position in the array A and if you do not own A(I,J), you
do nothing (exit form PDELSET).

Well that's easy for sure. Now that's not the more effiecient imlementation. Just start
from there if you have some effieciency problem because of the setting of your matrix
you can ask again.

Like you said, I see the eigenvalues 1,1,4. However, I am not sure if the jobs can really be parallelized and which part is doing the distribution of matrix to grid. I hope it is not just work as Lapack routine.


Well, ... , the main that you sent works for sure but it works on an npcol-by-nprow grid
where npcol=1 and nprow=1. So basically it is an LAPACK call. (Through ScaLAPACK
though.) Not terrificly exciting.

Julien
Julien Langou
 
Posts: 835
Joined: Thu Dec 09, 2004 12:32 pm
Location: Denver, CO, USA

Thanks Julien

Postby bruce » Sun Jan 21, 2007 4:47 pm

Thanks, Julien. I will try. I really appreciated your explaination!
bruce
 
Posts: 36
Joined: Mon Sep 25, 2006 5:11 am


Return to User Discussion

Who is online

Users browsing this forum: No registered users and 6 guests