LAPACK/ScaLAPACK Development

by **XL64100** » Thu May 19, 2011 9:25 am

Hello,

I'm currently working on the optimization of the assembly algorithm for using scalapack on a large number of processors (About 1000-10000 or more) (And more than 400 000 degrees of freedom for the problem).

The actual algorithm we are using is not scalable, the time will even increase with the number of processors after a certain size.
A simplified version of this algorithm is :

Code: Select all: /* ne is the total number of elements */ DO first_element = 1, ne, step DO ie = first_element, first_element + step DO je = ie, ne mat_elem = Build_the_elementary_matrix(ie,je) DO ddlie = 1, nb_ddl(ie) I = ddls(ddlie, ie) DO ddlje = 1, nb_ddl(je) J = ddls(ddlje, je) ie_lines(compute_idx(I,J)) = aCoef * mat_elem(I,J) ie_columns(compute_idx(J,I)) = anothercoef * mat_elem(I,J) END DO END DO END DO END DO DO ie = first_element, first_element + step DO p = 1, nb_proc SEND_DATA_TO(P) RECV_DATA_FROM(nb_proc+1-P) ADD_DATA_TO_LOCAL_MTX END DO END DO END DO

I have been working on an other algorithm using no communication.

Code: Select all: ! We construct 2 arrays which will say if an element impact local lines or local columns islocal_lig, islocal_col = computations DO ie = 1, ne IF (.not. islocal_lig(ie)) CYCLE Do je 1, ne IF (.not. islocal_col(je)) CYCLE IF (je > ie .OR. (.NOT. (isloca_lig(je) .AND. islocal_col(ie))) THEN mat_elem = Build_the_elementary_matrix(ie,je) DO ddlie = 1, nb_ddl(ie) I = ddls(ddlie, ie) DO ddlje = 1, nb_ddl(je) J = ddls(ddlje, je) if (local(I,J)) matrix(compute_idx(I,J)) = aCoef * mat_elem(I,J) if (local(J,I) matrix(compute_idx(J,I)) = anothercoef * mat_elem(I,J) END DO END DO END IF END DO END DO

This one is scalable but the problem is that it computes about 4 times more (3x3) elementary matrix, and so it cost about 4 times much on little number of processors...
The solution could be to switch between the two algorithm in function of the size of the problem and the number of processors but do you know if I could found somewhere documentation on a existing scalable assembly for scalapack.
An other solution i'm thinking of is to choose for each elementary matrix, one of the processors which need it to compute it and communicate to the others.

Thanks for any idea or comments.

Xavier Lacoste.

LAPACK/ScaLAPACK Development

Example of optimized assembly for using Scalapack

Example of optimized assembly for using Scalapack

Who is online