by libuxiao » Mon Sep 01, 2008 4:14 am
In scalapack usrguide it says that the two-dimensional block cyclic distribution scheme is the data layout that is used in the ScaLAPACK library for dense matrix computations. It first divides the whole matrix into many MB*NB small blocks,and then maps those small blocks onto processes following cyclic distribution scheme.It seems a little difficult for users to do this by hand if there are many different matrixs to be computed with.I don't know whether there is any routines that can do this automatically ,or this has to be done by the users.I want to know why do we have to divid the matrix into small blocks?And if I want to solve a problem by using four processes for example.If I divide the matrix into 2*2 blocks,and each block on one process,not divid it into small blocks,will it effect the efficiency or the load ballance?