It is true that
PxGEMR2D can be used for distributing matrix from one process to
many but it is not the right way to do it. Having said that here is a link of a code
that does just that:
http://venda.uku.fi/~vanne/parallel/spackDistr.f90
I'm far from critisizing other people's work but here is the problem with using
PxGEMR2D: the matrix has to fit in memory of a single process before it can be
distributed. In real life scenarios the matrix is so big it won't fit in one processes memory.
Even if it does it will cause paging and the operaration will take forever.
A more realistic scenario is when you have a matrix stored in a file (or other sequential
source of data). One of the processes has access to the file and distributes to other
processes.
Distributing a matrix from a single process to a ScaLAPACK process grid in such
a scenario can be done with Antoine's example code. Here is a link to his
example code:
http://www.netlib.org/scalapack/examples/scaex.tgz
The distribution code is not very efficient as it operates on small chunks of data. Antoine
gave an alternative code for reading and writing that operates on one column of a matrix
at a time:
http://www.netlib.org/scalapack/examples/pdlaread.f
http://www.netlib.org/scalapack/examples/pdlawrite.f
One thing to note about these examples (other than a fact that they are written in
Fortran77) is the use of parsed I/O rather than binary I/O. If you want to see
an example with binary I/O (in C) you can get LFC code:
http://icl.cs.utk.edu/lfc/
The reading routine starts on line 271 of the files
mpi/solvers/dlfcpslv.c and
mpi/solvers/slfcpslv.c. The code is based on Antoine's first example so it still
may be made more efficient using ideas from Antoine's second example