Hello,
How to estimate the memory required for matrix-matrix multiplication (using pdgemm) ?
Question:
- do we actually replicate the 16MB from the main process to all other processes running on different machine, and each process will then extract the block data by itself ?
So let's say I have two machine, process grid is 2x2, block size is 128. Two matrices 1024x1024, double precision (assume 8 bytes for each element, so total 16MB).
Assume 4 processes are equally distributed between the two nodes, so is it correct to say that:
- on the main node, we need to load 16MB into the memory, then we replicate the 16MB to all other 4 processes ? If this is the case, the master node will have 16 + 16x2 MB, and the other node will have 16x2 MB.
am I correct?
thanks
canal

