Similarly, we studied the communication pattern of a climate application called Parallel Ocean Program (POP). POP is an ocean modeling code developed at the Los Alamos National Laboratory, which executes in a time-step fashion and has a standard latitude-longitude grid with km
vertical levels. There are two main processes in a POP time-step: baroclinic and barotropic. Baroclinic requires only point-to-point communication and is highly parallelizable. Barotropic contains a conjugate gradient solver, which requires global reduction operations. Moreover, the discretized POP grid is mapped and distributed evenly on the two-dimensional processor grid. POP has two standard problem instances: x1 and .01. Processor grid dimensions are compile-time parameters in POP.
We studied the overall communication volume sensitivity and the individual message distribution in POP by varying the MPI grid sizes. POP uses MPI topology functions to create the two-dimensional virtual topology. All point-to-point communication operations are between the four nearest-neighbors in the 2D grid. Figure 10 shows the increase in overall volume in the two calculation phases, while Figure 11 shows the distribution of the message volume. Note that most messages are less than one Kbyte. Using this information, it is now possible to accurately reason about the actual application requirements. In this case, the application requires an interconnect that has low latency and low overhead for small messages.
|
|
MA is a new technique that combines the benefits from both analytic and empirical approaches, and it adds some new advantages, such as incremental model validation and multi-resolution modeling. Within the HPCS program, these models are useful to perform sensitivity analysis for future problem instances of HPCS applications. Moreover, the symbolic models can be evaluated efficiently and hence provide a powerful tool for application and algorithm developers to identify scaling bottlenecks and hotspots in their implementations. From the perspective of constructing, validating, and evaluating performance models, we believe that MA offers many benefits over conventional techniques throughout the performance lifecycle.
22. Alam, S. R., Vetter, J. S., “Hierarchical Model Validation of Symbolic Performance Models of Scientific Applications,” Proc. of European Conference on Parallel Processing (Euro-Par), 2006.
33. Bailey, D., Barszcz, E. et al., The NAS Parallel Benchmarks (94), NASA Ames Research Center, RNR Technical Report RNR-94-007, 1994, www.nas.nasa.gov/Pubs/TechReports/RNRreports/dbailey/RNR-94-007/RNR-94...
44. Parallel Ocean Program (POP), climate.lanl.gov/Models/POP/
55. MATLAB - www.mathworks.com/products/matlab/
66. Octave - www.gnu.org/software/octave/
77. Browne, S., Dongarra, J. et al., “A Portable Programming Interface for Performance Evaluation on Modern Processors,” The International Journal of High Performance Computing Applications, Volume 14, number 3, Fall 2000.
88. Vetter, J. S., Alam, S. R. et al., “Early Evaluation of the Cray XT3,” 20th IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2006.
99. Ohmacht, M., Bergamaschi, R. A. et al., “Blue Gene/L compute chip: Memory and Ethernet subsystem,” IBM Journal of Research and Development, Vol. 49, No. 2/3, 2005.