CTWatch Quarterly » Large-Scale Computational Scientific and Engineering Project Development and Production Workflows

Large-Scale Computational Scientific and Engineering Project Development and Production Workflows

D. E. Post, DoD High Performance Computing Modernization Program
R. P. Kendall, Carnegie Mellon University Software Engineering Institute

IV. Perform V&V

General Software infrastructure tool requirements and best practices: data analysis and visualization tools(2), tools for quantitative comparison of code results with test problem results, other code results and experimental data(2).

Verification provides assurance that the models and equations in the code and the solution algorithms are mathematically correct, i.e., that the computed answers are the correct solutions of the model equations. Validation provides assurance that the models in the code are consistent with the laws of nature and are adequate to simulate the properties of interest.⁸ V&V is an ongoing process that lasts the life of the code. However, it is particularly intense during the development of the code and early adoption by the user community. Verification is accomplished with tests that show that the code can reproduce known answers and demonstrate the preservation of known symmetries and other predictable behavior. Validation is accomplished by comparing the code results for an experiment or observation with real data taken from the experiment or observation. A code must first be verified then validated. Without prior verification, agreement between experimental validation data and the code results can only be viewed as fortuitous. Without a successful, documented verification and validation program, there is no reason for any of the project stakeholders to be confident that the code results are accurate.⁹

V. Execute production runs

General Software infrastructure tool requirements and best practices: Data analysis tools, visualization tools, (2), documentation(3), job scheduling(1),

Running a large-scale simulation on a large supercomputer represents a significant investment on the part of the sponsor. Large codes can cost $5M to $10M per year to develop, maintain and run, and more if the costs of validation experiments are included. Large computers are expensive resources. Computer time now costs approximately $1/cpu-hour (2006). A large project will need 5M cpu-hours or more. The total cost is thus in the range of $10M to $20M per year or more and hundreds of millions over the life of the project. It is analogous to getting run time on a large scale experiment (e.g., getting beam time at an accelerator, conducting experiments, collecting data, analyzing data). Sponsoring institutions and large scale experimental facilities and teams have learned that research and design activities of this scale require organization and planning to be successful.

Pages: 1 2 3 4 5 6 7 8

CTWatch is a collaborative effort				Sponsored By