CTWatch Quarterly » Observations about Software Development for High End Computing

Observations about Software Development for High End Computing

Jeffrey C. Carver, Mississippi State University
Lorin M. Hochstein, University of Nebraska, Lincoln
Richard P. Kendall, Information Sciences Institute, University of Southern California
Taiga Nakamura, University of Maryland, College Park
Marvin V. Zelkowitz, University of Maryland, College Park; Fraunhofer Center for Experimental Software Engineering
Victor R. Basili, University of Maryland, College Park; Fraunhofer Center for Experimental Software Engineering
Douglass E. Post, DoD High Performance Computing Modernization Office

3.3 Software Engineering Process and Development Workflow

There is minimal but consistent use of software engineering practices: All ASC Codes exhibited the use of a subset of standard software engineering practices (i.e., the use of version control systems, regression tests, and software architecture). However, the developers of these codes did little defect tracking or documentation beyond user guides. Conversely, the MP Codes did not show this type of consistency in the use of software engineering practices across teams. Each team made use of a few recognized software engineering practices, but there was no uniformity across projects.

Development is evolutionary at multiple levels: In both the ASC and MP Codes, the developers are working on “new science.” As a result, they do not always know the correct output in advance, providing numerous challenges for verifying the code relative to the phenomenon being simulated. The teams all use an iterative, agile approach, where new algorithms are implemented and then evaluated to determine whether they are of sufficient quality for current simulations. Even though the more rigid CMM approach may not be appropriate for many of the MP Codes, many projects claim to follow this approach, possibly as a result of pressure from sponsors.

Tuning for a specific system architecture is rarely done, if ever: None of the ASC Codes optimize/tune their code for particular platforms. For example, they assume that they are developing for a “flat MPI” system, and so do not optimize for machines with SMP nodes. One project member made a comment that was representative of all projects:

The amount of time it takes to tune to a particular architecture to get the last bit of juice, is considerably higher than the time it takes to develop a new algorithm that improves performance across all platforms. Our goal is to develop algorithms that will last across many lifetimes. We are not really interested in getting that last little bit of performance.

The MP Codes did perform some tuning, but it was in conflict with their larger goal of portability (see Observation 1 in Section 3.1).

There is little reuse of MPI frameworks: Several of the ASC and MP Codes built their own frameworks to abstract away the low-level details of the MPI parallelism. However, these frameworks are built from scratch each time, rather than being reused from another system. One developer from an ASC Code explained this lack of reuse:

We have encountered other projects where people have said, ‘We’ll use class library X that will hide array operations and other things’, but all sorts of issues arose. These frameworks make assumptions about how the work will be done, and to go against the assumptions of the framework requires getting into really deep details, which defeats the purpose of using such a framework.

Most development effort is focused on implementation rather than maintenance: In both the ASC and the MP Codes, most of the effort was expended during implementation, rather than maintenance. This distribution indicates that rather than being released and maintained like traditional IT projects, these projects are under constant development.

3.4 Programming Languages

Once selected, the primary language does not change: Languages adopted by the MP Codes do not change once they have been selected. The majority of the older codes were written in FORTRAN77. Some newer codes and many of the ASC Codes are in FORTRAN90, C and C++. C seems popular for handling I/O issues. Some of the ASC and MP Codes have also adopted Python to drive the application. However, on one project the Python-based driver was later abandoned because it increased debugging complexity and reduced the portability of the code.

Higher level languages (e.g., Matlab) are not widely adopted for the core of applications: The MP teams have not adopted these higher level languages for use in their codes. The ASC teams have restricted the use of these high level languages to prototyping numerical algorithms.

3.5 Verification and Validation

Verification and Validation are very difficult in this domain: While V&V is difficult in all software domains, it was seen as especially difficult for both the MP and ASC Codes due to some unique characteristics. It is difficult for developers to judge the correctness of code because they often do not know the “right” answer. In addition, there are at least three places where defects can enter the code, making it difficult to ultimately identify the source of the problem: 1) the underlying science could be incorrect; 2) the translation of the domain model to an algorithm could be incorrect; and 3) the translation of that algorithm into code could be incorrect.

Visualization is the most common tool for validation: All of the ASC Codes provide visualization to allow the users to view the large amounts of outputs produced. For the experienced user, this visualization provides a sanity check that the code is behaving reasonably.

Pages: 1 2 3 4 5

CTWatch is a collaborative effort				Sponsored By