CTWatch
May 2006
Designing and Supporting Science-Driven Infrastructure
Thom H. Dunning, Jr, National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign
Robert J. Harrison and Jeffrey A. Nichols, Computing and Computational Sciences Directorate, Oak Ridge National Laboratory

3

A number of basic computing issues had to be addressed to optimize the performance and scalability of NWChem. These included: processor architecture, node memory latency and bandwidth, interprocessor communications latency and bandwidth, and load balancing. Solving the associated problems often required rewriting and restructuring the software, explorations that were carried out by the postdoctoral fellows associated with the NWChem project. Another issue that was always in the foreground was the portability of the software. Computational chemists typically have access to a wide range of computer hardware, from various brands of desktop workstations, to various brands of departmental computers, to some of the world’s largest supercomputers. To most effectively support their work, it was important that NWChem run on all of these machine, if possible.

The process for designing, developing and implementing NWChem used modern software engineering practices. The process can be summarized as follows:

  1. Requirements gathering. The process began by gathering requirements from the researchers associated with the EMSL Project. This defined the functionality that had to be provided by the quantum chemistry software.
  2. Preliminary design and prototyping. After the requirements were gathered, work on NWChem began. This included design of the overall system architecture, identification of the major subsystems, definition of the objects and modules, definition of the internal and external interfaces, characterization of the major algorithms, etc.
  3. Resolution of unresolved issues. The preliminary design work led to the identification of a number of major, unresolved issues. Research projects were targeted at each of these issues.
  4. Detailed design. In the meantime, the preliminary design was extended to a set of “code to” specifications. As the major issues were resolved, they were included in the “code to” specifications.
  5. Implementation. NWChem was then created in well defined versions and revision control was used to track the changes.
  6. Testing and Acceptance. Finally, a bevy of test routines were used to verify the code and ensure that the requirements were met.

Although the above is a far more rigorous process that is followed in most scientific software development projects, we found it to be critical to meeting the goals set for NWChem and for managing a distributed software development effort. The above cycle was actually performed at least twice for each type of NWChem method implemented (e.g., classical, uncorrelated quantum, highly correlated quantum, density functional, etc). Going through the cycle multiple times generated “beta” software that could be released to users for feedback and refinement of user requirements.

Although the combination of an on-site core team plus off-site collaborators provided the range of technical capabilities needed to develop NWChem, there are lessons to be learned about managing such a highly distributed project. For example

  • The time and effort required for integration of existing sequential or parallel codes into the new code framework was always larger than estimated.
  • The preparation of documentation, for both users and programmers, should have been initiated earlier in the project. The programmer’s manual is especially important because this document provides the guidelines needed to ensure that the software produced by the distributed team will work together.
  • Software components that are on the critical path should be developed in-house, since the time schedules and priorities of collaborators inevitably differ from those of the core team.
  • It is important to implement code reviews both for software developed in-house by the “core” team as well as that developed by the external collaborators.

Our experience suggests that a distributed software development team can be successful if the core team is large enough to develop all of the software components on the critical path and if sufficient guidance is provided to the collaborators on the format and content for their contributions and their progress is carefully monitored.

Pages: 1 2 3 4 5 6 7 8

Reference this article
Dunning, T. H., Harrison, R. J., Nichols, J. A. "NWChem: Development of a Modern Quantum Chemistry Program," CTWatch Quarterly, Volume 2, Number 2, May 2006. http://www.ctwatch.org/quarterly/articles/2006/05/nwchem-development-of-a-modern-quantum-chemistry-program/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.