NWChem was designed to be extensible in several senses. First, the clearly defined task and module layers make it easy to add substantial new capabilities to NWChem. Second, the wide selection of lower-level APIs makes it easier to develop new capabilities within NWChem than within codes in which these capabilities are not easy to access. Finally, having a standard API means that a change to an implementation will affect the whole code.
Virtual Machine Model – Non-uniform Memory Access (NUMA)
By the late 1980s it was apparent that distributed-memory computers were the only path to truly scalable computational power and the only portable programming model available for these systems was message passing. Although NWChem initially adopted the TCGMSG message passing interface, members of the NWChem team participated in development of the message passing interface (MPI) standard,5 and the official NWChem message-passing interface has been MPI for several years. Without fear of contradiction, the MPI standard has been the most significant advancement in practical parallel programming in over a decade, and it is the foundation of the vast majority of modern parallel programs. The vision and tireless efforts of those who initiated and led this communal effort must be acknowledged. It has also been pointed out that the existence of such a standard was a prerequisite to the emergence of very successful application frameworks such as PETSc.6
A completely consistent (and deliberately provocative) viewpoint is that MPI is evil. The emergence of MPI coincided with an almost complete cessation of parallel programming tool/paradigm research. This was due to many factors, but in particular to the very public and very expensive failure of HPF. The downsides of MPI are that it standardized (in order to be successful itself) only the primitive and already old communicating sequential process7 (CSP) programming model, and MPI’s success further stifled adoption of advanced parallel programming techniques since any new method was by definition not going to be as portable. Since one of the major goals of NWChem was to enable calculations larger than would fit into a single processor, it was essential to manage distributed data structures. Scalable algorithms also demand dynamic load balancing to accommodate the very problem dependent sparsity in matrix elements and wide ranging cost of evaluating integrals. Both of these tasks are difficult to accomplish using only simple message passing and a more powerful solution was demanded.