CTWatch
May 2006
Designing and Supporting Science-Driven Infrastructure
Thom H. Dunning, Jr, National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign
Robert J. Harrison and Jeffrey A. Nichols, Computing and Computational Sciences Directorate, Oak Ridge National Laboratory

6

The Global Arrays (GA) toolkit8 9 10 provides an efficient and portable “shared-memory” programming interface for distributed-memory computers. Each process in a MIMD parallel program can asynchronously access logical blocks of physically distributed, dense multi-dimensional arrays, without need for explicit cooperation by other processes (Fig. 2). Unlike other shared-memory environments, the GA model exposes to the programmer the non-uniform memory access (NUMA) characteristics of the high performance computers and acknowledges that access to a remote portion of the shared data is slower than to the local portion. Locality information for the shared data is available, and direct access to local portions of shared data is provided. The GA toolkit has been in the public domain since 1994 and is fully compatible with MPI.

Essentially all chemistry functionality within NWChem is written using GA. MPI is only employed in those sections of code that benefit from the weak synchronization implied by passing messages between processes, for instance to handle the task dependencies in classical linear algebra routines or to coordinate data flow in a highly optimized parallel fast Fourier transform. This success is due to combining the correct abstraction (multi-dimensional arrays of distributed data) with the programming ease and scalability of one-sided access to remote data. Performance comes from algorithms (Fig. 3) designed to accommodate the NUMA machine characteristics, e.g., Hartree-Fock,11 four-index transformation,12 and multi-reference CI.13

Figure 2

Figure 2. Global Arrays: each process in an MIMD parallel program can asynchronously access logical blocks of physically distributed, dense, multi-dimensional arrays, without need for explicit cooperation by other processes.

Figure 3

Figure 3. Non-uniform memory access (NUMA) model of computation. Each process independently moves data from a shared data structure to local memory for computation. Results can be written or accumulated into another shared structure. A simple performance model can be used to ensure that the cost of moving data is offset by the amount of computation performed.

Pages: 1 2 3 4 5 6 7 8

Reference this article
Dunning, T. H., Harrison, R. J., Nichols, J. A. "NWChem: Development of a Modern Quantum Chemistry Program," CTWatch Quarterly, Volume 2, Number 2, May 2006. http://www.ctwatch.org/quarterly/articles/2006/05/nwchem-development-of-a-modern-quantum-chemistry-program/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.