The Global Arrays (GA) toolkit8 9 10 provides an efficient and portable “shared-memory” programming interface for distributed-memory computers. Each process in a MIMD parallel program can asynchronously access logical blocks of physically distributed, dense multi-dimensional arrays, without need for explicit cooperation by other processes (Fig. 2). Unlike other shared-memory environments, the GA model exposes to the programmer the non-uniform memory access (NUMA) characteristics of the high performance computers and acknowledges that access to a remote portion of the shared data is slower than to the local portion. Locality information for the shared data is available, and direct access to local portions of shared data is provided. The GA toolkit has been in the public domain since 1994 and is fully compatible with MPI.
Essentially all chemistry functionality within NWChem is written using GA. MPI is only employed in those sections of code that benefit from the weak synchronization implied by passing messages between processes, for instance to handle the task dependencies in classical linear algebra routines or to coordinate data flow in a highly optimized parallel fast Fourier transform. This success is due to combining the correct abstraction (multi-dimensional arrays of distributed data) with the programming ease and scalability of one-sided access to remote data. Performance comes from algorithms (Fig. 3) designed to accommodate the NUMA machine characteristics, e.g., Hartree-Fock,11 four-index transformation,12 and multi-reference CI.13