CTWatch Quarterly » NWChem: Development of a Modern Quantum Chemistry Program

NWChem: Development of a Modern Quantum Chemistry Program

Thom H. Dunning, Jr, National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign
Robert J. Harrison and Jeffrey A. Nichols, Computing and Computational Sciences Directorate, Oak Ridge National Laboratory

The Global Arrays (GA) toolkit⁸ ⁹ ¹⁰ provides an efficient and portable “shared-memory” programming interface for distributed-memory computers. Each process in a MIMD parallel program can asynchronously access logical blocks of physically distributed, dense multi-dimensional arrays, without need for explicit cooperation by other processes (Fig. 2). Unlike other shared-memory environments, the GA model exposes to the programmer the non-uniform memory access (NUMA) characteristics of the high performance computers and acknowledges that access to a remote portion of the shared data is slower than to the local portion. Locality information for the shared data is available, and direct access to local portions of shared data is provided. The GA toolkit has been in the public domain since 1994 and is fully compatible with MPI.

Essentially all chemistry functionality within NWChem is written using GA. MPI is only employed in those sections of code that benefit from the weak synchronization implied by passing messages between processes, for instance to handle the task dependencies in classical linear algebra routines or to coordinate data flow in a highly optimized parallel fast Fourier transform. This success is due to combining the correct abstraction (multi-dimensional arrays of distributed data) with the programming ease and scalability of one-sided access to remote data. Performance comes from algorithms (Fig. 3) designed to accommodate the NUMA machine characteristics, e.g., Hartree-Fock,¹¹ four-index transformation,¹² and multi-reference CI.¹³

Figure 2. Global Arrays: each process in an MIMD parallel program can asynchronously access logical blocks of physically distributed, dense, multi-dimensional arrays, without need for explicit cooperation by other processes.

Figure 3. Non-uniform memory access (NUMA) model of computation. Each process independently moves data from a shared data structure to local memory for computation. Results can be written or accumulated into another shared structure. A simple performance model can be used to ensure that the cost of moving data is offset by the amount of computation performed.

Pages: 1 2 3 4 5 6 7 8

CTWatch is a collaborative effort				Sponsored By