The Blue Gene/L supercomputer project is aimed to push the envelope of high performance computing (HPC) to unprecedented levels of scale and performance. Blue Gene/L is the first supercomputer in the Blue Gene family. It consists of 65,536 high-performance compute nodes (131,072 processors), each of which is an embedded 32-bit PowerPC dual processor, and has 33 Terabytes of main memory. Furthermore, it has 1024 I/O nodes, using the same chip that is used for compute nodes. A three-dimensional torus network and a sparse combining network are used to interconnect all nodes. The Blue Gene/L networks were designed with extreme scaling in mind. Therefore, we chose networks that scale efficiently in terms of both performance and packaging. The networks support very small messages (as small as 32 bytes) and include hardware support for collective operations (broadcast, reduction, scan, etc.), which will dominate some applications at the scaling limit. The compute nodes are designed to achieve a 183.5 Teraflops/s peak performance in the co-processor mode, and 367 Teraflops/s in the virtual node mode.1
The system on chip approach used in the Blue Gene/L project integrates two processors, cache (Level 2 and Level 3), internode networks (torus, tree, and global barrier networks), JTAG and Gigabit Ethernet links on the same die. By using the embedded DRAM, we have enlarged the on-chip Level 3 cache to four MB, four to eight times larger than competitive cache’s made of SRAM and greatly enhancing the amount of realized performance of the processor. By integrating the inter-node networks, we can take advantage of the same generation technology, i.e., these networks scale with chip frequency. Furthermore, the off-chip drivers and receivers can be optimized to consume less power than those of industry standard networks. Figure 2 is a photograph of multi-rows of the Blue Gene/L system. The first two rows have their black covers on, whereas the remaining rows are uncovered.
One of the key objectives in the Blue Gene/L design was to achieve cost/performance on a par with the COTS (Commodity Off The Shelf) approach, while at the same time incorporating a processor and network design so powerful that it can revolutionize supercomputer systems.
Using many low power, power-efficient chips to replace fewer, more powerful ones succeeds only if the application users can realize more performance by scaling up to a higher number of processors. This indeed is one of the most challenging aspects of the Blue Gene/L system design and must be addressed through scalable networks along with software that will efficiently leverage these networks.