CTWatch Quarterly » Recent Trends in the Marketplace of High Performance Computing

Recent Trends in the Marketplace of High Performance Computing

Erich Strohmaier, Lawrence Berkeley National Laboratory
Jack J. Dongarra, University of Tennessee/Oak Ridge National Laboratory
Hans W. Meuer, University of Mannheim
Horst D. Simon, Lawrence Berkeley National Laboratory

Intel-ization of the Processor Landscape

The HPC community had started to use commodity parts in large numbers in the 1990s already. MPPs and Constellations (Cluster of SMP) typically used standard workstation microprocessors even though they still might have used custom interconnect systems. There was however one big exception, virtually nobody used Intel microprocessors. Lack of performance and the limitations of a 32-bit processor design were the main reasons for this. This changed with the introduction of the Pentium III and especially in 2001 with the Pentium 4, which featured greatly improved memory performance due to its redesigned front-side bus and full 64-bit floating point support. The number of system in the TOP500 with Intel processors exploded from only 6 in November 2000 to 318 in November 2004 (Fig. 3).

Fig. 3. Main Processor Families seen in the TOP500.

New Architectures on the Horizon

Interest in novel computer architectures has always been great in the HPC community, which comes as little surprise as this field was born, and continues to thrive, on technological innovations. Some of the concerns of recent years were the ever increasing space and power requirements of modern commodity based supercomputers. In the BlueGene/L development, IBM addressed these issues by designing a very power and space efficient system. BlueGene/L uses not the latest commodity processors available but computationally less powerful and much more power efficient processor versions developed mainly not for the PC and workstation market but for embedded applications. Together with a drastic reduction of the available main memory this leads to a very dense system. To achieve the targeted extreme performance level an unprecedented number of these processors (up to 128,000) are combined using several specialized interconnects. There was and is considerable doubt whether such a system would be able to deliver the promised performance and would be usable as a general purpose system. First results of the current beta-System are very encouraging and the one-quarter size beta-System of the future LLNL system was able to claim the number one spot on the November 2004 TOP500 list.

Contrary to the progress in hardware development, there has been little progress, and perhaps regress, in making scalable systems easy to program. Software directions that were started in the early 1990’s (such as CM-Fortran and High-Performance Fortran) were largely abandoned. The payoff to finding better ways to program such systems and thus expand the domains in which these systems can be applied would appear to be large.

The move to distributed memory has forced changes in the programming paradigm of supercomputing. The high cost of processor-to-processor synchronization and communication requires new algorithms that minimize those operations. The structuring of an application for vectorization is seldom the best structure for parallelization on these systems. Moreover, despite some research successes in this area, without some guidance from the programmer, compilers are generally able neither to detect enough of the necessary parallelism nor to reduce sufficiently the inter-processor overheads. The use of distributed memory systems has led to the introduction of new programming models, particularly the message passing paradigm, as realized in MPI, and the use of parallel loops in shared memory subsystems, as supported by OpenMP. It also has forced significant reprogramming of libraries and applications to port onto the new architectures. Debuggers and performance tools for scalable systems have developed slowly, however, and even today most users consider the programming tools on parallel supercomputers to be inadequate.

All these issues prompted DARPA to start a program for High Productivity Computing Systems (HPCS) with the declared goal to develop a new computer architecture by the end of the decade with high performance and productivity. The performance goal is to install a system by 2009, which can sustain Petaflop/s performance level on real applications. This should be achieved by the combination of a new architecture designed to be easily programmable and combined with a complete new software infrastructure to make user productivity as high as possible.

Pages: 1 2 3 4

CTWatch is a collaborative effort				Sponsored By