Before 10G Ethernet will gain broad adoption, some technology barriers in end systems need to be addressed. For servers, a major issue is protocol processing. Using conventional network interface card (NIC) architecture simply scaled to 10G would result in the CPU’s processing power being the bottleneck. Ongoing efforts seek to offload some TCP processing (TOE) from the system CPU onto the NIC hardware.
Another source of processing overhead is data copying. In a conventional networking stack, incoming packets are stored in operating system memory and later copied to application memory. The copy function consumes CPU cycles and introduces delay. For parallel processing applications that use small buffers, data copying is a major performance hit. Commonly known as iWARP, the protocols for RDMA-over-IP will enable data to be written directly into application memory, eliminating costly copy operations. For applications which use small packets, 10G NICs that implement iWARP will provide lower latency by eliminating memory copies.
10G Ethernet performance has been constrained by the limits of end system interfaces and I/O interconnects. First-generation 10G NICs with partial TCP offloads and PCI-X system interface delivered peak performance of 6-8 Gb/s. Using large packet sizes, these NICs consume less than 100% of a typical server CPU. Second generation 10G NICs with TOE are available and achieve throughput similar to first generation NICs while lowering CPU utilization. Third-generation 10G NICs should achieve full line rate with large packets when combined with end systems with a 3GIO I/O interconnect such as PCI Express.
The smooth inter-working of 10G interfaces from multiple vendors, the ability to successfully fill 10 Gb/s paths both on local area networks, cross-continent and internationally, the ability to transmit greater than 10 Gb/s from a single host, and the ability of TCP offload engines to reduce CPU utilization all illustrate the maturity of the 10 Gb/s Ethernet market. The current performance limitations are not in the network but rather in the end systems.
The annual International Conference for High Performance Computing and Communications (SC)1 is co-sponsored by ACM SIGARCH and the IEEE Computer Society in November each year. Networks are an integral piece of modern high performance computing. SCinet is the very high-performance network built to support the SC conference. SCinet features both a high-performance production-quality network and an extremely high performance experimental network connecting to all the major national scientific networks and supercomputer centers. 2001 was the first year SCinet deployed two pre-standard 10G LAN interfaces in the showfloor production LAN. In 2002, 10 10G LAN interfaces were deployed. In 2004, 48 10G LAN interfaces were used to satisfy bandwidth requirements.
The Bandwidth Challenge event held during SC invites participants to stress the SCinet network infrastructure while demonstrating innovative applications across the multiple research networks that connect to SCinet. The ability to maximize network throughput is an essential element to the success of high performance computation. The primary measure of performance is the verifiable network throughput. In the five year history of the Bandwidth Challenge during SC, the peak throughput achieved by the winning individual application are shown below-
2000: | 1.7 Gb/s |
2001: | 3.3 Gb/s |
2002: | 16.8 Gb/s |
2003: | 23.2 Gb/s |
2004: | 101 Gb/s |
Achievable bandwidth rates are directly related to the number and capacity of WAN circuits brought into the SC venue. You may ask “why is the bandwidth challenge significant?” The Bandwidth Challenge 1) offers an opportunity to test the next generation network capacity as early as 2 years before production; 2) provides the opportunity to test software ideas that will be required to make use of the next generation network 2 years in advance; 3) creates an opportunity to test future network engineers giving them a 2 year lead on the problems with future networks.