CTWatch Quarterly » The Many-Core Inflection Point for Mass Market Computer Systems

The Many-Core Inflection Point for Mass Market Computer Systems

John L. Manferdelli, Microsoft Corporation

Developing Software

While the foregoing hardware architecture offers much more computing power, it makes writing software that can fully benefit from the hardware potentially much harder.

In scientific applications, improved performance has historically been achieved by having highly trained specialists modify existing programs to run efficiently as new hardware was provided.⁴ In fact, even re-writing existing programs in this environment was far too costly, and most organizations focused the specialists on rewriting small portions of the "mission critical" programs, called kernels. In the "good case," the mission critical applications spent 80 or 90% of their time in these kernels and the kernels represented a few percent of the application code. Thus making a kernel ten times faster could mean a nearly ten-fold performance improvement. Even so, this rewriting was time consuming, and organizations had to balance the risk of introducing subtle bugs into well tested programs against the benefit of increased speed at every significant hardware upgrade. All bets were off if the organization did not have the source code for the critical components.

By contrast, commercial vendors, thanks to the chip manufacturers who managed to rapidly improve the serial performance while maintaining the same hardware instruction set architecture, have been habituated to a world where all existing programs get faster with each new hardware generation. Further, software developers could confidently build new innovative software, which barely ran on then current hardware, knowing that it would run quite well on the next generation machine at the same cost. This will no longer occur for serial codes, but the goal of new software development tools must be to retain this very desirable characteristic as we move into the era of many-core computing. If we are successful, then building your software with these new tools and then faster hardware (or even just adding more hardware) will improve performance without further application programmer intervention.

In order to benefit from rapidly improving computer performance (and we all want that) and to retain the "write one, run faster on new hardware" paradigm, commercial software and scientific software must change their software development and system support.⁵ To achieve this, software development systems and supporting software must enable a significant portion of the programming community to construct parallel applications. There are several complementary approaches that may help us achieve this.

Encapsulate domain specific knowledge in reusable parallel component. The most effective way to deploy concurrency without needing to disturb the programming model for most developers is to encapsulate concurrency with domain knowledge of common reusable library components. This approach mirrors the use of numerical kernels beloved by computational scientists, but moves them into the world of general-purpose computing. This technique is ideal when it works, although composing such libraries requires better synchronization and resource-management techniques.
Integrate concurrency and coordination into traditional languages. Current languages have little or no support for expressing or controlling parallelism. Instead, programmers must use libraries or OS facilities. Other language features, like the use of for/while loops and linked lists, obscure potential parallelism from the compiler. To build parallel applications, we will need to extend traditional sequential languages with new features to allow programmers to explicitly guide program decomposition into parallel subtasks, as well as provide atomicity and isolation as those subtasks interact with shared data structures. Transactional memory⁶ shows promise here and also provides a way towards composing independently developed software components.
Raising semantic level to eliminate explicit sequencing. For many developers, we want to avoid the need to use procedural languages and use domain-specific systems based on rules or constraints. More declarative styles specify intent rather than sequencing of primitives and thus inherently permit parallel implementations that leverage the concurrency and transaction mechanisms of the system. SQL is a common example of this: it is declarative, and correctly written SQL can be executed much faster, and without modification, when supporting software (query optimizers) that adapts to different, more parallel hardware.

However, to fully exploit parallelism, programmers must understand a parallel execution model, develop parallel algorithms, and be equipped with much better tools to develop, test and automatically tune performance. This requires education as well as software innovation. Compilers, which bridge between intent-oriented features and the underlying execution model of the system, must incorporate idioms to explicitly identify parallel tasks as well as optimization techniques to identify and schedule implicitly parallel tasks discovered by it.⁷ Program analysis and testing are hard enough for sequential programs and are much harder in parallel programming. We must find mechanisms that contain concurrency and isolate threads and use those to make testing more robust. We have seen dramatic improvements in static analysis tools that identify software defects, reduce test burden and improve reliability. These techniques are being extended to incorporate identification of concurrency problems. Debuggers must evolve from the low-level machine model back to a more common and familiar model that a developer can reason about correctly and effectively. Finally, the need for tools for performance analysis to help identify bottlenecks will become crucial as we face the possibility of two orders of magnitude difference between optimized and naïve algorithms.

Pages: 1 2 3 4 5 6

CTWatch is a collaborative effort				Sponsored By