CTWatch Quarterly » The Next Decade in HPC

The Role of High Performance Computing

I have had a long history in the HPC community: I spent from 1982 to 1992 as the founder and architect of Alliant Computer Systems Corporation in Boston. We spent a long time trying to develop tools and an architecture whose components today would look like they were all fairly slow. But architecturally, many of the concepts that were explored back then by Seymour Cray and the many supercomputer companies–of which Alliant was just one–still to this day represent the basic architectures that are being reproduced and extended as Moore’s Law continues to allow these things to be compacted.

In my present role as CTO of Microsoft, it is probably fair to say that I have been the ‘god-father’ in moving Microsoft to begin to play a role in the area of technical computing. Up to now, the company has really never focused on this area. It is of course true that there are many people in the world who, whether they are in engineering or science, business or academia, use our products like tools on their desktop much like they think of pencil and paper. They would not want to work without them. But such tools are never really considered as an integral part of the mission itself. It is my belief that many of the things that HPC and supercomputing have tended to drive will become important as you look down the road of general computing architectures. The worldwide aggregate software market in technical computing is not all that large on a financial scale. However, Bill Gates and I, over the last couple years, have agreed that engaging with HPC is not just a question of how big the market is for software per se in technical computing. Rather it is a strategic market in the sense of ultimately making sure that there will be well-trained people who will come out of a university environment and help society solve the difficult problems it will be facing in the future. The global society has an increasing need to solve some very difficult large-scale problems in engineering, science, medicine and in many other fields. Microsoft has a huge research effort that has never been focused on such problems. I believe that it is time that we started to assess some application of our research technology outside of our traditional ways of using it within our own commercial products. We think that by doing so, there is a lot that can be learned about what will be the nature of future computing systems.

Many of the things that we thought of as de rigueur in terms of architectural issues and design problems in supercomputers in the late eighties and early nineties have now been shrunk down to a chip. Between 2010 and 2020 many of the things that the HPC community is focusing on today will go through a similar shrinking footprint. We will wake up one day and find that the kind of architectures that we assemble today with blades and clusters are now on a chip and being put into everything. In my work on strategy for Microsoft I have to look at the 10 to 20 year horizon rather than a one to three year horizon. The company’s entry into high performance computing is based on the belief that over the next 10 years or so, there will be a growing number of people who will want to use these kinds of technologies to solve more and more interesting problems. Another of my motivations is my belief that the problem set, even in that first 10-year period, will expand quite dramatically in terms of the types of problems where people will use these kinds of approaches.

There was a time certainly, when I was in the HPC business, that the people who wrote high performance programs were making them for consumption largely in an engineering environment. Only a few HPC codes were more broadly used in a small number of fields of academic research. Today, it is doubtful whether there is any substantive field of academic research in engineering or science that could really progress without the use of advanced computing technologies. And these technologies are not just the architecture and the megaflops but also the tools and programming environments necessary to address these problems.

Computer Science and the Science and Engineering Community

In parallel with these developments in HPC, we are no longer seeing the kind of heady growth in the number of trained computer scientists produced by the world’s universities. In fact, in the United States, this number is actually going down. The numbers are still rising in places like India and China right now, but one can forecast fairly directly that, even if all these people were involved in engineering and science, there will not be enough of them to meet future demand. I think the problem is in fact worse than this because Computer Science is still a young and maturing discipline.

So another interest I have in seeing Microsoft engage with the scientific community is in helping to bridge the divide between the Computer Science community and the broader world of research and engineering. My personal belief is that what we currently know as computing is going to have to evolve substantially–and what we know as programming is going to have to evolve even more dramatically. Every person who is involved in software development will struggle to deal with the complexity that comes from assembling ever larger and more complicated and interconnected pieces of software. Microsoft, as a company that aspires to be the world leader in providing software tools and platforms, is thinking deeply about how to solve those problems. One of the features that attracts me toward the world of high performance computing is that it is a world made up of people who have daily problems that need to get done, who live in an engineering environment but who are frequently at the bleeding edge in terms of the tools and techniques. And frankly there is a level of aggressiveness in this community that cannot really exist in basic business IT operations, particularly not at the scale where people are attempting to solve big new problems. So for all these reasons, Bill Gates and I decided that even though technical computing is not going to be the world’s largest software market, this is a strategic market in the sense that the HPC community can help us all better understand these challenging problems. We therefore hope that together we can help move the ball forward in some of these very difficult areas. As we look downstream and contemplate some fairly radical changes in the nature of computing itself and the need for software tools to deal with that, we also expect that this community is a place from which technical leaders can emerge. We would like to be a part of that.

We think that Microsoft has some assets that could really make a difference for the growing community of people who will need to adopt HPC technologies for their business or their research. Before too long, these people will not only want to solve the problem but will also want to be able to configure and manage these HPC systems for themselves. One thing that Microsoft can do really well is to provide good tools not only for programming but also for administration, management and security.

Moore’s Law and the need for new algorithms

Another area where I think we can make a difference is to explore how some of our research on algorithms in a number of different areas of Computer Science could be used in other areas of science. The breakthroughs that can come from radical concepts in the algorithmic space can be quite dramatic. Several years ago some of the most passionate researchers working to develop a vaccine for AIDS approached some researchers at Microsoft Research. They showed our researchers the algorithms that their community had been using to work with the genomic data over the past six years. They were frustrated with the level of progress that had been made and wanted our researchers’ opinions about how they might better work with the data to progress more quickly in producing a vaccine.

The Microsoft Research scientists, whose areas of expertise include machine learning and machine vision, studied the algorithmic concepts the group was using. They found the algorithms were sound and said that they might be able to show the group how to make the algorithms work a little better. But they also said that a whole new class of algorithms had recently been developed to work in the realm of searching in high-dimensional spaces. Not only did the scientists help the group implement the newer algorithms, but because they also were part of a group within Microsoft Research that researches the building of powerful visualization tools, they helped the group develop a suite of visualization tools that they could use along with the algorithms. The AIDS research team was able to reprocess in six weeks the same gene data that had previously taken them six years to process. Without even changing the underpinnings of the hardware or anything else they were using, the group now approaches their research in whole new ways. These new algorithms could materially accelerate the development of an effective vaccine for AIDS.

Finally, there is another challenge ahead for all of us. Since the early days of supercomputers, exploitation of parallelism has been limited by Amdahl’s Law, which basically says that it is the little part of the problem that is not parallelizable that will come back to haunt you. If you have a problem that is 90% parallelizable, even if you use 1000s of processors so this parallelizable part is done blindingly fast, you are still left with the 10% that runs at the speed of one processor and a maximum speed-up of 10. We are now entering a new era of silicon technology, one in which scalar processor performance will not improve significantly. Such a statement has never been true since the invention of digital computing as we know it, but it is one that is now likely to be true for the foreseeable future. This will herald a fundamental change for the whole IT industry.

The reason is as follows. Until there is some radical change in how we physically build transistors and design computers, we have now run into a brick wall. The problem is that everybody thinks Moore’s Law has really been driving the performance gains we have seen over the last thirty years. But Gordon, when he defined Moore’s Law, actually only talked about the fact that there would be the ability to double the number of transistors on a chip at an exponential rate. And this phenomenon is still going to continue for awhile. But this in itself was not what actually brought all of us faster computing. The thing that actually made it happen, and made a lot of other things go faster, was raising the clock rate. But raising the clock rate was only possible because we could lower the voltage. Now we cannot lower the voltage anymore because we are down into electron volts. There is just no more room to keep shrinking the voltage. If you cannot lower voltage, you cannot raise the clock rate materially. Therefore, even though we could have lots more transistors, there will no longer be chips with higher clock rates. This has a very profound effect. Up to now, that last 10% of the code that you could not parallelize was manageable because you have been the beneficiary of orders of magnitude improvement in scalar performance. Now such easy gains are over–or they will be in your lifetime.

So there are some very big challenges that we will all have to face. I contend that one of them is that we are all going to have to think at some level about new algorithms. One aspect of the future is already becoming clear. We will need to learn how best to exploit new multi-core CPU architectures and develop tools to support software development on such architectures. This is a real challenge to the parallel computing community. To benefit from these new machines we may have to change our programming methodology in more radical ways than we have really been comfortable doing in the past. We think about this in Microsoft every single day as we look out on a 10-20 year horizon. If you are not already thinking about this you should be. If you layer this problem on top of all the other ones, then the challenges ahead for the world’s Computer Science research community are amplified quite dramatically. All of these things are what led Bill Gates and I to believe that it is strategically important, no matter what your field of software expertise, to know and think deeply about high performance computing. It is this community that will be at the forefront of examining these hard problems and in finding solutions.