Making the Business Case for High Performance Computing: A Benefit-Cost Analysis Methodology
Suzy Tichenor, Council on Competitiveness
Albert Reuther, MIT Lincoln Laboratory
CTWatch Quarterly
November 2006 A

High performance computing (HPC), also known as supercomputing, makes enormous contributions not only to science and national security, but also to business innovation and competitiveness—yet senior executives often view HPC as a cost, rather than a value investment. This is largely due to the difficulty businesses and other organizations have had in determining the return on investment (ROI)1 of HPC systems.

Traditionally, HPC systems have been valued according to how fully they are utilized (i.e., the aggregate percentage of time that each of the processors of the HPC system is busy); but this valuation method treats all problems equally and does not give adequate weight to the problems that are most important to the organization. With no ability to properly assess problems having the greatest potential for driving innovation and competitive advantage, organizations risk purchasing inadequate HPC systems or, in some cases, foregoing purchases altogether because they cannot be satisfactorily justified.

This stifles innovation within individual organizations and, in the aggregate, prevents the U.S. business sector from being as globally competitive as it could and should be. The groundbreaking July 2004 "Council on Competitiveness Study of U.S. Industrial HPC Users," 2 sponsored by the Defense Advanced Research Projects Agency (DARPA) and conducted by market research firm IDC, found that 97 percent of the U.S. businesses surveyed could not exist, or could not compete effectively, without the use of HPC. Recent Council on Competitiveness studies reaffirmed that HPC typically is indispensable for companies that exploit it 3.

It is increasingly true that to out-compete, companies need to out-compute. Without a more pragmatic method for determining the ROI of HPC hardware systems, however, U.S. companies already using HPC may lose ground in the global competitiveness pack. Equally important, companies that have never used HPC may continue to miss out on its benefits for driving innovation and competitiveness.

To help address this issue, we present an alternative to relying on system utilization as a measure of system valuation, namely, capturing the ROI by starting with a benefit-cost ratio (BCR) calculation. This calculation is already in use at the Massachusetts Institute of Technology, where it has proven effective in other contexts.

HPC and Business Competitiveness

Alongside theory and experimentation, modeling and simulation using HPC has become the third leg of science and industrial design engineering. Industrial and other business firms are driven by external competition in a never-ending race to be first to market with the best products. The 2004 Council on Competitiveness study mentioned earlier also found that in these battles for global market supremacy, more-capable HPC resources often translate into faster time-to-market (in some cases more than 50% faster), reduced costs, and superior product quality.

Businesses use HPC systems (supercomputers) to design the cars we drive and the aircraft we fly in, to find and help extract new energy sources, to forecast severe weather, to discover new life-saving medicines and to safeguard our national security.

These benefits of HPC are often substantial:

The story doesn't end there. Going forward, America’s technological visionaries foresee equally dramatic advances if massive improvements in HPC power can be made available:

Barriers to HPC Use in the Private Sector

Council on Competitiveness research has identified three principal barriers inhibiting more widespread use of HPC. Educational and training hurdles (shortage of computational scientists) and technical obstacles (e.g., legacy codes need updating, new code development lags, growing performance gap between fast processors and other system technologies) are largely external to any one organization, although all of them affect the cost and ease of HPC use by the private sector. Within corporations, however, business strategies and decision-making processes can be a more significant obstacle to acquiring or accessing HPC tools.

In the boardrooms of many American companies, HPC isn’t seen as an innovation edge, but rather as a cost of doing business—an enormous "hole in the pocket." And when investment options are being considered, it is often easier to get management approval for one that will minimize or reduce costs in the short term versus an investment that has the potential to generate revenue over the longer term. HPC often is seen as the latter. As a result, management often favors "cheaper" systems and will not invest in more productive systems and the requisite training for personnel. For example, a $10 million investment in a high-end HPC machine to enable entry to a market six months early could result in $500 million of additional revenue in some industries, whereas a $1 million system with higher latencies and lower bandwidth might not be able to complete the solutions. Yet management often selects the less expensive approach.

The Council’s research also found that organizations are often more limited by their budgets than by the computers available in the market. The impact of this can be severe. While some companies can run their existing problems on cheaper machines with faster turnaround, they are often unable to run new problems that lead to the breakthroughs required to maintain global competitiveness.

Not surprisingly, the other key barrier mentioned by businesses was the difficulty in determining the ROI from their current or proposed HPC systems.

Making the Business Case for HPC

One reason that many senior executives view HPC as a cost rather than a value investment is due to the difficulty in determining the ROI to the organization of the system. Traditionally, HPC systems have been valued by the usage of the system, i.e., the aggregate percentage of time that each of the processors of the HPC system is busy. The rationale behind this valuation is that a large sum of money was spent to purchase and maintain the system, and the justification for spending such sums is that the demand for computing on the system keeps the system busy close to 100 percent of the time.

Such a valuation method encourages the HPC system owners to use a resource management queuing system that schedules a non-stop stream of smaller computational simulations by many users to keep the HPC system busy. While this encourages utilization, it may not actually accommodate the most important problems facing the organization. Thus, this valuation methodology may not capture the true value that the use of an HPC system would provide to the organization, and it may not optimize users’ needs in that it does not measure whether the problems most pressing for an organization’s long-term competitive edge are being addressed. Without considering these issues, the actual out-of-pocket cost for an HPC system may appear unaffordable, leading an organization to forego needed hardware purchases, upgrades, or certainly “greenfield” investments in HPC systems.

An alternative to relying on system utilization as a measure of system valuation is to capture the ROI by starting with a benefit-cost ratio (BCR) calculation. The BCR is expressed as the profit or cost savings divided by the sum of the investment over a given time period. In this article we shall simplify the discussion by assuming that all of the evaluations are conducted for a one-year time period. When we assume a one-year time period, the BCR is related to the one-year internal rate of return (IRR) with the following formula: BCR = 1 + IRR, or IRR = BCR – 1. Also, a net present value (NPV) analysis can be constructed using the benefits and costs identified in this article 4. Naturally, all of these analyses rely on the collection of accurate data.

When evaluating investments in HPC systems 5, the denominator of the BCR can be easy to find though one must make sure that all of the costs involved have been identified. However, the profit or cost savings due to the investment may not be nearly as easy to determine. The DARPA High Productivity Computing Systems (HPCS) program 6 has been working on determining the factors that play into the numerator and denominator of a BCR evaluation. In an article from the Winter 2004 Special Edition on HPC Productivity of the International Journal of High Performance Computing Applications7, the HPCS research team used productivity as their measure and defined it in classic economic terms as utility divided by cost. This is very similar to the BCR equation:

Equation

To expand on the utility (benefit) and cost for an organization, Dr. Jeremy Kepner of MIT Lincoln Laboratory, a HPCS Productivity Team member, developed a high performance productivity framework and evaluation model. The HPCS productivity model looks past the traditional measures of high performance computing systems such as peak floating point operations per second (flops) and system demand, since they usually do not have much influence on productivity. The result is the (SK)3 formulation:

Equation

In other words, the productivity level of a high performance computing system is a function of the time saved by engineers and scientists in solving advanced problems, taking into account not only the cost of the system, but also the time required to train users on it, prepare the application code(s) for parallel processing, launch the application(s), and administer the system. This formulation is intended for a research-oriented institution like a university or national laboratory.

In an industry environment, where systems are used less for basic research and more for solving product design and development challenges (i.e., a “production” environment), the variables for determining BCR/productivity may very well be different. Rather than computing the time saved by users on the system, an industry user may be more concerned with the value of newly developed products, potential increases in market share, profits generated (or lost) using HPC systems, or the importance of the job to be completed (i.e., how much revenue or market share will the company be able to gain once this large, extremely important problem is solved). In the denominator, the “time to parallelize” is irrelevant because all of the parallel software running on the HPC system is purchased. Hence the “time to parallelize” is replaced by the “cost of software.” Also the launch time becomes minute in comparison to the software execution time. Again, we start with the basic formula: productivity (BCR) equals utility/benefit divided by cost. With the above changes, the new expanded BCR formulation follows.

Equation

In the next section, we use these two productivity (BCR) formulas with numerical examples to demonstrate their use.

BCR Case Study Examples
Research Laboratory Example

At MIT Lincoln Laboratory, a federally funded research and developement center (FFRDC) for the Department of Defense, the research-oriented formula was used to evaluate the financial efficacy of a 600-processor, enterprise grid cluster solution that would be used by 200 users across Lincoln. The numerator value and each of the five denominator values were related to an average, fully burdened salary of $200,000 per year.

Inserting these values into the BCR/productivity equation yields:

Equation

Then we can also determine the one-year IRR as 160%. When taking the full range of average programming rate and cost to parallelize into account they found that BCR = 2.6 to 4.6.

The MIT Lincoln Laboratory High Performance Computing team has compiled many examples of how their users are more productive when they use their interactive, on-demand enterprise HPC system. For example, one of the technical staff members is designing and evaluating algorithms to improve the weather radars used across the United States. When he was running his algorithm simulations on his very powerful desktop computer, they would execute in about ten hours. He was able to make adjustments or run different data sets twice a day: once during the business day and once overnight. He was trained to use the HPC system in a single morning, and he had parallelized his simulation code by the time the day was finished. His simulation on the HPC system executed in an interactive, on-demand manner on eight to sixteen processors and usually executed in less than an hour. That allowed him to execute between ten and twelve simulations per day, thereby enabling him to deliver more accurate algorithms and raise the level of confidence in the effectiveness of the algorithms. This translates into delivering much better weather radar effectiveness for his project, its sponsor, and eventually our nation. However, it is not the usual way in which HPC systems are used.

Industry Example

Let us now use this formula in an industrial production example. As with the research example and for the sake of simplicity, we will assume a time period of one year (investment at the beginning of the year and benefits by the end of the year).

We assume that an automotive firm has three high priority projects that cannot go forward without the purchase of an HPC system. If the HPC system is purchased, the three projects are very likely to be successful and are expected to bring in profits of $5.25 million, $2.0 million, and $4.5 million. In terms of costs:

Given these estimates, we insert the values into the formula:

Equation

From this result, the management must determine whether a one-year IRR of 40% (BCR = 1.4) is acceptable and whether the purchase will go forward.

These two examples demonstrate how the productivity (BCR) formulation can be used in different organizations. The key is to identify and estimate the benefit(s) and costs for a particular organization.

Summary

In the past several decades, HPC has made a large impact in the growth of the American economy, has helped build and maintain American competitiveness in the world economy, and has enabled many of the products and capabilities that we have today. However, Council on Competitiveness research has revealed that despite the opportunities to use HPC to increase productivity and competitiveness, many executives believe that their firms are failing to apply this technology as aggressively as possible. Some of the hindrances have occurred because of lack of talent and certain technical barriers. Another hindrance is a that many boardrooms in American companies see HPC systems as a cost of doing business without realizing the benefits that such a system can bring to the organization and the bottom line. We have presented a variety of benefits and costs that may be realized in organizations that purchase and use HPC systems. While the Research and Industry equations and examples are presented as two distinct scenarios, many actual situations may prompt a melding of the two. Overall, the examples show that HPC assets are not just cost, but that they actually can contribute to healthy earnings reports as well as more productive and efficient staff.

Acknowledgements
The Council on Competitiveness thanks the Advanced Simulation and Computing program (ASC) at the Department of Energy's National Nuclear Security Administration and the Defense Advanced Research Projects Agency’s High Productivity Computing Systems program for sponsoring Council on Competitiveness research that contributed to this article. At MIT Lincoln Laboratory, this work was sponsored by the Defense Advanced Research Projects Agency under Air Force Contract FA8721-05-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government. The authors also wish to thank Dimitri Kusnezov, director of the ASC program, for useful discussions.
The Council on Competitiveness is an organization of the top business, university and labor leaders in the United States, responsible for influencing the course of American competitiveness on regional, national and global scales. For additional information about its High Performance Computing project and copies of reports, surveys and case studies, see www.compete.org/hpc
1 ROI can be captured with a variety of corporate finance techniques, including benefit-costs ratio (BCR), net present value (NPV), and internal rate of return (IRR).
2 http://www.compete.org/hpc
3 "Partnering for Prosperity: Harnessing Our HPC Assets for Competitive Strength." Two Council on Competitiveness studies, both completed in January 2006: "Industrial Partnerships through the National Science Foundation's Supercomputing Resources," and "Industrial Partnerships through the NNSA's Academic Strategic Alliance Program." Available at http://www.compete.org/hpc .
4 For some background on using BCR, IRR, and NPV to evaluate projects, please refer to G. Tassey, “Method for Assessing the Economic Impacts of Government R&D,” Planning Report #03-1, National Institute of Science and Technology, Sept. 2003. For a through treatment please refer to S. Ross, R. Westerfield, and J. Jaffe, Corporate Finance, McGraw-Hill Irwin: New York, 2004.
5 The intent of these sections is not to teach a tutorial on corporate finance but rather to illuminate various ways in which the benefits and costs associated with evaluating investments in HPC assets can be viewed and analyzed. The formulas and methodologies suggested here are based on the experiences at MIT Lincoln Laboratory. Our intent is to encourage readers to think about how their organization values an HPC solution.
6 DARPA HPCS - http://www.darpa.mil/ipto/programs/hpcs/
7 Kepner, J. “HPC productivity model synthesis,” International Journal of High Performance Computing Applications, Vol. 18, No. 4, November 2004.
8 The time saved by users was calculated in a very conservative manner: time saved = (time system is in use) * (average number of users) * (1 – 1 / (Average number of CPUs per job) ). This formula assumes that all jobs are fine-grained, synchronous parallel jobs; often parallel jobs are less synchronous and more coarsely grained. The last term in that expression can be substituted with a less conservative term such as log2(Average number of CPUs per job) or just Average number of CPUs per job. Using the latter, we have calculated a BCR greater than 30.

URL to article: http://www.ctwatch.org/quarterly/articles/2006/11/making-the-business-case-for-high-performance-computing-a-benefit-cost-analysis-methodology/