CTWatch
November 2006 B
High Productivity Computing Systems and the Path Towards Usable Petascale Computing
Declan Murphy, Sun Microsystems, Inc.
Thomas Nash, Sun Microsystems, Inc.
Lawrence Votta, Jr., Sun Microsystems, Inc.
Jeremy Kepner, MIT Lincoln Laboratory

3

The government, or the owners or stockholders, establish how activities are to be valued by the institution management. Management valuation appears in the largely subjective Usys variable, which includes, for example, what constitutes “success” for a computing activity. They may well value a successfully completed activity higher than the cost of the resources used. (It would be unusual for them not to do so at budget proposal time.) Usys assumes, as managers might often wish, that all resources are available and fully allocated at all times during the life of their expensive system.8 We define it as a multiplier to the peak system resources in CPU units. System utilization efficiency and project level effectiveness is included in Eadm and Eproj as described below.

Once a system is delivered, the system administrators (and vendor) strive to meet management expectations for availability, Asys and system level resource utilization efficiency (including allocation), Eadm.

More detailed descriptions of the individual variables follow and they are expanded further in the Appendix. We indicate dimensional units in brackets.

P, the productivity, is dimensionless, as required by the common economic interpretation of the productivity concept we use, output/input, [$]/[$].

C is total cost of ownership, including all costs for developing software and running it on the computer, over a defined lifetime of the system (T). This is expanded into components in the Appendix. [$]

T is the lifetime of the system as defined in the typical budget approval process. It does not show up explicitly in the top-level equation, but it is important to the definitions of the variables as well as for the measurements. It will depend on considerations specific to each environment, including, for example, whether the procurement and justification (and cost) involve continuing upgrades. Those responsible for budget submissions at the proposing institution are the source of this variable. [yrs]

Ejob is the ratio of Ujob to the total system cost Csys. Ujob is the total productively used resources over the lifetime by all individual jobs in all project activities, normalized to the assumption of 100% availability and 100% resource allocation. To a large extent, this efficiency measures how well programs use parallel resources. We include other costly resources (e.g., memory, bandwidth, I/O) besides CPU. In order not to favor one class of resources over another, we weight the resources by their relative costs.9 We say “productively used” so that we only include resources that would be consumed in direct support of maximizing utility for the specified problem. Estimators for Ejob and for , the corresponding average total project personnel time (including development and production) per project, are what Job-level Productivity Benchmarks should aim to measure for different environments and systems. More detailed definitions of these can be found in.2 [Dimensionless]

Asys is the availability; the fraction of total resource time available to the jobs making up Rjob. This accounts for all planned and unplanned downtimes, but does not include job restarts due to failures, which is included in Esys below. Note this is not fraction of system time, but fraction of resource time, so that it accounts for portions of the large system being available when other portions are down. This parameter is based on vendor estimates of MTTF, maintenance requirements, and other RAS considerations. [Dimensionless]

The product EsysEadmEproj is the effectiveness of the workload resource allocation at meeting the institution's priorities. System architecture, software, and administrative tools can make a big difference to this important metric. This is where utility (or value) enters the picture and we go beyond just looking at job-level optimization. We assume that the institution will have a process for establishing the priority or the utility or the value of the individual projects and jobs. Esys is the ratio of the actual total value of the jobs run over the lifetime to the optimal total that would come from maximally efficient management of a reference HPCS platform,10 for a given job mix, a given installed system configuration, and a given time pattern of resource downtime. As shown in [2], we allow for time dependent values, since allocation efficiency that completes a job or project long after it is needed has little or no value. We recognize explicitly (see further discussion in [2]) that not all the factors that go into Esys are amenable, even in principle, to before or after the fact quantitative measurement. Some will need to be evaluated subjectively, and primarily have to do with the effectiveness with which projects (management assigned and valued tasks) are able to accomplish management goals and schedules. This is why we have written Esys as a product with Eadm as the measurable part and Eproj the project effectiveness part requiring subjective evaluation. Eadm is the estimator that, along with the cost of administration, is what System-level and Administration Productivity Benchmarks should aim to measure, for different systems, configurations, and user environments. Eproj represents other considerations that result in reduced efficiency at delivering utility. [Dimensionless]

Usys is the summed utility (or value) of all projects or activities, per unit of total available resources, R, that would complete successfully during the lifetime T if the system administrators given the systems tools and capabilities of the HPCS Reference Platform were able to attain the optimum level of resource utilization and the system was perfectly available. The values may be assigned by whatever process the institution uses to prioritize its projects and justify its budget proposals. This variable only includes the local institution's evaluation of the value of the projects it intends for the installation. In some environments, it may be ignored and set to a constant such as Csys. We convert all resource units to CPU operations, weighting by their relative costs. [$/ops]

Pages: 1 2 3 4 5 6 7

Reference this article
"A System-wide Productivity Figure of Merit," CTWatch Quarterly, Volume 2, Number 4B, November 2006 B. http://www.ctwatch.org/quarterly/articles/2006/11/a-system-wide-productivity-figure-of-merit/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.