The power costs must not only take into account the power needs of the computer, but also the cost of the cooling. As a rule of thumb, multiply the power consumption of the system alone by 35-40% to estimate the additional power consumption of the required cooling. Today’s rates for power vary substantially over the country, ranging from under 3 cents/kwh to over 10 cents/kwh.
First year maintenance may be included in the price of a new system. After that, unless the purchase has explicitly included multi-year maintenance, annual maintenance costs seem to range between 4-8% of the purchase price of the machine. It is not necessary to get a maintenance contract with extremely rapid response. For a system with a large node count, it is much more important to be able to remove a node from the system rapidly, reconfigure, preferably with spares, and continue. Next day service may be adequate for the vendor to then do any required hardware maintenance on the removed nodes. It is almost always better to negotiate maintenance options with the vendor while negotiating for the original system, for that is when you have most leverage with the vendor. It is wise to structure these as annual options so that you can cancel the maintenance contract with the vendor if you can find a better deal.
Operation expenses can be kept down by developing operator-free systems. For this, you need an extensive alerting infrastructure, which relays system events to system administrators via pagers or text messaging on their cell phones. Underlying it is a monitoring system extensive and reliable enough to report any of the anomalies that system operators would likely catch. You actually need a hierarchy of monitoring, from simple pass/fail on individual low level devices, like nodes, disks, etc. to high level testing of several components in sequence and verifying that the end-to-end results are correct.
As a new trend, the four to five year operating cost including maintenance, space, power, and cooling of a major computer, which for many years was a small part of the total cost of ownership of a system, is now becoming a much more significant factor, and may even exceed the original capital investment.
Increasingly, system software for debuggers, mathematical libraries, job scheduling, performance analysis, and even compilers, is provided by companies other than the hardware vendor. The cost of this required third party software can be substantial, and often the suppliers do not have early access to hardware from the vendors. Make certain that you understand exactly what software will be supplied with the system, and what arrangements the vendor has with the independent software vendors who will supply these other needed tools. The cost of these licenses can be large. However, it is not always necessary to license tools such as debuggers for the full system. For example, debugging tools are not very effective above 100-200 tasks, so don’t bother to license the debugger for 2000 nodes. This can save a substantial amount of money. There are high-quality, robust mathematical libraries that are available for free from universities and government laboratories as a result of many years of development from the NSF and DOE. Often, vendors have optimized versions of these libraries available for their systems.