CTWatch
March 2008
Urgent Computing: Exploring Supercomputing's New Role
Paul Tooby
Dong Ju Choi
Nancy Wilkins-Diehr, San Diego Supercomputer Center

3

OnDemand is a Dell cluster with 64 Intel dual-socket, dual-core compute nodes for a total of 256 processors. The 2.33 GHz, 4-way nodes have 8 GB of memory. The system, which has a nominal theoretical peak performance of 2.4 Tflops, is running the SDSC-developed Rocks open-source Linux cluster operation software and the IBRIX parallel file system. Jobs are scheduled by the Sun Grid Engine.

OnDemand also makes use of the SPRUCE system developed by a team at Argonne National Laboratory. SPRUCE provides production-level functionality, including access controls, reporting, and fine-grained control for urgent computing jobs. An organization can issue tokens to its user groups who have been approved for urgent computing runs. Different colors (classes) of SPRUCE tokens represent varying urgency levels. A yellow token will put the requested job in the normal queue in the Sun Grid Engine scheduler; an orange token goes to the high priority queue; and a job submitted with a red token will preempt running jobs if necessary.

The researchers are working to develop additional capabilities. Currently, jobs with the least amount of accumulated CPU are the first to be preempted. In the future, preempted backfill jobs may be held and restarted when appropriate, without being killed, and investigation of checkpoint and restart systems is ongoing.

Backfill jobs consist of a variety of regular user jobs, primarily parallel scientific computing and visualization applications using MPI. Users who run on the OnDemand cluster are made aware of the cluster’s mission to prioritize jobs that require immediate turnaround.

Figure 3

Figure 3. The Star-P extends easy access to supercomputing to a much wider range of researchers.

One of the most interesting and successful applications using OnDemand is a commercial application called Star-P 5, which extends easy access to supercomputing to a much wider range of researchers. Users can code models and algorithms on their desktop computers using familiar applications like MATLAB, Python and R, and then run them interactively on SDSC's OnDemand cluster through the Star-P platform. This eliminates the need to re-program applications to run on parallel systems, so that programming that took months can now be done in days, and simulations that took days on the desktop can now be done in minutes. Lowering the barrier to supercomputing resources will let researchers jumpstart research that otherwise wouldn't get done.

Star-P supports researchers by allowing them to transparently use HPC clusters through a client (running on their user desktop environment) and server framework (running in an HPC cluster environment). For example, existing MATLAB users on a PC desktop can now achieve parallel scalability from the same MATLAB desktop interface with a simple set of STAR-P commands. This has enabled many users to achieve the tremendous speed-ups that advanced research groups see by laboriously reprogramming applications using MPI.

Researchers on SDSC’s OnDemand are using STAR-P in a variety of application areas, including science, engineering, medical and financial disciplines. Several research groups have seen true performance breakthroughs through STAR-P, which fundamentally changes the type of problems they are able to explore. A close collaboration with SDSC also won the Interactive Supercomputing HPC Challenge at SC 07.

SDSC and its academic and industrial partners, including Argonne National Laboratory and Interactive Supercomputing, are aggressively continuing to improve the cluster environment to enhance this urgent computing service. The accumulating experience at SDSC using OnDemand is playing a critical role as a testbed as the team works to further develop the urgent computing paradigm and robust infrastructure.

References
1San Diego Supercomputer Center (SDSC) - www.sdsc.edu/
2SDSC Allocations - www.sdsc.edu/us/allocations/
3SDSC On-Demand cluster - www.sdsc.edu/us/resources/ondemand/
4ShakeMovie, Caltech's Near Real-Time Simulation of So. Calif. Seismic Events Portal -http://shakemovie.caltech.edu/
5Star-P at Interactive Supercomputing - www.interactivesupercomputing.com/

Pages: 1 2 3

Reference this article
Tooby, P., Ju Choi, D., Wilkins-Diehr, N. "Supercomputing On Demand: SDSC Supports Event-Driven Science," CTWatch Quarterly, Volume 4, Number 1, March 2008. http://www.ctwatch.org/quarterly/articles/2008/03/supercomputing-on-demand-sdsc-supports-event-driven-science/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.