CTWatch
March 2008
Urgent Computing: Exploring Supercomputing's New Role
Suresh Marru, School of Informatics, Indiana University
Dennis Gannon, School of Informatics, Indiana University
Suman Nadella, Computation Institute, The University of Chicago
Pete Beckman, Mathematics and Computer Science Division, Argonne National Laboratory
Daniel B. Weber, Tinker Air Force Base
Keith A. Brewster, Center for Analysis and Prediction of Storms, University of Oklahoma
Kelvin K. Droegemeier, Center for Analysis and Prediction of Storms, University of Oklahoma

3

To summarize, LEAD has enormous demands: large data transfer, real-time data streams, and huge computational needs. But, arguably, most significant is the need to meet strict deadlines. On-demand computations cannot wait in a job queue for Grid resources to become available.

However, neither can the scientific community afford to keep multimillion dollar computational resources idle until required by an emergency. Instead, we must develop technologies that can support urgent computation. Scientists need mechanisms to find, evaluate, select, and launch elevated-priority applications on high-performance computing resources. Such applications might reorder, preempt, or terminate existing jobs in order to access the needed cycles in time.

To this end, LEAD is collaborating with SPRUCE, the Special PRiority and Urgent Computing Environment TeraGrid Science Gateway 12. SPRUCE provides resources quickly and efficiently to high-priority applications that must get computational power without delay.

SPRUCE

SPRUCE facilitates urgent computing by addressing five important concepts: session activation, priority policies, participation flexibility, allocation and usage policies, and verification drills.

SPRUCE uses a token-based authorization system for allocation and tracking of urgent sessions. As a raw technology, SPRUCE has no dictated priority policies; resource providers have full control and flexibility to choose possible urgency mechanisms they are comfortable with and to implement these mechanisms as the providers see fit. To build a complete solution for urgent computing, SPRUCE must be combined with allocation and activation policies, local participation policies for each resource, and procedures to support “warm-standby” drills. These application drills not only verify end-to-end correctness but also generate performance and reliability logs that can aid in resource selection.

Right-of-Way Tokens

Many possible authorization mechanisms could be used to let users initiate an urgent computing session, including digital certificates, signed files, proxy authentication, and shared-secret passwords. In time-critical situations, however, simpler is better. Complex digital authentication and authorization schemes could easily become a stumbling block to quick response. Hence, simple transferable tokens were chosen for SPRUCE. This design is based on existing emergency response systems proven in the field, such as the priority telephone access system supported by the U.S. Government Emergency Telecommunications Service in the Department of Homeland Security 13. Users of the priority telephone access system, such as officials at hospitals, fire departments, and 911 centers, carry a wallet-sized card with an authorization number. This number can be used to place high-priority phone calls that jump to the top of the queue for both land- and cell-based traffic even if circuits are completely jammed because of a disaster.

Figure 3


Figure 3. SPRUCE “right-of-way” token

The SPRUCE tokens (see Figure 3) are unique 16-character strings that are issued to scientists who have permission to initiate an urgent computing session. When a token is created, several important attributes are set, such as resource list, maximum urgency, sessions lifetime, expiration date, and project name. A token represents a unique “session” that can include multiple jobs and that lasts for a clearly defined period. It can also be associated with a group of users, who can be added or removed from the token at any time, providing flexible coordination.

Pages: 1 2 3 4 5 6 7 8

Reference this article
Marru, S., Gannon, D., Nadella, S., Beckman, P., Weber, D. B., Brewster, K. A., Droegemeier, K. K. "LEAD Cyberinfrastructure to Track Real-Time Storms Using SPRUCE Urgent Computing," CTWatch Quarterly, Volume 4, Number 1, March 2008. http://www.ctwatch.org/quarterly/articles/2008/03/lead-cyberinfrastructure-to-track-real-time-storms-using-spruce-urgent-computing/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.