CTWatch
November 2007
Software Enabling Technologies for Petascale Science
Jennifer M. Schopf, University of Chicago and Argonne National Laboratory
Ann Chervenak, University of Southern California
Ian Foster, University of Chicago and Argonne National Laboratory
Dan Fraser, University of Chicago and Argonne National Laboratory
Dan Gunter, Lawrence Berkeley National Laboratory
Nick LeRoy, University of Wisconsin
Brian Tierney, Lawrence Berkeley National Laboratory

3
3. The CEDPS Managed Object Placement Service: MOPS

We are creating a new class of data placement services that can position data reliably across diverse systems and coordinate provisioning, movement, and registration across multiple storage systems to enable efficient and prioritized access by many users. A single, logical transfer may involve multiple sources and destinations necessitating the use of intermediate store and forward storage systems, or the creation of optimized overlay networks such as user level multicast networks. Concurrent independent placement operations may be prioritized and monitored in case of failures.

As a first step, we have recently released a prototype Managed Object Placement Service (MOPS), shown in Figure 1, which transforms storage into a managed resource. MOPS allows users to negotiate access to a certain quantity of storage for a certain time and with defined performance characteristics. Its design and implementation leverages GridFTP, NeST, and dCache.

GridFTP provides a flexible core architecture with a data interface component that allows different plug-ins for added functionality. It is well known for its high-speed data transfer capabilities. GridFTP gives MOPS the core functionality of fast, bulk file transfers, element 2 in our scenarios, which MOPS extends through its plug-in capability.

Figure 1


Figure 1. General MOPS architecture.

NeST provides guaranteed storage allocation by allowing the user and storage device to negotiate a size and duration and to specify access control lists (ACLs) for file access. In this way, a system can specify which users can access certain files or sets of files and also work with disk reservations when they are available. This feature helps address element 3, coordinated data movement, and element 4, failure reduction, by decreasing the chance of disk overflow errors.

dCache provides methods for managing backend (tertiary) storage systems including space management, hot spot determination, and recovery from disk or node failures. When connected to a tertiary storage system, dCache simulates unlimited direct access storage space; data exchanges to and from the underlying tertiary storage system are performed automatically and invisibly to the user. Recent CEDPS-funded work has implemented data transfer consistency verification features for verifying that individual transfers have completed correctly. dCache also addresses element 3, coordinated data movement, and element 4, failure reduction.

By combining these three tools with a single user interface using MOPS, CEDPS users can now work with their data in a more managed environment, especially in terms of reducing failures due to running out of disk space in the middle of a transfer, limiting the access to a set of files, or verifying that a transfer has completed successfully, while continuing to serve the data quickly across a wide variety of networks and back-end storage systems.

Pages: 1 2 3 4 5 6

Reference this article
Schopf, J. M., Chervenak, A., Foster, I., Fraser, D., Gunter, D., LeRoy, N., Tierney, B. "End-to-End Data Solutions for Distributed Petascale Science," CTWatch Quarterly, Volume 3, Number 4, November 2007. http://www.ctwatch.org/quarterly/articles/2007/11/end-to-end-data-solutions-for-distributed-petascale-science/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.