CTWatch
November 2007
Software Enabling Technologies for Petascale Science
Jennifer M. Schopf, University of Chicago and Argonne National Laboratory
Ann Chervenak, University of Southern California
Ian Foster, University of Chicago and Argonne National Laboratory
Dan Fraser, University of Chicago and Argonne National Laboratory
Dan Gunter, Lawrence Berkeley National Laboratory
Nick LeRoy, University of Wisconsin
Brian Tierney, Lawrence Berkeley National Laboratory

6
6. Revisiting the FLASH Example

We began this article with a discussion of the University of Chicago FLASH application experiment, in which it took three weeks at 20 MB/s to transfer less than 15% of the data produced in a three-week simulation. By using MOPS, it is possible that smarter disk allocation could have been done, allowing the FLASH group to transfer data of particular interest more quickly and as it was being generated due to smarter handling of the backend storage system. When performing local analysis and replication of the data, the FLASH team could now take advantage of the DPS, which would handle registering new files and distributing them according to the policy defined by the FLASH team, instead of having to do this work by hand. In addition, with the added centralized logging and trigger service deployed at the various sites, FLASH scientists would be able to detect any failures and debug any performance problems much more easily than the current environment. The effort required to achieve their end-to-end goals of scientific publications and publicly available datasets would be significantly reduced overall.

7. Summary

We have introduced the SciDAC Center for Enabling Distributed Petascale Science (CEDPS), which is addressing three problems critical to enabling the distributed management and analysis of petascale datasets: data placement, scalable services, and troubleshooting.

In data placement, we are developing tools and techniques for reliable, high-performance, secure, and policy driven placement of data within a distributed science environment. We are constructing a managed object placement service (MOPS)—a significant enhancement to today’s GridFTP—that allows for management of the space, bandwidth, connections, and other resources needed to transfer data to and/or from a storage system. Building on this base, we are developing end-to-end data placement services that implement different data distribution and replication behaviors.

In troubleshooting, we are developing tools for the detection and diagnosis of failures in end-to-end data placement and distributed application hosting configurations. We are constructing an end-to-end monitoring architecture that uses instrumented services to provide detailed data for both background collection and run-time, event-driven collection. We are also constructing new monitoring analysis tools able to detect failures and performance anomalies and predict system behaviors using archived data and event logs.
These tools allow scientists to interact more easily with large data sets created during petascale computations, and allow faster end analysis of the data. More details can be found at http://www.cedps.net.

Acknowledgements
This work is supported through the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research, through the SciDAC program. Work at Argonne is supported under Contract DE-AC02-06CH11357 and at Lawrence Berkeley National Laboratory, under Contract DE-AC02-05CH11231. We gratefully acknowledge the contributions of our fellow CEDPS participants Andrew Baranovski, Shishir Bharathi, John Bresnahan, Tim Freeman, Keith Jackson, Kate Keahey, Carl Kesselman, David E. Konerding, Mike Link, Miron Livny, Neill Miller, Robert Miller, Gene Oleynik, Laura Pearlman, and Robert Schuler.
References
1 Baranovski, A., et al. "Enabling Distributed Petascale Science," Journal of Physics: Conference Series, 78. 2007.
2 LHC - The Large Hadron Collider Project - lhc.web.cern.ch, 2007.
3 BES Scientific User Facilities, www.sc.doe.gov/bes/BESfacilities.htm, 2007.
4 ITER - www.iter.org, 2006.
5 Fisher, R.T., et al. "Terascale Turbulence Computation on BG/L Using the FLASH3 Code," IBM Systems Journal. 2007.
6 Foster, I. "Service-Oriented Science," Science, 308. 814-817. 2005.
7 Ranganathan, K. and Foster, I. "Simulation Studies of Computation and Data Scheduling Algorithms for Data Grids," Journal of Grid Computing, 1 (1). 2003.
8 Chervenak, A., et al. "Data Placement for Scientific Applications in Distributed Environments," 8th IEEE/ACM International Conference on Grid Computing (Grid 2007), Austin, TX, 2007.
9Bernholdt, D., et al. "The Earth System Grid: Supporting the Next Generation of Climate Modeling Research," Proceedings of the IEEE, 93 (3). 485-495. 2005.
10 Allcock, B., Bresnahan, J., Kettimuthu, R., Link, M., Dumitrescu, C., Raicu, I. and Foster, I. "The Globus Striped GridFTP Framework and Server," SC'2005, 2005.
11 Chervenak, A., et al. "Giggle: A Framework for Constructing Scalable Replica Location Services," SC'02: High Performance Networking and Computing, www.globus.org/research/papers.html#giggle, 2002.
12Bent, J., et al. "NeST: A Grid Enabled Storage Appliance," Grid Resource Management: State of the Art and Future Trends, 2004.
13 Shoshani, A., Sim, A. and Gu, J. "Storage Resource Managers: Essential Components for the Grid," in Nabrzyski, J., Schopf, J.M. and Weglarz, J. eds. Grid Resource Management: State of the Art and Future Trends, Kluwer Academic Publishers, 2003.
14 Deelman, E., et al. "Pegasus: A Framework for Mapping Complex Scientific Workflows onto Distributed Systems," Scientific Programming, 13 (3). 219-237. 2005.
15 Chervenak, A., Schuler, R., Kesselman, C., Koranda, S. and Moe, B., "Wide Area Data Replication for Scientific Collaborations," in 6th IEEE/ACM Int'l Workshop on Grid Computing (2005).
16 The syslog-ng Logging System - www.balabit.com/products/syslog-ng/, 2007.
17 Tierney, B. and Gunter, D. "NetLogger: A Toolkit for Distributed System Performance Tuning and Debugging," Lawrence Berkeley National Laboratory, Technical Report LBNL-51276, 2003.
18 Chervenak, A., et al. "Monitoring the Earth System Grid with MDS4," 2nd IEEE Intl. Conference on e-Science and Grid Computing (e-Science 2006), Amsterdam, Netherlands, 2006.
19 Pordes, R., et al. "The Open Science Grid," in Scientific Discovery through Advanced Computing (SciDAC) Conference, (2007).
20 Grid Logging Best Practices Guide, CEDPS - www.cedps.net/wiki/images/6/6f/CEDPS-troubleshooting-bestPractices-16...., 2007.
21 Foster, I. "Globus Toolkit Version 4: Software for Service-Oriented Systems," Journal of Computational Science and Technology, 21 (4). 523-530. 2006.
22 Groep, D. Middleware Security Audit Logging Guidelines EGEE Document 2006-11-07, edms.cern.ch/document/793208, 2006.

Pages: 1 2 3 4 5 6

Reference this article
Schopf, J. M., Chervenak, A., Foster, I., Fraser, D., Gunter, D., LeRoy, N., Tierney, B. "End-to-End Data Solutions for Distributed Petascale Science," CTWatch Quarterly, Volume 3, Number 4, November 2007. http://www.ctwatch.org/quarterly/articles/2007/11/end-to-end-data-solutions-for-distributed-petascale-science/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.