PRAGMA: Example of Grass-Roots Grid Promoting Collaborative e-Science Teams
Peter Arzberger, University of California, San Diego
Philip Papadopoulos, San Diego Supercomputer Center and University of California, San Diego
CTWatch Quarterly
February 2006

Introduction

Science is a global enterprise. Its conduct transcends geographic, discipline, and educational levels. The routine ability to work with experts around the world, to use resources distributed in space across international boundaries, and to share and integrate different types of data, knowledge and technology is becoming more realistic. It is the development and deployment of compatible cyberinfrastructures (a.k.a Grid) that link together computers, data stores, and observational equipment via networks and middleware that form the operative IT backbone of international science teams. While, large community projects exist that exploit the Grid (e.g. Large Hadron Collider)1 , international collaboration can and most likely will take place at scales of smaller teams. For example, a multidisciplinary, distributed team of researchers from the University of Zurich, the University of California San Diego, and Monash University in Australia are synthesizing application and grid middleware, using distributed computational resources from the PRAGMA testbed 2, to gain understanding of complex biochemical reactions than can impact the design of new drugs 3 4 5 5a 6. This example and others 7 8 9 demonstrate the value and potential of working with the emerging cyberinfrastructure. Yet, significant effort was required to bring these tools, people and resources together. The current challenge for the Grid community is to make this potential and demonstration a reality, on a routine basis.

Pacific Rim Application and Grid Middleware Assembly (PRAGMA)

Established in 2002, the Pacific Rim Application and Grid Middleware Assembly 10 is an open organization whose focus is how to practically create, support and sustain international science and technology collaborations. Specific experiments are postulated, candidate technologies and people are identified to support these experiments, evaluation is performed in our trans-Pacific routine-use laboratory, and successful solutions are integrated into country-specific software stacks or Global Grid Forum 11 standards. The group harnesses the ingenuity of more than 100 individuals from 25 institutions to create and sustain these long-term activities. PRAGMA plays a critical role as an international conduit for personal interactions, ideas, information, and grid technology. Our multi-faceted framework for collaboration catalyzes and enables new activities because of a culture of openness to new ideas. Our pragmatic approach has lead to new scientific insights 3, enhanced technology 12 13 14, and a fundamental sharing of experiences manifest in our routine-use laboratory.

PRAGMA began with the following observations: global science communities were emerging in increasing numbers; grid software had entered its second phase of implementation; and international networks were expanding rapidly in capacity as fundamental high-speed enablers for data and video communication. But, the integration and productive use of these potential tools was “out of reach” to many scientists. To address the issue of making this technology routinely accessible, a founding set of Pacific Rim Institutions began to work together to share ideas, challenges, software, and possible end-to-end solutions.

Our common-sense approach begins with prospective collaborative science-driven projects (like whole genome annotation, quantum chemistry dynamics, Australian Savannah wildfire simulation, and remote control of large electron microscopes coupled with 3D tomographic reconstruction) so that both people and candidate technologies can be identified to address the scientific needs. Identification is through people-to-people networking, progressively more sophisticated demonstrations, tutorials on software components (e.g. gFarm, MOGAS15, Nimrod, Rocks16, Ninf-G 17 and others) and a consistent face-to-face workshop schedule. When enough ingredients are available to start down the pathway of using the Grid, integrating software to be grid-aware, and/or sharing data, then software is instanced onto our routine-use laboratory. This lab (described in more detail below with its evolution and management challenge described in 18) is where technologists from multiple organizations work together to provide a baseline infrastructure for evaluation. Successful science projects can move to larger resource pools if needed. The entire end-to-end process is possible because of an active international steering committee that continually focuses the group’s multiple efforts for tangible results. Below we describe and illustrate these key components of PRAGMA, together with software distribution and community building.

Collaborative Science-driven Teams

PRAGMA brings multidisciplinary, multi-institutional teams together, driven by application needs. In addition to the computational chemistry application described above, another team of researchers from the US, Japan, China, and Singapore, integrated a protein annotation pipeline (iGAP 19) developed at UCSD, a distributed file system (Gfarm 20) developed at the National Institute for Advanced Industrial Science and Technology (AIST), and a metascheduler (CSF) being extended by researchers at Jilin University in China to schedule iGAP testing 21. This software/middleware synthesis effort has led to improvements of Gfarm. In particular, the metadata server design is changed to meet the requirements of high throughput file creation and registration. Automatic replication of data and deployment of applications to remote sites become fully supported for most common architectures 7. Finally, a successful annotation of the bacteria, Burkholderia mallei, a known bioterrorism agent, has been conducted with this infrastructure and the PRAGMA testbed (the annotation will be publicly available pending publication of analysis results).

A final example integrates expertise of IPv6 networking at the Cybermedia Center of Osaka University, remote control of a microscope at UCSD, and use of a computational grid to build tomographic reconstructions of subcellular structures, and the development of visualization modules from the National Center of High-performance Computing (NCHC), to provide an enhanced suite of tools for researchers to use 22. Not only did the team benefit, but each group did as well. UCSD researchers were able to better access the machine in Japan and distributed compute resources at the three sites. Osaka researchers were able to control the machines and make codes available to their users, and NCHC colleagues were able to take the concept and knowledge of remote control of a microscope and retarget the application to that of sensors in the environment, creating EcoGrid 23 in Taiwan.

Each of these international science and technology teams has shared technology and experience to significantly enhance their research agendas. The structure of PRAGMA, with its culture of openness to new ideas and technologies coupled with a recurring series of focused workshops24, provided the essential glue for these teams. Each of these accomplishments has resulted in ongoing collaborations that now span years.

Routine Use Grids

These and other examples 7 have driven the use of PRAGMA’s evolving grass-root grid testbed. The overall goal of the PRAGMA testbed is to provide a stable platform to allow these and other application/middleware codes to be tested, and to understand how to make applications run on a routine basis without the superhuman efforts that many of these examples currently require.

The current testbed consists of resources and participants from 19 institutions from five continents and 13 countries. It is an instantiation of a useful, interoperable, and consistently available grid system that is neither dedicated to the needs of a single science domain nor funded by a single national agency. This testbed is heterogeneous in equipment and connectivity (bandwidth as well as persistence) between machines, reflecting both funding realities and the future global cyber-infrastructure.

The testbed has been grown using a minimum set of requirements. The initial software stack comprised of Globus plus local scheduling software. Additional middleware is added based on the needs of the applications. For example, a remote procedure call middleware, Ninf-G developed at AIST, became part of the testbed since it was required by two applications being used in PRAGMA: one is a time dependent density functional theory calculation; the other a Quantum Mechanical / Molecular Dynamics (QM/MD) code 7.

These routine use experiments have produced results through strong feedback between application and middleware developers. Codes have had to be improved to operate in a network environment where connections fail, in particular to be more fault tolerant. The testbed development is being driven by many application areas, allowing examination of different requirements. In addition, a richer monitoring set (SCMSWeb)25 developed at Kasetsart University and accounting tools (MOGAS) 26 of the Nanyang Technological University were introduced to the testbed.

PRAGMA has demonstrated that these grids can be very useful to improve codes and to conduct meaningful and otherwise unachievable science.

Multiway Software Dissemination

As software is tested in these real-world environments and used in multiple applications, ways of disseminating software is needed. In the case of Ninf-G, we have successfully developed a procedure that allowed it to be integrated into the US NMI software stack for version 8 and subsequent releases. This was the first instance of a non-US code being introduced into that stack. Another dissemination vehicle we are using is the Rocks Rolls mechanism (i.e. RPM packages configured for automatic (re)deployment to Rocks-based clusters). PRAGMA partners also released codes with changes, and Rocks was localized by the partners at the Korea Institute for Science and Technology Information (KISTI) to allow for easier access and use by the Korean Grid community 27. And with broader dissemination comes broader use. In one case this led to a set of standards proposed in GGF for remote procedure calls 28.

Finally, as the broader community produces new codes or standards, PRAGMA will adopt them. Examples include the CA from Naregi29 or the PMA lead by AIST, which then are used by the broader PRAGMA community.

Building a Community

PRAGMA itself focuses on a grass roots approach in an effort to enable new communities to form and assemble expertise not available at any single institution Some global issues require the ability to rapidly form international teams. Responses to epidemics such as SARS (PRAGMA played a crucial role in helping to pull together an international team to aid Taiwan in their efforts to combat the disease) 30 and emerging threats like Avian Flu often require teams to assemble in hours or days. Other groups with a small set of geographically dispersed experts simply do not have the personnel resources to independently build a complete cyberinfrastructure. In this model, PRAGMA has played a leading role in catalyzing GLEON31, the Global Lake Ecological Observatory Network, a grassroots network of limnologists, information technology experts, and engineers with a common goal of building a scalable, persistent network of lake ecology observatories. Data from these observatories will help this community to better understand key processes such as the effects of climate and land use change on lake function, the role of episodic events such as typhoons in resetting lake dynamics, and carbon cycling within lakes. These teams are, by nature and expertise, international in scope.

We have built a community by focusing on concrete projects to build trust and on an on-going series of semi-annual working meetings. The meetings rotate among PRAGMA member sites to engage a broader group of researchers at each site and to allow the PRAGMA community to appreciate its members’ cultural richness.

Students are an essential component of the PRAGMA community. Pacific Rim Undergraduate Experiences (PRIME)32 provides UCSD undergraduate students summer research experiences at four PRAGMA sites; Osaka, Monash, NCHC, and the Computer Network Information Center of the Chinese Academy of Sciences (CNIC). The students conduct research and contribute to the infrastructure. Further, they have helped expand the collaborations between scientists at the institutions. In addition, the Japanese government has awarded Osaka University funds to create Pacific Rim International UniverSities (PRIUS), a program designed to improve education for graduate students interested in grid technology by supporting a series of activities, including exchanges. Both PRIME and PRIUS build on and enhance the PRAGMA community and would not exist without PRAGMA.

Final Comments

PRAGMA is both a multifaceted organization and an experiment. It is open to institutions who wish to actively participate in projects and contribute and share resources, middleware, applications and expertise with other members. The value of the structure is that it allows for transfer of technology among institutions, allowing in some cases for rapid start ups by acquiring technology, and in other cases for user feedback on technologies that have been developed. The structure also allows for a transfer of technologies between disciplines where, for example, control of an instrument is being moved from neuroscience and a microscope to ecology and a sensor.

Collectively, we have built a human network that allows for new activities to begin. We have built a stable, persistent grass-roots grid testbed on which codes can be tested and science conducted. We have shared our experiences via publications and the improved codes via a variety of software dissemination vehicles, allowing the broader community to benefit. Finally, we have used the structure to build a legacy where researchers will work, collaborate, and educate internationally.

Acknowledgements
Each of the authors would like to acknowledge support from NSF INT-0314015 for our participation in PRAGMA and for OCI 0505520, for integrating Ninf-G into the NMI stack. We also acknowledge NSF INT-0407508, Calit2 and GEON for their support of PRIME. We wish to thank NSF current and former program officers for their strong partnership and encouragement to “take advantage of the geographical location of San Diego, on the pacific rim” and build the community.
We also note that NSF funds helped leverage resources from PRAGMA partners and their funding agencies. Without partnership involvement, we would not exist.
Finally, we wish to acknowledge the support of NIH P41 RR08605 which supports tools for the biomedical community, the Betty and Gordon Moore Foundation for the launch of GLEON, and NSF NEON 0446802 for tools in GLEON.
1 Large Hadron Collider - http://lhc.web.cern.ch/lhc/
2 Baldridge, K.K., Sudholt, W., Greenberg, J.P., Amoreira, C., Potier, Y., Altintas, I., Birnbaum, A.. Abramson, D.. Enticott, C.. Slavisa, G. Cluster and Grid Infrastructure for Computational Chemistry and Biochemistry. In Parallel Computing for Bioinformatics (Invited Book Chapter), A. Y. Zomaya (Ed.), John Wiley & Sons, 2005, In press.
3 Sudholt, W., Baldridge, K. K., Abramson, D., Enticott, C., Garic, S. Parameter Scan of an Effective Group Difference Pseudopotential Using Grid Computing. New Generation Computing, Vol.22 No 2 (Special Feature Grid Systems for Life Sciences). February 2004.
4 Sudholt, W., Baldridge, K., Abramson, D., Enticott, C., Garic, S., Applying Grid Computing to the Parameter Sweep of a Group Difference Potential, The International Conference on Computational Sciences, ICCS04, Krakow Poland, June 6 - 9, 2004.
5 The chemistry codes are GAMESS, a community code for quantum mechanics calculations, APBS – Adaptive Poisson Boltzmann Solver (http://apbs.sourceforge.net/), with an integrated framework for connecting grid resources, GEMSTONE 5a, with parameter sweeps middleware (Nimrod) over a grid.
5a Baldridge, K. K., Bhatia, K., Greenberg, J.P., Stearn, B., Mock, S., Sudholt, W., Krishnan, S., Bowne, A., Amoreira, C., Potier, Y. GEMSTONE: Grid-Enabled Molecular Science through Online Networked Environments. Invited paper: LSGRID Proceedings, 2005, in press
6 Sudholt, W., Baldridge, K., Abramson, D., Enticott, C., Garic, S., Kondric, C., Nguyen, D. Application of Grid Computing to Parameter Sweeps and Optimizations in Molecular Modeling. Future Generation Computer Systems (Invited), 2005. 21, 27-35.
7 Abramson, D., Lynch, A., Takemiya, H., Tanimura, Y., Date, S., Nakamura, H., Jeong, K., Lee, H., Wang, C., Shih, HL., Molina, T., Baldrdige, K., Li, W., Arzberger, P. Deploying Scientific Application on the PRAGMA Grid Testbed: Ways, Means and Lessons. Accepted CCgrid 2006
8 Hey, A., Trefethen, A. Cyberinfrastructure for e-Science, Science 2005 308: 817-821.
9 A 21st Century National Team Science Infrastructure - http://www.calit2.net/newsroom/release.php?id=660
10 Pacific Rim Application and Grid Middleware Assembly - http://www.pragma-grid.net
11 Global Grid Forum - http://www.ggf.org
12 Telescience Portal - https://telescience.ucsd.edu
13 Nimrod - http://www.csse.monash.edu.au/~davida/nimrod/
14 Gfarm - http://datafarm.apgrid.org/. a global parallel file system developed by AIST in collection with KEK, University of Tokyo and Titech.
15 Multiple Organization Grid Accounting System - http://www2.ntu.edu.sg/SCERN/Dec2004/art1.htm, http://pragma-goc.rocksclusters.org/softdepot/ntu_acct.html, http://ntu-cg.ntu.edu.sg/pragma/index.jsp
16 NPACI Rocks - http://www.rocksclusters.org/
17 Ninf-G - http://ninf.apgrid.org
18 Zheng, C., Abramson, D., Arzberger, P., Ayuub, S., Enticott, C., Garic, S., Katz, M., Kwak, J., Papadopoulos, P., Phatanapherom, S., Sriprayoonsakul, S., Tanaka, Y., Tanimura, Y., Tatebe, O., Uthayopas, P. The PRAGMA Testbed: Building a Multi-Application International Grid 2005. CCGrid 2006 (submitted).
19 Li, W.W., Byrnes, R.W., Hayes, J., Birnbaum, A., Reyes, V.M., Shahab, A., Mosley, C., Pekurovsky, D., Quinn, G.B., Shindyalov, I.N., Casanova, H., Ang, L., Berman, F., Arzberger, P.W., Miller, M.A., Bourne, P.E. The Encyclopedia of Life Project: Grid Software and Deployment. New Generation Computing, Vol.22 No 2 pp 127-136 (Special Feature Grid Systems for Life Sciences). February 2004. http://www.ohmsha.co.jp/ngc/ngc2202.htm
20 Tatebe, O., Ogawa, H., Kodama, Y., Kudoh, T., Sekiguchi, Matsuoka, S., Aida, K., Boku, T., Sato, M., Morita, Y., Kitatsuji, Y., Williams, J., Hicks, J., The Second Trans-Pacific Grid Datafarm Testbed and Experiments for SC2003, Proceedings of 2004 International Symposium on Applications and the Internet - Workshops (SAINT 2004 Workshops), 26-30 January 2004, Tokyo, Japan
21 Wei, X., Li, W. W., Tatebe, O., Xu, G., Liang, H., Ju, J. Implementing data aware scheduling in Gfarm using LSFTM scheduler plugin mechanism. Proceedings of the 2005 International Conference on Grid Computing and Applications (GCA'05). Las Vegas. 2005. In press.
22 Lee, D., Lin, A.W., Hutton, T., Akiyama, T., Shinji, S., Lin, F.P., Peltier, S., Ellisman, M.H. Global Telescience Featuring IPv6 at iGrid2002. Future Generation of Computer Systems, 19(6): 103139. 2003.
23 EcoGrid - http://ecogrid.nchc.org.tw/
24 Recent workshops were hosted by the PRAGMA members Bioinformatics Institution and the National Grid Office Singapore (May 2005), University of Hyderabad (October 2005). Future workshops will be hosted by University of Queensland and the Australian Partnership for Advanced Computing (March 2006), Osaka University (October 2006), National Electronics and Computer Technology Center and Kasetsart University (Spring 2007), National Center for Supercomputing Applications (Fall 2007), and NCHC (Spring 2008).
25 Scalable Cluster Management System Wed - http://www.opensce.org/components/SCMSWeb/
26 Lee, B-S., Tang, M., Zhang, J., Soon, O.Y., Zheng, C., Arzberger, P. Analysis of Jobs on a Multi-Organizational Grid Test-bed. CCGrid 2006 (accepted).
27 KROCK - http://rocks.cluster.or.kr
28 Tanimura, Y., Ikegami, T., Nakada, H., Tanaka, Y., Sekiguchi, S., Proceedings of the Workshop on Grid Applications: from Early Adopters to Mainstream Users, GGF Documents, 2005.
29 National Research Grid Initiative - http://www.naregi.org/index_e.html
30National Science Foundation FY2003 Performance Highlights (NSF-04-011) [See page “From the Director” and page 14] http://www.nsf.gov/pubs/2004/nsf04011/
31 Global Lake Ecological Observatory Network - http://gleon.org
32 Pacific Rim Undergraduate Experiences - http://prime.ucsd.edu

URL to article: http://www.ctwatch.org/quarterly/articles/2006/02/pragma-example-of-grass-roots-grid-promoting-collaborative-e-science-teams/