How to Build an International Grid: Infrastructure, Applications and Community
Fabrizio Gagliardi, EGEE Project Director – CERN
Bob Jones, EGEE Technical Director – CERN
Owen Appleton, EGEE Communications Officer – CERN
CTWatch Quarterly
November 2005

Introduction

The Enabling Grid for E-sciencE project (EGEE) is Europe’s flagship Research Infrastructures Grid project1 and the world’s largest Grid infrastructure of its kind. It involves more than 70 partners from 27 countries, arranged in 12 regional federations, and providing more than 16,000 CPUs, more than 160 sites and 10 petabytes of available network storage. This infrastructure supports six scientific domains and more than 20 individual applications.

Started in March 2004, EGEE has rapidly grown from a European to a global endeavour, and along the way learned a great deal about the business of building production-quality infrastructure. The consortium behind this effort represents a significant proportion of Europe’s Grid experts, including not only academic intuitions but also partners from the Research Network community and European industry. This article outlines the project’s structure and goals, its achievements and the importance of cooperation in such large scale international efforts.

A distributed effort – project structure and goals

Figure 1: Extent of EGEE infrastructure for EGEE-IIThe aim of EGEE is to leverage the pre-existing grid efforts in Europe, thematic, national and regional, to build a production quality multi-science computing Grid. As a result, the primary objective is to build the infrastructure itself, connecting computing centres across Europe (and more recently, around the globe) into a coordinated service capable of supporting 24/7 use by large scientific communities. To support this production service, the project also aims to re-engineer existing middleware components to produce a service-orientated middleware solution. Finally, the project aims to engage the maximum number of users running applications on the infrastructure through dissemination, training and user support. These tasks have been divided into different activity areas, which are tackled by different groups within the project. These groups are distributed across a number of partner institutes with relevant experience, such that the project helps to connect its partners and encourage knowledge transfer in the process of achieving its goals.

Connecting and sharing – growing a global infrastructure

Building a large, secure, stable and scalable infrastructure is perhaps the key feature of EGEE. From the start, the project benefited from the resources of the international High Energy Physics (HEP) community, leveraging these to build a Grid infrastructure for all scientific disciplines. These HEP resources come from the computing systems built to support the forthcoming Large Hadron Collider (LHC) being built at CERN2 in Switzerland. More specifically, EGEE formed a strategic alliance with the LCG3 (LHC Computing Grid) project, which independently was deploying an international distributed computing infrastructure.

With an infrastructure of a considerable size from the HEP community available from day one, EGEE has been able to concentrate on delivering a working infrastructure, with a main production service supported by pre-production, testing and development services, and even specialised infrastructure for dissemination and training.4 This initial pool of resources supplied by the HEP community helped to encourage the other pilot application domains, the Biomedical science community, to contribute their own resources and run their own production challenges, thus encouraging other domains to join the project.

It also became clear during the early part of the project that restricting this effort to Europe made little sense given the distributed nature of many scientific communities and the large number of resources, both in terms of knowledge and hardware, in other parts of the globe. EGEE began to extend its efforts beyond its original partners early on in the project through extension of the infrastructure into South-Eastern Europe through the SEEGRID5 project and into digital library applications through the DILIGENT6 project. This successful policy of collaboration and extension has continued, with EGEE building relationships with major sister projects in areas such as the United States (OSG) and Asia (NAREGI), as well as through support for related projects (such as BalticGrid, EUChinaGrid, EELA and EUMedGrid) that extend the EGEE infrastructure to new geographical areas. Such associations are an important part of EGEE’s role as an incubator, both within Europe and beyond, actively supporting a wide range of Grid efforts, from infrastructure to application projects. Through these projects, EGEE has spread the knowledge it has accumulated in all areas of its work, from making applications Grid compliant to managing infrastructure. This cooperative spirit is also represented in the way that the infrastructure is managed. Initially run from a central centre at CERN to spread both the workload and the knowledge generated by managing large scale infrastructures, responsibility now rotates around centres across Europe (with future plans for centres in the US and Asia) .

Re-engineering and integration – producing modern middleware

Figure 2: Projects related to EGEE and EGEE-II From its inception, EGEE had considerable advantages in the area of middleware. EGEE is in many ways the successor to the European DataGrid project,7 which had previously developed a well built middleware solution, the EDG middleware. This stack had already been further developed by the LCG project into the LCG-2 middleware, providing EGEE with a working middleware stack to deploy from day one. In parallel, EGEE has also developed a new middleware solution, gLite,8 tailored to the multi-science user communities it supports. Rather than starting from scratch, gLite takes components from a large number of software sources, re-engineering some of them and integrating them into a modern, lightweight, service-orientated middleware solution. The resulting software stack provides a full set of Grid foundation services, as well as a range of higher level services.

This modular approach, combines best-of-breed elements from other middleware sources, allowing outside projects interested in the middleware to install only the elements of gLite relevant to them, developing their own specialist, high-level services on top of it. The gLite stack also contains the foundations necessary for interoperability with other Grid systems, in particular in the area of security frameworks, and its development team actively participates in security working groups through the Global Grid Forum9 and other such bodies. The gLite stack is released under a permissive, business-friendly open source license, which facilitates and encourages its use by outside groups. With gLite in use within the project, outside groups such as the DILIGENT digital library project are already making use of gLite on their own Grid infrastructure, and it is hoped that more groups, and eventually industry, will join them in the future.

Prototype to applications – infrastructure in action

Figure 3: EGEE-II geographical extension through related projectsEGEE began with two pilot application domains, High Energy Physics (HEP) and Biomedical Science. The HEP domain includes close collaboration with LCG to process data from the international LHC experiment communities, but also includes applications from other HEP projects such as CDF, D0, Zeus and Babar. In the Biomedical Science field, some 10 different applications are already running, ranging from protein sequence analysis to molecular docking studies used to look for new treatments for Malaria.

In addition to these pilot domains, a number of other groups have joined the EGEE infrastructure since the project started, namely Computational Chemistry, Astrophysics, Earth Sciences and Geophysics. Such new groups can join the infrastructure through a system called the “Virtuous Cycle.” In this process, new communities become aware of the availability of the EGEE service through outreach events, personal contacts or through contacts with project members in their local area and can try the grid through online demonstrations. Following this, they interact with local resource centres, which provide access to resources and aid in porting applications to the EGEE infrastructure. This allows new applications to come from within the project in an organic manner and be identified with a nearby group able to communicate the new application’s requirements to the rest of the project. Once on the Grid, new application groups receive training in all the appropriate skills they need in order to make them a self supporting community. Finally, the new group becomes an established user community on the EGEE infrastructure, demonstrating to other potential users the benefits of Grid technology and encouraging them in turn to get involved with EGEE.

The vibrant and extensive user community formed from users in these application domains is in many ways EGEE’s greatest achievement. No other production grid infrastructure exists of this size or with this breadth of active users. Growing since the start of the project, the number of successful jobs per day on the infrastructure had exceeded 19,000 by June 2005. EGEE is not only breaking new ground in understanding the unique challenges that running such a truly interdisciplinary infrastructure presents, but also passing this knowledge on to sister projects in other parts of the world, industry and more focused Grid projects.

Apart from the various academic scientific communities that are involved with EGEE, the project also supports an industrial application from French firm Compagnie General de Geophysique (CGG), who support the EGEODE Virtual Organisation,10 used for basic geo-physics research. EGEODE benefits EGEE, CGG and the geophysics community in general by freely distributing the results of its research, as well as helping EGEE attune itself to industrial requirements and expectation for the future of Grid computing as a commercial service.

Toward a permanent research infrastructure

EGEE was originally conceived as the first two years of a four year programme and, in keeping with this vision, the consortium behind EGEE recently submitted a proposal to the recent EU Information Society Research Infrastructures funding call for the second half of this programme, the EGEE-II project.

EGEE-II is a further elaboration of the EGEE mission, learning form the experience of the previous project and featuring a considerably expanded consortium and refocused mission. As well as increasing its consortium to over 90 partners from 32 countries, it increases its global vision by formalising relationships with partners in the USA, Taipei and Korea. In the USA, this also includes other large scale Grid projects such OSG and Grid3, allowing both sides to profit from one another’s experience. Further extension of the infrastructure to the Baltic, Mediterranean area, China and Latin America will be achieved through related projects also submitted to EU Information Society funding calls.

Since the beginning of EGEE, Grid technology has matured considerably, with a great number of projects across the globe producing interesting results. EGEE-II has been planned in light of these developments, allowing it to profit from them as well as passing information and experience back into the community. This has led to a refocusing of the project activities, with a greater emphasis on infrastructure management and a new dedicated effort in middleware certification integration and testing. In parallel, middleware re-engineering within the project will focus more on integrating components from outside sources including Globus, Condor, and the Virtual Data Toolkit (VDT) and from related European Grid projects.

In the applications area, EGEE-II will continue to increase the number of scientific domains and applications running on the infrastructure. This will notably include collaboration with the International Thermonuclear Experimental Reactor project (ITER)11 on fusion applications, as well as support for any other interested partners.

Overall, the long term goal of EGEE and EGEE-II is to establish a permanent public Grid infrastructure to support research of all types. Through the course of EGEE, it has become clear that profiting from such infrastructures requires the greatest possible level of interconnection with other similar efforts. As a result, through the course of EGEE, such collaboration has increased considerably, and the plans for EGEE-II were framed with such collaboration in mind. Through this strategy, not only is the effectiveness of the individual projects and infrastructures improved, but it promotes common standards and interoperability crucial to the future of Grid technology for both academic and industrial users.

1 EGEE is funded by the EU Information Society & Media directorate through the Sixth Framework Programme, contract number INFSO-RI-508833.
2 The European Nuclear Research Organization, http://www.cern.ch/
3 http://lcg.web.cern.ch/LCG/
4 EGEE uses GILDA, a dedicated testbed for dissemination and training provided by Italy's Istituto Nazionale di Fisica Nucleare. https://gilda.ct.infn.it/
5 South Eastern European Grid-enabled eInfrastrcuture Development, http://www.see-grid.org/
6 A DIgital Library Infarstructure on Grid ENabled Technology, http://www.diligentproject.org/
7 http://eu-datagrid.web.cern.ch/eu-datagrid/
8 Pronounced "gee-lite", http://www.glite.org/
9 http://www.gridforum.org/
10 Virtual Organisations (VOs) are systems for allowing distributed communities to work together and share resources on a Grid infrastructure.
11 http://www.iter.org/

URL to article: http://www.ctwatch.org/quarterly/articles/2005/11/how-to-build-an-international-grid-infrastructure/