OntoGrid: A Semantic Grid Reference Architecture
Carole Goble and Sean Bechhofer, University of Manchester, UK
CTWatch Quarterly
November 2005

What is a Semantic Grid?

The Grid aims to support secure, flexible and coordinated resource sharing by providing a middleware platform for advanced distributing computing. Consequently, the Grid’s infrastructural machinery aims to allow collections of any kind of resources—computing, storage, data sets, digital libraries, scientific instruments, people, etc—to easily form Virtual Organisations that cross organisational boundaries in order to work together to solve a problem. A Grid depends on understanding the available resources, their capabilities, how to assemble them and how to best exploit them. Thus Grid middleware and the Grid applications they support thrive on the metadata that describes resources in all their forms, the Virtual Organisations, the policies that drive then, and so on, together with the knowledge to apply that metadata intelligently.

The Semantic Grid is a recent initiative to systematically expose semantically rich information associated with Grid resources to build more intelligent Grid services.1 The idea is to make structured semantic descriptions real and visible first class citizens with an associated identity and behaviour. We can then define mechanisms for their creation and management as well as protocols for their processing, exchange and customisation. We can separate these issues from both the languages used to encode the descriptions (from natural language text right through to logical-based assertions) and the structure and content of the descriptions themselves, which may vary from application to application.

In practice, work on Semantic Grids has primarily meant introducing technologies from the Semantic Web2 to the Grid. The background knowledge and vocabulary of a domain can be captured in ontologies – machine processable models of concepts, their interrelationships and their constraints; for example a model of a Virtual Organisation.3 Metadata labels Grid resources and entities with concepts, for example describing a job submission in terms of memory requirements and quality of service or a data file in terms of its logical contents. Rules and classification-based automatic inference mechanisms generate new metadata based on logical reasoning, for example describing the rules for membership of a Virtual Organisation and reasoning that a potential member’s credentials are satisfactory.

In recognition of the potential importance of Semantics in Grids, the Global Grid Forum standards body chartered a Semantic Grid Research Group in 2003.4 The Forum’s XML-based description languages such as the Job Submission Description Language, the Data Format Description Language and Oasis’ Security Assertion Markup Language all identify the role of semantics. Their recent Database Access and Integration Services Working Group specification identifies the importance of semantics in integration, metadata management and discovery. In July 2005 the Grid and Semantic Web Communities came together in a week long Schloss Dagstuhl seminar.5

Figure 1In the last few years, several projects have embraced the Semantic Grid vision and pioneered applications combining the strengths of the Grid and of semantic technologies, particularly the use of ontologies for describing Grid resources and improving interoperability.6 The UK myGrid7 project uses ontologies to describe and select web-based services used in the Life Sciences; the UK Geodise project uses ontologies to guide aeronautical engineers to select and configure Matlab scripts;8 the Collaboratory for Multi-scale Chemical Science9 and CombeChem10 projects both use semantic web technologies to describe provenance metadata for chemistry experiments; the US-based Biomedical Informatics Research Network uses technologies to mediate between different databases in neuroscience;11 and the UK’s CoAKTing project uses ontologies to assist in virtual meetings between scientists.12 On the Semantic Grid road we are now moving from a phase of exploratory experimentation to one of systematic investigation, architectural design and content acquisition for a semantic infrastructure that accompanies a cyberinfrastructure (Figure 1).

OntoGrid

Figure 2OntoGrid13 is an eight-partner EU FP6 project launched in October 2004 to investigate fundamental issues in Semantic Grids, bridging between the knowledge-based systems community and the Grid community. The project aims to show how knowledge technologies help deliver the next generation of Semantic Grid Computing systems and to experiment with the technological infrastructure needed for the development of knowledge-intensive, distributed open services for the Semantic Grid. The Semantic Grid should not only provide a general semantic-based computational infrastructure, but also a rich collection of knowledge services and knowledge-based services. Thus OntoGrid systematically brings together knowledge services (like ontology services, metadata stores and reasoning engines) with Grid services (such as workflow management, Virtual Organisation formation, debugging, resources brokering and data integration) adapted to semantic descriptions when they are available. This semantics-based approach to the Grid goes hand-in-hand with the exploitation of techniques from intelligent software agents and peer-to-peer (P2P) computing. OntoGrid mixes in techniques from agent computing for negotiation and coordination and peer-to-peer for distributed discovery (Figure 2).

OntoGrid is paving the way to Semantic Grids by investigating questions such as: Are semantic web technologies scalable? What’s the impact of a semantic approach to legacy grids? How do we minimize the impact? What are the minimum knowledge services needed? What should be their capabilities? How do we harvest and tend the semantic content? Is there content that is common for all Grids and how much is application specific? How, when and where does a semantic approach add value to a “traditional” Grid approach? What is an architectural framework for a Semantic Grid?

To keep our feet on the ground, the project is developing an architectural framework based on the emerging Open Grid Service Architecture (OGSA)14 and designed against two case studies from our applications in international insurance settlement and satellite data management: a Virtual Organisation Management System (VOMS) and intelligent debugging. Our first experiment is on a Semantically Aware VOMS, due in mid 2006. Most of the work has focused on a Reference Architecture for the Semantic Grid.

A principled approach to Semantic-OGSA

Figure 3Currently the Semantic Grid lacks a Reference Architecture or any kind of systematic framework for designing Semantic Grid components or applications. OGSA aims to define a core set of capabilities and behaviours for Grid systems.14 OntoGrid extends OGSA by explicitly defining a lightweight mechanism that will allow for the explicit use of semantics and defining the associated knowledge services to support a spectrum of service capabilities. Semantic-OGSA (S-OGSA) is guided by seven design principles identified by the project (Figure 3):

  1. 1. Parsimony: the architectural framework should be as lightweight as necessary, minimise the impact on legacy Grid infrastructure and tooling, and not dictate the definition of the contents of the descriptions – these will be application or middleware dependent.
  2. 2. Extensibility: rather than define a complete and generic architecture, define an extensible and customisable one. Generality is the enemy of applicability.
  3. 3. Uniformity: Semantic Grids are Grids, so all knowledge services are OGSA-compliant Grid services, and semantic descriptions have a lifetime and a life cycle just like other Grid entities. As metadata stores and ontology services are just special kinds of data services, we have adopted the OGSA-Data Access and Integration specification15 for their deployment and can potentially exploit other data grid capabilities.
  4. 4. Diversity: a dynamic ecosystem of Grid services ranging over a spectrum of semantic capabilities will coexist at any one time. Semantic capability may be possible for some Grid resources all of the time, all Grid resources some of the time, or not all resources all of the time.
  5. 5. Multiform + Multiplicity: the same semantic description may be captured in many representational forms (text, logic, ontology, rule) and any resource’s property may have many different descriptions.
  6. 6. Enlightenment: services should have a straightforward migration path that enables them to become knowledgeable and minimise the cost of doing so.
  7. 7. Conceptual: S-OGSA is a reference architecture. Thus it should apply equally to different Grid middleware platforms such as the Globus Toolkit,16 the EU EGEE gLite platform,17 the UK Open Middleware Infrastructure Institute Release,18 or regular Web Services.

These principles pervade OntoGrid development and our thinking.

Models, Capabilities and Mechanisms

Figure 4S-OGSA has three main aspects: the model (the elements that it is composed of and its interrelationships), the capabilities (the services needed to deal with such components) and the mechanisms (the elements that will enable communication when deploying the architecture in an application).

S-OGSA Model. Although there is no standardized overall model of the Grid and its basic concepts, there is a vocabulary associated with OGSA, and there are project specific models3 19 and capability focused models like the Common Information Model (CIM)20 from the Distributed Management Task Force and the Job Submission Description Language21 from Global Grid Forum. S-OGSA introduces the notion of Semantics into the model of the Grid defining Grid Entities, Knowledge Entities (e.g. ontologies, rules, text), Semantic Bindings between these two for a Grid Entity to become Semantic Grid Entities. Semantic Bindings are (possibly temporary) metadata assertions on Grid entities and are Grid resources with their own identity, manageability features and metadata.

S-OGSA Capabilities. S-OGSA is a mixed economy of these semantically enabled and disabled services. We add to the set of capabilities that Grid middleware should provide to include the Semantic Provisioning Services and Semantically Aware Grid Services (Figure 4).

Semantic Provisioning Services dynamically provision an application with semantic grid entities in the same way a data grid provisions an application with data. The services support the creation, storage, update, removal and access of different forms of Knowledge Entities and Semantic Bindings. Ontology services store and provide access to the conceptual models representing knowledge; reasoning services support computational reasoning with those conceptual models; metadata services store and provide access to semantic bindings and the annotation services generate metadata from different types of information sources, like databases, services and provenance data. These four build on the past work of members of the consortium: a knowledge parser for extracting information from online sources;22 a metadata store;23 and a suite of ontologies and supporting tools to generate semantic descriptions for Grid Services.24

Semantically Aware Grid Services exploit knowledge technologies to deliver their functionality, for example metadata aware authentication of a given identity by a Virtual Organisation manager service or execution of a search request over entries in a semantically enhanced resource catalogue. Sharing this knowledge brings flexibility to components and increases interoperability. OntoGrid is working on a principled re-factoring strategy for legacy Grid Services to quantify the impact on current Grids.

S-OGSA Mechanisms. The model and capabilities are platform independent. To demonstrate the approach in practice, we map the conceptual design to a specific software platform, namely the Globus Toolkit 4, by mapping the semantic bindings to resource properties defined using the Web Service Resource Framework and incorporating S-OGSA entities into the Resource Model of the Common Information Model.

Semantic Grid Challenges

Grid Services currently deal with this semantic infrastructure in ad-hoc and hidden ways, providing poor mechanisms for sharing and openly processing knowledge. This makes the knowledge hard to share, and hard to interpret by services other than the originators. Often these schemas are fixed, which makes them rather inflexible. Much of the metadata is hard-coded and buried in code libraries, type systems, or grid applications. This makes it hard to adapt and configure. Finally, understanding and know-how is frequently tacit, embedded in best practice and experience rather than explicitly recorded. This makes sharing, customisation and adaptation difficult, and dependent on scarce human effort. The Semantic Grid aims to provision a semantic infrastructure for Grid infrastructure to improve sharing, enable unanticipated reuse of resources, support interoperability and enable more flexible and configurable middleware.

OntoGrid is a step towards the Semantic Grid. There are many challenges to explore. Many are technical—architectural or theoretical foundations, the maturity of semantic and grid technologies, their appropriateness for the required tasks, their scalability, the separation of grid level and application specific semantics, and making it easier not harder by combining semantic infrastructure with Grid computing infrastructure. Others are operational—gathering and maintaining the semantic content, reliance on unavailable tooling, and convincingly showing the added value of semantics when the return on investment may come downstream, be long term and benefit developers other than the originators. Some are sociological and political—the interplay between the Semantic and the Grid communities, the inter-factional battles within those communities and the legal, security and privacy implications of clearly exposed metadata and automated reasoning.

Acknowledgements The OntoGrid Consortium: Universidad Politécnica de Madrid, Spain (Co-ordinator), The University of Manchester, UK, The University of Liverpool UK, Technical University of Crete (TUC), Greece, Intelligent Software Components, Spain, Y’all B.V., The Netherlands, Deimos Space, S.L, Spain, Boyd International, B.V. The Netherlands. This work is supported by the EU FP6 OntoGrid project (STREP 511513) funded by the Grid-based Systems for solving complex problems.

Glossary
BIRN Biomedical Informatics Research Network. An NIH initiative supporting distributed collaborations in biomedical science. http://www.nbirn.net/
CIM Common Information Model. A common definition of management information for systems, networks, applications and services. http://www.dmtf.org/standards/cim/
CMCS Collaboratory for Multi-scale Chemical Science. Project supporting collaboration through the use of adaptive infrastructure. http://cmcs.ca.sandia.gov/
DFDL Data Format Description Language. A language for describing the structure of binary and character encoded (ASCII/Unicode) files and data streams. http://forge.gridforum.org/projects/dfdl-wg/
EGEE Enabling Grids for E-SciencE. EU funded project building grid infrastructure for scientists. http://public.eu-egee.org/
GGF Global Grid Forum. The community of users, developers, and vendors leading the global standardization effort for grid computing. http://www.ggf.org/
gLite A lightweight middleware framework from the EGEE project. http://glite.web.cern.ch/glite/
GT(4) Globus Toolkit (4). An open source software toolkit used for building Grid systems and applications. Developed by the Globus Alliance. http://www.globus.org/toolkit/
JSDL Job Submission Description Language. https://forge.gridforum.org/projects/jsdl-wg/
Matlab A language and environment supporting computationally intensive tasks. http://www.mathworks.com/
OGSA Open Grid Services Architecture. A set of core capabilities and behaviours that address key concerns in Grid systems. http://www.globus.org/ogsa/
OGSA-DAI OGSA Data Access and Integration. Middleware to assist with access and integration of data from separate data sources via the grid. http://www.ogsadai.org/
OMII Open Middleware Infrastructure Institute. An EPSRC funded initiative providing reliable, interoperable and open-source Grid middleware. http://www.omii.ac.uk/
P2P Peer to Peer. Architectures which allow autonomous peers to interoperate in a decentralized, distributed manner for fulfilling individual and/or common goals
SAML Security Assertion Markup Language. A language for exchanging authentication and authorization data between security domains. http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=security
S-OGSA Semantic OGSA.
VO Virtual Organisation. Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources.
VOMS Virtual Organisation Management Service. A service managing a VO.
WSRF Web Services Resource Framework. A framework defining conventions for modelling and accessing stateful resources using Web services http://www.globus.org/wsrf/
1 C. Goble, D. De Roure, N. Shadbolt and A. Fernandes "Enhancing Services and Applications with Knowledge and Semantics" in The Grid 2 Blueprint for a New Computing Infrastructure Second Edition eds. Ian Foster and Carl Kesselman, 2003, Morgan Kaufman, November 2003.
2 J. Hendler "Science and the Semantic Web," Science 299: 520-521, 2003.
3 L. Pouchard, L. Cinquini, B. Drach, et al., "An Ontology for Scientific Information in a Grid Environment: the Earth System Grid," CCGrid 2003 (Symposium on Cluster Computing and the Grid), Tokyo, Japan, May 12-15, 2003.
4 http://www.semanticgrid.org/
5 http://www.dagstuhl.de/05271/
6 D. De Roure, Y. Gil, J. Hendler "Guest Editors' Introduction: E-Science," IEEE Intelligent Systems 19(1), Jan/Feb 2004: 24-25.
7 C. Wroe, C. Goble, M. Greenwood, P. Lord, S. Miles, J. Papay, T. Payne, L. Moreau. "Automating Experiments Using Semantic Data on a Bioinformatics Grid," IEEE Intelligent Systems special issue on e-Science Jan/Feb 2004.
8 L. Chen, N.R. Shadbolt, C.A. Goble, F. Tao, S.J. Cox, C. Puleston, P.R. "Towards a Knowledge-based Approach to Semantic Service Composition" 2nd International Semantic Web Conference, 20-24 October, 2003, Sanibel Island, Florida, USA.
9 J. D. Myers, C. Pancerella, C. Lansing, K. L. Schuchardt, B. D. "Multi-Scale Science: Supporting Emerging Practice with Semantically Derived Provenance," Proceedings of the Workshop on Semantic Web Technologies for Searching and Retrieving Scientific Data at Sanibel Island, Florida on October 20, 2003.
10 H. Fu. J. G. Frey. "Semantic description and tracking of analysis of chemical data," Second International Workshop on the Knowledge Grid and Grid Intelligence, Beijing, China, 2004, 140-149.
11 S. Bowers and B. Ludäscher "An Ontology-Driven Framework for Data Transformation in Scientific Workflows," Internaltional Workshop on Data Integration in the Life Sciences (DILS' 04), March 25-26, 2004 Leipzig, Germany, LNCS 2994.
12 M. Bachler, S. Shum, Y. Chen-Burger, J. Dalton, D. De Roure, M. Eisenstadt, J. Frey, J. Komzak, D. Michaelides, K. Page, S. Potter, N. Shadbolt, A. Tate, "Collaboration in the Semantic Grid: a Basis for e-Learning," Grid Learning Services (GLS 2004) at the 7th International Conference on Intelligent Tutoring Systems Workshop (ITS 2004), (Maceio, Brazil, 2004), 1-12.
13 http://www.ontogrid.net/
14 Foster, H. Kishimoto, A. Savva, D. Berry, A. Djaoui, A. Grimshaw, B. Horn, F. Maciel, F. Siebenlist, R. Subramaniam, J. Treadwell, J. Von Reich, "The Open Grid Services Architecture," http://www.ggf.org/documents/GFD.30, 2005.
15 OGSA Data Access and Integration. Middleware to assist with access and integration of data from separate data sources via the grid.
16 http://www.globus.org/
17 http://public.eu-egee.org/
18 http://www.omii.ac.uk/
19 N. Sharman, N. Alpdemir, J. Ferris, M. Greenwood, P. Li and C. Wroe, "The myGrid Information Model," Proceedings of UK e-science All Hands Meeting, 2004, available from http://www.mygrid.org.uk/
20 Common Information Model (CIM) A common definition of management information for systems, networks, applications and services. http://www.dmtf.org/standards/cim/
21 Job Submission Description Language http://forge.gridforum.org/projects/jsdl-wg/
22 Knowledge Parser http://www.isoco.com/en/innovation/applications/kp.html
23 Z. Kaoudi, I. Miliaraki, S. Skiadopoulos, M. Magiridou, E. Liarou, S. Idreos, and M. Koubarakis, "Specification and Design of Ontology Services and Semantic Grid Services on top of Self-organized P2P Networks." OntoGrid Deliverable D4.1, 2005.
24 C. Goble, A. Gómez-Pérez, R. González-Cabero, M. S. Pérez. "ODESGS Framework, Knowledge-based markup for Semantic Grid Services," Proceedings of the Third International Conference on Knowledge Capture (K-CAP 2005), Banff, Canada, 2005, 199:200.

URL to article: http://www.ctwatch.org/quarterly/articles/2005/11/ontogrid-a-semantic-grid-reference-architecture/