Differences between revisions 8 and 9
Revision 8 as of 2008-08-28 10:11:33
Size: 8295
Editor: abr
Comment:
Revision 9 as of 2008-08-28 10:44:57
Size: 8297
Editor: abr
Comment:
Deletions are marked like this. Additions are marked like this.
Line 29: Line 29:

A Owl lite ontology for Fedora 3.0 data objects based on the CMA

Fedora is a store for digital object. The exact way they are stored is not important for now. What is important is that the fedora digital objects have rdf relations to each other. Ie. the fedora digital object repository can be modelled as a rdf graph.

There is one critical difference between fedora digital objects and a normal rdf graph: Each fedora object contain its own local bit of the graph. You cannot change the number or nature of the relations from Object A, without editing object A.

An ontology for the fedora repository should preserve this invariant. The ontology must be spread out over the entire repository, and you should not be able to break the ontology in one place by changing it somewhere else. Or, in other words, if you have two seperate sets of digital objects in the repository, that each were described exactly by their respective ontologies when being the only set of objects in the repository, the combined ontology should describe the combined sets of objects. This leads to the first restriction on the ontology system:

1. The ontology must not make statements that are global for the entire repository, except for the declaration of the existence of a class or property

Fedora provide a "class" of objects called content models. These try to represent the classes of data objects, and if specified, contain the description of the data objects. These are the natural location to place the local ontology bits. But now we reach the first problems. The content models are both the classes of data objects, but they are also objects themselves. In order to describe such duality of existence, the language needed is OWL Full, or something with similar expressive power. Since such expressive languages are difficult to reason about by automated systems, we chose to use a more restricted version, called OWL lite. This imposes the second restriction on the ontology system.

2. The ontology only describes the data objects. Content models are regarded as classes, never as objects

An ontology could be used for many purposes, other than validation. If it is used as a guideline for performing changes that still leaves the objects valid, having an ontology with implicit relations pose some risks. As the implicit relations are extrapolated from the complete ontology and the existing relations, the implicit relations might change if some other part of the repository changes. This leads to the third restriction:

3. The ontology must be complete, so that every local bit provide the complete description of it local area.

Fedora rdf relations

Fedora does not allow for the full rdf specification to be used in the repository. What it basicly allows is that each object can have relations to other objects, without any qualifiers on these. There are a number of note-worthy issues about the way fedora works with rdf. The first is that fedora objects does not declare a rdf:type relation. Instead they use a fedora-model:hasModel relation to a content model. Unfortunately, OWL lite regards the relations as "owl:ObjectProperty"'s, and "rdf:type" as a "rdf:Property". So, you cannot make "fedora-model:hasModel" a "rdfs:subPropertyOf" "rdf:type". So, by the letter of the law, the 2. restriction is not possible, without OWL full. But if this is all that restricts us from using OWL lite for the ontology, there are hackish ways around it.

4. In data objects, all "fedora-model:hasModel" relations are to be regarded as "rdf:type" relations to the same subject

In fedora, there is no requirement that the relations from an object actually refers to another object. For the ontology, this is problematic, in regards to the 3. requirement. If there is to be no implicit definition of relations, all relations must be defined in a content model. Objects belonging to a non-existing content model will implicitly define the content model class, and as this is disallowed, all "fedora-model:hasModel" relations must point to existing content models, which define themselves as classes. This boils down to the following requirements:

5. All objects must have content models, and all fedora-model:hasModel relations must point to real content models

6. All content models must contain OWL that define themselves as classes, and list all the allowed relations for their subscribing objects

An OWL ontology

Just like Fedora only allows the "rdf:Description" tag in each object, we have chosen to similarly restrict what owl tags there can reside in a content model. In fact, there are just two allowed header elements, "owl:Class" and "owl:ObjectProperty"

Each content model must contain one and just one "owl:Class" element, about the content model itself. In this element the ordinary owl syntax can be used to place restrictions on the relations. The allowed restrictions are:

  • minCardinality (0-1)
  • maxCardinality (0-1)
  • cardinality (0-1)
  • someValuesFrom
  • allValuesFrom

All relations allowed from from a data object should be defined in at least one of it's content models, in the form of "owl:ObjectPropery". No additional

Describing an ontology for the entire repository of objects is only possible in OWL full. The reason is, the content models represents the class'es of objects, but they are at the same time objects themselves. This duality of existence is only possible in OWL full.

To keep the ontology expressable in OWL lite, we have decided on a fundamental restriction: The ontology only describes the data objects. In doing so, we do not need to regard the content models as objects.

Each data object contains the RELS-EXT datastream, containing the description of the data object. Each content model likewise contain such a description. In addition it contain an OWL datastream. The OWL datastream is a subset of true OWL lite, from the following rules.

  • There must be a <owl:Class> element, rdf:about the content model itself. This is the only <owl:Class> element allowed

  • The <owl:Class> element can contain property restrictions. The allowed restrictions are:

    • minCardinality (0-1)
    • maxCardinality (0-1)
    • cardinality (0-1)
    • someValuesFrom
    • allValuesFrom
  • The allowed relations from the object should be defined as <owl:ObjectProperty>. No attributes, such as range or domain should be specified. Domain is always the objects of the current content model. Range is specified by the "allValuesFrom" restriction.

Such a datastream could look like this.

<owl:Class rdf:about="info:fedora/doms:CM_B">
  <rdfs:subClassOf>
    <owl:Restriction>
      <owl:onProperty rdf:resource="http://www.statsbiblioteket.dk/doms-relations/#hasA"/>
      <owl:allValuesFrom rdf:resource="info:fedora:/doms:CM_A"/>
    </owl:Restriction>
  </rdfs:subClassOf>
</owl:Class>
<owl:ObjectProperty rdf:about="http://www.statsbiblioteket.dk/doms-relations/#hasA"/>

Implications

Having an ontology for the data objects have one sudden implication. All relations become invalid, unless defined in the ontology. As the ontology for a specific data object reside in the attached content model(s), the relation must be defined in these.

Explicitly forbidding range and domain on a property has some odd implications for the ontology. Not having range means that the relation can point to whatever object in the repository, unless the class imposes some restrictions on it. But these restrictions are only on the relation in that class. Another class could use the same relation, with some other restrictions. When a property have multiple ranges, the meaning is that the target must be an instance of all the ranges. So, if two relations use the same name, and they both define ranges, the repository would suddenly no longer be valid. It is easier to make the ranges a class specific restriction, than to make sure there are no name-collisions between relations. For domain the problem is the same. Multiple domains mean that the source must be an instance of multiple content models. And again, a recently introduced relation might break it's nametwin somewhere else in the repository.

FedoraOntology (last edited 2010-03-17 13:09:09 by localhost)