A Owl lite ontology for Fedora 3.0 data objects based on the CMA

Fedora is a store for digital object. The exact way they are stored is not important for now. What is important is that the fedora digital objects have rdf relations to each other. Ie. the fedora digital object repository can be modelled as a rdf graph.

There is one critical difference between fedora digital objects and a normal rdf graph: Each fedora object contain its own local bit of the graph. You cannot change the number or nature of the relations from Object A, without editing object A.

An ontology for the fedora repository should preserve this invariant. The ontology must be spread out over the entire repository, and you should not be able to break the ontology in one place by changing it somewhere else. Or, in other words, if you have two seperate sets of digital objects in the repository, that each were described exactly by their respective ontologies when being the only set of objects in the repository, the combined ontology should describe the combined sets of objects. This leads to the first restriction on the ontology system:

1. The ontology must not make statements that are global for the entire repository, except for the declaration of the existence of a class or property

Fedora provide a "class" of objects called content models. These try to represent the classes of data objects, and if specified, contain the description of the data objects. These are the natural location to place the local ontology bits. But now we reach the first problems. The content models are both the classes of data objects, but they are also objects themselves. In order to describe such duality of existence, the language needed is OWL Full, or something with similar expressive power. Since such expressive languages are difficult to reason about by automated systems, we chose to use a more restricted version, called OWL lite. This imposes the second restriction on the ontology system.

2. The ontology only describes the data objects. Content models are regarded as classes, never as objects

An ontology could be used for many purposes, other than validation. If it is used as a guideline for performing changes that still leaves the objects valid, having an ontology with implicit relations pose some risks. As the implicit relations are extrapolated from the complete ontology and the existing relations, the implicit relations might change if some other part of the repository changes. This leads to the third restriction:

3. The ontology must be complete, so that every local bit provide the complete description of it local area.

Fedora rdf relations

Fedora does not allow for the full rdf specification to be used in the repository. What it basicly allows is that each object can have relations to other objects, without any qualifiers on these. There are a number of note-worthy issues about the way fedora works with rdf. The first is that fedora objects does not declare a rdf:type relation. Instead they use a fedora-model:hasModel relation to a content model. Unfortunately, OWL lite regards the relations as "owl:ObjectProperty"'s, and "rdf:type" as a "rdf:Property". So, you cannot make "fedora-model:hasModel" a "rdfs:subPropertyOf" "rdf:type". So, by the letter of the law, the 2. restriction is not possible, without OWL full. But if this is all that restricts us from using OWL lite for the ontology, there are hackish ways around it.

4. In data objects, all "fedora-model:hasModel" relations are to be regarded as "rdf:type" relations to the same subject

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:doms="http://doms.statsbiblioteket.dk/relations/default/0/1/#"
         xmlns:fedora-model="info:fedora/fedora-system:def/model#">
  <rdf:Description rdf:about="info:fedora/doms:ContentModel_DOMS">
    <fedora-model:hasModel rdf:resource="info:fedora/fedora-system:ContentModel-3.0"/>
    <doms:isPartOfCollection rdf:resource="info:fedora/doms:DOMS_Base_Collection"/>
    <doms:hasLicense rdf:resource="info:fedora/doms:Open_License" />
  </rdf:Description>
</rdf:RDF>

You begin with a RDF tag. The only allowed tag inside is the Description tag, which must be rdf:about the object itself. Inside the Description tag the relations from this object is defined, as tags with a rdf:resource attribute to designate the target. A relation is implicitly defined, by making an object have said relation to another object.

As you can see, this is a far cry from the full rdf standard. But it is sufficient to express the interrelations between objects. What Fedora really lacks is a way to express an ontology for these interrelated digital objects.

An OWL ontology

TODO: Split in OWLCLASS and OWLPROPERTIES

Describing an ontology for the entire repository of objects is only possible in OWL full. The reason is, the content models represents the class'es of objects, but they are at the same time objects themselves. This duality of existence is only possible in OWL full.

To keep the ontology expressable in OWL lite, we have decided on a fundamental restriction: The ontology only describes the data objects. In doing so, we do not need to regard the content models as objects.

Each data object contains the RELS-EXT datastream, containing the description of the data object. Each content model likewise contain such a description. In addition it contain an OWL datastream. The OWL datastream is a subset of true OWL lite, from the following rules.

There must be a <owl:Class> element, rdf:about the content model itself. This is the only <owl:Class> element allowed
The <owl:Class> element can contain property restrictions. The allowed restrictions are:
- minCardinality (0-1)
- maxCardinality (0-1)
- cardinality (0-1)
- someValuesFrom
- allValuesFrom
The allowed relations from the object should be defined as <owl:ObjectProperty>. No attributes, such as range or domain should be specified. Domain is always the objects of the current content model. Range is specified by the "allValuesFrom" restriction.

Such a datastream could look like this.

<owl:Class rdf:about="info:fedora/doms:CM_B">
  <rdfs:subClassOf>
    <owl:Restriction>
      <owl:onProperty rdf:resource="http://www.statsbiblioteket.dk/doms-relations/#hasA"/>
      <owl:allValuesFrom rdf:resource="info:fedora:/doms:CM_A"/>
    </owl:Restriction>
  </rdfs:subClassOf>
</owl:Class>
<owl:ObjectProperty rdf:about="http://www.statsbiblioteket.dk/doms-relations/#hasA"/>

Implications

Having an ontology for the data objects have one sudden implication. All relations become invalid, unless defined in the ontology. As the ontology for a specific data object reside in the attached content model(s), the relation must be defined in these.

Explicitly forbidding range and domain on a property has some odd implications for the ontology. Not having range means that the relation can point to whatever object in the repository, unless the class imposes some restrictions on it. But these restrictions are only on the relation in that class. Another class could use the same relation, with some other restrictions. When a property have multiple ranges, the meaning is that the target must be an instance of all the ranges. So, if two relations use the same name, and they both define ranges, the repository would suddenly no longer be valid. It is easier to make the ranges a class specific restriction, than to make sure there are no name-collisions between relations. For domain the problem is the same. Multiple domains mean that the source must be an instance of multiple content models. And again, a recently introduced relation might break it's nametwin somewhere else in the repository.