Differences between revisions 12 and 13
Revision 12 as of 2008-08-28 14:08:53
Size: 16509
Editor: abr
Comment:
Revision 13 as of 2008-09-09 14:09:56
Size: 18841
Editor: abr
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= A Owl lite ontology for Fedora 3.0 data objects based on the CMA =

Fedora is a store for digital object. The exact way they are stored is not important for now. What is important is that the fedora digital objects have rdf relations to each other. Ie. the fedora digital object repository can be modelled as a rdf graph.

There is one critical difference between fedora digital objects and a normal rdf graph: Each fedora object contain its own local bit of the graph. You cannot change the number or nature of the relations from Object A, without editing object A.

An ontology for the fedora repository should preserve this invariant. The ontology must be spread out over the entire repository, and you should not be able to break the ontology in one place by changing it somewhere else. Or, in other words, if you have two seperate sets of digital objects in the repository, that each were described exactly by their respective ontologies when being the only set of objects in the repository, the combined ontology should describe the combined sets of objects. This leads to the first restriction on the ontology system:
= A OWL LITE ontology for Fedora 3.0 data objects based on the CMA =

Fedora is a store for digital objects. The exact way they are stored is not important for this discussion. What is important is that the Fedora digital objects have RDF relations to each other. I.e. the Fedora digital object repository can be modelled as an RDF graph.

There is one critical difference between Fedora digital objects and a normal RDF graph: Each Fedora object contains its own local bit of the graph. You cannot change the number or nature of the relations from object A, without editing object A.

An ontology for the Fedora repository should preserve this characteristic. The ontology must be spread out over the entire repository, and you should not be able to break the ontology in one place by changing it somewhere else. In other words, if you have two separate Fedora repositories, each described with by seperate ontologies, and you transfer them to the same repository, the combined ontology should describe the combined sets of objects. This leads to the first property on the ontology system:
Line 11: Line 11:
Fedora provide a "class" of objects called content models. These try to represent the classes of data objects, and if specified, contain the description of the data objects. These are the natural location to place the local ontology bits. But now we reach the first problems. The content models are both the classes of data objects, but they are also objects themselves. In order to describe such duality of existence, the language needed is OWL Full, or something with similar expressive power. Since such expressive languages are difficult to reason about by automated systems, we chose to use a more restricted version, called OWL lite. This imposes the second restriction on the ontology system. Fedora provides a "class" of objects called Content Models. These try to represent the classes of data objects, and if specified, contain the description of the data objects. These are the natural location to place the local ontology bits. But now we reach the first problems. The Content models are both the classes of data objects, but they are also objects themselves. In order to describe such duality of existence, the language needed is OWL FULL, or something with similar expressive power. Since such expressive languages are difficult to reason about by automated systems, we chose to use a more restricted version, called OWL LITE. This imposes the second property on the ontology system.
Line 15: Line 15:
An ontology could be used for many purposes, other than validation. If it is used as a guideline for performing changes that still leaves the objects valid, having an ontology with implicit relations pose some risks. As the implicit relations are extrapolated from the complete ontology and the existing relations, the implicit relations might change if some other part of the repository changes. This leads to the third restriction:

''' 3. The ontology must be complete, so that every local bit provides the complete description of its local area. '''


== Fedora rdf relations ==

Fedora does not allow for the full rdf specification to be used in the repository. What it basicly allows is that each object can have relations to other objects, without any qualifiers on these. There are a number of note-worthy issues about the way fedora works with rdf. The first is that fedora objects does not declare a rdf:type relation. Instead they use a fedora-model:hasModel relation to a content model. Unfortunately, OWL lite regards the relations as "owl:ObjectProperty"'s, and "rdf:type" as a "rdf:Property". So, you cannot make "fedora-model:hasModel" a "rdfs:subPropertyOf" "rdf:type". So, by the letter of the law, the 2. restriction is not possible, without OWL full. But if this is all that restricts us from using OWL lite for the ontology, there are hackish ways around it.

''' 4. In data objects, all "fedora-model:hasModel" relations are to be regarded as "rdf:type" relations to the same subject '''

In fedora, there is no requirement that the relations from an object actually refers to another object. For the ontology, this is problematic, in regards to the 3. requirement. If there is to be no implicit definition of relations, all relations must be defined in a content model. Objects belonging to a non-existing content model will implicitly define the content model class, and as this is disallowed, all "fedora-model:hasModel" relations must point to existing content models, which define themselves as classes. This boils down to the following requirements:

''' 5. All objects must have content models, and all fedora-model:hasModel relations must point to real content models '''

''' 6. All content models must contain OWL that define themselves as classes, and list all the allowed relations for their subscribing objects '''


== An OWL ontology ==

Just like Fedora only allows the "rdf:Description" tag in each object, we have chosen to similarly restrict what owl tags there can reside in a content model. In fact, there are just two allowed header elements, "owl:Class" and "owl:ObjectProperty".

Each content model must contain one and just one "owl:Class" element, about the content model itself. In this element the ordinary owl syntax can be used to place restrictions on the relations. The allowed restrictions are:
An ontology with implicit rules, properties or classes, could lead to some potential problems. When part of the ontology is derived from the whole ontology, the effects of changes to the ontology can become difficult to predict. Especially the removal or introduction of a class could affect the nature of other classes. In effect, this means that someone wanting to use the ontology must know the entire ontology, in order to extrapolate anything implicit, which is in conflict with property 1. To make this explicit, the third property is introduced:

''' 3. The ontology must be locally complete, so that every local bit provides the complete description of its local area. '''


== Fedora RDF relations ==

Fedora does not allow for the FULL RDF specification to be used in the repository. What it basicly allows is that each object can have properties relating them to other objects (called relations), and literal properties. There can be no qualifiers on the properties.

There are a number of note-worthy issues about the way Fedora works with RDF. The first is that Fedora objects do not declare a RDF:type property. Instead they use a Fedora-model:hasModel property to relate to a Content model. Unfortunately, OWL LITE regards the relations as "OWL:ObjectProperty"'s, and "RDF:type" as a "RDF:Property". As they are different, you must use OWL FULL to define:
{{{
Fedora-model:hasModel RDFs:subPropertyOf RDF:type
}}}
So, in OWL LITE Content Models cannot be regarded as classes, in violation of property 2. But as this is all that prevents from using OWL LITE for the ontology, there are hackish ways around it. And thus is property 4 defined.

''' 4. In data objects, all "Fedora-model:hasModel" relations are to be regarded as "RDF:type" relations to the same subject '''

In Fedora, it is not required that the relations from an object actually refers to another object. For an ontology, this is problematic, in regards to the 3. property. As the Content Model define the class of an object, using a nonexisting content model will mean that the class is implicitly defined. Likewise, if there exist a relationship between two objects, which is not defined in the ontology, the relationship is implictly defined. As there is to be no implicit definition of properties/relations, all such must be defined in a content model. This impose the properties 5 and 6.

''' 5. All Fedora-model:hasModel relations must point to real Content models '''

''' 6. All allowed relations must be defined in the ontology '''



== Defining ontologies by OWL LITE ==


When the properties 2 and 6 are expressed in OWL, they become the property 7.

''' 7. All Content models must contain OWL that define themselves as classes, and list all the allowed relations for their subscribing objects '''

A Fedora object consist of a number of datastreams. One datastream, RELS-EXT have been reserved for the Fedora rdf statements. We choose to reserve another datastream, ONTOLOGY, to contain the ontology definitions.

Just like Fedora only allows the "RDF:Description" tag in each object, we have chosen to similarly restrict what OWL tags there can reside in a Content model. In fact, there are just three allowed elements inside the rdf:RDF tag; "OWL:Class", "OWL:ObjectProperty" and "OWL:DatatypeProperty".

Each Content model must contain one and just one "OWL:Class" element, about the Content model itself. In this element the ordinary OWL syntax can be used to place restrictions on the Properties. The allowed restrictions are:
Line 43: Line 57:
You are not allowed to use the "rdfs:subClassOf" property to make the class a subclass of another content model.

All relations allowed from from a data object should be defined in at least one of it's content models, in the form of "owl:ObjectPropery". This is important, so I will say it again: The complete list of allowed relations in a data object is the set of ObjectProperties defined by it's content models. A relation can be declared in multiple content models, but if a content model place restrictions on a relation, it must declare the property itself.
The reason behind this requirement is just requirement 3. Even through all the declaration of ObjectProperties are global for the repository, and thereby allowed for all objects in the repository, the demand is that each data object should be described by just the local content models, ie. those it relates to through the "fedora-model:hasModel" relation.

Looking at requirement 1. in the contect of "owl:ObjectProperty", it becomes clear that range and domain are not allowed. This is unfortunately required. Since neither OWL nor Fedora provide a way to ensure that the same relation is not defined twice, it is entirely possible for two unrelated content models in the repository to define the same property. Each part of the repository will be valid viewed locally, but when regarding the repository as a whole, the two different definitions will be combined. Having two domains for a property mean that the source must be of both types, not either, and likewise for range, and the repository as a whole will be invalid. To prevent the risk of such errors the use of domain and range are disallowed.
You are not allowed to use the "RDFs:subClassOf" property to make the class a subclass of another Content model.

All relations allowed from from a data object should be defined as in at least one of its Content models, in the form of "OWL:ObjectProperty". This is important, so I will say it again: The complete list of allowed relations in a data object is the set of ObjectProperties defined by it's Content models. A relation can be declared in multiple Content models, but if a Content model place restrictions on a relation, it must declare the property itself.
The reason behind this requirement is just property 3. Even through all the declaration of ObjectProperties are global for the repository, and thereby allowed for all objects in the repository, the demand is that each data object should be described by just the local Content models, ie. those it relates to through the "Fedora-model:hasModel" relation.

Fedora will also allow an object to have literal properties. Such properties are defined by the "OWL:DatatypeProperty" tag
.

Looking at property 1 in the contect of "OWL:ObjectProperty", it becomes clear that range and domain are not allowed. This is unfortunately required. Since neither OWL nor Fedora provide a way to ensure that the same relation is not defined twice, it is entirely possible for two unrelated Content models in the repository to define the same property. Each part of the repository will be valid viewed locally, but when regarding the repository as a whole, the two different definitions will be combined. Having two domains for a property mean that the source must be of both types, not either, and likewise for range, and the repository as a whole will be invalid. To prevent the risk of such errors the use of domain and range are disallowed.

''' 8. "rdf:range" and "rdf:domain" are not allowed on any properties.'''

Instead of "rdf:range", one should use the "allValuesFrom" restriction. This restriction defines a range for the property, but only in the given class. As such, the restriction will have no global effect. "rdf:domain" is just not nessesary. The property 3 implies that the ONTOLOGY in a Content Model should describe the local area, i.e. the objects subscribing to that content model. The result of this is that the domain, so to speak, of a property will always be the Content Model in which it was defined. But again, this restriction will have no global effect, the property defined somewhere else will have some other Content Model as its domain.
Line 54: Line 74:
    <rdf:Description rdf:about="info:fedora/doms:Object_A1">
        <fedora-model:hasModel rdf:resource="info:fedora/doms:CM_A"/>
        <doms:hasB rdf:resource="info:fedora/doms:Object_B1"/>
    </rdf:Description>
    <RDF:Description RDF:about="info:Fedora/doms:Object_A1">
        <Fedora-model:hasModel RDF:resource="info:Fedora/doms:CM_A"/>
        <doms:hasB RDF:resource="info:Fedora/doms:Object_B1"/>
    </RDF:Description>
Line 60: Line 80:
    <owl:Class rdf:about="info:fedora/doms:CM_A">
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty
                        rdf:resource="http://www.statsbiblioteket.dk/doms-relations/#hasB"/>
                <owl:cardinality
                        rdf:datatype=
    <OWL:Class RDF:about="info:Fedora/doms:CM_A">
        <RDFs:subClassOf>
            <OWL:Restriction>
                <OWL:onProperty
                        RDF:resource="http://www.statsbiblioteket.dk/doms-relations/#hasB"/>
                <OWL:cardinality
                        RDF:datatype=
Line 69: Line 89:
                </owl:cardinality>
            </owl:Restriction>
        </rdfs:subClassOf>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty
                        rdf:resource="http://www.statsbiblioteket.dk/doms-relations/#hasB"/>
                <owl:allValuesFrom rdf:resource="info:fedora:/doms:CM_B"/>
            </owl:Restriction>
        </rdfs:subClassOf>
    </owl:Class>
    <owl:ObjectProperty
            rdf:about="http://www.statsbiblioteket.dk/doms-relations/#hasB"/>

}}}
A object A1 and a Content model CM_A is defined. There is one allowed relation for A1, the #hasB relation. Two restrictions are placed on this relation. There must be one, and just one such relation in A1, and it must refer to an object of class/content model CM_B. In fact, A1 has one such relation, and it refers to the object B1, which follows below.
                </OWL:cardinality>
            </OWL:Restriction>
        </RDFs:subClassOf>
        <RDFs:subClassOf>
            <OWL:Restriction>
                <OWL:onProperty
                        RDF:resource="http://www.statsbiblioteket.dk/doms-relations/#hasB"/>
                <OWL:allValuesFrom RDF:resource="info:Fedora:/doms:CM_B"/>
            </OWL:Restriction>
        </RDFs:subClassOf>
    </OWL:Class>
    <OWL:ObjectProperty
            RDF:about="http://www.statsbiblioteket.dk/doms-relations/#hasB"/>

}}}
A object A1 and a Content model CM_A is defined. There is one allowed relation for A1, the #hasB relation. Two restrictions are placed on this relation. There must be one, and just one such relation in A1, and it must refer to an object of class/Content model CM_B. In fact, A1 has one such relation, and it refers to the object B1, which follows below.
Line 88: Line 108:
    <rdf:Description rdf:about="info:fedora/doms:Object_B1">
        <fedora-model:hasModel rdf:resource="info:fedora/doms:CM_B"/>
    </rdf:Description>
    <RDF:Description RDF:about="info:Fedora/doms:Object_B1">
        <Fedora-model:hasModel RDF:resource="info:Fedora/doms:CM_B"/>
    </RDF:Description>
Line 94: Line 114:
    <owl:Class rdf:about="info:fedora/doms:CM_B">
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty
                        rdf:resource="http://www.statsbiblioteket.dk/doms-relations/#hasA"/>
                <owl:allValuesFrom rdf:resource="info:fedora:/doms:CM_A"/>
            </owl:Restriction>
        </rdfs:subClassOf>
    </owl:Class>
    <owl:ObjectProperty
            rdf:about="http://www.statsbiblioteket.dk/doms-relations/#hasA"/>
}}}
Here is the the object B1, and it's content model CM_B. There is one allowed relation from a B1, the #hasA relation. There is just one restriction on this relation, that it must refer to something of class/content model CM_A. No cardinality restriction is defined, so B1 does not need to have the relation, and in fact, it does not have it.
    <OWL:Class RDF:about="info:Fedora/doms:CM_B">
        <RDFs:subClassOf>
            <OWL:Restriction>
                <OWL:onProperty
                        RDF:resource="http://www.statsbiblioteket.dk/doms-relations/#hasA"/>
                <OWL:allValuesFrom RDF:resource="info:Fedora:/doms:CM_A"/>
            </OWL:Restriction>
        </RDFs:subClassOf>
    </OWL:Class>
    <OWL:ObjectProperty
            RDF:about="http://www.statsbiblioteket.dk/doms-relations/#hasA"/>
}}}
Here is the the object B1, and it's Content model CM_B. There is one allowed relation from a B1, the #hasA relation. There is just one restriction on this relation, that it must refer to something of class/Content model CM_A. No cardinality restriction is defined, so B1 does not need to have the relation, and in fact, it does not have it.
Line 111: Line 131:
While this model does not require it, it does lend itself readily to a inheritance system for content models. In this section, such an anmendment will be outlined.

There are two apparant ways to specify inheritance. It could be added as "rdfs:subClassOf" (or a subproperty of this) tag inside the "owl:Class" element in the OWL datastream or it could be added as a relation in the RELS-EXT datastream for the content models. Both ways have their advantages. There is really no advantage in using the OWL datastream over the RELS-EXT datastream. When the relation is defined in RELS-EXT, it is indexed by Fedora, and you can make queries about it.

Inheritance of content models does have some implications for data objects. In the standard fedora worldview, a object does not belong to content model, unless it has a "fedora-model:hasModel" relation to it. We want to maintain this invariant. So, in addition to having "rdfs:subClassOf" between the content models, a data object must have "fedora-model:hasModel" relations to ALL the content models inherited by it's primary content models.

The OWL schema in a content model will now not reside solely in the OWL datastream. As the information about any parent classes is placed in the RELS-EXT datastream, it now have to be included in the ontology. But this provide a new problem; a owl class is not allowed to have relations to other objects, like an owl individual. So, only the "rdfs:subClassOf" relation should be extracted from the RELS-EXT datastream, the other relations should be disregarded.

The problem about "rdf:type" not being extensible is also relevant for "rdf:subClassOf". There are two ways to model the inheritance relations in RELS-EXT. Firstly, you can use "rdfs:subClassOf" directly. Alternatively, you can define some other property to have the same semantic meaning, just like we did with "fedora-model:hasModel" and "rdf:type".

=== An example of inheritance ===
In the following "doms:extendsModel" have been used in place of "rdfs:subClassOf" but carries the same meaning.
While this model does not require it, it does lend itself readily to a inheritance system for Content models. In this section, such an amendment will be outlined.

There are two apparant ways to specify inheritance. It could be added as "RDFs:subClassOf" (or a subproperty of this) tag inside the "OWL:Class" element in the ONTOLOGY datastream or it could be added as a relation in the RELS-EXT datastream for the Content models. Both ways will work, but there is really no advantage in using the ONTOLOGY datastream (to specify the inheritance)over the RELS-EXT datastream. When the relation is defined in RELS-EXT, it is indexed by Fedora, and you can make triple-store queries about it.

The problem about "RDF:type" not being extensible is also relevant for "RDF:subClassOf". There are two ways to model the inheritance relations in RELS-EXT. Firstly, you can use "RDFs:subClassOf" directly. Alternatively, you can define some other property to have the same semantic meaning, just like we did with "Fedora-model:hasModel" and "RDF:type". We have chosen the last option, and reserved a property "doms:extendsModel", to represent Content Model inheritance.

Inheritance of Content models does have some implications for data objects. In the standard Fedora worldview, a object does not belong to Content model, unless it has a "Fedora-model:hasModel" relation to it. For now, we want to maintain this invariant. So, in addition to having "doms:extendsModel" between the Content models, a data object must have "Fedora-model:hasModel" relations to ALL the Content models inherited by it's primary Content models.

The OWL schema in a Content model will now not reside solely in the ONTOLOGY datastream. As the information about any parent classes is placed in the RELS-EXT datastream, it now have to be included in the ontology. But this provide a new problem; a OWL class is not allowed to have relations to other objects, unlike an OWL individual. So, only the "doms:extendsModel" and the "fedora-model:hasModel" relation should be extracted from the RELS-EXT datastream, the other relations should be disregarded.


=== An example using inheritance ===

To aid in understanding, an example using inheritance have been included below.
Line 127: Line 149:
    <rdf:Description rdf:about="info:fedora/doms:Object_A1">
        <fedora-model:hasModel rdf:resource="info:fedora/doms:CM_A"/>
        <fedora-model:hasModel rdf:resource="info:fedora/doms:CM_C"/>

        <doms:hasB rdf:resource="info:fedora/doms:Object_B1"/>
    </rdf:Description>
    <RDF:Description RDF:about="info:Fedora/doms:Object_A1">
        <Fedora-model:hasModel RDF:resource="info:Fedora/doms:CM_A"/>
        <Fedora-model:hasModel RDF:resource="info:Fedora/doms:CM_C"/>

        <doms:hasB RDF:resource="info:Fedora/doms:Object_B1"/>
    </RDF:Description>
Line 135: Line 157:
    <rdf:Description rdf:about="info:fedora/doms:Object_A2">
        <fedora-model:hasModel rdf:resource="info:fedora/doms:CM_A"/>
        <fedora-model:hasModel rdf:resource="info:fedora/doms:CM_C"/>

        <doms:hasB rdf:resource="info:fedora/doms:Object_B1"/>
    </rdf:Description>
    <RDF:Description RDF:about="info:Fedora/doms:Object_A2">
        <Fedora-model:hasModel RDF:resource="info:Fedora/doms:CM_A"/>
        <Fedora-model:hasModel RDF:resource="info:Fedora/doms:CM_C"/>

        <doms:hasB RDF:resource="info:Fedora/doms:Object_B1"/>
    </RDF:Description>
}}}
Two data objects, A1 and A2 have been declared. They both have the CM_A and the CM_C Content model.

{{{
    <!--RELS-EXT from Object_C-->
    <RDF:Description RDF:about="info:Fedora/doms:CM_C">
        <Fedora-model:hasModel
                RDF:resource="info:Fedora/Fedora-system:ContentModel-3.0"/>
    </RDF:Description>


    <!--OWL from CM_C-->
    <OWL:Class RDF:about="info:Fedora/doms:CM_C">
        <RDFs:subClassOf>
            <OWL:Restriction>
                <OWL:onProperty
                        RDF:resource="http://www.statsbiblioteket.dk/doms-relations/#hasB"/>
                <OWL:allValuesFrom RDF:resource="info:Fedora:/doms:CM_B"/>
            </OWL:Restriction>
        </RDFs:subClassOf>
    </OWL:Class>
    <OWL:ObjectProperty
            RDF:about="http://www.statsbiblioteket.dk/doms-relations/#hasB"/>
}}}
Here the CM_C Content model has been declared. It extends no other Content model. It defines the relation #hasB, which is required as it imposes a restriction on the relation. The restriction is that the relation must refer to objects from Content model CM_B. As could be seen before the A1 and A2 objects both refer to a B1 object, which may or may not be of the class CM_B.


{{{
Line 143: Line 192:
    <rdf:Description rdf:about="info:fedora/doms:CM_A">
        <hasModel rdf:resource="info:fedora/fedora-system:ContentModel-3.0"
                  xmlns="info:fedora/fedora-system:def/model#"/>
        <doms:extendsModel rdf:resource="info:fedora/doms:CM_C" />
    </rdf:Description>
    <RDF:Description RDF:about="info:Fedora/doms:CM_A">
        <hasModel RDF:resource="info:Fedora/Fedora-system:ContentModel-3.0"
                  xmlns="info:Fedora/Fedora-system:def/model#"/>
        <doms:extendsModel RDF:resource="info:Fedora/doms:CM_C" />
    </RDF:Description>
Line 150: Line 199:
    <owl:Class rdf:about="info:fedora/doms:CM_A">
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty
                        rdf:resource="http://www.statsbiblioteket.dk/doms-relations/#hasB"/>
                <owl:cardinality
                        rdf:datatype=
    <OWL:Class RDF:about="info:Fedora/doms:CM_A">
        <RDFs:subClassOf>
            <OWL:Restriction>
                <OWL:onProperty
                        RDF:resource="http://www.statsbiblioteket.dk/doms-relations/#hasB"/>
                <OWL:cardinality
                        RDF:datatype=
Line 159: Line 208:
                </owl:cardinality>
            </owl:Restriction>
        </rdfs:subClassOf>
    </owl:Class>
    <owl
:ObjectProperty
            rdf:about="http://www.statsbiblioteket.dk/doms-relations/#hasB"/>
                </OWL:cardinality>
            </OWL:Restriction>
        </RDFs:subClassOf>
    </OWL:
Class>
    <OWL
:ObjectProperty
            RDF:about="http://www.statsbiblioteket.dk/doms-relations/#hasB"/>
Line 167: Line 216:
Two data objects, A1 and A2 have been declared. They both have the CM_A and the CM_C content model. CM_A declares itself as a subtype of CM_C. CM_A declares a relation #hasB and define a cardinality of 1.

{{{
    <!--RELS-EXT from Object_C-->
    <rdf:Description rdf:about="info:fedora/doms:CM_C">
        <fedora-model:hasModel
                rdf:resource="info:fedora/fedora-system:ContentModel-3.0"/>
    </rdf:Description>


    <!--OWL from CM_C-->
    <owl:Class rdf:about="info:fedora/doms:CM_C">
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty
                        rdf:resource="http://www.statsbiblioteket.dk/doms-relations/#hasB"/>
                <owl:allValuesFrom rdf:resource="info:fedora:/doms:CM_B"/>
            </owl:Restriction>
        </rdfs:subClassOf>
    </owl:Class>
    <owl:ObjectProperty
            rdf:about="http://www.statsbiblioteket.dk/doms-relations/#hasB"/>
}}}
Here the CM_C content model has been declares. It derives from no other content model. It too defines the relation #hasB, which is required as it imposes a restriction on the relation. The restriction is that the relation must refer to objects from content model CM_B. As could be seen before the A1 and A2 objects both refer to a B1 object, which follows below.
CM_A declares itself as a subtype of CM_C. CM_A redeclares the relation #hasB and define a cardinality of 1. So, Objects of CM_A must have just one relation #hasB, and it must point to an object of class CM_B. Objects just of class CM_C can have 0-* such relations, but they must still all point to objects of class CM_C.
Line 194: Line 221:
    <rdf:Description rdf:about="info:fedora/doms:Object_B1">
        <fedora-model:hasModel rdf:resource="info:fedora/doms:CM_B"/>
        <fedora-model:hasModel rdf:resource="info:fedora/doms:CM_C"/>

        <doms:hasA rdf:resource="info:fedora/doms:Object_A1"/>
        <doms:hasB rdf:resource="info:fedora/doms:Object_B1"/>
    </rdf
:Description>
}}}
As can be seen, object B1 is of content models CM_B and CM_C. As such they relations in A1 and A2 are valid. B1 has two relations. Whether or not they are valid should be answered by looking at the content models CM_B and CM_C, as they describe B1.
    <RDF:Description RDF:about="info:Fedora/doms:Object_B1">
        <Fedora-model:hasModel RDF:resource="in
fo:Fedora/doms:CM_B"/>
        <Fedora-model:hasModel RDF:res
ource="info:Fedora/doms:CM_C"/>

        <doms:hasA RDF:resource="in
fo:Fedora/doms:Object_A1"/>
    </RDF:Description>
}}}
As can be seen, object B1 is of Content models CM_B and CM_C. As such they relations in A1 and A2 are valid, as B1 is of class CM_C. B1 has two relations. Whether or not they are valid should be answered by looking at the Content models CM_B and CM_C, as they describe B1.
Line 206: Line 232:
    <rdf:Description rdf:about="info:fedora/doms:CM_B">
        <fedora-model:hasModel
                rdf:resource="info:fedora/f
edora-system:ContentModel-3.0"/>
        <doms:extendsModel rdf:resource="info:fedora/doms:CM_C" />
    </rdf:Description>
    <RDF:Description RDF:about="info:Fedora/doms:CM_B">
        <Fedora-model:hasModel
                RDF:resource="in
fo:Fedora/Fedora-system:ContentModel-3.0"/>
        <doms:extendsModel RDF:resource="info:Fedora/doms:CM_C" />
    </RDF:Description>
Line 213: Line 239:
    <owl:Class rdf:about="info:fedora/doms:CM_B">
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty
                        rdf:resource="http://www.statsbiblioteket.dk/doms-relations/#hasA"/>
                <owl:allValuesFrom rdf:resource="info:fedora:/doms:CM_A"/>
            </owl:Restriction>
        </rdfs:subClassOf>
    </owl:Class>
    <owl:ObjectProperty
            rdf:about="http://www.statsbiblioteket.dk/doms-relations/#hasA"/>
}}}
Here is the definition of CM_B. As can be seen, it derives from CM_C. Since B1 is of CM_C, it may have #hasB relations to objects of CM_B. It has such a relation to itself. Since it is of CM_B, it may have #hasA relations to objects of CM_A. It has such a relation to A1. Note that there are no required relations for B1, nor is there a max number of relations, but it can only have #hasA and #hasB relations.
    <OWL:Class RDF:about="info:Fedora/doms:CM_B">
 <RDFs:subClassOf>
            <OWL:Restriction>
                <OWL:onProperty
                        RDF:resource="http://www.statsbiblioteket.dk/doms-relations/#hasA"/>
                <OWL:minCardinality
                        RDF:datatype=
                                "http://www.w3.org/2001/XMLSchema#integer">
                    1
                </OWL:minCardinality>
            </OWL:Restriction>
        </RDFs:subClassOf>
        <RDFs:subClassOf>
            <OWL:Restriction>
                <OWL:onProperty
                        RDF:resource="http://www.statsbiblioteket.dk/doms-relations/#hasA"/>
                <OWL:allValuesFrom RDF:resource="info:Fedora:/doms:CM_A"/>
            </OWL:Restriction>
        </RDFs:subClassOf>
    </OWL:Class>
    <OWL:ObjectProperty
            RDF:about="http://www.statsbiblioteket.dk/doms-relations/#hasA"/>
}}}
Here is the definition of CM_B. As can be seen, it extends CM_C. Objects of class CM_B, it must have one or more #hasA relations, and they must all refer to objects of class CM_A.

Since B1 is of CM_C, it may have zero or more #hasB relations to objects of CM_B. It has zero such relation, which is allowed. It has one #hasA a relation to A1, which is enough to fulfill the restrictions.

A OWL LITE ontology for Fedora 3.0 data objects based on the CMA

Fedora is a store for digital objects. The exact way they are stored is not important for this discussion. What is important is that the Fedora digital objects have RDF relations to each other. I.e. the Fedora digital object repository can be modelled as an RDF graph.

There is one critical difference between Fedora digital objects and a normal RDF graph: Each Fedora object contains its own local bit of the graph. You cannot change the number or nature of the relations from object A, without editing object A.

An ontology for the Fedora repository should preserve this characteristic. The ontology must be spread out over the entire repository, and you should not be able to break the ontology in one place by changing it somewhere else. In other words, if you have two separate Fedora repositories, each described with by seperate ontologies, and you transfer them to the same repository, the combined ontology should describe the combined sets of objects. This leads to the first property on the ontology system:

1. The ontology must not make statements that are global for the entire repository, except for the declaration of the existence of a class or property

Fedora provides a "class" of objects called Content Models. These try to represent the classes of data objects, and if specified, contain the description of the data objects. These are the natural location to place the local ontology bits. But now we reach the first problems. The Content models are both the classes of data objects, but they are also objects themselves. In order to describe such duality of existence, the language needed is OWL FULL, or something with similar expressive power. Since such expressive languages are difficult to reason about by automated systems, we chose to use a more restricted version, called OWL LITE. This imposes the second property on the ontology system.

2. The ontology only describes the data objects. Content models are regarded as classes, never as objects

An ontology with implicit rules, properties or classes, could lead to some potential problems. When part of the ontology is derived from the whole ontology, the effects of changes to the ontology can become difficult to predict. Especially the removal or introduction of a class could affect the nature of other classes. In effect, this means that someone wanting to use the ontology must know the entire ontology, in order to extrapolate anything implicit, which is in conflict with property 1. To make this explicit, the third property is introduced:

3. The ontology must be locally complete, so that every local bit provides the complete description of its local area.

Fedora RDF relations

Fedora does not allow for the FULL RDF specification to be used in the repository. What it basicly allows is that each object can have properties relating them to other objects (called relations), and literal properties. There can be no qualifiers on the properties.

There are a number of note-worthy issues about the way Fedora works with RDF. The first is that Fedora objects do not declare a RDF:type property. Instead they use a Fedora-model:hasModel property to relate to a Content model. Unfortunately, OWL LITE regards the relations as "OWL:ObjectProperty"'s, and "RDF:type" as a "RDF:Property". As they are different, you must use OWL FULL to define:

Fedora-model:hasModel RDFs:subPropertyOf RDF:type

So, in OWL LITE Content Models cannot be regarded as classes, in violation of property 2. But as this is all that prevents from using OWL LITE for the ontology, there are hackish ways around it. And thus is property 4 defined.

4. In data objects, all "Fedora-model:hasModel" relations are to be regarded as "RDF:type" relations to the same subject

In Fedora, it is not required that the relations from an object actually refers to another object. For an ontology, this is problematic, in regards to the 3. property. As the Content Model define the class of an object, using a nonexisting content model will mean that the class is implicitly defined. Likewise, if there exist a relationship between two objects, which is not defined in the ontology, the relationship is implictly defined. As there is to be no implicit definition of properties/relations, all such must be defined in a content model. This impose the properties 5 and 6.

5. All Fedora-model:hasModel relations must point to real Content models

6. All allowed relations must be defined in the ontology

Defining ontologies by OWL LITE

When the properties 2 and 6 are expressed in OWL, they become the property 7.

7. All Content models must contain OWL that define themselves as classes, and list all the allowed relations for their subscribing objects

A Fedora object consist of a number of datastreams. One datastream, RELS-EXT have been reserved for the Fedora rdf statements. We choose to reserve another datastream, ONTOLOGY, to contain the ontology definitions.

Just like Fedora only allows the "RDF:Description" tag in each object, we have chosen to similarly restrict what OWL tags there can reside in a Content model. In fact, there are just three allowed elements inside the rdf:RDF tag; "OWL:Class", "OWL:ObjectProperty" and "OWL:DatatypeProperty".

Each Content model must contain one and just one "OWL:Class" element, about the Content model itself. In this element the ordinary OWL syntax can be used to place restrictions on the Properties. The allowed restrictions are:

  • minCardinality (0-1)
  • maxCardinality (0-1)
  • cardinality (0-1)
  • someValuesFrom
  • allValuesFrom

You are not allowed to use the "RDFs:subClassOf" property to make the class a subclass of another Content model.

All relations allowed from from a data object should be defined as in at least one of its Content models, in the form of "OWL:ObjectProperty". This is important, so I will say it again: The complete list of allowed relations in a data object is the set of ObjectProperties defined by it's Content models. A relation can be declared in multiple Content models, but if a Content model place restrictions on a relation, it must declare the property itself. The reason behind this requirement is just property 3. Even through all the declaration of ObjectProperties are global for the repository, and thereby allowed for all objects in the repository, the demand is that each data object should be described by just the local Content models, ie. those it relates to through the "Fedora-model:hasModel" relation.

Fedora will also allow an object to have literal properties. Such properties are defined by the "OWL:DatatypeProperty" tag.

Looking at property 1 in the contect of "OWL:ObjectProperty", it becomes clear that range and domain are not allowed. This is unfortunately required. Since neither OWL nor Fedora provide a way to ensure that the same relation is not defined twice, it is entirely possible for two unrelated Content models in the repository to define the same property. Each part of the repository will be valid viewed locally, but when regarding the repository as a whole, the two different definitions will be combined. Having two domains for a property mean that the source must be of both types, not either, and likewise for range, and the repository as a whole will be invalid. To prevent the risk of such errors the use of domain and range are disallowed.

8. "rdf:range" and "rdf:domain" are not allowed on any properties.

Instead of "rdf:range", one should use the "allValuesFrom" restriction. This restriction defines a range for the property, but only in the given class. As such, the restriction will have no global effect. "rdf:domain" is just not nessesary. The property 3 implies that the ONTOLOGY in a Content Model should describe the local area, i.e. the objects subscribing to that content model. The result of this is that the domain, so to speak, of a property will always be the Content Model in which it was defined. But again, this restriction will have no global effect, the property defined somewhere else will have some other Content Model as its domain.

Example of a simple ontology

    <!--RELS-EXT from Object_A1-->
    <RDF:Description RDF:about="info:Fedora/doms:Object_A1">
        <Fedora-model:hasModel RDF:resource="info:Fedora/doms:CM_A"/>
        <doms:hasB RDF:resource="info:Fedora/doms:Object_B1"/>
    </RDF:Description>

    <!--OWL-SCHEMA from CM_A-->
    <OWL:Class RDF:about="info:Fedora/doms:CM_A">
        <RDFs:subClassOf>
            <OWL:Restriction>
                <OWL:onProperty
                        RDF:resource="http://www.statsbiblioteket.dk/doms-relations/#hasB"/>
                <OWL:cardinality
                        RDF:datatype=
                                "http://www.w3.org/2001/XMLSchema#integer">
                    1
                </OWL:cardinality>
            </OWL:Restriction>
        </RDFs:subClassOf>
        <RDFs:subClassOf>
            <OWL:Restriction>
                <OWL:onProperty
                        RDF:resource="http://www.statsbiblioteket.dk/doms-relations/#hasB"/>
                <OWL:allValuesFrom RDF:resource="info:Fedora:/doms:CM_B"/>
            </OWL:Restriction>
        </RDFs:subClassOf>
    </OWL:Class>
    <OWL:ObjectProperty
            RDF:about="http://www.statsbiblioteket.dk/doms-relations/#hasB"/>

A object A1 and a Content model CM_A is defined. There is one allowed relation for A1, the #hasB relation. Two restrictions are placed on this relation. There must be one, and just one such relation in A1, and it must refer to an object of class/Content model CM_B. In fact, A1 has one such relation, and it refers to the object B1, which follows below.

    <!--RELS-EXT from Object_B1-->
    <RDF:Description RDF:about="info:Fedora/doms:Object_B1">
        <Fedora-model:hasModel RDF:resource="info:Fedora/doms:CM_B"/>
    </RDF:Description>


    <!--OWL-SCHEMA from CM_B-->
    <OWL:Class RDF:about="info:Fedora/doms:CM_B">
        <RDFs:subClassOf>
            <OWL:Restriction>
                <OWL:onProperty
                        RDF:resource="http://www.statsbiblioteket.dk/doms-relations/#hasA"/>
                <OWL:allValuesFrom RDF:resource="info:Fedora:/doms:CM_A"/>
            </OWL:Restriction>
        </RDFs:subClassOf>
    </OWL:Class>
    <OWL:ObjectProperty
            RDF:about="http://www.statsbiblioteket.dk/doms-relations/#hasA"/>

Here is the the object B1, and it's Content model CM_B. There is one allowed relation from a B1, the #hasA relation. There is just one restriction on this relation, that it must refer to something of class/Content model CM_A. No cardinality restriction is defined, so B1 does not need to have the relation, and in fact, it does not have it.

Content Model inheritance

While this model does not require it, it does lend itself readily to a inheritance system for Content models. In this section, such an amendment will be outlined.

There are two apparant ways to specify inheritance. It could be added as "RDFs:subClassOf" (or a subproperty of this) tag inside the "OWL:Class" element in the ONTOLOGY datastream or it could be added as a relation in the RELS-EXT datastream for the Content models. Both ways will work, but there is really no advantage in using the ONTOLOGY datastream (to specify the inheritance)over the RELS-EXT datastream. When the relation is defined in RELS-EXT, it is indexed by Fedora, and you can make triple-store queries about it.

The problem about "RDF:type" not being extensible is also relevant for "RDF:subClassOf". There are two ways to model the inheritance relations in RELS-EXT. Firstly, you can use "RDFs:subClassOf" directly. Alternatively, you can define some other property to have the same semantic meaning, just like we did with "Fedora-model:hasModel" and "RDF:type". We have chosen the last option, and reserved a property "doms:extendsModel", to represent Content Model inheritance.

Inheritance of Content models does have some implications for data objects. In the standard Fedora worldview, a object does not belong to Content model, unless it has a "Fedora-model:hasModel" relation to it. For now, we want to maintain this invariant. So, in addition to having "doms:extendsModel" between the Content models, a data object must have "Fedora-model:hasModel" relations to ALL the Content models inherited by it's primary Content models.

The OWL schema in a Content model will now not reside solely in the ONTOLOGY datastream. As the information about any parent classes is placed in the RELS-EXT datastream, it now have to be included in the ontology. But this provide a new problem; a OWL class is not allowed to have relations to other objects, unlike an OWL individual. So, only the "doms:extendsModel" and the "fedora-model:hasModel" relation should be extracted from the RELS-EXT datastream, the other relations should be disregarded.

An example using inheritance

To aid in understanding, an example using inheritance have been included below.

    <!--RELS-EXT from Object_A1-->
    <RDF:Description RDF:about="info:Fedora/doms:Object_A1">
        <Fedora-model:hasModel RDF:resource="info:Fedora/doms:CM_A"/>
        <Fedora-model:hasModel RDF:resource="info:Fedora/doms:CM_C"/>

        <doms:hasB RDF:resource="info:Fedora/doms:Object_B1"/>
    </RDF:Description>

    <!--RELS-EXT from Object_A2-->
    <RDF:Description RDF:about="info:Fedora/doms:Object_A2">
        <Fedora-model:hasModel RDF:resource="info:Fedora/doms:CM_A"/>
        <Fedora-model:hasModel RDF:resource="info:Fedora/doms:CM_C"/>

        <doms:hasB RDF:resource="info:Fedora/doms:Object_B1"/>
    </RDF:Description>

Two data objects, A1 and A2 have been declared. They both have the CM_A and the CM_C Content model.

    <!--RELS-EXT from Object_C-->
    <RDF:Description RDF:about="info:Fedora/doms:CM_C">
        <Fedora-model:hasModel
                RDF:resource="info:Fedora/Fedora-system:ContentModel-3.0"/>
    </RDF:Description>


    <!--OWL from CM_C-->
    <OWL:Class RDF:about="info:Fedora/doms:CM_C">
        <RDFs:subClassOf>
            <OWL:Restriction>
                <OWL:onProperty
                        RDF:resource="http://www.statsbiblioteket.dk/doms-relations/#hasB"/>
                <OWL:allValuesFrom RDF:resource="info:Fedora:/doms:CM_B"/>
            </OWL:Restriction>
        </RDFs:subClassOf>
    </OWL:Class>
    <OWL:ObjectProperty
            RDF:about="http://www.statsbiblioteket.dk/doms-relations/#hasB"/>

Here the CM_C Content model has been declared. It extends no other Content model. It defines the relation #hasB, which is required as it imposes a restriction on the relation. The restriction is that the relation must refer to objects from Content model CM_B. As could be seen before the A1 and A2 objects both refer to a B1 object, which may or may not be of the class CM_B.

    <!--RELS-EXT from CM_A-->
    <RDF:Description RDF:about="info:Fedora/doms:CM_A">
        <hasModel RDF:resource="info:Fedora/Fedora-system:ContentModel-3.0"
                  xmlns="info:Fedora/Fedora-system:def/model#"/>
        <doms:extendsModel RDF:resource="info:Fedora/doms:CM_C" />
    </RDF:Description>

    <!--OWL from CM_A-->
    <OWL:Class RDF:about="info:Fedora/doms:CM_A">
        <RDFs:subClassOf>
            <OWL:Restriction>
                <OWL:onProperty
                        RDF:resource="http://www.statsbiblioteket.dk/doms-relations/#hasB"/>
                <OWL:cardinality
                        RDF:datatype=
                                "http://www.w3.org/2001/XMLSchema#integer">
                    1
                </OWL:cardinality>
            </OWL:Restriction>
        </RDFs:subClassOf>
    </OWL:Class>
    <OWL:ObjectProperty
            RDF:about="http://www.statsbiblioteket.dk/doms-relations/#hasB"/>

CM_A declares itself as a subtype of CM_C. CM_A redeclares the relation #hasB and define a cardinality of 1. So, Objects of CM_A must have just one relation #hasB, and it must point to an object of class CM_B. Objects just of class CM_C can have 0-* such relations, but they must still all point to objects of class CM_C.

    <!--RELS-EXT from Object_B1-->
    <RDF:Description RDF:about="info:Fedora/doms:Object_B1">
        <Fedora-model:hasModel RDF:resource="info:Fedora/doms:CM_B"/>
        <Fedora-model:hasModel RDF:resource="info:Fedora/doms:CM_C"/>

        <doms:hasA RDF:resource="info:Fedora/doms:Object_A1"/>
    </RDF:Description>

As can be seen, object B1 is of Content models CM_B and CM_C. As such they relations in A1 and A2 are valid, as B1 is of class CM_C. B1 has two relations. Whether or not they are valid should be answered by looking at the Content models CM_B and CM_C, as they describe B1.

    <!--RELS-EXT from CM_B-->
    <RDF:Description RDF:about="info:Fedora/doms:CM_B">
        <Fedora-model:hasModel
                RDF:resource="info:Fedora/Fedora-system:ContentModel-3.0"/>
        <doms:extendsModel RDF:resource="info:Fedora/doms:CM_C" />
    </RDF:Description>

    <!--OWL from CM_B-->
    <OWL:Class RDF:about="info:Fedora/doms:CM_B">
        <RDFs:subClassOf>
            <OWL:Restriction>
                <OWL:onProperty
                        RDF:resource="http://www.statsbiblioteket.dk/doms-relations/#hasA"/>
                <OWL:minCardinality
                        RDF:datatype=
                                "http://www.w3.org/2001/XMLSchema#integer">
                    1
                </OWL:minCardinality>
            </OWL:Restriction>
        </RDFs:subClassOf>
        <RDFs:subClassOf>
            <OWL:Restriction>
                <OWL:onProperty
                        RDF:resource="http://www.statsbiblioteket.dk/doms-relations/#hasA"/>
                <OWL:allValuesFrom RDF:resource="info:Fedora:/doms:CM_A"/>
            </OWL:Restriction>
        </RDFs:subClassOf>
    </OWL:Class>
    <OWL:ObjectProperty
            RDF:about="http://www.statsbiblioteket.dk/doms-relations/#hasA"/>

Here is the definition of CM_B. As can be seen, it extends CM_C. Objects of class CM_B, it must have one or more #hasA relations, and they must all refer to objects of class CM_A.

Since B1 is of CM_C, it may have zero or more #hasB relations to objects of CM_B. It has zero such relation, which is allowed. It has one #hasA a relation to A1, which is enough to fulfill the restrictions.

FedoraOntology (last edited 2010-03-17 13:09:09 by localhost)