Doms Data Model

In the following these shorthands will refer to the following namespaces

A definition: A datamodel describes the content of a collection. A content model describes the content of a data object. So, a data model is a set of content models, that together describe the collection.

ImageLink(http://merkur/viewvc/trunk/docs/datamodel/fig/DOMSBaseCollection.png?root=doms&view=co,alt=DOMS base collection,width=256)

The DOMS datamodel describes how the Type system underlying DOMS is realised in Fedora 3. The figure above will serve as a guide through the following sections.

DOMS Content Models in general

Fedora provides a repository for digital objects. All objects in the repository can, in principle, be unique, but Fedora provides a way of specifying that an object has a given type. Unfortunately, the type-definitions in Fedora, called Content Models, are rather simplistic by default. We use them as the basis of our type system, with certain enhancements.

For our purposes, there are two kinds of digital objects in Fedora

The Content Model object, as used in DOMS, describes the compulsary and legal content of an object of this type. It contains the information nessesary to verify if the given object is indeed of this type. For more detail on this, see FedoraOntology and FedoraTypeChecking

A data object can specify the Content Model describing its contents, via a fedora-model:hasModel relation, and in DOMS we require it to be present. A data object will be said to "subcribe" to a Content Model. Content Model inheritance, as specified in FedoraOntology, will be used.

The special Content Model object "doms:ContentModel_DOMS" is the root object. All Content Models must have an "doms-relations:extendsModel" relation to this object, possibly through a number of other Content Models. The complete description of a data object is defined as the set of the descriptions in the Content Model specified with "fedora-model:hasModel" and all Content Models that can be reached from this, by following "doms-relations:extendsModel" relations.

A Content Model can "extend" more than one other Content Model. There is no overriding of Content Models, a subscribing object must be valid in regards to all the Content Models in the inheritance tree.

Content Models have two datastreams in particular that are interesting. These are the ONTOLOGY and DS-COMPOSITE. The Ontology defines the the allowed relations in subscribing objects, and the DS-COMPOSITE defines the required datastreams and any restrictions they must adhere to.

Schema Objects

XXXXX Many of the schemas used in DOMS need to be referenced many times. To avoid duplication, we have made objects containing only schemas, subcribing to the Content Model "doms:ContentModel_Schema". The describing datastream in a schemabinding might contain the schema directly, or it might contain the URL to the datastream that does. Either way, it should be invisible to programs accessing the datastream through the API.

DOMS Base Collection

Doms come shipped with a base collection of objects, representing the base types. The objects in the base collection will be detailed in the following.

doms:ContentModel_DOMS

ContentModel_DOMS is the root of the content model inheritance tree. All content models derive from this model. As all data objects in DOMS must have a content model, all data objects must adhere to the restrictions defined in this content model.

The ONTOLOGY datastream.

<rdf:RDF
        xmlns:owl="http://www.w3.org/2002/07/owl#"
        xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
        xml:base="http://doms.statsbiblioteket.dk/relations/default/0/1/#">

    <owl:Class rdf:about="info:fedora/doms:ContentModel_DOMS">

        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="isPartOfCollection"/>
                <owl:minCardinality rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">1</owl:minCardinality>
            </owl:Restriction>
        </rdfs:subClassOf>

        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="isPartOfCollection"/>
                <owl:allValuesFrom rdf:resource="info:fedora/doms:ContentModel_Collection"/>
            </owl:Restriction>
        </rdfs:subClassOf>

        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="hasLicense"/>
                <owl:cardinality rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">1</owl:cardinality>
            </owl:Restriction>
        </rdfs:subClassOf>

        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="hasLicense"/>
                <owl:allValuesFrom rdf:resource="info:fedora/doms:ContentModel_License"/>
            </owl:Restriction>
        </rdfs:subClassOf>

    </owl:Class>

    <owl:ObjectProperty rdf:about="isPartOfCollection"/>

    <owl:ObjectProperty rdf:about="hasLicense"/>
</rdf:RDF>

In humanreadable format it says that all subscribing objects must have a "doms-relations:hasLicense" relation to a object of "doms:ContentModel_License". Also, it must have one or more "doms-relations:isPartOfCollection" relations to objects of "doms:ContentModel_Collection".

DS-COMPOSITE datastream.

<dsCompositeModel
        xmlns="info:fedora/fedora-system:def/dsCompositeModel#"
        xmlns:schema="http://doms.statsbiblioteket.dk/types/dscompositeschema/0/1/#">

    <dsTypeModel ID="DC">
        <form MIME="text/xml"/>
        <extensions name="DOMS">
            <schema:schema type="xsd" datastream="DC_SCHEMA"/>
        </extensions>
    </dsTypeModel>
    <dsTypeModel ID="POLICY">
        <form MIME="text/xml"/>
    </dsTypeModel>
    <dsTypeModel ID="STATE">
        <form MIME="text/xml"/>
        <extensions name="DOMS">
            <schema:schema type="xsd" datastream="STATE_SCHEMA"/>
        </extensions>
    </dsTypeModel>
    <dsTypeModel ID="RELS-EXT"/>
</dsCompositeModel>

In human readable format, this say that all subscribing objects must have the following datastreams "DC", "POLICY", "STATE" and "RELS-EXT". The "DC" stream must conform to the schema defined in the datastream "DC_SCHEMA", in "doms:ContentModel_DOMS". The "STATE" stream must conform to the schema in datastream "STATE_SCHEMA" in "doms:ContentModel_DOMS". To understand the use of the "STATE" datastream, see the FedoraTransactionsReplacement. The use of the "POLICY" stream is defined in FedoraLicensePolicies. The "AUDIT" stream is detailed in DomsAuditTrail.

The DC_SCHEMA stream is just an xml schema and looks like this -- REF to OAIDUBLINCORE XXXXX Regard this

doms:ContentModel_File

Extends ContentModel_DOMS

In DOMS, we have found it beneficial to separate the abstract concept of "Image" or "Audio" from the concrete implementations such as "jpeg" and "mp3". The metadata about the image will be relevant no matter the manifestation of the image, and as such should not reside along with the technical metadata about the manifestation. To support this separation, we have introduced the concept of File objects.

A File object is an object, that contain a link to the file (in Bitstorage), and the technical metadata about this file. Only File objects are allowed to reference a file in Bitstorage. File objects must all have a Content Model that extends ContentModel_File.

The ONTOLOGY datastream from ContentModel_File

<rdf:RDF
        xmlns:owl="http://www.w3.org/2002/07/owl#"
        xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
        xml:base="http://doms.statsbiblioteket.dk/relations/default/0/1/#">

    <owl:Class rdf:about="info:fedora/doms:ContentModel_File">

        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="hasOriginal"/>
                <owl:allValuesFrom rdf:resource="info:fedora/doms:ContentModel_File"/>
            </owl:Restriction>
        </rdfs:subClassOf>

    </owl:Class>

    <owl:ObjectProperty rdf:about="hasOriginal"/>

</rdf:RDF>

In human readable format, it defines that objects of ContentModel_File can have "doms-relations:hasOriginal" relations to other objects of "doms:ContentModel_File". If a file A is the result of a migration of file B and both are in Doms, the File A data object will have a "doms-relations:hasOriginal" relation to the data object for File B.

The DS-COMPOSITE datastream

<dsCompositeModel
        xmlns="info:fedora/fedora-system:def/dsCompositeModel#"
        xmlns:schema="http://doms.statsbiblioteket.dk/types/dscompositeschema/0/1/#">

    <dsTypeModel ID="CHARACTERISATION">
        <form MIME="text/xml"/>
        <extensions name="DOMS">
            <schema:schema type="xsd" datastream="CHARACTERISATION_SCHEMA"/>
        </extensions>
    </dsTypeModel>

    <dsTypeModel ID="CONTENTS"/>

    <dsTypeModel ID="ORIGIN">
        <form MIME="text/xml"/>
        <extensions name="DOMS">
            <schema:schema type="xsd" datastream="ORIGIN_SCHEMA"/>
        </extensions>
    </dsTypeModel>

    <dsTypeModel ID="PRONOMID">
        <form MIME="text/xml"/>
        <extensions name="DOMS">
            <schema:schema type="xsd" datastream="PRONOMID_SCHEMA"/>
        </extensions>
    </dsTypeModel>

</dsCompositeModel>

This specifies that data objects of doms:ContentModel_File must have the datastream "CHARACTERISATION", "CONTENTS", "ORIGIN" and "PRONOMID". Each of these deserve some description.

CONTENTS

"CONTENTS" is where the actual file is. It will always have the the "E" controlgroup, meaning that the file is externally referenced. The reference must allways be to a File in Bitstorage, and only this datastream is ever allowed to reference files.

If you get the datastream through the standard API, you get the contents of the file, not the link.

PRONOMID

In order to perform proper digital preservation, we need to store the exact format of each file somehow. The National Archives, Great Britain have developed the PRONOM scheme, http://www.nationalarchives.gov.uk/pronom For each file format, or version thereof, in the registry, they have a signature file, that enables their tool (DROID) to identify files of this type.

We selected PRONOM because they are currently able to identify all the relevant preservation file formats used in DOMS, and was the closest we could find to an uniformly accepted standard.

The schema from the PRONOMID_SCHEMA datastream.

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance#"
            targetNamespace="http://doms.statsbiblioteket.dk/properties/pronomID/0/1/#"
            xmlns="http://doms.statsbiblioteket.dk/properties/pronomID/0/1/#"
            elementFormDefault="qualified"
            attributeFormDefault="unqualified">

    <xsd:element name="pronomID" type="extpropertiesType"/>

    <xsd:complexType name="extpropertiesType">
        <xsd:attribute name="value" type="xsd:string"/>
    </xsd:complexType>
</xsd:schema>

ORIGIN

CHARACTERISATION

Requirements for objects described by ContentModel_File

The characterisation datastream could look like this.

<?xml version="1.0" encoding="UTF-8"?>
<char:characterisation xsi:schemaLocation="http://doms.statsbiblioteket.dk/types/characterisation/0/1/# http://developer.statsbiblioteket.dk/DOMS/types/characterisation/0/1/characterisation/characterisation-0-1.xsd"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:char="http://developer.statsbiblioteket.dk/DOMS/types/characterisation/0/1/#"
  xmlns:jhove="">
  <char:characterisationRun>
    <char:tool>JHove</char:tool>
    <char:output>
      <jhove:...>
      </jhove:...>
    </char:output>
  </char:characterisationRun>
</char:characterisation>

ContentModel_ImagePreservationFile

Extends ContentModel_File

Requirements for objects described by ContentModel_ImagePreservationFile

ContentModel_TextPreservationFile

Extends ContentModel_File

Requirements for objects described by ContentModel_TextPreservationFile

ContentModel_VideoPreservationFile

Extends ContentModel_File

Requirements for objects described by ContentModel_VideoPreservationFile

ContentModel_AudioPreservationFile

Extends ContentModel_File

Requirements for objects described by ContentModel_AudioPreservationFile

ContentModel_Image

Extends ContentModel_DOMS

The following variables are used:

Requirements for objects described by ContentModel_Image

ContentModel_Audio

Extends ContentModel_DOMS

The following variables are used:

Requirements for objects described by ContentModel_Audio

ContentModel_Video

Extends ContentModel_DOMS

The following variables are used:

Requirements for objects described by ContentModel_Video

ContentModel_Text

Extends ContentModel_DOMS

The following variables are used:

Requirements for objects described by ContentModel_Text

ContentModel_License

Extends ContentModel_DOMS

Requirements for objects described by ContentModel_License

ContentModel_Schema

Extends ContentModel_DOMS

Requirements for objects described by ContentModel_Schema

ContentModel_Collection

Extends ContentModel_DOMS

Requirements for objects described by ContentModel_Collection: None

Technical Metadata

A file object should contain technical metadata. In this context it refers

In addition, it must have a relation "hasOriginal" if it was migrated from another file that exists in DOMS.