Differences between revisions 52 and 90 (spanning 38 versions)
Revision 52 as of 2008-10-01 08:12:54
Size: 12647
Editor: abr
Comment:
Revision 90 as of 2010-03-17 13:13:00
Size: 5322
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= Doms Data Model = = DON'T PANIC =
Line 3: Line 3:
In the following these shorthands will refer to the following namespaces
 * rdf - http://www.w3.org/1999/02/22-rdf-syntax-ns#
 * rdfs - http://www.w3.org/2000/01/rdf-schema#
 * owl - http://www.w3.org/2002/07/owl#
 * fedora-model - info:fedora/fedora-system:def/model#
 * doms-relations - http://doms.statsbiblioteket.dk/relations/default/0/1/#
 * doms - Standard prefix for PIDs. It is not a namespace
'''A definition: A ''datamodel'' describes the content of a collection. A ''content model'' describes the content of a data object. So, a datamodel is a set of content models, that together describe the collection.'''
Line 11: Line 5:
'''A definition: A datamodel describes the content of a collection. A content model describes the content of a data object. So, a data model is a set of content models, that together describe the collection.''' The DOMS datamodel describes how the Type system underlying DOMS is realized in Fedora 3.

The entire DOMS datamodel is, in its entirety, a complex system. For proper understanding, the various components have been detailed in seperate documents. Firstly, it consist of a number of extensions to the Fedora system. Secondly, it consist of a number of predefined objects. These objects make use of the extensions to Fedora. Thirdly, it consist of a number of policies for how certain tasks are achived. And fourthly, it consist of a number of API interfaces.

Fedora and DOMS are big on namespaces. To ease writing the documentation, a namespace document, DomsNameSpacesAndSchemas, have been written. All namespaces should be defined there, and all shorthands refer to the namespaces defined therein.
Line 14: Line 12:


[[ImageLink(http://merkur/viewvc/trunk/docs/datamodel/fig/DOMSBaseCollection.png?root=doms&view=co,alt=DOMS base collection,width=256)]]

The DOMS datamodel describes how the Type system underlying DOMS is realised in Fedora 3. The figure above will serve as a guide through the following sections.




== DOMS Content Models in general ==
== Content Models in general ==
Line 31: Line 20:
The Content Model object, as used in DOMS, describes the compulsary and legal content of an object of this type. It contains the information nessesary to verify if the given object is indeed of this type. For more detail on this, see FedoraOntology and FedoraTypeChecking The Content Model object, as used in DOMS, describes the compulsory and legal content of an object of this type. It contains the information nessesary to verify if the given object is indeed of this type. For more detail on this, see FedoraOntology and FedoraTypeChecking
Line 40: Line 29:
== Schema Objects == Content Models have two datastreams in particular that are interesting. These are the ONTOLOGY and DS-COMPOSITE. The Ontology defines the the allowed relations in subscribing objects, and the DS-COMPOSITE defines the required datastreams and any restrictions they must adhere to.
Line 43: Line 32:
XXXXX
Many of the schemas used in DOMS need to be referenced many times. To avoid duplication, we have made objects containing only schemas, subcribing to the Content Model "doms:ContentModel_Schema". The describing datastream in a schemabinding might contain the schema directly, or it might contain the URL to the datastream that does. Either way, it should be invisible to programs accessing the datastream through the API.
== Fedora extensions ==
 * FedoraOntology - Detailing the Ontology lanquage
 * FedoraTypeChecking - Detailing the extensions to the DS-COMPOSITE stream
 * FedoraViewBlobs - Detailing the View system, which allows you to view many objects as one
 * FedoraObjectTemplates - Detailing the prototype system for content models
 * FedoraImportExport - Detailing use of Import and Export of objects using content models
 * FedoraLicensePolicies - Detailing the License system, and how it interacts with the Search system
 * FedoraState - Detailing the use of object states to allow controlling validity, availability and deletions.
Line 46: Line 41:
== Predefined objects ==
 * DomsPredefinedObjects - The predefined content models and other objects in the doms system
Line 47: Line 44:
== DOMS Base Collection == == Doms policies ==
 * DomsFileHandling - Detailing how we expect to handle file objects in Fedora.
 * DomsCollections - Detailing the use of collection objects in DOMS
 * DomsAuditTrail - Detailing how DOMS logs changes and new versions
Line 49: Line 49:
Doms come shipped with a base collection of objects, representing the base types. The objects in the base collection will be detailed in the following.

Shorthand:
 * myObject.myDatastream means the Datastream myDatastream in the Object myObject.
 * $variable introduces a variable.


=== ContentModel_DOMS ===
ContentModel_DOMS is the root of the content model inheritance tree. All content models derive from this model. As all data objects in DOMS must have a content model, all data objects must adhere to the restrictions defined in this content model.


The ONTOLOGY datastream.
{{{
<rdf:RDF
        xmlns:owl="http://www.w3.org/2002/07/owl#"
        xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
        xml:base="http://doms.statsbiblioteket.dk/relations/default/0/1/#">

    <owl:Class rdf:about="info:fedora/doms:ContentModel_DOMS">

        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="isPartOfCollection"/>
                <owl:minCardinality rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">1</owl:minCardinality>
            </owl:Restriction>
        </rdfs:subClassOf>

        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="isPartOfCollection"/>
                <owl:allValuesFrom rdf:resource="info:fedora/doms:ContentModel_Collection"/>
            </owl:Restriction>
        </rdfs:subClassOf>

        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="hasLicense"/>
                <owl:cardinality rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">1</owl:cardinality>
            </owl:Restriction>
        </rdfs:subClassOf>

        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="hasLicense"/>
                <owl:allValuesFrom rdf:resource="info:fedora/doms:ContentModel_License"/>
            </owl:Restriction>
        </rdfs:subClassOf>

    </owl:Class>

    <owl:ObjectProperty rdf:about="isPartOfCollection"/>

    <owl:ObjectProperty rdf:about="hasLicense"/>
</rdf:RDF>
}}}

In humanreadable format it says that all subscribing objects must have a "doms-relations:hasLicense" relation to a object of "doms:ContentModel_License". Also, it must have one or more "doms-relations:isPartOfCollection" relations to objects of "doms:ContentModel_Collection".


DS-COMPOSITE datastream.
{{{
<dsCompositeModel
        xmlns="info:fedora/fedora-system:def/dsCompositeModel#"
        xmlns:schema="http://doms.statsbiblioteket.dk/types/dscompositeschema/0/1/#">

    <dsTypeModel ID="DC">
        <form MIME="text/xml"/>
        <extensions name="DOMS">
            <schema:schema type="xsd" datastream="DC_SCHEMA"/>
        </extensions>
    </dsTypeModel>
    <dsTypeModel ID="POLICY">
        <form MIME="text/xml"/>
    </dsTypeModel>
    <dsTypeModel ID="STATE">
        <form MIME="text/xml"/>
        <extensions name="DOMS">
            <schema:schema type="xsd" datastream="STATE_SCHEMA"/>
        </extensions>
    </dsTypeModel>
    <dsTypeModel ID="RELS-EXT"/>
</dsCompositeModel>
}}}
In human readable format, this say that all subscribing objects must have the following datastreams "DC", "POLICY", "STATE" and "RELS-EXT". The "DC" stream must conform to the schema defined in the datastream "DC_SCHEMA", in "doms:ContentModel_DOMS". The "STATE" stream must conform to the schema in datastream "STATE_SCHEMA" in "doms:ContentModel_DOMS". To understand the use of the "STATE" datastream, see the FedoraTransactionsReplacement. The use of the "POLICY" stream is defined in FedoraLicensePolicies. The "AUDIT" stream is detailed in DomsAuditTrail.

The DC_SCHEMA stream is just an xml schema and looks like this -- REF to OAIDUBLINCORE XXXXX Regard this
{{{

}}}
== API documentation ==
 * Refer to the overall [[Documentation]] page
Line 143: Line 55:
== Working with the Data model ==
Doms contains a number of content models. These are meant to serve as the basic buildingblocks for data models for new collections. A datamodel is, of course, not restricted to use only these content models, it can, and should, define it's own. All new content models, should extend doms:!ContentModel_DOMS, and all objects that need to reference files outside Fedora should have a content model that derive from doms:!ContentModel_File and so on. The content models that provide extra meaning are optional to use, and should at least be extended for the relevant collection.
Line 144: Line 58:

=== ContentModel_File ===
Extends ContentModel_DOMS


In DOMS, we have found it beneficial to separate the abstract concept of "Image" or "Audio" from the concrete implementations such as "jpeg" and "mp3". The metadata about the image will be relevant no matter the manifestation of the image, and as such should not reside along with the technical metadata about the manifestation. To support this separation, we have introduced the concept of File objects.

A File object is an object, that contain a link to the file (in Bitstorage), and the technical metadata about this file. Only File objects are allowed to reference a file in Bitstorage. File objects must all have a Content Model that extends !ContentModel_File.


The following variables are used:
 * $OrigFile: An object with the Content Model that derives from !ContentModel_File;

Requirements for objects described by !ContentModel_File
 * ObjectProperties
  * External Properties
   * http://doms.statsbiblioteket.dk/extproperties/#pronomID : The pronom ID of the file
 * Datastreams
  * RELS-EXT
   * (optional) doms:hasOriginal -> $OrigFile
  * CHARACTERISATION: The output of the characterisation tools. Schema attachment:Characterisation.xsd
  * CONTENTS: Datastream containing the file
   * !ContentLocation URL = The file in Bitstorage
  * ORIGIN: Metadata about the creation of the file, in the Premis [http://www.loc.gov/standards/premis/v1/Event-v1-1.xsd schema]


The characterisation datastream could look like this.
{{{
<?xml version="1.0" encoding="UTF-8"?>
<char:characterisation xsi:schemaLocation="http://doms.statsbiblioteket.dk/types/characterisation/0/1/# http://developer.statsbiblioteket.dk/DOMS/types/characterisation/0/1/characterisation/characterisation-0-1.xsd"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:char="http://developer.statsbiblioteket.dk/DOMS/types/characterisation/0/1/#"
  xmlns:jhove="">
  <char:characterisationRun>
    <char:tool>JHove</char:tool>
    <char:output>
      <jhove:...>
      </jhove:...>
    </char:output>
  </char:characterisationRun>
</char:characterisation>
}}}


=== ContentModel_ImagePreservationFile ===
Extends ContentModel_File

Requirements for objects described by !ContentModel_ImagePreservationFile
 * Datastreams
  * PRONOMID: must be "fmt/10" (tiff version 6)



=== ContentModel_TextPreservationFile ===
Extends ContentModel_File

Requirements for objects described by !ContentModel_TextPreservationFile
 * Datastreams
  * PRONOMID: must be "x-fmt/16" (Utf8) or "fmt/95" (pdf-a)


=== ContentModel_VideoPreservationFile ===
Extends ContentModel_File

Requirements for objects described by !ContentModel_VideoPreservationFile
 * Datastreams
  * PRONOMID: must be "x-fmt/385" (mpeg1) or "x-fmt/386" (mpeg2)


=== ContentModel_AudioPreservationFile ===
Extends ContentModel_File

Requirements for objects described by !ContentModel_AudioPreservationFile
 * Datastreams
  * PRONOMID: must be "fmt/2" (Bwf version 1) or "fmt/6" (wav)




=== ContentModel_Image ===
Extends ContentModel_DOMS

The following variables are used:
 * $ImageFile: An object with the Content Model !ContentModel_!ImagePreservationFile

Requirements for objects described by !ContentModel_Image
 * Datastreams
  * RELS-EXT
   * doms:hasPreservationFile -> $ImageFile



=== ContentModel_Audio ===
Extends ContentModel_DOMS

The following variables are used:
 * $AudioFile: An object with the Content Model !ContentModel_AudioPreservationFile;


Requirements for objects described by !ContentModel_Audio
 * Datastreams
  * RELS-EXT:
   * doms:hasPreservationFile -> $AudioFile

=== ContentModel_Video ===
Extends ContentModel_DOMS

The following variables are used:
 * $VideoFile: An object with the Content Model !ContentModel_VideoPreservationFile;


Requirements for objects described by !ContentModel_Video
 * Datastreams
  * RELS-EXT:
   * doms:hasPreservationFile -> $VideoFile


=== ContentModel_Text ===
Extends ContentModel_DOMS

The following variables are used:
 * $TextFile: An object with the Content Model !ContentModel_TextPreservationFile;

Requirements for objects described by !ContentModel_Text
 * Datastreams
  * RELS-EXT:
   * doms:hasPreservationFile -> $TextFile


=== ContentModel_License ===
Extends ContentModel_DOMS

Requirements for objects described by !ContentModel_License
 * Datastreams
  * LICENCE: XACML describing the license. [http://www.oasis-open.org/committees/download.php/915/cs-xacml-schema-policy-01.xsd Schema]
  * DC: The DC datastream (probably the description field) is used to describe the human readable version of the license

=== ContentModel_Schema ===
Extends ContentModel_DOMS

Requirements for objects described by !ContentModel_Schema
 * Datastreams
  * SCHEMA: The xsd schema inlined.



=== ContentModel_Collection ===
Extends ContentModel_DOMS

Requirements for objects described by !ContentModel_Collection: None


== Technical Metadata ==
A file object should contain technical metadata. In this context it refers
 * A datastream with the output of the characterisation tools used on this file upon ingest, called CHARACTERISATION
 * A datastream with the metadata about the origins of the file, called Origin

In addition, it must have a relation "hasOriginal" if it was migrated from another file that exists in DOMS.
Most data models are structured around some realworld concept, like a CD, modelled as a digital object. This object will be described by a content model that is totally collection specific, only extending doms:!ContentModel_DOMS. It will probably have relations to digital objects, like tracks. These will be described by a content model that extends doms:!ContentModel_Audio. Each of these will tracks must then reference a audio preservation file object, or some subtype of this. This is the best practice for constructing data models.

DON'T PANIC

A definition: A datamodel describes the content of a collection. A content model describes the content of a data object. So, a datamodel is a set of content models, that together describe the collection.

The DOMS datamodel describes how the Type system underlying DOMS is realized in Fedora 3.

The entire DOMS datamodel is, in its entirety, a complex system. For proper understanding, the various components have been detailed in seperate documents. Firstly, it consist of a number of extensions to the Fedora system. Secondly, it consist of a number of predefined objects. These objects make use of the extensions to Fedora. Thirdly, it consist of a number of policies for how certain tasks are achived. And fourthly, it consist of a number of API interfaces.

Fedora and DOMS are big on namespaces. To ease writing the documentation, a namespace document, DomsNameSpacesAndSchemas, have been written. All namespaces should be defined there, and all shorthands refer to the namespaces defined therein.

Content Models in general

Fedora provides a repository for digital objects. All objects in the repository can, in principle, be unique, but Fedora provides a way of specifying that an object has a given type. Unfortunately, the type-definitions in Fedora, called Content Models, are rather simplistic by default. We use them as the basis of our type system, with certain enhancements.

For our purposes, there are two kinds of digital objects in Fedora

  • Data objects
  • Content Model objects

The Content Model object, as used in DOMS, describes the compulsory and legal content of an object of this type. It contains the information nessesary to verify if the given object is indeed of this type. For more detail on this, see FedoraOntology and FedoraTypeChecking

A data object can specify the Content Model describing its contents, via a fedora-model:hasModel relation, and in DOMS we require it to be present. A data object will be said to "subcribe" to a Content Model. Content Model inheritance, as specified in FedoraOntology, will be used.

The special Content Model object "doms:ContentModel_DOMS" is the root object. All Content Models must have an "doms-relations:extendsModel" relation to this object, possibly through a number of other Content Models. The complete description of a data object is defined as the set of the descriptions in the Content Model specified with "fedora-model:hasModel" and all Content Models that can be reached from this, by following "doms-relations:extendsModel" relations.

A Content Model can "extend" more than one other Content Model. There is no overriding of Content Models, a subscribing object must be valid in regards to all the Content Models in the inheritance tree.

Content Models have two datastreams in particular that are interesting. These are the ONTOLOGY and DS-COMPOSITE. The Ontology defines the the allowed relations in subscribing objects, and the DS-COMPOSITE defines the required datastreams and any restrictions they must adhere to.

Fedora extensions

Predefined objects

Doms policies

API documentation

Working with the Data model

Doms contains a number of content models. These are meant to serve as the basic buildingblocks for data models for new collections. A datamodel is, of course, not restricted to use only these content models, it can, and should, define it's own. All new content models, should extend doms:ContentModel_DOMS, and all objects that need to reference files outside Fedora should have a content model that derive from doms:ContentModel_File and so on. The content models that provide extra meaning are optional to use, and should at least be extended for the relevant collection.

Most data models are structured around some realworld concept, like a CD, modelled as a digital object. This object will be described by a content model that is totally collection specific, only extending doms:ContentModel_DOMS. It will probably have relations to digital objects, like tracks. These will be described by a content model that extends doms:ContentModel_Audio. Each of these will tracks must then reference a audio preservation file object, or some subtype of this. This is the best practice for constructing data models.

DataModel (last edited 2010-03-17 13:13:00 by localhost)