DOMS Data model

TODO:

ImageLink(http://merkur/viewvc/trunk/docs/datamodel/fig/DOMSBaseCollection.png?root=doms&view=co,alt=DOMS base collection,width=1024)

The DOMS datamodel describes how the Type system underlying DOMS is realised in Fedora 3. The figure above will serve as a guide through the following sections.

DOMS Content Models

Fedora provides a repository for digital objects. All objects in the repository can, in principle, be unique, but Fedora provides a way of specifying that an object has a given type. Unfortunately, the type-definitions in Fedora, called Content Models, are rather simplistic by default. We use them as the basis of our type system, with certain enhancements.

For our purposes, there are two kinds of digital objects in Fedora

The Content Model object, as used in DOMS, describes the compulsary and legal content of an object of this type. It contains the information nessesary to verify if the given object is indeed of this type.

A data object can specify the Content Model describing its contents, via a fedora-model:hasModel relation. It can only have one such relation, and in DOMS we require it to be present. A data object will be said to "suscribe" to a Content Model.

Inheritance

A type system that does not allow for inheritance, will have limited use. In spite of this, the Fedora Content Models do not provide this functionality. We have built our own inheritance system for Content Models to compensate for this lack.

The special Content Model object "ContentModel_DOMS" is the root object. All Content Models must have an "extendsModel" relation to this object, possibly through a number of other Content Models. The complete description of a data object is defined as the set of the descriptions in the Content Model specified with "hasModel" and all Content Models that can be reached from this, by following "extendsModel" relations.

A Content Model can "extend" more than one other Content Model. When determining the inheritance tree, and which Content Models that override each others, questions can arise. From the starting Content Model, perform a Breadth-first search of the inheritance tree. For datastream described by the Content Models, use only the first description you find. All the others count as overridden.

Using a Content Model

A content model contain a plethora of information in many separate datastreams. To properly interface with the DOMS system, one must know how to interpret this information.

Important datastreams:

DS-COMPOSITE-MODEL

Fedoracontrolled structural datastream. Lists the required datastreams in the suscribing data objects and the MIME-type, which will almost always be text/xml

As this datastream is fedora-controlled, it does not respect our inheritance system. Therefore, it only mentions the datastreams that this particular content model requires to be present. In order to construct the complete list of required datastreams for a suscribing object, one must follow the "extendsModel" relations and concatenate the lists from these Content Models. Lastly, remove duplicates. Since this datastream only names required datastreams, there will be no issue with inheritance and overriding.

Example of the contents of a DS-COMPOSITE-MODEL datastream can be seen below

                <dsCompositeModel
                        xmlns="info:fedora/fedora-system:def/dsCompositeModel#">
                    <dsTypeModel ID="DC">
                        <form MIME="text/xml"/>
                    </dsTypeModel>
                    <dsTypeModel ID="RELS-EXT">
                        <form MIME="application/rdf+xml"/>
                    </dsTypeModel>
                    <dsTypeModel ID="POLICY">
                        <form MIME="text/xml"/>
                    </dsTypeModel>
                    <dsTypeModel ID="AUDIT">
                        <form MIME="text/xml"/>
                    </dsTypeModel>
                </dsCompositeModel>

SCHEMABINDINGS

DS-COMPOSITE-MODEL lists the required datastreams, but make no statements about their contents. SCHEMABINDINGS does, albeit indirectly. For each required datastream in a suscribing object, it gives the name of a datastream in this Content Model containing a sort of schema for the contents. By "sort of schema", there is meant something that can be used both for validation of the contents and provides enough information to construct some human understandable interface to the contents.

At the moment there are two such schema lanquages, [http://en.wikipedia.org/wiki/XML_Schema_(W3C) "xsd"] and [http://en.wikipedia.org/wiki/Web_Ontology_Language "owl"] lite. Owl lite is only ever used to express requirements about the RELS-EXT datastream. All other datastreams must be describable by xsd schemas.

There is a third option for schema language, which breaks somewhat with the above. As the reader knows, a fedora object, not matter the type, can be said to consist of three fundamental components.

While the relations are implemented as a datastream, the properties are not, but otherwise behave in much the same way. We wanted to be able to store information in the properties, and we therefore needed a way to describe the contents of properties. Properties are in fact two things, the internal properties, managed and used by fedora, and the external properties. The external properties are stored and handled by fedora, but are not used. These we want to be able to access and store information in. By introducing a name for a datastream "EXTPROPERTIES", which is understood to refer to the external properties, rather than a proper datastream, SCHEMABINDINGS can link the external properties to a schema.

We added a third possible schema language, "propschema", which will only ever be used if the described datastream is EXTPROPERTIES. The describing datastream will then consist of a number of rules, written in the language defined in the datastream SCHEMA in the object doms:Extprop_Schema. A non-authoriatively copy of this schema can be seen below.

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            targetNamespace="http://doms.statsbiblioteket.dk/properties/extproperties/0/1/#"
            xmlns="http://doms.statsbiblioteket.dk/properties/extproperties/0/1/#"
            elementFormDefault="qualified"
            attributeFormDefault="unqualified">

    <xsd:element name="extproperties" type="extpropertiesType"/>

    <xsd:complexType name="extpropertiesType">
        <xsd:sequence>
            <xsd:element name="property" type="propertyType" minOccurs="0" maxOccurs="unbounded"/>
        </xsd:sequence>
    </xsd:complexType>

    <xsd:complexType name="propertyType">
        <xsd:choice maxOccurs="1" minOccurs="1">
            <xsd:element name="legal-values" type="valuesType"/>
            <xsd:element name="all-values"/>
        </xsd:choice>
        <xsd:attribute name="name" type="xsd:string" use="required"/>
    </xsd:complexType>

    <xsd:complexType name="valuesType">
        <xsd:sequence>
            <xsd:element name="value" type="xsd:string" minOccurs="1" maxOccurs="unbounded"/>
        </xsd:sequence>
    </xsd:complexType>
</xsd:schema>

VIEW

Datastreams in ContentModel_DOMS

Predefined Content Models

Shorthand:

ContentModel_DOMS

The following variables are used:

Requirements for objects described by ContentModel_DOMS

ContentModel_Image

The following variables are used:

Requirements for objects described by ContentModel_Image

ContentModel_Audio

The following variables are used:

Requirements for objects described by ContentModel_Audio

ContentModel_Video

The following variables are used:

Requirements for objects described by ContentModel_Video

ContentModel_Text

The following variables are used:

Requirements for objects described by ContentModel_Text

ContentModel_License

Requirements for objects described by ContentModel_License

ContentModel_Collection

Requirements for objects described by ContentModel_Collection: None

ContentModel_File

The following variables are used:

Requirements for objects described by ContentModel_File

The characterisation datastream could look like this.

<?xml version="1.0" encoding="UTF-8"?>
<char:characterisation xsi:schemaLocation="http://developer.statsbiblioteket.dk/DOMS/types/characterisation/0/1/# http://developer.statsbiblioteket.dk/DOMS/types/characterisation/0/1/characterisation/characterisation-0-1.xsd"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:char="http://developer.statsbiblioteket.dk/DOMS/types/characterisation/0/1/#"
  xmlns:jhove="">
  <char:characterisationRun>
    <char:tool>JHove</char:tool>
    <char:output>
      <jhove:...>
      </jhove:...>
    </char:output>
  </char:characterisationRun>
</char:characterisation>

All the predefined subtypes of File bring no new requirements.

Content Model implementation

Datastreams in ContentModel_DOMS

A Content Model in DOMS must have a number of additional datastreams, in regards to the Content Model requirements defined by Fedora.

The following variables are used:

Requirements for Content Model objects (except ContentModel_DOMS, which do not have an "extends" relation)

Schemabindings

The schemabinding datastream links datastreams in the described data objects with datastreams (containing schemas) in this object.

<?xml version="1.0" encoding="UTF-8"?>
<b:bindings xmlns:b="http://developer.statsbiblioteket.dk/DOMS/types/Schemabinding/0/1/#">
  <b:binding schema="xsd">
    <b:from name="datastream_name_in_data_object"/>
    <b:to name="datastream_with_validator_schema_in_this_object"/>
  </b:binding>
</b:bindings>

DS-COMPOSITE-MODEL

<dsCompositeModel xmlns="info:fedora/fedora-system:def/dsCompositeModel#">
  <dsTypeModel ID="DC">
    <form MIME="text/xml"/>
  </dsTypeModel>
  <dsTypeModel ID="RELS-EXT">
    <form MIME="application/rdf+xml"/>
  </dsTypeModel>
  <dsTypeModel ID="POLICY">
    <form MIME="text/xml"/>
  </dsTypeModel>
  <dsTypeModel ID="AUDIT">
    <form MIME="text/xml"/>
  </dsTypeModel>
</dsCompositeModel>

View

View datastream contain xml of the form

<?xml version="1.0" encoding="UTF-8"?>
<view:views  xmlns:view="http://developer.statsbiblioteket.dk/DOMS/types/view/0/1/#">
  <view:view name="GUI" mainobject="true">
    <view:relations>
      <doms:hasFile xmlns:doms="http://developer.statsbiblioteket.dk/DOMS/relations/default/0/1/#"/>
    </view:relations>
    <view:inverse-relations>
      <doms:isPartOfCollection xmlns:doms="http://developer.statsbiblioteket.dk/DOMS/relations/default/0/1/#"/>
    </view:inverse-relations>
    <view:datastreams>
      <view:datastream>DC</view:datastream>
    </view:datastreams>
  </view:view>
</view:views>

As can be seen, it describes all relations to be followed outwards, both directly and reverse. When including the object, only the named datastreams from the datastreams tag should be used. There can be several views, with different views in an object. The GUI should use the view with the name GUI.

Collections

The DOMS system will be a system that models several collections of digital objects. Each object belongs to one or more collections. This is represented by having one or more "isPartOfCollection" relations to the parent collections. This goes for a collection object as well - it belongs to another collection. One collection has special status though: the "doms:Root_Collection" does not belong to any other collection, and is thus the bottom element for the "isPartOfCollection" relation. Every other collection has a "isPartOfCollection" relation to "doms:Root_Collection".

In addition, DOMS contains another special collection - the "doms:DOMS_Base_Collection". This collection provides objects such as content models and licenses that are utilized by (and mandatory for) the other collections in the DOMS. This collection is meant to be ingested as the first collection in the DOMS.

File Objects

In DOMS, we have found it beneficial to seperate the abstract concept of "Image" or "Audio" from the concrete implementations such as "jpeg" and "mp3". The metadata about the image will be relevant no matter the manifestation of the image, and as such should not reside along with the technical metadata about the manifestation. To support this separation, we have introduced the concept of File objects.

A File object is an object, that contain a link to the file (in Bitstorage), and the technical metadata about this file. Only File objects are allowed to reference a file in Bitstorage. File objects must all have a Content Model that extends ContentModel_File.

Preservation files

We have a special class of files in DOMS, the ones we are willing to promise to preserve. These are the eight formats

For each of these, we have defined a Content Model that extends ContentModel_File.

Presentation files generally are dynamically generated upon request.

Technical Metadata

A file object should contain technical metadata. In this context it refers

In addition, it must have a relation "hasOriginal" if it was migrated from another file that exists in DOMS.

Views

A view is a way of combining objects in the DOMS into a domain-relevant group. It is a way of seeing a number of objects as related - as a whole; information that can be useful for the GUI-generator when generating GUI-windows.

Those views that we imagine as being suitable for a screen or window in the GUI, are called main views. Each main view contains an object that the main view is centered around. We call this the main object, and the ID of this view is the ID of the main object. Views of other objects are simply called views. The main object is the object that represents the main view - other objects in that view are related to the main object and would presumably be relevant to edit in the GUI at the same time. For a CD modelled in DOMS, for example, a CD object would be the main object, and objects for tracks, cover, lyrics and so on would constitute the rest of the main view.

We imagine that results appearing in searches in the GUI will all be main views. In fact every view that will be the basis for a screen/window will be a main view.

A view for an object O is represented by a Datastream VIEW on the Content Model object for O. This Datastream also mark the object as Main, if this is the case. Please note that the view is defined on Content Model level, so the same rules are used to generate the view for all objects using that Content Model.

The datastream will just contain a list of relation names and reverse relation names. Following these relations will give you the view. Views are inherited when Content Models "extends" each other, so you should generate the view for each Content Model in the inheritance tree of this Object, and remove duplicates.

Definitions:

In addition, we suggest to augment the 1-step approach with the idea of "includes". What this means is that when object O has a view defined by following relations from O once, and an object P is in the view of O, then the view of P will be included in the view of O.

Licenses

Licenses, in DOMS, have, as their only purpose, to restrict who can view what material. They are only a concern for people using the material in DOMS, not users working with the GUI, or otherwise administrating the contents.

Licenses are implemented by using the Fedora XACML engine. When a user authenticates with the Fedora server (or with a server passing authentication tokens to the DOMS), he gets a number of attributes. Each of these attributes name one license that the user can access material under.

Each data object in DOMS has a POLICY datastream. This datastream is just an URL, referring to a License object's LICENCE datastream. This datastream is an XACML stream, that evaluates if the user posses the attribute that specify that the user can use this License. If yes, the user is granted access to the original object, otherwise he is denied.

Audit Trail

Each user that will use the GUI will need to login. They will authenticate with some external server, probably the SB LDAP server. The access control is not really a concern for the DOMS system. As such, all GUI users will have equal and full access to the DOMS repository.

Audit trails, however, are a concern. Each change to a datastream in a data object will, per default Fedora behaivour, create a new version of this datastream, marked with the creation time and the username. For this reason the Fedora operations PurgeObject and PurgeDatastream are blocked, as they destroy the audit trail. Real deletion of information is not possible, but both objects and datastreams can be marked as "deleted", again per standard Fedora behaviour. Any tools working with or on DOMS should respect this flag. The GUI should only concern itself with the most recent version of a datastream.