Differences between revisions 1 and 59 (spanning 58 versions)
Revision 1 as of 2008-06-26 12:26:05
Size: 15678
Editor: kfc
Comment: Created by the PackagePages action.
Revision 59 as of 2008-10-01 15:19:19
Size: 8874
Editor: abr
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= DOMS Data model = = Doms Data Model =
Line 3: Line 3:
[[ImageLink(http://merkur/viewvc/trunk/docs/datamodel/fig/DOMSBaseCollection.png?root=doms&view=co,alt=DOMS base collection,width=1000)]] In the following these shorthands will refer to the following namespaces
 * rdf - http://www.w3.org/1999/02/22-rdf-syntax-ns#
 * rdfs - http://www.w3.org/2000/01/rdf-schema#
 * owl - http://www.w3.org/2002/07/owl#
 * fedora-model - info:fedora/fedora-system:def/model#
 * doms-relations - http://doms.statsbiblioteket.dk/relations/default/0/1/#
 * doms - Standard prefix for PIDs. It is not a namespace
Line 5: Line 11:
The DOMS datamodel describe how the Type system underlying DOMS is realised in Fedora 3. The figure above will serve as a guide through the following sections. '''A definition: A datamodel describes the content of a collection. A content model describes the content of a data object. So, a data model is a set of content models, that together describe the collection.'''
Line 9: Line 15:
== DOMS Content Models ==
Line 11: Line 16:
Fedora provides a repository for digital objects. All objects in the repository can, in principle be unique, but Fedora provides of specifying an object as a given type. Unfortunately, the type-definitions in Fedora, called Content Models, are rather simplistic by default. We use them as the basis of our type system, with certain enhancements. [[ImageLink(http://merkur/viewvc/trunk/docs/datamodel/fig/DOMSBaseCollection.png?root=doms&view=co,alt=DOMS base collection,width=256)]]

The DOMS datamodel describes how the Type system underlying DOMS is realised in
Fedora 3. The figure above will serve as a guide through the following sections.




== DOMS Content Models in general ==

Fedora
provides a repository for digital objects. All objects in the repository can, in principle, be unique, but Fedora provides a way of specifying that an object has a given type. Unfortunately, the type-definitions in Fedora, called Content Models, are rather simplistic by default. We use them as the basis of our type system, with certain enhancements.
Line 17: Line 31:
The Content Model object, as used in DOMS, describes the compulsary and legal content of an object of this type. It contains the information nessesary to verify if the given object is indeed of this type. The Content Model object, as used in DOMS, describes the compulsary and legal content of an object of this type. It contains the information nessesary to verify if the given object is indeed of this type. For more detail on this, see FedoraOntology and FedoraTypeChecking
Line 19: Line 33:
A data object can specify the Content Model describing it's contents, via a hasContentModel relation. It can only have one such relation, and in DOMS we require it to be present. A data object can specify the Content Model describing its contents, via a fedora-model:hasModel relation, and in DOMS we require it to be present. A data object will be said to "subcribe" to a Content Model. Content Model inheritance, as specified in FedoraOntology, will be used.
Line 21: Line 35:
=== Inheritance ===
A type system that does not allow for inheritance, will have limited use. In spite of this, the Fedora Content Models do not provide this functionality. We have build our own inheritance system for Content Models to compensate for this lack.
The special Content Model object "doms:!ContentModel_DOMS" is the root object. All Content Models must have an "doms-relations:extendsModel" relation to this object, possibly through a number of other Content Models. The complete description of a data object is defined as the set of the descriptions in the Content Model specified with "fedora-model:hasModel" and all Content Models that can be reached from this, by following "doms-relations:extendsModel" relations.
Line 24: Line 37:
The special Content Model object "ContentModel_DOMS" is the root object. All Content Models must have an "extends" relation to this object, possible through a number of other Content Models. When performing validation of a Data Object, the validator will validate the object against the Content Model specified with "hasContentModel" and all Content Models that can be reached from this, by following "extends" relations. A Content Model can "extend" more than one other Content Model. There is no overriding of Content Models, a subscribing object must be valid in regards to all the Content Models in the inheritance tree.
Line 26: Line 39:
A Content Model can "extend" more than one other Content Model. The validator should only validate an object against a given Content Model once per invocation.
Line 28: Line 40:
=== Predefined Content Models === Content Models have two datastreams in particular that are interesting. These are the ONTOLOGY and DS-COMPOSITE. The Ontology defines the the allowed relations in subscribing objects, and the DS-COMPOSITE defines the required datastreams and any restrictions they must adhere to.
Line 30: Line 42:
Shorthand:
 * myObject.myDatastream means the Datastream myDatastream in the Object myObject.
== DOMS Base Collection ==
Line 33: Line 44:
==== ContentModel_DOMS ==== Doms come shipped with a base collection of objects, representing the base types. The objects in the base collection will be detailed in the following.


 * [:DataModel/ContentModel_DOMS: doms:ContentModel_DOMS] In progress
 * [:DataModel/ContentModel_File: doms:ContentModel_File] In progress
 * [:DataModel/ContentModel_AudioPreservationFile: doms:ContentModel_AudioPreservationFile] DONE
 * [:DataModel/ContentModel_ImagePreservationFile: doms:ContentModel_ImagePreservationFile]
 * [:DataModel/ContentModel_VideoPreservationFile: doms:ContentModel_VideoPreservationFile]
 * [:DataModel/ContentModel_TextPreservationFile: doms:ContentModel_TextPreservationFile]
 * [:DataModel/ContentModel_Audio: doms:ContentModel_Audio] In progress
 * [:DataModel/ContentModel_Image: doms:ContentModel_Image]
 * [:DataModel/ContentModel_Video: doms:ContentModel_Video]
 * [:DataModel/ContentModel_Text: doms:ContentModel_Text]
 * [:DataModel/ContentModel_License: doms:ContentModel_License] DONE
 * [:DataModel/ContentModel_Schema: doms:ContentModel_Schema] DONE
 * [:DataModel/ContentModel_Collection: doms:ContentModel_Collection] DONE


== doms:ContentModel_DOMS ==
ContentModel_DOMS is the root of the content model inheritance tree. All content models derive from this model. As all data objects in DOMS must have a content model, all data objects must adhere to the restrictions defined in this content model.

For the precise definition of doms:!ContentModel_DOMS, see ["DataModel/ContentModel_DOMS"]

All subscribing objects must have a "doms-relations:hasLicense" relation to a object of "doms:ContentModel_License". Also, it must have one or more "doms-relations:isPartOfCollection" relations to objects of "doms:ContentModel_Collection".

All subscribing objects must have the following datastreams "DC", "POLICY", "STATE" and "RELS-EXT". The "DC" stream must conform to the schema defined in the datastream "DC_SCHEMA", in "doms:ContentModel_DOMS". The "STATE" stream must conform to the schema in datastream "STATE_SCHEMA" in "doms:ContentModel_DOMS". To understand the use of the "STATE" datastream, see the FedoraTransactionsReplacement. The use of the "POLICY" stream is defined in FedoraLicensePolicies. The "AUDIT" stream is detailed in DomsAuditTrail.






== doms:ContentModel_File ==

In DOMS, we have found it beneficial to separate the abstract concept of "Image" or "Audio" from the concrete implementations such as "jpeg" and "mp3". The metadata about the image will be relevant no matter the manifestation of the image, and as such should not reside along with the technical metadata about the manifestation. To support this separation, we have introduced the concept of File objects.

A File object is an object, that contain a link to the file (in Bitstorage), and the technical metadata about this file. Only File objects are allowed to reference a file in Bitstorage. File objects must all have a Content Model that extends !ContentModel_File.

Objects of !ContentModel_File can have "doms-relations:hasOriginal" relations to other objects of "doms:!ContentModel_File". If a file A is the result of a migration of file B and both are in Doms, the File A data object will have a "doms-relations:hasOriginal" relation to the data object for File B.

This specifies that data objects of doms:!ContentModel_File must have the datastream "CHARACTERISATION", "CONTENTS", "ORIGIN" and "PRONOMID". Each of these deserve some description.

==== CONTENTS ====
"CONTENTS" is where the actual file is. It will always have the the "E" controlgroup, meaning that the file is externally referenced. The reference must allways be to a File in Bitstorage, and only this datastream is ever allowed to reference files.



====



=== ContentModel_ImagePreservationFile ===
Extends ContentModel_File

Requirements for objects described by !ContentModel_ImagePreservationFile
 * Datastreams
  * PRONOMID: must be "fmt/10" (tiff version 6)



=== ContentModel_TextPreservationFile ===
Extends ContentModel_File

Requirements for objects described by !ContentModel_TextPreservationFile
 * Datastreams
  * PRONOMID: must be "x-fmt/16" (Utf8) or "fmt/95" (pdf-a)


=== ContentModel_VideoPreservationFile ===
Extends ContentModel_File

Requirements for objects described by !ContentModel_VideoPreservationFile
 * Datastreams
  * PRONOMID: must be "x-fmt/385" (mpeg1) or "x-fmt/386" (mpeg2)




=== ContentModel_Image ===
Extends ContentModel_DOMS
Line 36: Line 126:
 * $Title: A human readable title
 * $PID: A generated PID for this object
 * $ContentModel: The content model for this object, must derive from ContentModel_DOMS
 * $Collection: The collection that this object belongs to. An object with a Content Model that derives from ContentModel_Collection
 * $Licence: The Licence object that govern access to this object. An object with a Content Model that derives from ContentModel_Licence
 * $ImageFile: An object with the Content Model !ContentModel_!ImagePreservationFile
Line 42: Line 128:
Requirements for objects described by ContentModel_DOMS
 * Datastreams
  * DC
   * dc:title = $Title
  * DomsDC
   * dcterms:title = $Title
  * RELS-EXT
   * oai:itemID = $PID
   * fedora:hasContentModel -> $ContentModel
   * doms:hasLicence -> $LicenceObject
   * doms:isPartOfCollection -> $Collection
  * AUDIT
   Systemcontrolled audit trail
  * POLICY
   * ContentLocation URL = $LicenceObject.LICENCE
  * STATE: The state of the object, in XML
   * <availibility> = draft | intermediate | published

==== ContentModel_Image ====
The following variables are used:
 * $TiffFile: An object with the Content Model ContentModel_TiffFile;

Requirements for objects described by ContentModel_Image
Requirements for objects described by !ContentModel_Image
Line 67: Line 131:
   * doms:hasFile -> $TiffFile

==== ContentModel_Audio ====
The following variables are used:
 * $WavFile: An object with the Content Model ContentModel_WavFile;
 * $BwfFile: An object with the Content Model ContentModel_BwfFile;

Requirements for objects described by ContentModel_Wav
 * Datastreams
  * RELS-EXT: One of the following
   * doms:hasFile -> $WavFile
   * doms:hasFile -> $BwfFile

==== ContentModel_Audio ====
The following variables are used:
 * $WavFile: An object with the Content Model ContentModel_WavFile;
 * $BwfFile: An object with the Content Model ContentModel_BwfFile;

Requirements for objects described by ContentModel_Wav
 * Datastreams
  * RELS-EXT: One of the following
   * doms:hasFile -> $WavFile
   * doms:hasFile -> $BwfFile

==== ContentModel_Video ====
The following variables are used:
 * $Mpeg1File: An object with the Content Model ContentModel_Mpeg1File;
 * $Mpeg2File: An object with the Content Model ContentModel_Mpeg2File;

Requirements for objects described by ContentModel_Wav
 * Datastreams
  * RELS-EXT: One of the following
   * doms:hasFile -> $Mpeg1File
   * doms:hasFile -> $Mpeg2File

==== ContentModel_Text ====
The following variables are used:
 * $Utf8File: An object with the Content Model ContentModel_WavFile;
 * $PdfFile: An object with the Content Model ContentModel_BwfFile;
 * $DocFile: An object with the Content Model ContentModel_OfficeOpenXmlFile;
Requirements for objects described by ContentModel_Wav
 * Datastreams
  * RELS-EXT: One of the following
   * doms:hasFile -> $Utf8File
   * doms:hasFile -> $PdfFile
   * doms:hasFile -> $DocFile
   * doms:hasPreservationFile -> $ImageFile
Line 115: Line 134:
==== ContentModel_Licence ==== === ContentModel_Video ===
Extends ContentModel_DOMS
Line 117: Line 137:
Requirements for objects described by ContentModel_Licence
 * Datastreams
  * LICENCE: XACML describing the licence
  * LICENCETEXT: The human readable textual representation of the licence

==== ContentModel_Collection ====

Requirements for objects described by ContentModel_Collection: None
The following variables are used:
 * $VideoFile: An object with the Content Model !ContentModel_VideoPreservationFile;
Line 127: Line 141:
==== ContentModel_File ====
The following variables are used:
 * $OrigFile: An object with the Content Model that derives from ContentModel_File;
Requirements for objects described by !ContentModel_Video
 * Datastreams
  * RELS-EXT:
   * doms:hasPreservationFile -> $VideoFile
Line 131: Line 146:
Requirements for objects described by ContentModel_File
 * Datastreams
  * RELS-EXT
   * (optional) doms:hasOriginal -> $OrigFile
  * CHARACTERIZATION: The output of the characterization tools
  * CONTENTS:
   * ContentLocation URL = The file in Bitstorage
  * ORIGIN: Metadata about the creation of the file
  * PRONOM: The pronom ID of the fileformat
Line 141: Line 147:
All the predefined subtypes of File bring no new requirements.

=== Content Model implementation ===

A Content Model in DOMS must have a number of additional datastreams, in regards to the Content Model requirements defined by Fedora.
=== ContentModel_Text ===
Extends ContentModel_DOMS
Line 148: Line 151:
 * $PID: A generated PID for this object
 * $ContentModel: The content model for this object, must derive from ContentModel_DOMS
 * $Collection: The collection that this object belongs to. An object with a Content Model that derives from ContentModel_Collection
 * $Licence: The Licence object that govern access to this object. An object with a Content Model that derives from ContentModel_Licence
 * $TextFile: An object with the Content Model !ContentModel_TextPreservationFile;

Requirements for objects described by !ContentModel_Text
 * Datastreams
  * RELS-EXT:
   * doms:hasPreservationFile -> $TextFile
Line 154: Line 159:
Requirements for Content Model objects (except ContentModel_DOMS, which do not have an "extends" relation)
 * Datastreams
  * RELS-EXT
   * (1+) doms:extends -> $ContentModel
   * oai:itemID = $PID
   * doms:hasLicence -> $LicenceObject
   * doms:isPartOfCollection -> $Collection
  * VALIDATIONBINDINGS: Described below
  * VIEW: Described below
  * AUDIT
   Systemcontrolled audit trail
  * POLICY
   * ContentLocation URL = $LicenceObject.LICENCE
  * STATE: The state of the object, in XML
   * <availibility> = draft | intermediate | published
  
Line 171: Line 160:
== Technical Metadata ==
A file object should contain technical metadata. In this context it refers
 * A datastream with the output of the characterisation tools used on this file upon ingest, called CHARACTERISATION
 * A datastream with the metadata about the origins of the file, called Origin
Line 172: Line 165:
VALIDATIONBINDINGS contain xml of the form:
{{{
<binding name="binding1">
  <from name="datastream_name_in_data_object"/>
  <to name="datastream_with_validator_schema_in_this_object"/>
</binding>
}}}

First stab at a schema for this datastream
{{{
<xsd:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xsd:element name="bindings">
    <xsd:complexType>
      <xsd:sequence>
       <xsd:element name="binding" type="bindingType" minOccurs="0" maxOccurs="unbounded"/>
      </xsd:sequence>
    </xsd:complexType>
  <xsd:element>
  <xsd:complexType name="bindingType">
    <xsd:sequence>
      <xsd:element name="from" type="datastreamType"/>
      <xsd:element name="to" type="datastreamType"/>
   </xsd:sequence>
  </xsd:complexType>
  <xsd:complexType name="datastreamType">
    <xsd:attribute name="ID" type="idType" use="required"/>
  </xsd:complexType>
  <xsd:simpleType name="idType">
    <xsd:restriction base="xsd:ID">
      <xsd:maxLength value="64"/>
    </xsd:restriction>
  </xsd:simpleType>
</xs:schema>
}}}


== Collections ==

The DOMS system will be a system that models several collections of digital objects. Each object belongs to one or more collections. This is represented by having one or more "isPartOfCollection" relations to the parent collections. This goes for a collection object as well - it belongs to another collection. One collection has special status though: the "doms:Root_collection" does not belong to any other collection, and is thus the bottom element for the "isPartOfCollection" relation. Every other collection has a "isPartOfCollection" relation to "doms:Root_collection".

In addition, DOMS contains a special collection - the "doms:DOMS_base_collection". This collection provides objects such as content models and licences that are utilized by (and mandatory for) the other collections in the DOMS. Content models and licences are discussed below. This collection is meant to be ingested as the first collection in the DOMS.

== File Objects ==

In DOMS, we have found it beneficial to seperate the abstract concept of "Image" or "Audio" from the concrete implementations such as "jpeg" and "mp3". The metadata about the image will be relevant no matter the manifestation of the image, and as such should not reside along with the technical metadata about the manifestation. To support this separation, we have introduced the concept of File objects.

A File object is an object, that contain a link to the file (in Bitstorage), and the technical metadata about this file. Only File objects are allowed to reference a file in Bitstorage. File objects must all have a Content Model that extends ContentModel_File.

=== Preservation files ===
We have a special class of files in DOMS, the ones we are willing to promise to preserve. These are the eight formats
 * Tiff
 * Wav, Bwf
 * Mpeg1, Mpeg2
 * UTF8, Pdf, OfficeOpenXml
For each of these, we have defined a Content Model that extends ContentModel_File.

=== Presentation files ===
DOMS does not, by default, support the concept of presentation files. Files in Bitstorage, are per default, meant to preserved, and it should not be used for transitory formats. Instead presentation files are dynamically generated upon request.

All the preservation files Content Models have disseminators that convert the given preservation format into a presentation format.

=== Technical Metadata ===
A file object should contain technical metadata. In this context it refers
 * A datastream with the output of the characterization tools used on this file upon ingest
 * A datastream with the metadata about the origins of the file.

In addition, it must have a relation "hasOriginal" if it was migrated from another file that exist in DOMS.


== Views ==

A view is a way of combining objects in the DOMS into a domain-relevant group. It is a way of seeing a number of objects as related - as a whole; information that can be useful for the GUI-generator when generating GUI-windows.

Those views that we imagine as being suitable for a screen or window in the GUI, are called main views. Each main view contains an object that the main view is centered around. We call this the main object, and the ID of this view is the ID of the main object. Views of other objects are simply called views.
The main object is the object that represents the main view - other objects in that view are related to the main object and would presumably be relevant to edit in the GUI at the same time. For a CD modelled in DOMS, for example, a CD object would be the main object, and objects for tracks, cover, lyrics and so on would constitute the rest of the main view.

We imagine that results appearing in searches in the GUI will all be main views. In fact every view that will be the basis for a screen/window will be a main view.

A view for an object O is represented by a Datastream VIEW on the Content Model object for O. This Datastream also mark the object as Main, if this is the case. Please note that the view is defined on Content Model level, so the same rules are used to generate the view for all objects using that Content Model.

The datastream will just contain a list of relation names. Following these relations will give you the view. Views are inherited when Content Models "extends" each other, so you should generate the view for each Content Model in the inheritance tree of this Object, and remove duplicates.

Note: Exactly how these relations should be followed has not been decided yet. Suggestions include:

 * 1-step relations (relations on a content model c of the form "x rel y", meaning that if an object x with content model c has relation rel to another object y, then y will be part the view too. Examples of rel for a CD modelled in DOMS could be hasTrack, hasLyrics,..)

 * x-step relations. These are relations of the form mentioned above, but they will be followed from an object an arbitrary number of times, as long as the relations match.

 * reverse relations (relations of the type "y rel x" on a content model c, where x has model c and y therefore will be included in the view for x)

In addition, it has been suggested to augment the 1-step approach with the idea of "includes". What this means is that when object O has a view defined by following relations from O once, and an object P is in the view of O, then the view of P will be included in the view of O.
Note that this is different from x-step relations, where objects in the view of P would not necessarily be included in the view of O.


== Licenses ==

Licences, in DOMS, have, as their only purpose, to restrict who can view what material. They are only a concern for people using the material in DOMS, not users working with the GUI, or otherwise administrating the contents.

Licences are implemented by using the Fedora XACML engine. When a user authenticates with the Fedora server (or with a server passing authentication tokens to the DOMS), he gets a number of attributes. Each of these attributes name one licence that the user can access material under.

Each data object in DOMS has a POLICY datastream. This datastream is just an URL, referring to a Licence object's LICENCE datastream. This datastream is an XACML stream, that evaluates if the user posses the attribute that specify that the user can use this Licence. If yes, the user is granted access to the original object, otherwise he is denied.


== Audit Trail ==

Each user that will use the GUI will need to login. They will authenticate with some external server, probably the SB LDAP server. The access control is not really a concern for the DOMS system. As such, all GUI users will have equal and full access to the DOMS repository.

Audit trails, however, are a concern. Each change to a datastream in a data object will, per default Fedora behaivour, create a new version of this datastream, marked with the creation time and the username. For this reason the Fedora operations PurgeObject and PurgeDatastream are blocked, as they destroy the audit trail.
Real deletion of information is not possible, but both objects and datastreams can be marked as "deleted", again per standard Fedora behaviour. Any tools working with or on DOMS should respect this flag. The GUI should only concern itself with the most recent version of a datastream.
In addition, it must have a relation "hasOriginal" if it was migrated from another file that exists in DOMS.

Doms Data Model

In the following these shorthands will refer to the following namespaces

A definition: A datamodel describes the content of a collection. A content model describes the content of a data object. So, a data model is a set of content models, that together describe the collection.

ImageLink(http://merkur/viewvc/trunk/docs/datamodel/fig/DOMSBaseCollection.png?root=doms&view=co,alt=DOMS base collection,width=256)

The DOMS datamodel describes how the Type system underlying DOMS is realised in Fedora 3. The figure above will serve as a guide through the following sections.

DOMS Content Models in general

Fedora provides a repository for digital objects. All objects in the repository can, in principle, be unique, but Fedora provides a way of specifying that an object has a given type. Unfortunately, the type-definitions in Fedora, called Content Models, are rather simplistic by default. We use them as the basis of our type system, with certain enhancements.

For our purposes, there are two kinds of digital objects in Fedora

  • Data objects
  • Content Model objects

The Content Model object, as used in DOMS, describes the compulsary and legal content of an object of this type. It contains the information nessesary to verify if the given object is indeed of this type. For more detail on this, see FedoraOntology and FedoraTypeChecking

A data object can specify the Content Model describing its contents, via a fedora-model:hasModel relation, and in DOMS we require it to be present. A data object will be said to "subcribe" to a Content Model. Content Model inheritance, as specified in FedoraOntology, will be used.

The special Content Model object "doms:ContentModel_DOMS" is the root object. All Content Models must have an "doms-relations:extendsModel" relation to this object, possibly through a number of other Content Models. The complete description of a data object is defined as the set of the descriptions in the Content Model specified with "fedora-model:hasModel" and all Content Models that can be reached from this, by following "doms-relations:extendsModel" relations.

A Content Model can "extend" more than one other Content Model. There is no overriding of Content Models, a subscribing object must be valid in regards to all the Content Models in the inheritance tree.

Content Models have two datastreams in particular that are interesting. These are the ONTOLOGY and DS-COMPOSITE. The Ontology defines the the allowed relations in subscribing objects, and the DS-COMPOSITE defines the required datastreams and any restrictions they must adhere to.

DOMS Base Collection

Doms come shipped with a base collection of objects, representing the base types. The objects in the base collection will be detailed in the following.

doms:ContentModel_DOMS

ContentModel_DOMS is the root of the content model inheritance tree. All content models derive from this model. As all data objects in DOMS must have a content model, all data objects must adhere to the restrictions defined in this content model.

For the precise definition of doms:ContentModel_DOMS, see ["DataModel/ContentModel_DOMS"]

All subscribing objects must have a "doms-relations:hasLicense" relation to a object of "doms:ContentModel_License". Also, it must have one or more "doms-relations:isPartOfCollection" relations to objects of "doms:ContentModel_Collection".

All subscribing objects must have the following datastreams "DC", "POLICY", "STATE" and "RELS-EXT". The "DC" stream must conform to the schema defined in the datastream "DC_SCHEMA", in "doms:ContentModel_DOMS". The "STATE" stream must conform to the schema in datastream "STATE_SCHEMA" in "doms:ContentModel_DOMS". To understand the use of the "STATE" datastream, see the FedoraTransactionsReplacement. The use of the "POLICY" stream is defined in FedoraLicensePolicies. The "AUDIT" stream is detailed in DomsAuditTrail.

doms:ContentModel_File

In DOMS, we have found it beneficial to separate the abstract concept of "Image" or "Audio" from the concrete implementations such as "jpeg" and "mp3". The metadata about the image will be relevant no matter the manifestation of the image, and as such should not reside along with the technical metadata about the manifestation. To support this separation, we have introduced the concept of File objects.

A File object is an object, that contain a link to the file (in Bitstorage), and the technical metadata about this file. Only File objects are allowed to reference a file in Bitstorage. File objects must all have a Content Model that extends ContentModel_File.

Objects of ContentModel_File can have "doms-relations:hasOriginal" relations to other objects of "doms:ContentModel_File". If a file A is the result of a migration of file B and both are in Doms, the File A data object will have a "doms-relations:hasOriginal" relation to the data object for File B.

This specifies that data objects of doms:ContentModel_File must have the datastream "CHARACTERISATION", "CONTENTS", "ORIGIN" and "PRONOMID". Each of these deserve some description.

CONTENTS

"CONTENTS" is where the actual file is. It will always have the the "E" controlgroup, meaning that the file is externally referenced. The reference must allways be to a File in Bitstorage, and only this datastream is ever allowed to reference files.

====

ContentModel_ImagePreservationFile

Extends ContentModel_File

Requirements for objects described by ContentModel_ImagePreservationFile

  • Datastreams
    • PRONOMID: must be "fmt/10" (tiff version 6)

ContentModel_TextPreservationFile

Extends ContentModel_File

Requirements for objects described by ContentModel_TextPreservationFile

  • Datastreams
    • PRONOMID: must be "x-fmt/16" (Utf8) or "fmt/95" (pdf-a)

ContentModel_VideoPreservationFile

Extends ContentModel_File

Requirements for objects described by ContentModel_VideoPreservationFile

  • Datastreams
    • PRONOMID: must be "x-fmt/385" (mpeg1) or "x-fmt/386" (mpeg2)

ContentModel_Image

Extends ContentModel_DOMS

The following variables are used:

  • $ImageFile: An object with the Content Model ContentModel_ImagePreservationFile

Requirements for objects described by ContentModel_Image

  • Datastreams
    • RELS-EXT

ContentModel_Video

Extends ContentModel_DOMS

The following variables are used:

Requirements for objects described by ContentModel_Video

  • Datastreams
    • RELS-EXT:

ContentModel_Text

Extends ContentModel_DOMS

The following variables are used:

Requirements for objects described by ContentModel_Text

  • Datastreams
    • RELS-EXT:

Technical Metadata

A file object should contain technical metadata. In this context it refers

  • A datastream with the output of the characterisation tools used on this file upon ingest, called CHARACTERISATION
  • A datastream with the metadata about the origins of the file, called Origin

In addition, it must have a relation "hasOriginal" if it was migrated from another file that exists in DOMS.

DataModel (last edited 2010-03-17 13:13:00 by localhost)