Differences between revisions 1 and 2
Revision 1 as of 2008-06-26 12:26:09
Size: 14709
Editor: kfc
Comment: Created by the PackagePages action.
Revision 2 as of 2010-03-17 13:09:10
Size: 14818
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 9: Line 9:
The DOMS is a common system for storage and processing of digital material and metadata. The metadata is stored as digital metadata objects in the Fedora metadata storage. The material is stored in an external bitstorage and referenced by the [#metadata metadata objects]. The DOMS is a common system for storage and processing of digital material and metadata. The metadata is stored as digital metadata objects in the Fedora metadata storage. The material is stored in an external bitstorage and referenced by the [[#metadata|metadata objects]].
Line 11: Line 11:
[[Anchor(metadata)]] <<Anchor(metadata)>>
Line 13: Line 13:
The metadata objects in the Fedora metadata storage are described using [http://www.fedora.info/download/2.0/userdocs/digitalobjects/introFOXML.html FoxML]. All metadata objects in the Fedora storage must have an [#types object type]. The metadata objects in the Fedora metadata storage are described using [[http://www.fedora.info/download/2.0/userdocs/digitalobjects/introFOXML.html|FoxML]]. All metadata objects in the Fedora storage must have an [[#types|object type]].
Line 21: Line 21:
Fedora requires us to use names of the form "<namespace>:<objectname>". For now, we have chosen to use the "doms" namespace, so all names will be of the form "doms:<objectname>". The restrictions on <objectname> are currently syncronised with the folder/file names described in [http://merkur/domswiki/TaskA.4.1DesignDocument], so they may only contain Fedora requires us to use names of the form "<namespace>:<objectname>". For now, we have chosen to use the "doms" namespace, so all names will be of the form "doms:<objectname>". The restrictions on <objectname> are currently syncronised with the folder/file names described in [[http://merkur/domswiki/TaskA.4.1DesignDocument]], so they may only contain
Line 31: Line 31:
Relations adhere to the RDF standard, and as such have names too. These names follow all the same rules and restrictions that apply to object names, but we for now we use !CamelCase, not '_'. Fedora is less restrictive about namespaces for relations than it is for objects. If a relation is introduced by a [#collspecifictypes collection-specific type], the relation uses the collection name as namespace, not "doms:". The ":" is still mandatory. Relations adhere to the RDF standard, and as such have names too. These names follow all the same rules and restrictions that apply to object names, but we for now we use !CamelCase, not '_'. Fedora is less restrictive about namespaces for relations than it is for objects. If a relation is introduced by a [[#collspecifictypes|collection-specific type]], the relation uses the collection name as namespace, not "doms:". The ":" is still mandatory.
Line 34: Line 34:
[[Anchor(types)]] <<Anchor(types)>>
Line 70: Line 70:
The predefined types in DOMS are drawn as a hierarchy in figure [#typehier Types]. The specifications of a type is inherited by the subtype, and an object of a given type must comply with the specifications of this type as well as all super types. The predefined types in DOMS are drawn as a hierarchy in figure [[#typehier|Types]]. The specifications of a type is inherited by the subtype, and an object of a given type must comply with the specifications of this type as well as all super types.
Line 72: Line 72:
[[Anchor(typehier)]]
[[ImageLink(http://merkur/viewvc/trunk/docs/datamodel/fig/typeHierarchy.png?root=doms&view=co)]]
<<Anchor(typehier)>>
[[http://merkur/viewvc/trunk/docs/datamodel/fig/typeHierarchy.png?root=doms&view=co|{{http://merkur/viewvc/trunk/docs/datamodel/fig/typeHierarchy.png?root=doms&view=co}}]]
Line 75: Line 75:
Types. DOMS object type hierarchy. [http://merkur/viewvc/trunk/docs/datamodel/fig/typeHierarchy.dia?root=doms&view=co Dia source]. Types. DOMS object type hierarchy. [[http://merkur/viewvc/trunk/docs/datamodel/fig/typeHierarchy.dia?root=doms&view=co|Dia source]].
Line 82: Line 82:
 * [:DataModel/Type_DOMS_object: Type_DOMS]
  * [:DataModel/Datastream_DublinCore: qualified DOMS Dublin Core (the DomsDC datastream)]
  * [:DataModel/Datastream_Relations: Relations to other objects (the RELS-EXT datastream)]
  * [:DataModel/Disseminators: Disseminators]
 * [:DataModel/Type_collection: Type_collection]
  * The [:DataModel/DOMS_base_collection_object: DOMS_base_collection] object
 * [:DataModel/Type_image: Type_image]
 * [:DataModel/Type_audio: Type_audio]
 * [:DataModel/Type_video: Type_video]
 * [:DataModel/Type_text: Type_text]
 * [:DataModel/Type_file: Type_file]
 * [:DataModel/Type_type: Type_type]
 * [:DataModel/Type_rights: Type_rights]
  * The [:DataModel/DOMS_admin_rights: DOMS_admin_rights] object
 * [:DataModel/Type_fileformat: Type_fileformat]
 * [[DataModel/Type DOMS object| Type_DOMS]]
  * [[DataModel/Datastream DublinCore| qualified DOMS Dublin Core (the DomsDC datastream)]]
  * [[DataModel/Datastream Relations| Relations to other objects (the RELS-EXT datastream)]]
  * [[DataModel/Disseminators| Disseminators]]
 * [[DataModel/Type collection| Type_collection]]
  * The [[DataModel/DOMS base collection object| DOMS_base_collection]] object
 * [[DataModel/Type image| Type_image]]
 * [[DataModel/Type audio| Type_audio]]
 * [[DataModel/Type video| Type_video]]
 * [[DataModel/Type text| Type_text]]
 * [[DataModel/Type file| Type_file]]
 * [[DataModel/Type type| Type_type]]
 * [[DataModel/Type rights| Type_rights]]
  * The [[DataModel/DOMS admin rights| DOMS_admin_rights]] object
 * [[DataModel/Type fileformat| Type_fileformat]]
Line 99: Line 99:
[[Anchor(formats)]] <<Anchor(formats)>>
Line 117: Line 117:
Fedora does not directly handle the files. In DOMS we represent files by digital objects of "Type_file", that reference the content, i.e. a file in bitstorage, and include a technical datastream (see [:DataModel/Type_file: Type_file]). The fileformats are stored as digital objects of "Type_fileformat", that contain a description of the format of the content in bitstorage and the required metadata in the technical datastream. Each file object has one relation to the fileformat object corresponding to the file it represents. Fedora does not directly handle the files. In DOMS we represent files by digital objects of "Type_file", that reference the content, i.e. a file in bitstorage, and include a technical datastream (see [[DataModel/Type file| Type_file]]). The fileformats are stored as digital objects of "Type_fileformat", that contain a description of the format of the content in bitstorage and the required metadata in the technical datastream. Each file object has one relation to the fileformat object corresponding to the file it represents.
Line 122: Line 122:
[[Include(DataModel/Format_list,,)]] <<Include(DataModel/Format_list,,)>>
Line 146: Line 146:
[[Anchor(collspecifictypes)]] <<Anchor(collspecifictypes)>>
Line 160: Line 160:
All the predefined types, fileformats and objects have been created in [:DataModel/DOMS_Base_Collection: 'DOMS_base_collection']. This collection is meant to be ingested as the first collection in the DOMS metadata storage. All the predefined types, fileformats and objects have been created in [[DataModel/DOMS Base Collection| 'DOMS_base_collection']]. This collection is meant to be ingested as the first collection in the DOMS metadata storage.
Line 175: Line 175:
 * [:DataModel/ExampleObjects_gentofteCollection: The Gentofte collection]
 * [:DataModel/ExampleObjects_logoCollection: The 'logo' example objects]
 * [:DataModel/ExampleObjects_ramCollection: The 'radioavismanuskript' example objects]
 * [[DataModel/ExampleObjects gentofteCollection| The Gentofte collection]]
 * [[DataModel/ExampleObjects logoCollection| The 'logo' example objects]]
 * [[DataModel/ExampleObjects ramCollection| The 'radioavismanuskript' example objects]]

DOMS Data Model

/!\ This is the DataModel used in the Production MiniDOMS, and is made for Fedora 2.x. This Document should be preserved, and referenced when a model is made for the Fedora 3 system used in the Real DOMS

TODO Consider making a guideline for using controlled vocabularies for dc:subject - See http://dublincore.org/documents/dcmi-terms/#terms-subject

The DOMS is a common system for storage and processing of digital material and metadata. The metadata is stored as digital metadata objects in the Fedora metadata storage. The material is stored in an external bitstorage and referenced by the metadata objects.

Metadata Objects

The metadata objects in the Fedora metadata storage are described using FoxML. All metadata objects in the Fedora storage must have an object type.

PIDs

In theory, the identifier for a digital object can be an unique random string of characters. In reality, having identifiers like "doms:CD_Gnags_Ref1234" makes it easy for humans to recognize objects.

Ultimately, we need a final standard, but for now we use a temporary standard.

Fedora requires us to use names of the form "<namespace>:<objectname>". For now, we have chosen to use the "doms" namespace, so all names will be of the form "doms:<objectname>". The restrictions on <objectname> are currently syncronised with the folder/file names described in http://merkur/domswiki/TaskA.4.1DesignDocument, so they may only contain [aA-zZ], [0-9], '.', '-' and '_'. In addition to this, UTF-8 characters are accepted if they are encoded with % hex-digit hex-digit. The names are CaseSensitive. As a guideline, use the '_' as a separator in the name.

We use the 'doms' namespace throughout this project. If a name is given without namespace, assume that the namespace is doms.

Changing the PID of an object is allowed but discouraged. There is no reason for us to ever change the PID of an ingested object, just the title. The PID does not contain any information about the digital objects. So if the need arises, create a new object, update references to it and mark the old object as deleted (note that external references will be invalidated).

PIDs for relations

Relations adhere to the RDF standard, and as such have names too. These names follow all the same rules and restrictions that apply to object names, but we for now we use CamelCase, not '_'. Fedora is less restrictive about namespaces for relations than it is for objects. If a relation is introduced by a collection-specific type, the relation uses the collection name as namespace, not "doms:". The ":" is still mandatory.

Typechecking in DOMS

Fedora provides a repository for digital objects. All objects in the repository can, in principle be unique, and fedora provides no means of checking whether or not a given object is of a specified type. We desire this functionality in DOMS, and our means for achiving it is described below.

Fedora understands three fundamental types of digital objects.

  • FedoraObject

  • Definitions
  • Mechanisms

We introduce a new kind of digital objects, TYPE objects. We cannot represent TYPE objects just as a fourth type of objects, like the three default ones in Fedora, due to technical limitations. TYPE objects are a special kind of objects. They do not correspond to any content in the digital collection made available by the DOMS system. Rather, they represent kinds of materials in the system.

The TYPE object describes the content model, i.e. the compulsary and legal content of an object of this type. It contains the information nessesary to verify if the given object is indeed of this type.

We make two objects in the repository, "Type_DOMS" and "Type_type" and demand that all objects in the repository have a relation of the name "isObjectType" to a TYPE object. TYPE objects are defined as the objects with a "isObjectType" relation to the "Type_type" object and non-TYPE/ordinary objects instead have the relation to "Type_DOMS" or another TYPE object.

As the "Type_DOMS" object is a TYPE object, it has a "isObjectType" relation to "Type_type". The "Type_type" object will not have a "isObjectType" relation to anybody, making it the bottom element of that relation.

A number of base types are predefined, and it is possible to define new type objects when needed.

Circular Relations

Relationcycles are explicitly forbidden in DOMS for all relations. To illustrate this object A may have a relation rela to object B, but then object B must not have the rela relation to A. Not even through an intermidiary. It is of course not a problem for B to have a relation different to A.

TYPE inheritance

A type system that does not allow for inheritance, will have limited use. We have designed inheritance to work as follows.

The "Type_DOMS" object is the parent of TYPEs. All TYPE objects, i.e. any object that has an "isObjectType" relation to "Type_type" must have an relation "extendsType" to another TYPE object. Only "Type_DOMS" is exempt from this demand, and is therefore the bottom of the "extendsType" relation-tree.

When verifying an object A against the TYPE object Type_B, that A claims to belong to, the typechecker also verifies A against the TYPE object Type_C that Type_B has the "extendsType" relation to, and the type that Type_C extends until the object "Type_DOMS" is reached.

A TYPE object is allowed to have "extendsType" relations to more than one TYPE object. So we allow for multible inheritance of types, but a single object can still only have one "isObjectType" relation, i.e. still only be of one type.

Predefined Type Objects

The predefined types in DOMS are drawn as a hierarchy in figure Types. The specifications of a type is inherited by the subtype, and an object of a given type must comply with the specifications of this type as well as all super types.

http://merkur/viewvc/trunk/docs/datamodel/fig/typeHierarchy.png?root=doms&view=co

Types. DOMS object type hierarchy. Dia source.

The predefined objects are described on the following sub pages. The Type_DOMS is the base type of all objects, and all the other types define additions to the Type_DOMS.

In the following definitions, we sometime need to specify "one or more relations like this" or "zero or one" relations. We use the '+' prefix to indicate the former, and the '?' to indicate the latter.

Digital Material and File Format Objects

Based on existing collections at SB, we have identified four kinds of digital material:

  1. Images (TIFF, PNG, JPEG2000, etc.)
  2. Audio (Broadcast WAV, WAV, etc.)
  3. Video (MPEG1, MPEG2, etc.)
  4. Text (PDF, XML, etc.)

For each kind of material, we have chosen a number of recommended formats. The DOMS will ensure access to content of files in these formats.

  1. Images: TIFF
  2. Audio: WAV, BWF
  3. Video: MPEG1, MPEG2
  4. Text: UTF8, PDF, OfficeOpenXML

When ingesting new collections, files of different formats will be converted to one of the above recommended formats, and the new files will be saved along with the originals. For text we will always extract and save a UTF8 version, and if the original text was formatted, we will also convert to one of PDF and OOXML.

Fedora does not directly handle the files. In DOMS we represent files by digital objects of "Type_file", that reference the content, i.e. a file in bitstorage, and include a technical datastream (see Type_file). The fileformats are stored as digital objects of "Type_fileformat", that contain a description of the format of the content in bitstorage and the required metadata in the technical datastream. Each file object has one relation to the fileformat object corresponding to the file it represents.

Fileformat objects are objects with a relation "isObjectType" to "Type_fileformat" but they are not TYPEs themselves. But neither do they correspond to actual content in DOMS.

We do allow for subtyping in fileformats. An object of type Type_fileformat has an optional relation, "extendsFormat" to a fileformat object. This is meant to represent the baseline TIFF format versus the full TIFF format.

Migration of files in DOMS

Each object of TYPE Type_file can have a "hasOriginal" relation to another file object. That way, if a datafile is migrated to a new format, we create a new object of TYPE Type_file and point the new object's "hasOriginal" relation to the old file object. We then list all the objects with a "hasFile" relation to the old file object, and decide which should have the refence updated to the new object, and which should keep pointing at the old object. Often one would just update all the relations, but sometimes the migration kan break some functionality in the file, so certain objects will need the old file.

Collections and Rights in DOMS

The DOMS system shall, ultimately, be a system that indexes and models several collections. As such, we need to model the digital objects as belonging to one or more collections. Likewise, each collection, or even digital object might need a representation of its legal and administrative rights.

How to model this

We define a new type, "Type_Collection", and demand that all objects must have a "isPartOfCollection" relation to a object of "Type_Collection". So one must have a single object of "Type_Collection" for each collection in the DOMS. But collection objects themselves must also have a "isPartOfCollection" relation, so we create a single collection object named "doms:Collection" that represents the entire DOMS collection. This object will function as the bottom element for the "isPartOfCollection" relation.

This also allows us to have subcollections within collections. The collection objects were NOT designed to represent a CD, with each track as part of the collection. For this, create new types. The Collection objects should represent a clearly defined collection of digital material controlled by a single governing body.

Rigths are modelled in much the same way. We define a type "Type_Rights" and demand that all objects must have a "hasRights" relation to a object of this type. Again we are faced with the need for at bottom element, so "doms:DOMS_admin_rights" is made.

Rights objects must be part of a collection, like all other objects, but there can be several rights objects in a collection. The collection object itself must have a relation to a rights object, but the objects that are part of the collection are free to have their relation to a different rights object. That way the overall rights of the collection can be made different from the rights of the individual elements, and elements within a collection can have different rights.

Remember the ban on circular relations. A new collection is not allowed to have two collection objects that each have an "isPartOfCollection" relation to each other. A must be part of B and B must be part of the DOMS collection. Likewise for Rights.

Collection specific TYPEs and fileformat

Most/all of the collections that will ultimately be housed in DOMS will need special TYPEs to represent their structure. For a library it could possibly be a Type_book and Type_author, and perhaps Type_chapter and Type_page.

These new TYPEs are made just as the predefined ones. They must have an "isObjectType" relation to "doms:Type_type", of course, and they must extend "doms:Type_DOMS". But they are part of the new collection that introduced them, and therefore have the "isPartOfCollection" relation to that collection object.

Should a collection specific type prove to be of general enough use, that is should be integrated into the DOMS_base_collection, this can easily be achived by giving it yet another "isPartOfCollection" relation, this one to "doms:Collection. Remember, an object can have more than one "isPartOfCollection" relation.

The exact same procedure can be used for collection specific fileformats. Create them in the collection, and later on add a relation directly to "doms:Collection" if they prove generally useful.

Doms Collection

All the predefined types, fileformats and objects have been created in 'DOMS_base_collection'. This collection is meant to be ingested as the first collection in the DOMS metadata storage.

Levels of Metadata

In summary, we have three levels of metadata:

  1. Common Core: The metadata that must be present in all objects. This includes the core properties defined by the FoxML format and an DOMS Dublin Core description and is formalised as the Type_DOMS.
  2. Base Object Types: The metadata defined by the base object types. For example the image object type specifies that an object of Type_image must have a hasFile relation to an object of Type_file with reference to the TIFF file format.

  3. Collection Specific Types: The metadata defined by new object types introduced in new collections.

Example Collections

MiniDOMSDataModel (last edited 2010-03-17 13:09:10 by localhost)