Size: 11162
Comment:
|
Size: 9597
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
#pragma section-numbers on | = Doms Data Model = In the following these shorthands will refer to the following namespaces * rdf - http://www.w3.org/1999/02/22-rdf-syntax-ns# * rdfs - http://www.w3.org/2000/01/rdf-schema# * owl - http://www.w3.org/2002/07/owl# * fedora-model - info:fedora/fedora-system:def/model# * doms-relations - http://doms.statsbiblioteket.dk/relations/default/0/1/# * doms - Standard prefix for PIDs. It is not a namespace '''A definition: A datamodel describes the content of a collection. A content model describes the content of a data object. So, a data model is a set of content models, that together describe the collection.''' |
Line 6: | Line 16: |
[[TableOfContents]] = Extensions for Fedora = The DOMS data model have been make use of the Fedora extension: * [FedoraOntology: OWL LITE ontologies for the Content Model Architecture] * [FedoraTypeChecking: Datastream schemas for the Content Model Architecture] [[ImageLink(http://merkur/viewvc/trunk/docs/datamodel/fig/DOMSBaseCollection.png?root=doms&view=co,alt=DOMS base collection,width=1024)]] |
[[ImageLink(http://merkur/viewvc/trunk/docs/datamodel/fig/DOMSBaseCollection.png?root=doms&view=co,alt=DOMS base collection,width=256)]] |
Line 23: | Line 21: |
== DOMS Content Models == | == DOMS Content Models in general == |
Line 31: | Line 31: |
The Content Model object, as used in DOMS, describes the compulsary and legal content of an object of this type. It contains the information nessesary to verify if the given object is indeed of this type. | The Content Model object, as used in DOMS, describes the compulsary and legal content of an object of this type. It contains the information nessesary to verify if the given object is indeed of this type. For more detail on this, see FedoraOntology and FedoraTypeChecking |
Line 35: | Line 35: |
=== Inheritance === The special Content Model object "!ContentModel_DOMS" is the root object. All Content Models must have an "doms:extendsModel" relation to this object, possibly through a number of other Content Models. The complete description of a data object is defined as the set of the descriptions in the Content Model specified with "hasModel" and all Content Models that can be reached from this, by following "extendsModel" relations. |
The special Content Model object "doms:!ContentModel_DOMS" is the root object. All Content Models must have an "doms-relations:extendsModel" relation to this object, possibly through a number of other Content Models. The complete description of a data object is defined as the set of the descriptions in the Content Model specified with "fedora-model:hasModel" and all Content Models that can be reached from this, by following "doms-relations:extendsModel" relations. |
Line 42: | Line 40: |
Content Models have two datastreams in particular that are interesting. These are the ONTOLOGY and DS-COMPOSITE. The Ontology defines the the allowed relations in subscribing objects, and the DS-COMPOSITE defines the required datastreams and any restrictions they must adhere to. | |
Line 43: | Line 42: |
=== Schema Objects === Many of the schemas used in DOMS need to be referenced many times. To avoid duplication, we have made objects containing only schemas, suscribing to the Content Model "doms:ContentModel_Schema". The describing datastream in a schemabinding might contain the schema directly, or it might contain the URL to the datastream that does. Either way, it should be invisible to programs accessing the datastream through the API. |
== Schema Objects == |
Line 48: | Line 45: |
==== Inheritance rules ==== Views are inherited when Content Models extends each other. Keep three seperate lists, one for datastreams, one for relations and one for inverse relations. Just concatenate the entries from all parent content models to these lists, and remove duplicates. Then use these three lists to generate the list of objects in the view. |
XXXXX Many of the schemas used in DOMS need to be referenced many times. To avoid duplication, we have made objects containing only schemas, subcribing to the Content Model "doms:ContentModel_Schema". The describing datastream in a schemabinding might contain the schema directly, or it might contain the URL to the datastream that does. Either way, it should be invisible to programs accessing the datastream through the API. == DOMS Base Collection == Doms come shipped with a base collection of objects, representing the base types. The objects in the base collection will be detailed in the following. == doms:ContentModel_DOMS == ContentModel_DOMS is the root of the content model inheritance tree. All content models derive from this model. As all data objects in DOMS must have a content model, all data objects must adhere to the restrictions defined in this content model. For the precise definition of doms:!ContentModel_DOMS, see ["DataModel/ContentModel_DOMS"] All subscribing objects must have a "doms-relations:hasLicense" relation to a object of "doms:ContentModel_License". Also, it must have one or more "doms-relations:isPartOfCollection" relations to objects of "doms:ContentModel_Collection". All subscribing objects must have the following datastreams "DC", "POLICY", "STATE" and "RELS-EXT". The "DC" stream must conform to the schema defined in the datastream "DC_SCHEMA", in "doms:ContentModel_DOMS". The "STATE" stream must conform to the schema in datastream "STATE_SCHEMA" in "doms:ContentModel_DOMS". To understand the use of the "STATE" datastream, see the FedoraTransactionsReplacement. The use of the "POLICY" stream is defined in FedoraLicensePolicies. The "AUDIT" stream is detailed in DomsAuditTrail. |
Line 53: | Line 64: |
== Predefined Content Models == Shorthand: * myObject.myDatastream means the Datastream myDatastream in the Object myObject. * $variable introduces a variable. === ContentModel_DOMS === ContentModel_DOMS is the root of the content model inheritance tree. All content models derive from this model. As all objects in DOMS must have a content model, all objects must adhere to the restrictions defined in this content model. The following variables are used: * $Title: A human readable title * $PID: A generated PID for this object * $!ContentModel: The content model for this object, must derive from !ContentModel_DOMS * $Collection: The collection that this object belongs to. An object with a Content Model that derives from !ContentModel_Collection * $!LicenseObject: The License object that govern access to this object. An object with a Content Model that derives from !ContentModel_License Requirements for objects described by !ContentModel_DOMS * Datastreams * DC: Fedora required datastream with the [http://dublincore.org/schemas/xmls/qdc/dc.xsd Dublin Core schema]. * dc:title = $Title * RELS-EXT: Fedora controlled relations between objects. * oai:itemID = $PID (only required on harvested objects) * fedora-model:hasModel -> $!ContentModel * doms:hasLicense -> $LicenseObject * doms:isPartOfCollection -> $Collection * AUDIT: Fedoracontrolled audit trail * STATE: The state of the object, (Values = draft | intermediate | published) * POLICY Fedoracontrolled datastream describing the rights and restrictions on this object * !ContentLocation URL = $!LicenseObject.LICENCE === ContentModel_File === Extends ContentModel_DOMS |
== doms:ContentModel_File == |
Line 97: | Line 70: |
Objects of !ContentModel_File can have "doms-relations:hasOriginal" relations to other objects of "doms:!ContentModel_File". If a file A is the result of a migration of file B and both are in Doms, the File A data object will have a "doms-relations:hasOriginal" relation to the data object for File B. | |
Line 98: | Line 72: |
The following variables are used: * $OrigFile: An object with the Content Model that derives from !ContentModel_File; |
This specifies that data objects of doms:!ContentModel_File must have the datastream "CHARACTERISATION", "CONTENTS", "ORIGIN" and "PRONOMID". Each of these deserve some description. |
Line 101: | Line 74: |
Requirements for objects described by !ContentModel_File * ObjectProperties * External Properties * http://doms.statsbiblioteket.dk/extproperties/#pronomID : The pronom ID of the file * Datastreams * RELS-EXT * (optional) doms:hasOriginal -> $OrigFile * CHARACTERISATION: The output of the characterisation tools. Schema attachment:Characterisation.xsd * CONTENTS: Datastream containing the file * !ContentLocation URL = The file in Bitstorage * ORIGIN: Metadata about the creation of the file, in the Premis [http://www.loc.gov/standards/premis/v1/Event-v1-1.xsd schema] |
==== CONTENTS ==== "CONTENTS" is where the actual file is. It will always have the the "E" controlgroup, meaning that the file is externally referenced. The reference must allways be to a File in Bitstorage, and only this datastream is ever allowed to reference files. |
Line 114: | Line 78: |
The characterisation datastream could look like this. {{{ <?xml version="1.0" encoding="UTF-8"?> <char:characterisation xsi:schemaLocation="http://doms.statsbiblioteket.dk/types/characterisation/0/1/# http://developer.statsbiblioteket.dk/DOMS/types/characterisation/0/1/characterisation/characterisation-0-1.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:char="http://developer.statsbiblioteket.dk/DOMS/types/characterisation/0/1/#" xmlns:jhove=""> <char:characterisationRun> <char:tool>JHove</char:tool> <char:output> <jhove:...> </jhove:...> </char:output> </char:characterisationRun> </char:characterisation> }}} |
==== |
Line 245: | Line 197: |
= Audit Trail = Each user that will use the GUI will need to login. They will authenticate with some external server, probably the SB LDAP server. The access control is not really a concern for the DOMS system. As such, all GUI users will have equal and full access to the DOMS repository. Audit trails, however, are a concern. Each change to a datastream in a data object will, per default Fedora behaivour, create a new version of this datastream, marked with the creation time and the username. For this reason the Fedora operations PurgeObject and PurgeDatastream are blocked, as they destroy the audit trail. Real deletion of information is not possible, but both objects and datastreams can be marked as "deleted", again per standard Fedora behaviour. Any tools working with or on DOMS should respect this flag. The GUI should only concern itself with the most recent version of a datastream. |
Doms Data Model
In the following these shorthands will refer to the following namespaces
- fedora-model - info:fedora/fedora-system:def/model#
doms-relations - http://doms.statsbiblioteket.dk/relations/default/0/1/#
- doms - Standard prefix for PIDs. It is not a namespace
A definition: A datamodel describes the content of a collection. A content model describes the content of a data object. So, a data model is a set of content models, that together describe the collection.
The DOMS datamodel describes how the Type system underlying DOMS is realised in Fedora 3. The figure above will serve as a guide through the following sections.
DOMS Content Models in general
Fedora provides a repository for digital objects. All objects in the repository can, in principle, be unique, but Fedora provides a way of specifying that an object has a given type. Unfortunately, the type-definitions in Fedora, called Content Models, are rather simplistic by default. We use them as the basis of our type system, with certain enhancements.
For our purposes, there are two kinds of digital objects in Fedora
- Data objects
- Content Model objects
The Content Model object, as used in DOMS, describes the compulsary and legal content of an object of this type. It contains the information nessesary to verify if the given object is indeed of this type. For more detail on this, see FedoraOntology and FedoraTypeChecking
A data object can specify the Content Model describing its contents, via a fedora-model:hasModel relation, and in DOMS we require it to be present. A data object will be said to "subcribe" to a Content Model. Content Model inheritance, as specified in FedoraOntology, will be used.
The special Content Model object "doms:ContentModel_DOMS" is the root object. All Content Models must have an "doms-relations:extendsModel" relation to this object, possibly through a number of other Content Models. The complete description of a data object is defined as the set of the descriptions in the Content Model specified with "fedora-model:hasModel" and all Content Models that can be reached from this, by following "doms-relations:extendsModel" relations.
A Content Model can "extend" more than one other Content Model. There is no overriding of Content Models, a subscribing object must be valid in regards to all the Content Models in the inheritance tree.
Content Models have two datastreams in particular that are interesting. These are the ONTOLOGY and DS-COMPOSITE. The Ontology defines the the allowed relations in subscribing objects, and the DS-COMPOSITE defines the required datastreams and any restrictions they must adhere to.
Schema Objects
XXXXX Many of the schemas used in DOMS need to be referenced many times. To avoid duplication, we have made objects containing only schemas, subcribing to the Content Model "doms:ContentModel_Schema". The describing datastream in a schemabinding might contain the schema directly, or it might contain the URL to the datastream that does. Either way, it should be invisible to programs accessing the datastream through the API.
DOMS Base Collection
Doms come shipped with a base collection of objects, representing the base types. The objects in the base collection will be detailed in the following.
doms:ContentModel_DOMS
ContentModel_DOMS is the root of the content model inheritance tree. All content models derive from this model. As all data objects in DOMS must have a content model, all data objects must adhere to the restrictions defined in this content model.
For the precise definition of doms:ContentModel_DOMS, see ["DataModel/ContentModel_DOMS"]
All subscribing objects must have a "doms-relations:hasLicense" relation to a object of "doms:ContentModel_License". Also, it must have one or more "doms-relations:isPartOfCollection" relations to objects of "doms:ContentModel_Collection".
All subscribing objects must have the following datastreams "DC", "POLICY", "STATE" and "RELS-EXT". The "DC" stream must conform to the schema defined in the datastream "DC_SCHEMA", in "doms:ContentModel_DOMS". The "STATE" stream must conform to the schema in datastream "STATE_SCHEMA" in "doms:ContentModel_DOMS". To understand the use of the "STATE" datastream, see the FedoraTransactionsReplacement. The use of the "POLICY" stream is defined in FedoraLicensePolicies. The "AUDIT" stream is detailed in DomsAuditTrail.
doms:ContentModel_File
In DOMS, we have found it beneficial to separate the abstract concept of "Image" or "Audio" from the concrete implementations such as "jpeg" and "mp3". The metadata about the image will be relevant no matter the manifestation of the image, and as such should not reside along with the technical metadata about the manifestation. To support this separation, we have introduced the concept of File objects.
A File object is an object, that contain a link to the file (in Bitstorage), and the technical metadata about this file. Only File objects are allowed to reference a file in Bitstorage. File objects must all have a Content Model that extends ContentModel_File.
Objects of ContentModel_File can have "doms-relations:hasOriginal" relations to other objects of "doms:ContentModel_File". If a file A is the result of a migration of file B and both are in Doms, the File A data object will have a "doms-relations:hasOriginal" relation to the data object for File B.
This specifies that data objects of doms:ContentModel_File must have the datastream "CHARACTERISATION", "CONTENTS", "ORIGIN" and "PRONOMID". Each of these deserve some description.
CONTENTS
"CONTENTS" is where the actual file is. It will always have the the "E" controlgroup, meaning that the file is externally referenced. The reference must allways be to a File in Bitstorage, and only this datastream is ever allowed to reference files.
====
ContentModel_ImagePreservationFile
Extends ContentModel_File
Requirements for objects described by ContentModel_ImagePreservationFile
- Datastreams
- PRONOMID: must be "fmt/10" (tiff version 6)
ContentModel_TextPreservationFile
Extends ContentModel_File
Requirements for objects described by ContentModel_TextPreservationFile
- Datastreams
- PRONOMID: must be "x-fmt/16" (Utf8) or "fmt/95" (pdf-a)
ContentModel_VideoPreservationFile
Extends ContentModel_File
Requirements for objects described by ContentModel_VideoPreservationFile
- Datastreams
- PRONOMID: must be "x-fmt/385" (mpeg1) or "x-fmt/386" (mpeg2)
ContentModel_AudioPreservationFile
Extends ContentModel_File
Requirements for objects described by ContentModel_AudioPreservationFile
- Datastreams
- PRONOMID: must be "fmt/2" (Bwf version 1) or "fmt/6" (wav)
ContentModel_Image
Extends ContentModel_DOMS
The following variables are used:
$ImageFile: An object with the Content Model ContentModel_ImagePreservationFile
Requirements for objects described by ContentModel_Image
- Datastreams
- RELS-EXT
doms:hasPreservationFile -> $ImageFile
- RELS-EXT
ContentModel_Audio
Extends ContentModel_DOMS
The following variables are used:
$AudioFile: An object with the Content Model ContentModel_AudioPreservationFile;
Requirements for objects described by ContentModel_Audio
- Datastreams
- RELS-EXT:
doms:hasPreservationFile -> $AudioFile
- RELS-EXT:
ContentModel_Video
Extends ContentModel_DOMS
The following variables are used:
$VideoFile: An object with the Content Model ContentModel_VideoPreservationFile;
Requirements for objects described by ContentModel_Video
- Datastreams
- RELS-EXT:
doms:hasPreservationFile -> $VideoFile
- RELS-EXT:
ContentModel_Text
Extends ContentModel_DOMS
The following variables are used:
$TextFile: An object with the Content Model ContentModel_TextPreservationFile;
Requirements for objects described by ContentModel_Text
- Datastreams
- RELS-EXT:
doms:hasPreservationFile -> $TextFile
- RELS-EXT:
ContentModel_License
Extends ContentModel_DOMS
Requirements for objects described by ContentModel_License
- Datastreams
LICENCE: XACML describing the license. [http://www.oasis-open.org/committees/download.php/915/cs-xacml-schema-policy-01.xsd Schema]
- DC: The DC datastream (probably the description field) is used to describe the human readable version of the license
ContentModel_Schema
Extends ContentModel_DOMS
Requirements for objects described by ContentModel_Schema
- Datastreams
- SCHEMA: The xsd schema inlined.
ContentModel_Collection
Extends ContentModel_DOMS
Requirements for objects described by ContentModel_Collection: None
Technical Metadata
A file object should contain technical metadata. In this context it refers
- A datastream with the output of the characterisation tools used on this file upon ingest, called CHARACTERISATION
- A datastream with the metadata about the origins of the file, called Origin
In addition, it must have a relation "hasOriginal" if it was migrated from another file that exists in DOMS.