| Size: 3692 Comment:  |  ← Revision 36 as of 2010-03-17 13:09:36  ⇥ Size: 5798 Comment: converted to 1.6 markup | 
| Deletions are marked like this. | Additions are marked like this. | 
| Line 3: | Line 3: | 
| ''work in progress'' | <<TableOfContents>> | 
| Line 8: | Line 8: | 
| Program sources for our DOMS are accessible at {{{<TODO: SOURCE FORGE URL>}}}. | Program sources for our DOMS are accessible at [[http://sourceforge.net/projects/doms|SourceForge]]. | 
| Line 17: | Line 17: | 
| * Integrates with certain authorization systems | * Handles metadata for access rights, and integrates with certain authentication systems | 
| Line 22: | Line 22: | 
| A brief technical overview of the individual parts of can be found at [[BriefTechnicalDescriptionOfDOMSparts]] | |
| Line 30: | Line 30: | 
| If, for example, we were to store audio CDs in a DOMS, the actual digitized tracks, along with scanned cover art, could be stored as files in the bitstorage. The Fedora then would keep an object (or a cluster of objects) representing the CD, and containing information about for example artist(s), production year and record label, as well as information relating this CD to the digitized material in the bitstorage. | |
| Line 32: | Line 31: | 
| If, for example, we were to store audio CDs in a DOMS, the actual digitized tracks, along with scanned cover art, could be stored as files in the bitstorage. The Fedora then would keep a metadata object (or a cluster of metadata objects) representing the CD, and containing information about for example artist(s), production year and record label, as well as information relating this CD to the digitized material in the bitstorage. | |
| Line 37: | Line 37: | 
| The packages that form the DOMS are the following. * Fedora ECM (Extended Content Models) * Interface to bitstorage * Interface to search engine * OAI-PMH handling * The DOMS GUI | The packages that form the DOMS are the following: 1. The DOMS server interface 1. DOMS Front end Services: * DOMS Ingest System * Interface to the Summa search engine * OAI-PMH * The DOMS GUI 1. The DOMS backend: * Fedora ECM (Enhanced Content Models) * Interface to bitstorage * An update tracker | 
| Line 45: | Line 51: | 
| * Fedora repository (available at: {{{<TODO: INSERT URL>}}}) | * Fedora repository (available at [[http://fedora-commons.org|the Fedora Commons website]]) | 
| Line 49: | Line 55: | 
| * A search engine that the system will interface to | * The Summa search engine (available at [[http://summa.sourceforge.net|the Summa website]]) | 
| Line 53: | Line 59: | 
| {{{<TODO: INSERT FIGURE>}}} | {{attachment:DOMSOverview20090819.png}} | 
| Line 59: | Line 64: | 
| Each object in the repository may contain the following: | Each metadata object in the repository contains the following: | 
| Line 61: | Line 66: | 
| * {{{<TODO: ADD MORE INFO>}}} * {{{<TODO: ADD MORE INFO>}}} | * Metadata as XML | 
| Line 65: | Line 69: | 
| {{{<TODO: ADD MORE INFO?>}}} | In addition to objects containing ''descriptive metadata'' (metadata like artist in the CD example above), there are objects that we call ''file objects''. All relations to data in a bitstorage come from file objects, and these objects instead of descriptive metadata contain ''technical metadata'' about the files they point to (things like sample rate, for sound files). | 
| Line 67: | Line 71: | 
| In addition to the objects that contain metadata, i.e. which represent the content of the DOMS, there is a number of special objects which a DOMS always contains. The kinds of predefined objects in this ''base collection'' are: | In addition to the objects that contain metadata, i.e. which represent the content of the DOMS, there are a number of special objects in a DOMS. These special objects are: | 
| Line 71: | Line 75: | 
| * ''Template'' objects, that are templates for generating new objects in the repository | |
| Line 72: | Line 77: | 
| A DOMS comes preloaded with some special objects, defined in the ''base collection''. These base objects are meant as a base for defining your own data models (see below) for specific collections. | |
| Line 73: | Line 79: | 
| The figure below shows the objects in the base collection, and their relations. | |
| Line 74: | Line 81: | 
| {{{<TODO: ADD MORE INFO>}}} | {{attachment:DOMSBaseCollection.png}} | 
| Line 76: | Line 83: | 
| Content models are related in a hierarchy with {{{ContentModel_DOMS}}} at the top. Similarly, collections are related by the ''isPartOfCollection'' relation, and at the top of this hierarchy is {{{Root_Collection}}}. The file objects mentioned earlier have the content models (via relation ''hasModel'') that extend {{{ContentModel_File}}}. That is, each has a content model for either image, text, audio, or video. Finally, {{{ContentModel_License}}} is the content model of all licenses, and we include {{{Open_License}}}, which is the license of all content models in the base collection. | |
| Line 77: | Line 85: | 
| === Datamodels === | === Data Models === | 
| Line 79: | Line 87: | 
| {{{<TODO: ADD MORE INFO>}}} | For each collection that is to be stored in DOMS, a ''data model'' must be defined. | 
| Line 81: | Line 89: | 
| A datamodel is a description of the formats of data and metadata, and how this data is organised in Fedora objects. In practice, this is done by formal descriptions of content models, defined by Enhanced Content Models. Having a formally described model allows us to validate our data, and to use data model aware software that integrates with our repository. | |
| Line 82: | Line 91: | 
| == Usage of a Fedora repository == | The DOMS data model, as described above, defines some structure that '''must''' be true for each collection. This includes relations to one or more collections, a license for each object, and the requirements for separate file objects containing technical metadata for each file we store in Fedora. | 
| Line 84: | Line 93: | 
| {{{<TODO: ADD MORE INFO>}}} | We also assume Dublin Core as one minimal metadata format for each object. | 
| Line 86: | Line 95: | 
| == Usage of the Fedora ECM == {{{<TODO: ADD MORE INFO>}}} === Validation === {{{<TODO: ADD MORE INFO>}}} == Interfacing with a bitstorage == {{{<TODO: ADD MORE INFO>}}} == Handling OAI-PMH requests == {{{<TODO: ADD MORE INFO>}}} == Interfacing to search engines == {{{<TODO: ADD MORE INFO>}}} == Usage of the DOMS GUI == {{{<TODO: ADD MORE INFO>}}} == The included testbed == {{{<TODO: ADD MORE INFO>}}} | However, the model is flexible, and a data model for a specific collection ''may'' define additional metadataformats and relations that describe the formats and structure for that particular collection. | 
High Level Design Documentation
Contents
Introduction
A DOMS is a Digital Object Management System. This document describes the open source DOMS developed at the State and University Library in Aarhus, Denmark. Program sources for our DOMS are accessible at SourceForge.
The DOMS aims at fulfilling the following objectives.
- Stores and handles digital material and metadata, with a view to long-term preservation
- Supports re-use of components in new collections of material
- Supports a common index with possibility for relations between objects
- Is modular with possibility for additions to the system
- Can be monitored and maintained by non-developers
- Handles metadata for access rights, and integrates with certain authentication systems
What is not part of the objectives:
- Establishing work flow systems and data-specific ingest systems
A brief technical overview of the individual parts of can be found at BriefTechnicalDescriptionOfDOMSparts
Overview of a DOMS system
Our DOMS keeps its data in two different kinds of storage:
- A Fedora repository
- A bitstorage
The Fedora repository keeps metadata, and the bitstorage keeps the actual digital material files.
If, for example, we were to store audio CDs in a DOMS, the actual digitized tracks, along with scanned cover art, could be stored as files in the bitstorage. The Fedora then would keep a metadata object (or a cluster of metadata objects) representing the CD, and containing information about for example artist(s), production year and record label, as well as information relating this CD to the digitized material in the bitstorage.
System structure
Our system is available as several different packages, each containing part of the DOMS, and each part interacting with the others.
The packages that form the DOMS are the following:
- The DOMS server interface
- DOMS Front end Services: - DOMS Ingest System
- Interface to the Summa search engine
- OAI-PMH
- The DOMS GUI
 
- The DOMS backend: - Fedora ECM (Enhanced Content Models)
- Interface to bitstorage
- An update tracker
 
In addition, the following separate projects are needed:
- Fedora repository (available at the Fedora Commons website) 
Finally, you may need:
- A bitstorage that the system will use
- The Summa search engine (available at the Summa website) 
The figure below illustrates how the different modules make use of each other.
 
 
DOMS objects
The metadata stored by the DOMS is represented as objects inside a Fedora repository. Each metadata object in the repository contains the following:
- A PID (persistent identifier of the object)
- Metadata as XML
- Relations to other objects (optional)
In addition to objects containing descriptive metadata (metadata like artist in the CD example above), there are objects that we call file objects. All relations to data in a bitstorage come from file objects, and these objects instead of descriptive metadata contain technical metadata about the files they point to (things like sample rate, for sound files).
In addition to the objects that contain metadata, i.e. which represent the content of the DOMS, there are a number of special objects in a DOMS. These special objects are:
- Content model objects, each acting as a type or class for other objects 
- Basic collection objects, all objects belong to a collection 
- License objects, every content model relates to a license 
- Template objects, that are templates for generating new objects in the repository 
A DOMS comes preloaded with some special objects, defined in the base collection. These base objects are meant as a base for defining your own data models (see below) for specific collections.
The figure below shows the objects in the base collection, and their relations.
 
 
Content models are related in a hierarchy with ContentModel_DOMS at the top. Similarly, collections are related by the isPartOfCollection relation, and at the top of this hierarchy is Root_Collection. The file objects mentioned earlier have the content models (via relation hasModel) that extend ContentModel_File. That is, each has a content model for either image, text, audio, or video. Finally, ContentModel_License is the content model of all licenses, and we include Open_License, which is the license of all content models in the base collection.
Data Models
For each collection that is to be stored in DOMS, a data model must be defined.
A datamodel is a description of the formats of data and metadata, and how this data is organised in Fedora objects. In practice, this is done by formal descriptions of content models, defined by Enhanced Content Models. Having a formally described model allows us to validate our data, and to use data model aware software that integrates with our repository.
The DOMS data model, as described above, defines some structure that must be true for each collection. This includes relations to one or more collections, a license for each object, and the requirements for separate file objects containing technical metadata for each file we store in Fedora.
We also assume Dublin Core as one minimal metadata format for each object.
However, the model is flexible, and a data model for a specific collection may define additional metadataformats and relations that describe the formats and structure for that particular collection.