Differences between revisions 7 and 31 (spanning 24 versions)
Revision 7 as of 2009-06-24 09:41:46
Size: 3329
Editor: jrg
Comment:
Revision 31 as of 2009-10-09 09:03:06
Size: 10742
Editor: kfc
Comment:
Deletions are marked like this. Additions are marked like this.
Line 5: Line 5:
[[TableOfContents]]
Line 8: Line 10:
Program sources for our DOMS are accessible at {{{<TODO: SOURCE FORGE URL>}}}. Program sources for our DOMS are accessible at [http://sourceforge.net/projects/doms SourceForge].
Line 17: Line 19:
 * Handles metadata for access rights, and integrates with certain authentication systems
Line 19: Line 22:
 * Establishing work flow systems and ingest systems
 * Handling authorisation
 * Establishing work flow systems and data-specific ingest systems
Line 30: Line 31:
If, for example, we were to store audio CDs in a DOMS, the actual digitized tracks, along with scanned cover art, could be stored as files in the bitstorage. The Fedora then would keep an object (or a cluster of objects) representing the CD, and containing information about for example artist(s), production year and record label, as well as information relating this CD to the digitized material in the bitstorage.
Line 32: Line 32:
If, for example, we were to store audio CDs in a DOMS, the actual digitized tracks, along with scanned cover art, could be stored as files in the bitstorage. The Fedora then would keep a metadata object (or a cluster of metadata objects) representing the CD, and containing information about for example artist(s), production year and record label, as well as information relating this CD to the digitized material in the bitstorage.
Line 37: Line 38:
The packages that form the DOMS are the following.
* Fedora ECM (Extended Content Models)
 * Interface to bitstorage
 * Interface to search engine
* OAI-PMH handling
* The DOMS GUI
The packages that form the DOMS are the following:

 1
. The DOMS server interface
 1. The DOMS backend:
 
* Fedora ECM (Enhanced Content Models)
  * Interface to bitstorage
  * An update tracker
 1. DOMS Services:
  * DOMS Ingest System
  * Interface to the Summa search engine
 
* OAI-PMH
 
* The DOMS GUI
Line 45: Line 52:
 * Fedora repository (available at: {{{<TODO: INSERT URL>}}})  * Fedora repository (available at [http://fedora-commons.org the Fedora Commons website])
Line 49: Line 56:
 * A search engine that the system will interface to  * The Summa search engine (available at [http://summa.sourceforge.net the Summa website])
Line 53: Line 60:
{{{<TODO: INSERT FIGURE>}}}
attachment:DOMSOverview20090819.png
Line 59: Line 65:
Each object in the repository contains the following:
 *
 *
*
Each metadata object in the repository contains the following:
 * A PID (persistent identifier of the object)
 * Metadata as XML
Line 65: Line 70:
In addition to the objects that contain metadata, i.e. which represent the content of the DOMS, there is a number of special objects which a DOMS always contains. These objects have roles in .
The kinds of predefined objects are:
 *
 *
 *
 *
In addition to objects containing ''descriptive metadata'' (metadata like artist in the CD example above), there are objects that we call ''file objects''. All relations to data in a bitstorage come from file objects, and these objects instead of descriptive metadata contain ''technical metadata'' about the files they point to (things like sample rate, for sound files).
Line 72: Line 72:
In addition to the objects that contain metadata, i.e. which represent the content of the DOMS, there are a number of special objects in a DOMS. These special objects are:
 * ''Content model'' objects, each acting as a type or class for other objects
 * Basic ''collection'' objects, all objects belong to a collection
 * ''License'' objects, every content model relates to a license
 * ''Template'' objects, that are templates for generating new objects in the repository
Line 73: Line 78:
{{{<TODO: ADD MORE INFO>}}} A DOMS comes preloaded with some special objects, defined in the ''base collection''. These base objects are meant as a base for defining your own data models (see below) for specific collections.
Line 75: Line 80:
The figure below shows the objects in the base collection, and their relations.
Line 76: Line 82:
=== Datamodels === attachment:DOMSBaseCollection.png
Line 78: Line 84:
{{{<TODO: ADD MORE INFO>}}} Content models are related in a hierarchy with {{{ContentModel_DOMS}}} at the top. Similarly, collections are related by the ''isPartOfCollection'' relation, and at the top of this hierarchy is {{{Root_Collection}}}. The file objects mentioned earlier have the content models (via relation ''hasModel'') that extend {{{ContentModel_File}}}. That is, each has a content model for either image, text, audio, or video. Finally, {{{ContentModel_License}}} is the content model of all licenses, and we include {{{Open_License}}}, which is the license of all content models in the base collection.
Line 80: Line 86:
=== Data Models ===
Line 81: Line 88:
== Usage of a Fedora repository == For each collection that is to be stored in DOMS, a ''data model'' must be defined.
Line 83: Line 90:
{{{<TODO: ADD MORE INFO>}}} A datamodel is a description of the formats of data and metadata, and how this data is organised in Fedora objects. In practice, this is done by formal descriptions of content models, defined by Enhanced Content Models. Having a formally described model allows us to validate our data, and to use data model aware software that integrates with our repository.
Line 85: Line 92:
The DOMS data model, as described above, defines some structure that '''must''' be true for each collection. This includes relations to one or more collections, a license for each object, and the requirements for separate file objects containing technical metadata for each file we store in Fedora.
Line 86: Line 94:
== Usage of the Fedora ECM == We also assume Dublin Core as one minimal metadata format for each object, and we define a well defined subset of Premis as format for technical metadata in the file objects.
Line 88: Line 96:
{{{<TODO: ADD MORE INFO>}}} However, the model is flexible, and a data model for a specific collection ''may'' define additional metadataformats and relations that describe the formats and structure for that particular collection.
Line 90: Line 98:
== Brief technical description if the individual parts of DOMS ==
Line 91: Line 100:
=== Validation === === The DOMS system ===
Line 93: Line 102:
{{{<TODO: ADD MORE INFO>}}} DOMS as a piece of server software is exposed as a webservice, and anything can communicate with DOMS through that. Furthermore, an object model that communicates with the webservice is available for java.
Line 95: Line 104:
==== The DOMS server ====
Line 96: Line 106:
== Interfacing with a bitstorage == The DOMS server interface includes methods to put, get and update data and metadata in the DOMS system. Furthermore, it includes an interface for indexing the DOMS system in a search system. Each request is delegated to the relevant backend services described below. Basically it delegates metadata to Fedora and ECM and files to the Bitstorage.
Line 98: Line 108:
{{{<TODO: ADD MORE INFO>}}} ===== The included test bed =====
Line 100: Line 110:
Included with DOMS is a test bed, that quickly sets up a locally running system for testing DOMS. This will set up a version of all backend services (see below), and optionally some test objects.
Line 101: Line 112:
== Handling OAI-PMH requests == It can also start up some frontend services (see below), that will enable interacting with DOMS using these services.
Line 103: Line 114:
{{{<TODO: ADD MORE INFO>}}} ==== The DOMS client ====
Line 105: Line 116:
The DOMS client is a java object-oriented interface to DOMS. It can be utilized by java programs, by including a jar file with your code.
Line 106: Line 118:
== Interfacing to search engines == === Front end services ===
Line 108: Line 120:
{{{<TODO: ADD MORE INFO>}}} DOMS can be used directly through the DOMS server or DOMS client, but a number of services communicating with DOMS are also available. This includes a GUI, a mass ingest system, OAI-PMH integration, and integration with the Summa search system.
Line 110: Line 122:
==== Ingest ====
Line 111: Line 124:
== Usage of the DOMS GUI == The DOMS mass ingest system, is a system for ingesting quantities of files and metadata into DOMS without user intervention. A usecase could be a "hot dir", where digitized files are automaqtically uploaded and ingested into DOMS, and mass ingest of a collection from a legacy system.
Line 113: Line 126:
{{{<TODO: ADD MORE INFO>}}} ==== GUI ====
Line 115: Line 128:
The GUI is a user-oriented interface for adding or editing metadata in DOMS, and uploading and downloading files. The GUI is web based, and automatically adapts to the content models of the digital collection in DOMS.
Line 116: Line 130:
== The included testbed == ==== OAI-PMH ====
Line 118: Line 132:
{{{<TODO: ADD MORE INFO>}}} OAI-PMH is a protocol for downloading metadata. DOMS can expose its digital collections using this protocol.

==== Summa Integration ====

DOMS can integrate with the Summa search engine, to provide a fast and flexible search, and optionally integration with other material.

=== Back end services ===

The DOMS system is based on some back end services, that must run on a server.

The back end services consist of a running Fedora; Enhanced Content Model services, a bit storage with interfaces, and an updates tracker.

==== Usage of Fedora ====

A running DOMS requires a Fedora installation, and some settings must be true of that Fedora system.

Fedora must handle authorization and authentication with an external mechanism, that must be set up correctly. The Fedora must be configured to use a ''resource index'' that handles semantic queries to the relations between metadata objects. And the Fedora must be set up to validate objects on ingest, using the Enhanced Content Model framework.

A DOMS systems contains scripts to set up the Fedora correctly.

==== Enhanced Content Models ====

Enhanced Content Models is a framework for describing content models in a machine readable way. This enables services to validate objects in Fedora, and to have services understand data models and act on them. For instance, the GUI provided with DOMS automatically generates a user interface for editing metadata for any data model defined in DOMS.

Furthermore, Enhanced Content Models provide a way to define ''views'' on objects, that consists of a bundle of objects viewed as a hole. For instance, if different data objects describe CD's and tracks on CD's, a view may define a CD with all its tracks.

Also, ECM provides a functionality for generating new objects, by cloning ''templates''.

===== Validation =====

Enhanced content models come with a framework for validating data objects for conformance with their content models. This can be enabled on ingest, as well as being periodically checked.

==== Bitstorage Interface ====

DOMS is designed to work with a bitstorage that long-time preserves the bits. Files are delivered to a web service, that communicates with that bitstorage. Once a file is approved, it cannot be deleted, and the bits are duplicated and monitored never to change.

The bit storage itself is done outside DOMS, but you can plug in your own implementation under the DOMS bitstorage web service.

==== Update tracker ====

The concept of ''views'' designed in the ECM data models, gives a possibility to look at several Fedora objects as a whole. The update tracker maintains a current database of when any object in any given view has last changed. This is important, since sytems that index DOMS need a list of all views that have changed since last update. Services that use that include Summa integration and OAI-PMH.

High Level Design Documentation

work in progress

TableOfContents

Introduction

A DOMS is a Digital Object Management System. This document describes the open source DOMS developed at the State and University Library in Aarhus, Denmark. Program sources for our DOMS are accessible at [http://sourceforge.net/projects/doms SourceForge].

The DOMS aims at fulfilling the following objectives.

  • Stores and handles digital material and metadata, with a view to long-term preservation
  • Supports re-use of components in new collections of material
  • Supports a common index with possibility for relations between objects
  • Is modular with possibility for additions to the system
  • Can be monitored and maintained by non-developers
  • Handles metadata for access rights, and integrates with certain authentication systems

What is not part of the objectives:

  • Establishing work flow systems and data-specific ingest systems

Overview of a DOMS system

Our DOMS keeps its data in two different kinds of storage:

  1. A Fedora repository
  2. A bitstorage

The Fedora repository keeps metadata, and the bitstorage keeps the actual digital material files.

If, for example, we were to store audio CDs in a DOMS, the actual digitized tracks, along with scanned cover art, could be stored as files in the bitstorage. The Fedora then would keep a metadata object (or a cluster of metadata objects) representing the CD, and containing information about for example artist(s), production year and record label, as well as information relating this CD to the digitized material in the bitstorage.

System structure

Our system is available as several different packages, each containing part of the DOMS, and each part interacting with the others.

The packages that form the DOMS are the following:

  1. The DOMS server interface
  2. The DOMS backend:
    • Fedora ECM (Enhanced Content Models)
    • Interface to bitstorage
    • An update tracker
  3. DOMS Services:
    • DOMS Ingest System
    • Interface to the Summa search engine
    • OAI-PMH
    • The DOMS GUI

In addition, the following separate projects are needed:

Finally, you may need:

The figure below illustrates how the different modules make use of each other.

attachment:DOMSOverview20090819.png

DOMS objects

The metadata stored by the DOMS is represented as objects inside a Fedora repository. Each metadata object in the repository contains the following:

  • A PID (persistent identifier of the object)
  • Metadata as XML
  • Relations to other objects (optional)

In addition to objects containing descriptive metadata (metadata like artist in the CD example above), there are objects that we call file objects. All relations to data in a bitstorage come from file objects, and these objects instead of descriptive metadata contain technical metadata about the files they point to (things like sample rate, for sound files).

In addition to the objects that contain metadata, i.e. which represent the content of the DOMS, there are a number of special objects in a DOMS. These special objects are:

  • Content model objects, each acting as a type or class for other objects

  • Basic collection objects, all objects belong to a collection

  • License objects, every content model relates to a license

  • Template objects, that are templates for generating new objects in the repository

A DOMS comes preloaded with some special objects, defined in the base collection. These base objects are meant as a base for defining your own data models (see below) for specific collections.

The figure below shows the objects in the base collection, and their relations.

attachment:DOMSBaseCollection.png

Content models are related in a hierarchy with ContentModel_DOMS at the top. Similarly, collections are related by the isPartOfCollection relation, and at the top of this hierarchy is Root_Collection. The file objects mentioned earlier have the content models (via relation hasModel) that extend ContentModel_File. That is, each has a content model for either image, text, audio, or video. Finally, ContentModel_License is the content model of all licenses, and we include Open_License, which is the license of all content models in the base collection.

Data Models

For each collection that is to be stored in DOMS, a data model must be defined.

A datamodel is a description of the formats of data and metadata, and how this data is organised in Fedora objects. In practice, this is done by formal descriptions of content models, defined by Enhanced Content Models. Having a formally described model allows us to validate our data, and to use data model aware software that integrates with our repository.

The DOMS data model, as described above, defines some structure that must be true for each collection. This includes relations to one or more collections, a license for each object, and the requirements for separate file objects containing technical metadata for each file we store in Fedora.

We also assume Dublin Core as one minimal metadata format for each object, and we define a well defined subset of Premis as format for technical metadata in the file objects.

However, the model is flexible, and a data model for a specific collection may define additional metadataformats and relations that describe the formats and structure for that particular collection.

Brief technical description if the individual parts of DOMS

The DOMS system

DOMS as a piece of server software is exposed as a webservice, and anything can communicate with DOMS through that. Furthermore, an object model that communicates with the webservice is available for java.

The DOMS server

The DOMS server interface includes methods to put, get and update data and metadata in the DOMS system. Furthermore, it includes an interface for indexing the DOMS system in a search system. Each request is delegated to the relevant backend services described below. Basically it delegates metadata to Fedora and ECM and files to the Bitstorage.

The included test bed

Included with DOMS is a test bed, that quickly sets up a locally running system for testing DOMS. This will set up a version of all backend services (see below), and optionally some test objects.

It can also start up some frontend services (see below), that will enable interacting with DOMS using these services.

The DOMS client

The DOMS client is a java object-oriented interface to DOMS. It can be utilized by java programs, by including a jar file with your code.

Front end services

DOMS can be used directly through the DOMS server or DOMS client, but a number of services communicating with DOMS are also available. This includes a GUI, a mass ingest system, OAI-PMH integration, and integration with the Summa search system.

Ingest

The DOMS mass ingest system, is a system for ingesting quantities of files and metadata into DOMS without user intervention. A usecase could be a "hot dir", where digitized files are automaqtically uploaded and ingested into DOMS, and mass ingest of a collection from a legacy system.

GUI

The GUI is a user-oriented interface for adding or editing metadata in DOMS, and uploading and downloading files. The GUI is web based, and automatically adapts to the content models of the digital collection in DOMS.

OAI-PMH

OAI-PMH is a protocol for downloading metadata. DOMS can expose its digital collections using this protocol.

Summa Integration

DOMS can integrate with the Summa search engine, to provide a fast and flexible search, and optionally integration with other material.

Back end services

The DOMS system is based on some back end services, that must run on a server.

The back end services consist of a running Fedora; Enhanced Content Model services, a bit storage with interfaces, and an updates tracker.

Usage of Fedora

A running DOMS requires a Fedora installation, and some settings must be true of that Fedora system.

Fedora must handle authorization and authentication with an external mechanism, that must be set up correctly. The Fedora must be configured to use a resource index that handles semantic queries to the relations between metadata objects. And the Fedora must be set up to validate objects on ingest, using the Enhanced Content Model framework.

A DOMS systems contains scripts to set up the Fedora correctly.

Enhanced Content Models

Enhanced Content Models is a framework for describing content models in a machine readable way. This enables services to validate objects in Fedora, and to have services understand data models and act on them. For instance, the GUI provided with DOMS automatically generates a user interface for editing metadata for any data model defined in DOMS.

Furthermore, Enhanced Content Models provide a way to define views on objects, that consists of a bundle of objects viewed as a hole. For instance, if different data objects describe CD's and tracks on CD's, a view may define a CD with all its tracks.

Also, ECM provides a functionality for generating new objects, by cloning templates.

Validation

Enhanced content models come with a framework for validating data objects for conformance with their content models. This can be enabled on ingest, as well as being periodically checked.

Bitstorage Interface

DOMS is designed to work with a bitstorage that long-time preserves the bits. Files are delivered to a web service, that communicates with that bitstorage. Once a file is approved, it cannot be deleted, and the bits are duplicated and monitored never to change.

The bit storage itself is done outside DOMS, but you can plug in your own implementation under the DOMS bitstorage web service.

Update tracker

The concept of views designed in the ECM data models, gives a possibility to look at several Fedora objects as a whole. The update tracker maintains a current database of when any object in any given view has last changed. This is important, since sytems that index DOMS need a list of all views that have changed since last update. Services that use that include Summa integration and OAI-PMH.

HighLevelDesignDocumentation (last edited 2010-03-17 13:09:36 by localhost)