High Level Design Documentation

work in progress

TableOfContents

Introduction

A DOMS is a Digital Object Management System. This document describes the open source DOMS developed at the State and University Library in Aarhus, Denmark. Program sources for our DOMS are accessible at [http://sourceforge.net/projects/doms SourceForge].

The DOMS aims at fulfilling the following objectives.

What is not part of the objectives:

Overview of a DOMS system

Our DOMS keeps its data in two different kinds of storage:

  1. A Fedora repository
  2. A bitstorage

The Fedora repository keeps metadata, and the bitstorage keeps the actual digital material files.

If, for example, we were to store audio CDs in a DOMS, the actual digitized tracks, along with scanned cover art, could be stored as files in the bitstorage. The Fedora then would keep an object (or a cluster of objects) representing the CD, and containing information about for example artist(s), production year and record label, as well as information relating this CD to the digitized material in the bitstorage.

System structure

Our system is available as several different packages, each containing part of the DOMS, and each part interacting with the others.

The packages that form the DOMS are the following: The DOMS backend:

DOMS Services:

In addition, the following separate projects are needed:

Finally, you may need:

The figure below illustrates how the different modules make use of each other. <TODO: UPDATE FIGURE>

attachment:DOMSOverview20090819.png

DOMS objects

The metadata stored by the DOMS is represented as objects inside a Fedora repository. Each metadata object in the repository contains the following:

In addition to objects containing descriptive metadata (metadata like artist in the CD example above), there are objects that we call file objects. All relations to data in a bitstorage come from file objects, and these objects instead of descriptive metadata contain technical metadata about the files they point to (things like sample rate, for sound files).

In addition to the objects that contain metadata, i.e. which represent the content of the DOMS, there are a number of special objects in a DOMSThese special objects are:

A DOMS comes preloaded with some special objects, defined in the base collection. These base objects are meant as a base for defining your own data models (see below) for specific collections.

The figure below shows the objects in the base collection, and their relations.

attachment:DOMSBaseCollection.png

Content models are related in a hierarchy with ContentModel_DOMS at the top. Similarly, collections are related by the isPartOfCollection relation, and at the top of this hierarchy is Root_Collection. The file objects mentioned earlier have the content models (via relation hasModel) that extend ContentModel_File. That is, each has a content model for either image, text, audio, or video. Finally, ContentModel_License is the content model of all licenses, and we include Open_License, which is the license of all content models in the base collection.

Data Models

For each collection that is to be stored in DOMS, a data model must be defined.

A datamodel is a description of the formats of data and metadata, and how this data is organised in Fedora objects. In practice, this is done by formal descriptions of content models, defined by Enhanced Content Models. Having a formally described model allows us to validate our data, and to use data model aware software that integrates with our repository.

The DOMS data model, as described above, defines some structure that must be true for each collection. This includes relations to one or more collections, a license for each object, and the requirements for separate file objects containing technical metadata for each file we store in Fedora.

We also assume Dublin Core as one minimal metadata format for each object, and we define a well defined subset of Premis as format for technical metadata in the file objects.

However, the model is flexible, and a data model for a specific collection may define additional metadataformats and relations that describe the formats and structure for that particular collection.

Back end services

The DOMS system consists of some back end services, that must run on a server; the DOMS web service interface, that exposes DOMS to clients; and some client side services, that ingest and disseminate data from the DOMS.

The back end services consist of a running Fedora; Enhanced Content Model services, a bit storage with interfaces, and an updates tracker.

Usage of Fedora

A running DOMS requires a Fedora installation, and some settings must be true of that Fedora system.

Fedora must handle authorization and authentication with an external mechanism, that must be set up correctly. The Fedora must be configured to use a resource index that handles semantic queries to the relations between metadata objects. And the Fedora must be set up to validate objects on ingest, using the Enhanced Content Model framework.

<TODO: CHECK IF THIS IS EXHAUSTIVE>

A DOMS systems contains scripts to set up the Fedora correctly.

Enhanced Content Models

Enhanced Content Models is a framework for describing content models in a machine readable way. This enables services to validate objects in Fedora, and to have services understand data models and act on them. For instance, the GUI provided with DOMS automatically generates a user interface for editing metadata for any data model defined in DOMS.

Furthermore, Enhanced Content Models provide a way to define views on objects, that consists of a bundle of objects viewed as a hole. For instance, if different data objects describe CD's and tracks on CD's, a view may define a CD with all its tracks.

Also, ECM provides a functionality for generating new objects, by cloning templates.

Validation

Enhanced content models come with a framework for validating data objects for conformance with their content models. This can be enabled on ingest, as well as being periodically checked.

Bitstorage Interface

<TODO: ADD MORE INFO>

Update tracker

<TODO: ADD MORE INFO>

The DOMS system

<TODO: ADD MORE INFO>

The DOMS server

<TODO: ADD MORE INFO>

The included test bed

<TODO: ADD MORE INFO>

The DOMS client

<TODO: ADD MORE INFO>

Front end services

<TODO: ADD MORE INFO>

Ingest

<TODO: ADD MORE INFO>

GUI

<TODO: ADD MORE INFO>

OAI-PMH

<TODO: ADD MORE INFO>

Summa Integration

<TODO: ADD MORE INFO>