Alternative DOMS Data Model

The DOMS is a common system for storage and processing of digital material and metadata. The metadata is stored as digital metadata objects in the Fedora metadata storage. The material is stored in an external bitstorage and referenced by the metadata objects.

Digital Material and File Formats

Based on existing collections at SB, we have identified four kinds of digital material:

  1. Images (TIFF, PNG, JPEG2000, etc.)
  2. Audio (Broadcast WAV, WAV, etc.)
  3. Video (MPEG1, MPEG2, etc.)
  4. Text (PDF, XML, etc.)

For each kind of material, we have chosen a number of recommended formats. The DOMS will ensure access to content of files in these formats.

  1. Images: TIFF
  2. Audio: WAV, BWF
  3. Video: MPEG1, MPEG2
  4. Text: UTF8, PDF, OfficeOpenXML

When ingesting new collections, files of different formats will be converted to one of the above recommended formats, and the new files will be saved along with the originals. For text we will always extract and save a UTF8 version, and if the original text was formatted, we will also convert to one of PDF and OOXML. The 'compulsory' file formats for the different kinds of material will be specified by the type objects (see below), and the required metadata for the different formats will be specified by the file format objects (see below).

Metadata Objects

The metadata objects in the Fedora metadata storage are described using FoxML. All metadata objects in the Fedora storage must have an object type.

Object Types

All objects in the DOMS metadata storage must have a type. The type is also an object. The type describes the content model, i.e. the compulsary and legal content of an object of this type. An object has a relation to the type it claims to be. It should be possible to validate that the object is indeed of this type. A number of base types (base type objects) are predefined, and it is possible to define new type objects when needed. The 'object type' is type SB and all other types extend type SB (directly or indirectly).

Predefined Object Type Objects

The predefined types in DOMS are drawn as a hierarchy in fig. 1. The specifications of a type is inherited by the subtype, and an object of a given type must comply with the specifications of this type as well as all super types.

http://merkur/viewvc/trunk/docs/datamodel/fig/alternativeTypeHierarchy.png?root=doms&view=co

Figure 1. DOMS object type hierarchy. Dia source.

The types are described on the following sub pages. The DOMS type is the base type of all objects, and all the other types define additions to the DOMS type. Some technical content descriptions can be found where the content is introduced.

File Format Objects

File format objects are referenced by objects with object type file. File objects also reference content, i.e. a file in bitstorage, and they include a technical datastream (see Type File). The file format objects describe the file format of the content in bitstorage and the required metadata in the technical datastream. There will be predefined file format objects for the recommended formats: TIFF, WAV, BWF, MPEG1, MPEG2, UTF8, PDF, OfficeOpenXML, and possibly also for other common formats.

Levels of Metadata

In summary, we have three levels of metadata:

  1. Common Core: The metadata that must be present in all objects. This includes the core properties defined by the FoxML format, an SB Dublin Core description and an index representation disseminator and is formalised as the SB type below.
  2. Base Object Type Specified: The metadata defined by the base object types described above. For example the image object type specifies that an object of type image must have a hasFile relation to an object of type file with reference to the TIFF file format.

  3. Collection Object Type Specified: New object types can be introduced along with a new collection.

SB Collection

All the predefined types and file formats have been created as objects as part of a 'base SB collection'. The base collection also includes at least one rights object, a collection object and some file format objects. This collection is meant to be ingested as the first collection in the DOMS metadata storage.

Example Collections

http://merkur/viewvc/trunk/docs/datamodel/fig/GentofteCollection.png?root=doms&view=co

Figure 1. Dia source.

DataModel/AlternativeDataModel (last edited 2010-03-17 13:12:45 by localhost)