Differences between revisions 6 and 7
Revision 6 as of 2008-10-02 14:05:57
Size: 5682
Editor: abr
Comment:
Revision 7 as of 2008-10-02 14:15:34
Size: 5721
Editor: abr
Comment:
Deletions are marked like this. Additions are marked like this.
Line 4: Line 4:
In DOMS, we have found it beneficial to separate the abstract concept of "Image" or "Audio" from the concrete implementations such as "jpeg" and "mp3". The metadata about the image will be relevant no matter the manifestation of the image, and as such should not reside along with the technical metadata about the manifestation. To support this separation, we have introduced the concept of File objects. In DOMS, we have found it beneficial to separate the abstract concept of "Image" or "Audio" from the concrete implementations such as "jpeg" and "mp3". The metadata about the image will be relevant no matter the manifestation of the image, and as such should not reside along with the technical metadata about the manifestation. To support this separation, we have introduced the concept of File objects. For more details see DomsFileHandling.

doms:ContentModel_File

Extends [:DataModel/ContentModel_DOMS: doms:ContentModel_DOMS]

In DOMS, we have found it beneficial to separate the abstract concept of "Image" or "Audio" from the concrete implementations such as "jpeg" and "mp3". The metadata about the image will be relevant no matter the manifestation of the image, and as such should not reside along with the technical metadata about the manifestation. To support this separation, we have introduced the concept of File objects. For more details see DomsFileHandling.

A File object is an object, that contain a link to the file (in Bitstorage), and the technical metadata about this file. Only File objects are allowed to reference a file in Bitstorage. File objects must all have a Content Model that extends ContentModel_File.

Objects of ContentModel_File can have "doms-relations:hasOriginal" relations to other objects of "doms:ContentModel_File". If a file A is the result of a migration of file B and both are in Doms, the File A data object will have a "doms-relations:hasOriginal" relation to the data object for File B.

Objects of doms:ContentModel_File must have the datastream "CHARACTERISATION", "CONTENTS", "ORIGIN" and "PRONOMID". Each of these deserve some description.

CONTENTS

"CONTENTS" is where the actual file is. It will always have the the "E" controlgroup, meaning that the file is externally referenced. The reference must allways be to a File in Bitstorage, and only this datastream is ever allowed to reference files. The datastream itself is just an URL, but if you access it through the API-A, you get the contents of the file.

PRONOMID

In order to perform proper digital preservation, we need to store the exact format of each file somehow. The National Archives, Great Britain have developed the PRONOM scheme, http://www.nationalarchives.gov.uk/pronom For each file format, or version thereof, in the registry, they have a signature file, that enables their tool (DROID) to identify files of this type.

We selected PRONOM because they are currently able to identify all the relevant preservation file formats used in DOMS, and was the closest we could find to an uniformly accepted standard.

The schema for this datastream is:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance#"
            targetNamespace="http://doms.statsbiblioteket.dk/properties/pronomID/0/1/#"
            xmlns="http://doms.statsbiblioteket.dk/properties/pronomID/0/1/#"
            elementFormDefault="qualified"
            attributeFormDefault="unqualified">

    <xsd:element name="pronomID" type="extpropertiesType"/>

    <xsd:complexType name="extpropertiesType">
        <xsd:attribute name="value" type="xsd:string"/>
    </xsd:complexType>
</xsd:schema>

ORIGIN

This section depends on input from Elsebeth, so it has intentionally left blank.

CHARACTERISATION

When a file is uploaded to Bitstorage (see Bitstorage_API), it is automatically subjected to a series of characterisation tools. Aside from extraction the PRONOM ID, which should be stored in the "PRONOMID" datastream, they also extract a lot of other metadata. This metadata is highly dependant on the type of file in question, so parsing it in a general context is not feasable. Instead, a datastream have been made to hold it, so that it can be parsed at a later date, if the particular file becomes interesting.

There is no schema for the actual metadata. We have defined a schema for the containing structure. The metadata must be valid xml, but it does not need to be schema validated.

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            targetNamespace="http://doms.statsbiblioteket.dk/types/characterisation/0/1/#"
            xmlns="http://doms.statsbiblioteket.dk/types/characterisation/0/1/#"
            elementFormDefault="qualified"
            attributeFormDefault="unqualified">

    <xsd:element name="characterisation" type="characterisationType"/>

    <xsd:complexType name="characterisationType">
        <xsd:sequence>
            <xsd:element name="characterisationRun"
                         minOccurs="1"
                         maxOccurs="unbounded"
                         type="characterisationRunType"/>
        </xsd:sequence>
    </xsd:complexType>

    <xsd:complexType name="characterisationRunType">
        <xsd:sequence>
            <xsd:element name="tool" type="xsd:string" maxOccurs="1" minOccurs="1"/>
            <xsd:element name="output" type="outputType" maxOccurs="1"/>
        </xsd:sequence>
    </xsd:complexType>

    <xsd:complexType name="outputType">
        <xsd:sequence>
            <xsd:any namespace="##any" processContents="skip" maxOccurs="unbounded"/>
        </xsd:sequence>
    </xsd:complexType>
</xsd:schema>

The characterisation datastream could look like this.

<?xml version="1.0" encoding="UTF-8"?>
<char:characterisation xsi:schemaLocation="http://doms.statsbiblioteket.dk/types/characterisation/0/1/# http://developer.statsbiblioteket.dk/DOMS/types/characterisation/0/1/characterisation/characterisation-0-1.xsd"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:char="http://developer.statsbiblioteket.dk/DOMS/types/characterisation/0/1/#"
  xmlns:jhove="">
  <char:characterisationRun>
    <char:tool>JHove</char:tool>
    <char:output>
      <jhove:...>
      </jhove:...>
    </char:output>
  </char:characterisationRun>
</char:characterisation>

DataModel/ContentModel File (last edited 2010-03-17 13:12:54 by localhost)