Datafiles in DOMS

DOMS handles files, and metadata about these files. The way it does this it somewhat unusual, and deserves further explanation.

The DOMS repository is really two different systems, the Fedora metadata storage, and the Bitstorage data storage. There are two kinds of metadata handled by Fedora. The first kind is the information about the actual files, such as fileformat, bitrate, length and so on. I.e. metadata about the actual files, which is nessesary for preservation efforts. The second kind is metadata about the actual contents of the files, i.e. the author, and the category of the work. So we have three levels of data in Doms:

  1. Data about the artistic work
  2. Data about the digital manifistation of the work
  3. The digital manifistation of the work.

Most other systems make no distinction between the first and second level, which, for preservation purposes, is unfortunate. One of the advantages of this system is when you have multiple manifestations of the same artistic work. Rather than copying the data about the artistic work, the metadata object just reference each of the manifestations.

One of the results of this is that there are two types of objects in Fedora. There are the normal digital objects, and the File objects, which reference a file in Bitstorage. All File objects must have the content model doms:ContentModel_File, and only such object may ever reference a file outside Fedora.

Working with data files in DOMS

We have devised procedures for working with datafiles in DOMS, that will leave the system consistent. First, a few ground rules need to be defined.

There is really just one use-case for datafiles; uploading a file to DOMS. Here's how it's done.

  1. Create the File object in Fedora, that should reference the file.
  2. Select the file to upload.
  3. Make the file publicly reachable via an URL.
  4. md5sum the file.
  5. Call the uploadFile method in the Bitstorage webservice with local URL, checksum and name. The file will be uploaded and characterized. An object containing the pronom id, the public URL and the characterization will be returned.
  6. Change the "CONTENTS" datastream to the URL returned. Set the FORMAT_URI of this datastream to the pronom id. Fedora will automatically compute the md5sum of the file. Check that this is as expected.
  7. Change the "CHARACTERISATION" datastream to follow the schema defined in ContentModel_File, including formaturi, validity and the xml blob returned.

  8. Set the file name in the "DC" datastream's, "dc:title" field.
  9. The File object can now be further documented by the user

The first time a Fedora file object is set in active state, the referred file should be approved in bitstorage, using the "approveFile" method. The file is then publicly accessible, and can no longer be deleted with "disapproveFile".

If a Fedora file object is deleted, or if any operation fails on bit storage, the "disappoveFile" may remove an object from bit storage, if it has not yet been approved.

DomsFileHandling (last edited 2010-03-17 13:08:52 by localhost)