Differences between revisions 6 and 7
Revision 6 as of 2008-10-17 16:56:43
Size: 3953
Editor: abr
Comment:
Revision 7 as of 2010-03-17 13:08:52
Size: 3954
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 14: Line 14:
One of the results of this is that there are two ''types'' of objects in Fedora. There are the normal digital objects, and the File objects, which reference a file in Bitstorage. All File objects must have the content model [:DataModel/ContentModel_File: doms:ContentModel_File], and only such object may ever reference a file outside Fedora. One of the results of this is that there are two ''types'' of objects in Fedora. There are the normal digital objects, and the File objects, which reference a file in Bitstorage. All File objects must have the content model [[DataModel/ContentModel File| doms:ContentModel_File]], and only such object may ever reference a file outside Fedora.

Datafiles in DOMS

DOMS handles files, and metadata about these files. The way it does this it somewhat unusual, and deserves further explanation.

The DOMS repository is really two different systems, the Fedora metadata storage, and the Bitstorage data storage. There are two kinds of metadata handled by Fedora. The first kind is the information about the actual files, such as fileformat, bitrate, length and so on. I.e. metadata about the actual files, which is nessesary for preservation efforts. The second kind is metadata about the actual contents of the files, i.e. the author, and the category of the work. So we have three levels of data in Doms:

  1. Data about the artistic work
  2. Data about the digital manifistation of the work
  3. The digital manifistation of the work.

Most other systems make no distinction between the first and second level, which, for preservation purposes, is unfortunate. One of the advantages of this system is when you have multiple manifestations of the same artistic work. Rather than copying the data about the artistic work, the metadata object just reference each of the manifestations.

One of the results of this is that there are two types of objects in Fedora. There are the normal digital objects, and the File objects, which reference a file in Bitstorage. All File objects must have the content model doms:ContentModel_File, and only such object may ever reference a file outside Fedora.

Working with data files in DOMS

We have devised procedures for working with datafiles in DOMS, that will leave the system consistent. First, a few ground rules need to be defined.

  • There should always exist a one-to-one correspondance between File objects in Fedora and files in Bitstorage. A file in bitstorage that does not have a File object can never be found again and will be lost.
  • The only way to upload files to Bitstorage is through the Bitstorage_API.
  • Bitstorage is a write-once system. Files are uploaded and then approved. When approved, they cannot be deleted later.
  • Fedora is a write-once system. Objects are created. They can be updated, but this just creates new versions.
  • There are no "loose" File objects. A File object is created to correspond to a file uploaded to Bitstorage. If the upload fails, the object is deleted.
  • A File object can never be made to refer to another file. Rather, a new File object must be made.
  • File objects can only ever refer to files in Bitstorage.

There is really just one use-case for datafiles; uploading a file to DOMS. Here's how it's done.

  1. Create the File object in Fedora, that should reference the file.
  2. Select the file to upload.
  3. Make the file publicly reachable via an URL.
  4. md5sum the file.
  5. Call the uploadFile method in the Bitstorage webservice with local URL, checksum and name. The file will be uploaded and characterized. An object containing the pronom id, the public URL and the characterization will be returned.
  6. Change the "CONTENTS" datastream to the URL returned. Set the FORMAT_URI of this datastream to the pronom id. Fedora will automatically compute the md5sum of the file. Check that this is as expected.
  7. Change the "CHARACTERISATION" datastream to follow the schema defined in ContentModel_File, including formaturi, validity and the xml blob returned.

  8. Set the file name in the "DC" datastream's, "dc:title" field.
  9. The File object can now be further documented by the user

The first time a Fedora file object is set in active state, the referred file should be approved in bitstorage, using the "approveFile" method. The file is then publicly accessible, and can no longer be deleted with "disapproveFile".

If a Fedora file object is deleted, or if any operation fails on bit storage, the "disappoveFile" may remove an object from bit storage, if it has not yet been approved.

DomsFileHandling (last edited 2010-03-17 13:08:52 by localhost)