Differences between revisions 2 and 3
Revision 2 as of 2008-10-09 09:45:53
Size: 2215
Editor: abr
Comment:
Revision 3 as of 2008-10-09 10:02:32
Size: 3837
Editor: abr
Comment:
Deletions are marked like this. Additions are marked like this.
Line 23: Line 23:
 * There are now "loose" File objects. A File object is created to correspond to a file uploaded to Bitstorage. If the upload fails, the object is deleted.
 * A File object can never be made to refer to another file. Rather, a new File object must be made.
 * File objects can only ever refer to files in Bitstorage.


There is really just one use-case for datafiles; uploading a file to DOMS. Here's how it's done.

 1. Select the file to upload.
 1. Make the file publicly reachable via an URL.
 1. md5sum the file.
 1. Call the uploadFile method in the Bitstorage webservice with local URL, checksum and name. The file will be uploaded and characterized. An object containing the pronom id, the public URL and the characterization will be returned.
 1. Create the Fedora File object that should correspond to the bitstorage file.
 1. Change the "CONTENTS" datastream to the URL returned.
 1. Change the "PRONOMID" datastream to the pronom id returned
 1. Change the "CHARACTERISATION" datastream to the xml blob returned.
 1. If everything went OK this far:
   1. Call the Bitstorage webservice method approveFile with the public URL and the md5sum of the file. The file will now be undeletable.
   1. Update the relation from whatever master object that should have this file to refer to the current Fedora File object.
 1. Else:
   1. Call the Bitstorage webservice method disapproveFile with the public URL and the md5sum of the file. This will delete the file from bitstorage.
   1. Fetch the new File object and change the object state to D(eleted) via the Fedora API-M method modifyObject.

Datafiles in DOMS

DOMS handles files, and metadata about these files. The way it does this it somewhat unusual, and deserve futher explanation.

The DOMS repository is really two different systems, the Fedora metadata storage, and the Bitstorage data storage. There are two kinds of metadata handled by Fedora. The first kind is the information about the actual files, such as fileformat, bitrate, length and so on. Ie. metadata about the actual files, which is nessesary for preservation efforts. The second kind is metadata about the actual contents of the files, ie. the author, and the category of the work. So we have three levels of data in Doms:

  1. Data about the artistic work
  2. Data about the digital manifistation of the work
  3. The digital manifistation of the work.

Most other systems make no distintion between the first and second level, which, for preservation purposes, is unfortunate. One of the advantages of this system is when you have multiple manifistations of the same artistic work. Rather than copying the data about the artistic work, the metadata object just reference each of the manifistations.

One of the results of this is that there are two types of objects in Fedora. There are the normal digital objects, and the File objects, which reference a file in Bitstorage. All File objects must have the content model [:DataModel/ContentModel_File: doms:ContentModel_File], and only such object may ever reference a file outside Fedora.

Working with data files in DOMS

We have devised procedures for working with datafiles in DOMS, that will leave the system consistent. First, a few ground rules need to be defined.

  • There should always exist a one-to-one correspondance between File objects in Fedora and files in Bitstorage. A file in bitstorage that does not have a File object can never be found again and will be lost.
  • The only way to upload files to Bitstorage is through the Bitstorage_API.
  • Bitstorage is a write-once system. Files are uploaded and then approved. When approved, they cannot be deleted later.
  • Fedora is a write-once system. Objects are created. They can be updated, but this just creates new versions.
  • There are now "loose" File objects. A File object is created to correspond to a file uploaded to Bitstorage. If the upload fails, the object is deleted.
  • A File object can never be made to refer to another file. Rather, a new File object must be made.
  • File objects can only ever refer to files in Bitstorage.

There is really just one use-case for datafiles; uploading a file to DOMS. Here's how it's done.

  1. Select the file to upload.
  2. Make the file publicly reachable via an URL.
  3. md5sum the file.
  4. Call the uploadFile method in the Bitstorage webservice with local URL, checksum and name. The file will be uploaded and characterized. An object containing the pronom id, the public URL and the characterization will be returned.
  5. Create the Fedora File object that should correspond to the bitstorage file.
  6. Change the "CONTENTS" datastream to the URL returned.
  7. Change the "PRONOMID" datastream to the pronom id returned
  8. Change the "CHARACTERISATION" datastream to the xml blob returned.
  9. If everything went OK this far:
    1. Call the Bitstorage webservice method approveFile with the public URL and the md5sum of the file. The file will now be undeletable.
    2. Update the relation from whatever master object that should have this file to refer to the current Fedora File object.
  10. Else:
    1. Call the Bitstorage webservice method disapproveFile with the public URL and the md5sum of the file. This will delete the file from bitstorage.
    2. Fetch the new File object and change the object state to D(eleted) via the Fedora API-M method modifyObject.

DomsFileHandling (last edited 2010-03-17 13:08:52 by localhost)