Fedora REST workshop, London 2008-02-07 & 2008-02-08

The workshops were hosted by JISC and CRIG, and also in collaboration with the UK/Ireland Fedora User Group.

They were held in the Birkbeck Bar (yes, it was a bar!) at Birkbeck College, London.

Contact person: David D. Flanders <d.flanders@bbk.ac.uk>, 0790 8262 059 (tel.) / david.flanders (skype)

Matt Zuwalt from MediaShelf (www.yourmediashelf.com) was invited speaker.

First day

All presentations from the first days are available here:

http://www.yourmediashelf.com/reference/fedora/fedora3.0b1/

The first day was an introduction to the new features in Fedora 3.0b1. Two main new features were covered:

The Content Model Arcitechture

The Content Model Arcitechture is the new way of having Content Models expressed in your Fedora repository.

The Content Model is used for two things.

  1. A formal description of the content in your object
  2. A new way to bind disseminators to an object (the old way is deprecated, and no longer works)

The content model is defined with a relation in RELS-EXT.

Experimentation showed that it is not possible to have more than one content model to an object.

More people than us wanted better use multiple content models, either by inheritance, or by specifying multiple content models.

The documentation is unfinished, but in progress.

The following documentation is updated for Fedora 3 wrg. the new CMA: - The object model documentation - Tutorial 2 - A new CMA document

Formal description of object content

As for now, the formal model doesn't have a terribly expressive language.

An example of the describing language is the following:

<dsCompositeModel xmlns="info:fedora/fedora-system:def/dsCompositeModel#">
  <dsTypeModel ID="DC">
    <form FORMAT_URIS= MIME="text/xml">
  </dsTypeModel>
  <dsTypeModel ID="DOC">
    <form MIME="application/pdf">
  </dsTypeModel>
</dsCompositeModel>

According to the xml schema it is also possible (required, actually) to specify format URIs.

There is no special handling of relations, and no way to describe the content of each datastream in greater detail.

Binding disseminators to content models

The way it works is that an object is bound to a content model, using relations.

The content model defines the BDef, using relations.

The BMech defines which BDef it implements, using relations, and which content models it provides mechanisms for, again using relations.

There is an image availble showing the relations in the CMA document.

Beware it has some subtle errors, or at least unclear points. It seems to indicate there can only be one BMech to a BDef, but this is not the case. However, only one BMech that contracts for a content model must implement the BDef.

The new REST API

The new REST API is developed and contributed to Fedora by Matt Zuwalt from MediaShelf.

Basically the REST API provides a way to do all Fedora commands, using only HTTP commands to some well defined URL.

This gives some nice features:

Basically, the new API has given nicer URLs to the objects, and the possibility to do all management operations using HTTP.

The presentation was largely interactive. We were given a bunch of example URIs to demonstrate how the REST API works. These could be tested using curl(1).

I will provide a few examples, the rest can be found here: http://www.yourmediashelf.com/reference/fedora/fedora3.0b1/

Get a datastream

   curl -i http://localhost:8080/fedora/objects/test:02/datastreams/DC

Add a new datastream

   curl -i -H "Content-type: text/xml" -XPOST --data-binary @build.xml -u fedoraAdmin:fedoraAdmin "http://localhost:8080/fedora/objects/test:02/datastreams/DS1?dsLabel=A%20Test%20Datastream&altIDs=3333"

Update a datastream

   curl -i -H "Content-type: text/xml" -XPOST "http://localhost:8080/fedora/objects/test:02/datastreams/DS1?dsLabel=hello&altIDs=3333" --data-binary @build.xml -u fedoraAdmin:fedoraAdmin

In the evening

Later in the day we divided into groups and discussed various issues. In my group we discussed content models in general. Basically most people advocate the atomistic model, although exactly how atomistic an atomistic model is, isn't always generally agreed on. Specifically, our model of only having one binary datastream in any object, to have technical metadata represented for all datastreams seemed a little extreme for some members of the group.

The fact that you need to define what you wish to do with your data before you can model it was also discussed.

Some people in he group were modelling novices. Others, like Richard Green, were grand old men in the area.

Interestingly, one of the members, and I unfortunately forget who, actually had real life use for storing scientific data - although only for three years! It was data from electreonic microscopes, and the thing was that any data was discovered to be interesting within three years, and it was too expensive to just keep all the data. The data were a big bunch of high resolution images from an electronic microscope, each rotated in a slightly different angle. There was some discussion about whether these should be modelled as one huge datastream, or a lot of interrelated.

David approached each group during discussion to sound the idea of a European Fedora User Group. I heard nothing but positive comments on the idea.

In later discussions, we suggested that such a meeting could be held something like bi-annually, and exist on a mailing list. Also, ECDL would be a good idea as a tiem people meet anyway. OR is too, but not always in Europe.

Second Day

The second day focused only on REST, and while it did present the REST interface in Fedora, the focus was broader. There was a brief introduction by Matt Zuwalt about the concepts of REST, and we then split out into groups, which were supposed to discuss how to use REST for some given scenarios.

My group ended up discussing the concepts of a Service Oriented Architecture in general, rather than the particulars of REST, and only in a more general sense ended up talking about how REST could help in that area.

There were various wishes for things one could want to do with any repository, and that limiting that to only Fedora would limit rather than improve matters.

Services that would be relevant on multiple repositories are: Preservation services (characterisation, planning, migration), metadata extraction services (automatic metadata from documents, indexing, OCR'ing, ...)

One interesting point was a system called iVia from DataFountains, that should provide working metadata extraction. The system is available for download(LGPL).

2008-02-07 London Fedora Workshop Travel log (last edited 2010-03-17 13:08:52 by localhost)