Action Decide on Fedora Modules

Assigned
ABR: 2 JRG: 2

Prev assigned

Tasks adressed
["Tasks/3/3"]

Time estimated
4md

Time used
0.5md

Priority
?

Status
In progress

Iteration
22

Notes

Problem

Decide on the layout of the final production system. Decide which supporting fedora modules should be used, and in what way.

Progress

Identified external modules

Identified internal modules

Journaling

Journaling provides us with the option to always have a up2date mirror of the running repository. This mirror cannot be modified while mirroring, but it can be used for access. Possible scenarios

  1. Primary server fails, we switch server and is up and running in no time
  2. Primary server handles the GUI, and mirror handles public access requests.

This use of this functionality is heavily dependent on the uptime requirements to the final system.

We have received the requirements for uptime:

It is believed that the 4 hours requirement on public access can be achieved by IT maintenance without journaling. This leaves journaling as a mean for load balancing. If we run the fedora system inside a virtual system, it can be duplicated to a number of systems by maintenance, and these can all be the receivers of public requests. They will be static, and not receive updates from the master fedora installation, but this is less of a concern.

Akubra

Akubra blob storage lives on http://www.fedora-commons.org/confluence/display/AKUBRA/Akubra+Project It is a storage layer abstraction. At the moment it is in version 0.1, but in time it will grow to become the storage layer of Fedora.

The basic idea is a BlobStore, storing Blobs. These can be gotten back, and new ones can be made. At the moment, the DOMS system have two very different storage mechanisms, the one in Bitstorage, and the foxml storage. The akubra project would naturally integrate on top of the foxml storage. The specific system in Bitstorage is more of a problem, through. To use Akubra on the Bitstorage would require a fundamental redesign of how files are handled in DOMS.

It is the recommendation of this group that Akubra not be used in DOMS. Further investigations into managed datastreams in Fedora would be worthwhile and might change the decision.

OAI-Provider

http://fedora-commons.org/confluence/display/FCSVCS/OAI+Provider+Service+1.2

The OAI-Provider service adds proper OAI-PMH support to a Fedora repository. Unfortunately, it, like all non-DOMS system, cannot handle view blobs. A view blob is a collection of objects that together comprise a single record. Changes to any of these is regarded as a change to the entire record, but the record can only be made from one head object. As it is these records that should be disseminated, not the individual objects, a separate system is needed to keep track of which have been updated.

It is the conclusion of this group that building into the oai provider an understanding of the view system is more bother than it is worth. Instead a separate oai provider service should be developed, on top of the doms interface, which understands views. This could very possible be based on the http://proai.sourceforge.net/ which incidentally underlies the fedora oai provider.

GSearch

http://fedora-commons.org/confluence/display/FCSVCS/Generic+Search+Service+2.2 Gsearch is a that can produce a lucene index of the contents of a fedora repository. Like the OAI provider above, it suffers from the problem of understanding the doms view blobs, and thinks each object is a separate record. Like the OAI-provider above, this group finds that Gsearch is not worth the bother. Instead, a module for Summa integration should be developed.

Tripple Store

Fedora comes with a triple store, which is not enabled per default. Being able to perform queries about relations between objects are a crucial feature of many doms operations. Fedora has a choice between implementations of the Triple store, namely Mulgara, Kowari and MPTstore. By the performance recommendations of the Fedora developers, this group advise that the Mulgara tripple store be used.

Access to the triple store should not be available directly, but through designated api calls in the doms webservice.

REST api

The REST api was previously an optional part of Fedora. With the newest version of Fedora, it became equivalent to the other API methods. As such, we cannot disable the REST api selectively. Rather, we must accept that there are numerous ways of invoking Fedora, but all boils down to the same API functions. It is these api functions that should be considered, not the different interfaces to them.

The advice of this group is to do nothing further with the REST api. The eventual doms webservice might use this api to speak to fedora.

Database

Fedora already has a database, which is used as a cache for faster lookups of crucial information. The search system in Doms depends on fast lookup of which objects are in which views, and thus needs such a cache. We have the option of establishing a secondary database, or using the Fedora database. But there is some middle ground. Fedora does come with a database, but can easily be configured to use an external database. To ease maintenance, having just one, rather than two, database servers is preferable.

As such, the recommendation of this group is to establish such a separate database. Which database system to be used should be decided by the maintenance group, as they will be running it. All database dependent applications in DOMS should use this database server.

Conclusion

Checklist For Working On An Action

The Life Cycle of an Action:

Please make sure that you address the below issues, when working on an action: