Action, Harvest DOMS Data in Summa

This action is part of iteration 7.

The goal of this action is to be able to harvest DOMS data into a Summa metadata storage and to ingest the harvested data into a Summa index, for search and display on a website.

Fedora OAI-provider

Access to Fedora should happen through an OAI-provider. Unfortunately the default OAI-provider only returns the DC-datastream, which is too limited for our use.

The custom OAI-provider for Fedora should be enabled and made to provide the output of a disseminator. A small howto on this is available from the DOMS pilot on SettingUpOAI. An updated version is now available from OAIProviderSetup.

This is classified as average - we have experience with custom OAI-providers on top of Fedora, but the experience is that it is hard to set up and hard to debug.

Status: Stuck! I suggest KFC takes a look at the set-up. -- bam 2008-01-16 12:59:03

Thank you to KFC. The new version of the Fedora OAI Provider can run schema validation during the update process. This can be turned off, and our index representation disseminator can be made available through the oai provider. The OAIProviderSetup page has been updated. Validation of the current object bundle result of the index representation disseminator is not possible. We should think about whether we want to be able to validate at some point. -- bam 2008-01-22 12:58:54

And there now is an OAI Provider Setup Script and OAIProviderSetup has been updated. We need review of both script and wiki page. -- bam 2008-02-01 12:54:58

JRG has reviewed both script and wiki page, and I have followed up. -- bam 2008-02-18 07:58:13

And the oai: prefix has been removed in the doms base collection and the pre-ingest module. -- bam 2008-02-20 08:23:14


The OAI-target for the DOMS Fedora installation should be harvested as part of a standard Summa workflow. Hans Lauridsen is responsible for OAI harvests and is to be instructed so that DOMS data handled properly.

The data should ultimately be stored on atria, along with the other Statsbiblioteket Search targets.

Hans estimates that this will be very easy to implement.

TODO: While the harvester works, it currently resides on Toke's computer. This should be moved to Stable Summa or someplace similar, preferably rolled with Summa on a Stick for easy access.

Summa ingest

The setup for Summa should be changed to ingest data harvested from DOMS.

The file summa-ingest/config/ contains the setup for the ingester. Something like this should be inserted:

    <entry key="doms">path_to_folder_containing_harvested_data</entry>
    <entry key="doms_run">true</entry>
   <entry key="doms_DigesterClass">dk.statsbiblioteket.summa.ingest.SimpleXML.SimpleDirectoryDigester</entry>
   <entry key="doms_full_ingest">false</entry>
   <entry key="doms_encoding">UTF-8</entry>
   <entry key="doms_prefix">doms_</entry>
   <entry key="doms_check_output">true</entry>
   <entry key="doms_base_name">doms</entry>
   <entry key="doms_id_element">oai:itemID</entry>
   <entry key="doms_record_element">digitalObjectBundle</entry>

This task is near-trivial.

Caveat: If the PID needs to be transformed, such af truncated, the task changes to average.

Summa index

Summa needs access to DOMS-specific XSLTs in order to transform DOMS metadata to Summa index format. These XSLTs can be stored under the DOMS Subversion repository and refered in the Summa index setup. The XSLTs needs to be written and should - in this iteration - be simple. Potentially just a display of the Dublin Core part of the Fedora objects.

This task is hard and is handled in ActionSummaIndexAndPresentationXSLTs.

Website display

The simple website for Summa On A Stick currently requires the display-XSLTs to reside on the local file system.

This task is hard and is handled in ActionSummaIndexAndPresentationXSLTs.

See also ActionSummaOnAStick

ActionHarvestDomsToSumma (last edited 2010-03-17 13:08:52 by localhost)