## Action = Action, Harvest DOMS Data in Summa = This action is part of [[Iteration7| iteration 7]]. The goal of this action is to be able to harvest DOMS data into a Summa metadata storage and to ingest the harvested data into a Summa index, for search and display on a website. ## Detailed description of wanted output from the work to be carried out. ## E.g. Implement a utility class for writing data to a disk. The data must be base64 encoded before being written. == Fedora OAI-provider == Access to Fedora should happen through an OAI-provider. Unfortunately the default OAI-provider only returns the DC-datastream, which is too limited for our use. The custom OAI-provider for Fedora should be enabled and made to provide the output of a disseminator. A small howto on this is available from the DOMS pilot on [[SettingUpOAI]]. An updated version is now available from [[OAIProviderSetup]]. ''This is classified as average - we have experience with custom OAI-providers on top of Fedora, but the experience is that it is hard to set up and hard to debug''. * (3 ½ md) (BAM) '''DONE''' Status: Stuck! I suggest KFC takes a look at the set-up. -- [[bam]] <> Thank you to KFC. The new version of the Fedora OAI Provider can run schema validation during the update process. This can be turned off, and our index representation disseminator can be made available through the oai provider. The [[OAIProviderSetup]] page has been updated. Validation of the current object bundle result of the index representation disseminator is not possible. We should think about whether we want to be able to validate at some point. -- [[bam]] <> And there now is an OAI Provider Setup Script and [[OAIProviderSetup]] has been updated. We need review of both script and wiki page. -- [[bam]] <> JRG has reviewed both script and wiki page, and I have followed up. -- [[bam]] <> And the {{{oai:}}} prefix has been removed in the doms base collection and the pre-ingest module. -- [[bam]] <> == Harvesting == The OAI-target for the DOMS Fedora installation should be harvested as part of a standard Summa workflow. Hans Lauridsen is responsible for OAI harvests and is to be instructed so that DOMS data handled properly. The data should ultimately be stored on atria, along with the other Statsbiblioteket Search targets. ''Hans estimates that this will be very easy to implement''. * (1 md) (TE + HL) '''Partly done''' - The man days Hans has to use is not part of this estimate. '''TODO''': While the harvester works, it currently resides on Toke's computer. This should be moved to Stable Summa or someplace similar, preferably rolled with Summa on a Stick for easy access. == Summa ingest == The setup for Summa should be changed to ingest data harvested from DOMS. The file {{{summa-ingest/config/target.properties.xml}}} contains the setup for the ingester. Something like this should be inserted: {{{ path_to_folder_containing_harvested_data true dk.statsbiblioteket.summa.ingest.SimpleXML.SimpleDirectoryDigester false UTF-8 doms_ true doms oai:itemID digitalObjectBundle }}} ''This task is near-trivial''. Caveat: If the PID needs to be transformed, such af truncated, the task changes to average. * (¼ md) (TE) '''DONE''': See TestSetup. == Summa index == Summa needs access to DOMS-specific XSLTs in order to transform DOMS metadata to Summa index format. These XSLTs can be stored under the DOMS Subversion repository and refered in the Summa index setup. The XSLTs needs to be written and should - in this iteration - be simple. Potentially just a display of the Dublin Core part of the Fedora objects. ''This task is hard'' and is handled in ActionSummaIndexAndPresentationXSLTs. == Website display == The simple website for Summa On A Stick currently requires the display-XSLTs to reside on the local file system. ''This task is hard'' and is handled in ActionSummaIndexAndPresentationXSLTs. ## Targeted WBS Tasks and Assigned Developers can be found on the iteration page for this action. See also [[ActionSummaOnAStick]]