##Design Documentation ## Headline such as "Design of Backup Requirements" = Design of Task A.4.1 Ingest and Validation = The ingest module is inspired by the file-based ingester at KB, Holland. Coordination and logistics are done by moving files between folders. The SB DOMS ingester should ultimately support validation and transaction-based ingest. In the first iteration, these features are skipped, although hooks should be provided. Also, ingest of the data-objects themselves (images, movies, sound and text) are not part of the first iteration. This design is for the full ingester, although the detail level for the postponed features is not very high. ## Add detailed explanation of the solution to this task. That is, how things are meant to work. ## This may include UML diagrams, schematics describing architecture etc. attached like this: ## Link to original: [http://merkur/svn/doms/trunk/MODULE/docs/MyPackageDiagram.dia] ## png on the wiki page: http://merkur/svn/doms/trunk/MODULE/docs/MyPackageDiagram.png == Prerequisites and Design Decisions == === Workflow === 1. The user prepares an ingest package, stores it in the input-folder and calls the ingester. 1. The ingester performs a validation of the ingest package. If the validation fails, the ingest fails. 1. The ingester transfers all data-files to the bit-storage. If the transfer fails, the ingest fails. 1. The ingester ingests all meta-data objects in the ingest package. If this fails, the ingest fails. 1. The ingest-package is moved to the success-folder. If this fails, an error is logged. If an ingest fails, an error-description will be stored within the ingest package and the package will be moved to the failure-folder. Ingested meta-data and data-objects will be rolled back. ==== Ingest package ==== An ingest package is a folder containing meta-data files in the foxml1.0 format and data-files in context-specific format. The structure is as follows (+ indicates folders): {{{ + cd_ingest_number_1234 + meta cd_sort_sol_flow_my_firetear_main.xml cd_sort_sol_unspoiled_monsters_main.xml cd_sort_sol_flow_my_firetear_cover.xml cd_sort_sol_flow_my_firetear_lyrics_index.xml ... + lyric_siggimund_blue.xml lyric_firetear.xml lyric_erlkonig.xml ... + data + sbsound/albums/Sort%20Sol/Flow%20My%20Firetear Siggimund_Blue.wav Firetear-lyrics.pdf ... + sbsound/albums/Sort%20Sol/Unspoiled%20Monsters Erlk%c3%b7nig.wav ... + cover front.png back.png ... }}} Folder- and file-names should be restricted to [a-z], [A-Z], [0-9], '.', '-' and '_'. In addition to this, UTF-8 characters are accepted if they are encoded with {{{% hex-digit hex-digit}}}. * All foxml-files must be placed in the folder meta. It is allowed to place foxml-files in sub-folders inside {{{meta}}}. The ingester will perform a recursive descent through the sub-folders and ingest all FOXML files in them. * All data-files to be stored in bit-storage must be placed in the folder data. The sub-path for the data-files maps to the wanted location in the bit-storage. The first part of the path signifies the collection, while subsequent parts of the path signifies logical organization within the collection. Example: {{{sbsound/albums/Sort_Sol/Flow_My_Firetear/Siggimund_Blue.wav}}} signifies the collection {{{sbsound}}} and the location {{{albums/Sort_Sol/Flow_My_Firetear/}}} within that collection for the data-object {{{Siggimund_Blue.wav}}}. * It is legal to create folders other than {{{meta}}} and {{{data}}} in the root. They are ignored by the ingesting process and should only be used for administrative data, logs and similar. ==== Usage ==== {{{Ingest.properties.xml}}} contains all general properties for the ingester (all package-specific information is contained in the FOXML-files). Invocations are done with {{{ingest.sh}}}. ## Describe any prerequisites are important for this design, e.g.: ## Requirements that must be met by other systems, users, etc. ## Description and background/reason for non-trivial design decisions. === Required Software and Modules === Validation is expected to be provided either by the SB-developed software RIFF or by Fedora-group provided software. The current status of RIFF is that it can provide simple link-checking, which is useful but not at all complete enough for SB. Ingestion of data-objects requires a working [[BitstorageAgreement|bit-storage]], provided by IT-Maintenance at Statsbiblioteket. The bit-storage is expected to provide transactional commits: * The ingester sends a data-object, which consists of an id, a location and a bitstream. * The bit-storage sends back a checksum and awaits confirmation before the data-object is stored permanently. * If the ingester finds the checksum satisfactory and the ingest of the meta-data is successful, a confirmation is sent back to bit-storage. ## Describe any required 3rd party software, libraries, other code modules in the project to make this design work. == Resources == ## List links to documents, wiki pages etc. that are relevant to this design document. ## * [:LinkToMyRessource:Use class diagram xx] ## * [:LinkToMyVersionControlledDocument:Link to 3rd party library documentation.]