Design of Task A.4.1 Ingest and Validation

The ingest module is inspired by the file-based ingester at KB, Holland. Coordination and logistics are done by moving files between folders. The SB DOMS ingester should ultimately support validation and transaction-based ingest. In the first iteration, these features are skipped, although hooks should be provided. Also, ingest of the data-objects themselves (images, movies, sound and text) are not part of the first iteration. This design is for the full ingester, although the detail level for the postponed features is not very high.

Prerequisites and Design Decisions

Workflow

  1. The user prepares an ingest package, stores it in the input-folder and calls the ingester.
  2. The ingester performs a validation of the ingest package. If the validation fails, the ingest fails.
  3. The ingester transfers all data-files to the bit-storage. If the transfer fails, the ingest fails.
  4. The ingester ingests all meta-data objects in the ingest package. If this fails, the ingest fails.
  5. The ingest-package is moved to the success-folder. If this fails, an error is logged.

If an ingest fails, an error-description will be stored within the ingest package and the package will be moved to the failure-folder. Ingested meta-data and data-objects will be rolled back.

Ingest package

An ingest package is a folder containing meta-data files in the foxml1.0 format and data-files in context-specific format. The structure is as follows (+ indicates folders):

+ cd_ingest_number_1234
    + meta
        cd_sort_sol_flow_my_firetear_main.xml
        cd_sort_sol_unspoiled_monsters_main.xml
        cd_sort_sol_flow_my_firetear_cover.xml
        cd_sort_sol_flow_my_firetear_lyrics_index.xml
        ...
        + <some_folder>
            lyric_siggimund_blue.xml
            lyric_firetear.xml
            lyric_erlkonig.xml
            ...
    + data
        + sbsound/albums/Sort%20Sol/Flow%20My%20Firetear
            Siggimund_Blue.wav
            Firetear-lyrics.pdf
            ...
        + sbsound/albums/Sort%20Sol/Unspoiled%20Monsters
            Erlk%c3%b7nig.wav
            ...
            + cover
                front.png
                back.png
                ...

Folder- and file-names should be restricted to [a-z], [A-Z], [0-9], '.', '-' and '_'. In addition to this, UTF-8 characters are accepted if they are encoded with % hex-digit hex-digit.

Usage

Ingest.properties.xml contains all general properties for the ingester (all package-specific information is contained in the FOXML-files). Invocations are done with ingest.sh.

Required Software and Modules

Validation is expected to be provided either by the SB-developed software RIFF or by Fedora-group provided software. The current status of RIFF is that it can provide simple link-checking, which is useful but not at all complete enough for SB.

Ingestion of data-objects requires a working bit-storage, provided by IT-Maintenance at Statsbiblioteket. The bit-storage is expected to provide transactional commits:

Resources

TaskA.4.1DesignDocument (last edited 2010-03-17 13:09:17 by localhost)