Differences between revisions 1 and 2
Revision 1 as of 2008-06-26 12:26:14
Size: 5505
Editor: kfc
Comment: Created by the PackagePages action.
Revision 2 as of 2010-03-17 13:09:17
Size: 5506
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 62: Line 62:
Ingestion of data-objects requires a working [:BitstorageAgreement:bit-storage], provided by IT-Maintenance at Statsbiblioteket. The bit-storage is expected to provide transactional commits: Ingestion of data-objects requires a working [[BitstorageAgreement|bit-storage]], provided by IT-Maintenance at Statsbiblioteket. The bit-storage is expected to provide transactional commits:

Design of Task A.4.1 Ingest and Validation

The ingest module is inspired by the file-based ingester at KB, Holland. Coordination and logistics are done by moving files between folders. The SB DOMS ingester should ultimately support validation and transaction-based ingest. In the first iteration, these features are skipped, although hooks should be provided. Also, ingest of the data-objects themselves (images, movies, sound and text) are not part of the first iteration. This design is for the full ingester, although the detail level for the postponed features is not very high.

Prerequisites and Design Decisions

Workflow

  1. The user prepares an ingest package, stores it in the input-folder and calls the ingester.
  2. The ingester performs a validation of the ingest package. If the validation fails, the ingest fails.
  3. The ingester transfers all data-files to the bit-storage. If the transfer fails, the ingest fails.
  4. The ingester ingests all meta-data objects in the ingest package. If this fails, the ingest fails.
  5. The ingest-package is moved to the success-folder. If this fails, an error is logged.

If an ingest fails, an error-description will be stored within the ingest package and the package will be moved to the failure-folder. Ingested meta-data and data-objects will be rolled back.

Ingest package

An ingest package is a folder containing meta-data files in the foxml1.0 format and data-files in context-specific format. The structure is as follows (+ indicates folders):

+ cd_ingest_number_1234
    + meta
        cd_sort_sol_flow_my_firetear_main.xml
        cd_sort_sol_unspoiled_monsters_main.xml
        cd_sort_sol_flow_my_firetear_cover.xml
        cd_sort_sol_flow_my_firetear_lyrics_index.xml
        ...
        + <some_folder>
            lyric_siggimund_blue.xml
            lyric_firetear.xml
            lyric_erlkonig.xml
            ...
    + data
        + sbsound/albums/Sort%20Sol/Flow%20My%20Firetear
            Siggimund_Blue.wav
            Firetear-lyrics.pdf
            ...
        + sbsound/albums/Sort%20Sol/Unspoiled%20Monsters
            Erlk%c3%b7nig.wav
            ...
            + cover
                front.png
                back.png
                ...

Folder- and file-names should be restricted to [a-z], [A-Z], [0-9], '.', '-' and '_'. In addition to this, UTF-8 characters are accepted if they are encoded with % hex-digit hex-digit.

  • All foxml-files must be placed in the folder meta. It is allowed to place foxml-files in sub-folders inside meta. The ingester will perform a recursive descent through the sub-folders and ingest all FOXML files in them.

  • All data-files to be stored in bit-storage must be placed in the folder data. The sub-path for the data-files maps to the wanted location in the bit-storage. The first part of the path signifies the collection, while subsequent parts of the path signifies logical organization within the collection. Example: sbsound/albums/Sort_Sol/Flow_My_Firetear/Siggimund_Blue.wav signifies the collection sbsound and the location albums/Sort_Sol/Flow_My_Firetear/ within that collection for the data-object Siggimund_Blue.wav.

  • It is legal to create folders other than meta and data in the root. They are ignored by the ingesting process and should only be used for administrative data, logs and similar.

Usage

Ingest.properties.xml contains all general properties for the ingester (all package-specific information is contained in the FOXML-files). Invocations are done with ingest.sh.

Required Software and Modules

Validation is expected to be provided either by the SB-developed software RIFF or by Fedora-group provided software. The current status of RIFF is that it can provide simple link-checking, which is useful but not at all complete enough for SB.

Ingestion of data-objects requires a working bit-storage, provided by IT-Maintenance at Statsbiblioteket. The bit-storage is expected to provide transactional commits:

  • The ingester sends a data-object, which consists of an id, a location and a bitstream.
  • The bit-storage sends back a checksum and awaits confirmation before the data-object is stored permanently.
  • If the ingester finds the checksum satisfactory and the ingest of the meta-data is successful, a confirmation is sent back to bit-storage.

Resources

TaskA.4.1DesignDocument (last edited 2010-03-17 13:09:17 by localhost)