The RadioTV Datamodel in DOMS

/!\ This document has been implemented in, overlaps somewhat with http://wiki.statsbiblioteket.dk/domswiki/GuidelinesForNewDatamodel/Collections/Television_Programs, as well as relevant patterns.

Deprecated

As seen a lot of places, the Radio TV collection should be ingested and made available through DOMS. This document describes the datamodel used for the objects in DOMS.

== Initial intro ==

We chose to use a Program-Centric datamodel. So, the primary object is the Program object, corresponding to one TV show, radio program, movie. There is no higher structure encoded in the datamodel, about relations between programs. So, while the programs (of course) contain information about when they were aired, there will be no links to the previous or next program. This structure must be dynamically created from an index.

Beside the program, the datamodel consist of two other types of objects, Shard and File. File objects represent one of the recording files, which spans multiple programs and channels. Shard represent the exact recording of this specific program. The trick here is that the Shard objects do not refer to real files. The files they refer to are created dynamically when needed. One the File objects refer to real files.

== Collection ==

All the objects are part of the "doms:RadioTV_Collection". There is currently no subdivision among digital tv recordings, analog recordings and radio recordings. All the objects have a policy specifying that they are only viewable by administrators and users sitting inhouse.

== Details ==

As stated above, there are three kinds of objects, Program, Shard and File.

=== Program ===
The Program object contain all the bibliographic information we have about the specific aired program. The primary data is stored in the PBCORE datastream, not suprisingly in the PBCore format. We use PBCore version 1.1. 

The original bibliographic metadata is in the Ritzau and Gallup/TVMeter format. At present we do not have access to the Gallup/TVMeter data. The original data is stored in the RITZAU_ORIGINAL and GALLUP_ORIGINAL datastreams in the Program object. They have no useful schema, as the data is not really xml.

The Program object contain one and just one relation to a Shard Object, with the predicate "http://doms.statsbiblioteket.dk/relations/default/0/1/#hasShard". There must be a 1-to-1 relation between Program and Shard objects.

=== Shard ===

There will be one Shard object for each Program object. A Shard object is really a very special kind of File Object. Because it is a File Object, it has a CONTENTS datastream with the url to the data. As it is not a "real" file object, the url does not refer to a real file. In fact, it does not refer to anything at the moment. It will always be of the form "http://www.statsbiblioteket.dk/doms/shard/{shard-pid}" where {shard-pid} is the pid of the shard object.

Because it is a File object, the shard object must also have a "CHARACTERISATION" datastream. The datastream is filled in with placeholder values, so the object validates, as the virtual file has no useful characterisation information.

The Shard object has one more datastream, "SHARD_METADATA". This contain the information about which datafile(s) contain the relevant information, and which offsets and cutoff values are used.

An example of the Shard Metadata can be seen below. The file tag is repeatable. The rest is not.
{{{
<shard_metadata>
  <file>
    <file_url>http://bitfinder.statsbiblioteket.dk/bart/mux1.1256943600-2009-10-31-00.00.00_1256947200-2009-10-31-01.00.00_dvb1-1.ts</file_url>
    <channel_id>102</channel_id>
    <program_start_offset>2100</program_start_offset>
    <program_clip_length>1500</program_clip_length>
    <file_name>mux1.1256943600-2009-10-31-00.00.00_1256947200-2009-10-31-01.00.00_dvb1-1.ts</file_name>
    <format_uri>info:pronom/x-fmt/386</format_uri>
  </file>
</shard_metadata>

This is (all) the information the ShardCutter/BroadcastExtraction service needs to extract and transcode the relevant bit of the recording.

The Shard object has one or more relations to File objects, with the predicate "http://doms.statsbiblioteket.dk/relations/default/0/1/#consistsOf". Each file mentioned in the SHARD_METADATA must also be referenced via this relation (to the corresponding file object).

File

The File object correspond to a datafile in bitstorage. Each file is a chunk of recording, spanning a number of hours, and possible several channels. It has two relevant datastreams.

"CONTENTS" contain the URL to the file in bitstorage. At present, the URL does not work, as there is no webserver configured at the location, but this is of no consequense for the current setup. When the webserver is established, the URLs will work, and no changes will be nessesary to the File objects in DOMS.

"CHARACTERISATION" contain the technical metadata about the datafile. At present, there is no technical metadata, aside from a qualified guess at the format. The original Radio/TV system did not contain any technical metadata, and the files are much to big to be characterised within the current deadlines. As such, this datastream contain placeholder values to make the validator happy.

Bundling

When being used by Summa, the datamodel described above should be presented as one record. Every Program object denotes a Summa record, and every program object is presented in a bundle with it's corresponding Shard object. The File objects are not presented to Summa. }}}

Documentation/DomsRadioTVDataModel (last edited 2010-12-07 12:33:07 by eab)