Differences between revisions 1 and 2
Revision 1 as of 2008-06-26 12:26:09
Size: 1025
Editor: kfc
Comment: Created by the PackagePages action.
Revision 2 as of 2010-03-17 13:08:51
Size: 1025
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
[[Include(DataModel,,)]]
[[Include(DataModel/Type_DOMS_object,,)]]
<<Include(DataModel,,)>>
<<Include(DataModel/Type_DOMS_object,,)>>
Line 6: Line 6:
[[Include(DataModel/Datastream_DublinCore,,)]]
[[
Include(DataModel/Datastream_Relations,,)]]
[[
Include(DataModel/Disseminators,,)]]
[[
Include(DataModel/Type_collection,,)]]
[[
Include(DataModel/Collection,,)]]
<<Include(DataModel/Datastream_DublinCore,,)>>
<<
Include(DataModel/Datastream_Relations,,)>>
<<
Include(DataModel/Disseminators,,)>>
<<
Include(DataModel/Type_collection,,)>>
<<
Include(DataModel/Collection,,)>>
Line 12: Line 12:
[[Include(DataModel/Type_image,,)]]
[[Include(DataModel/Type_audio,,)]]
[[
Include(DataModel/Type_video,,)]]
[[
Include(DataModel/Type_text,,)]]
[[
Include(DataModel/Type_file,,)]]
[[Include(DataModel/Type_type,,)]]
[[
Include(DataModel/Type_rights,,)]]
[[
Include(DataModel/DOMS_admin_rights,,)]]
[[
Include(DataModel/Type_fileformat,,)]]
[[
Include(DataModel/DOMS_Base_Collection,,)]]
[[
Include(DataModel/ExampleObjects_gentofteCollection,,)]]
<<Include(DataModel/Type_image,,)>>
<<Include(DataModel/Type_audio,,)>>
<<
Include(DataModel/Type_video,,)>>
<<
Include(DataModel/Type_text,,)>>
<<
Include(DataModel/Type_file,,)>>
<<Include(DataModel/Type_type,,)>>
<<
Include(DataModel/Type_rights,,)>>
<<
Include(DataModel/DOMS_admin_rights,,)>>
<<
Include(DataModel/Type_fileformat,,)>>
<<
Include(DataModel/DOMS_Base_Collection,,)>>
<<
Include(DataModel/ExampleObjects_gentofteCollection,,)>>
Line 24: Line 24:
[[Include(DataModel/UnresolvedIssues,,)]] <<Include(DataModel/UnresolvedIssues,,)>>

DON'T PANIC

A definition: A datamodel describes the content of a collection. A content model describes the content of a data object. So, a datamodel is a set of content models, that together describe the collection.

The DOMS datamodel describes how the Type system underlying DOMS is realized in Fedora 3.

The entire DOMS datamodel is, in its entirety, a complex system. For proper understanding, the various components have been detailed in seperate documents. Firstly, it consist of a number of extensions to the Fedora system. Secondly, it consist of a number of predefined objects. These objects make use of the extensions to Fedora. Thirdly, it consist of a number of policies for how certain tasks are achived. And fourthly, it consist of a number of API interfaces.

Fedora and DOMS are big on namespaces. To ease writing the documentation, a namespace document, DomsNameSpacesAndSchemas, have been written. All namespaces should be defined there, and all shorthands refer to the namespaces defined therein.

Content Models in general

Fedora provides a repository for digital objects. All objects in the repository can, in principle, be unique, but Fedora provides a way of specifying that an object has a given type. Unfortunately, the type-definitions in Fedora, called Content Models, are rather simplistic by default. We use them as the basis of our type system, with certain enhancements.

For our purposes, there are two kinds of digital objects in Fedora

  • Data objects
  • Content Model objects

The Content Model object, as used in DOMS, describes the compulsory and legal content of an object of this type. It contains the information nessesary to verify if the given object is indeed of this type. For more detail on this, see FedoraOntology and FedoraTypeChecking

A data object can specify the Content Model describing its contents, via a fedora-model:hasModel relation, and in DOMS we require it to be present. A data object will be said to "subcribe" to a Content Model. Content Model inheritance, as specified in FedoraOntology, will be used.

The special Content Model object "doms:ContentModel_DOMS" is the root object. All Content Models must have an "doms-relations:extendsModel" relation to this object, possibly through a number of other Content Models. The complete description of a data object is defined as the set of the descriptions in the Content Model specified with "fedora-model:hasModel" and all Content Models that can be reached from this, by following "doms-relations:extendsModel" relations.

A Content Model can "extend" more than one other Content Model. There is no overriding of Content Models, a subscribing object must be valid in regards to all the Content Models in the inheritance tree.

Content Models have two datastreams in particular that are interesting. These are the ONTOLOGY and DS-COMPOSITE. The Ontology defines the the allowed relations in subscribing objects, and the DS-COMPOSITE defines the required datastreams and any restrictions they must adhere to.

Fedora extensions

Predefined objects

Doms policies

API documentation

Working with the Data model

Doms contains a number of content models. These are meant to serve as the basic buildingblocks for data models for new collections. A datamodel is, of course, not restricted to use only these content models, it can, and should, define it's own. All new content models, should extend doms:ContentModel_DOMS, and all objects that need to reference files outside Fedora should have a content model that derive from doms:ContentModel_File and so on. The content models that provide extra meaning are optional to use, and should at least be extended for the relevant collection.

Most data models are structured around some realworld concept, like a CD, modelled as a digital object. This object will be described by a content model that is totally collection specific, only extending doms:ContentModel_DOMS. It will probably have relations to digital objects, like tracks. These will be described by a content model that extends doms:ContentModel_Audio. Each of these will tracks must then reference a audio preservation file object, or some subtype of this. This is the best practice for constructing data models.

Disseminators

The disseminators named by Type_DOMS, are to be defined by the individual objects - so for instance an image file object would give the actual image (possibly scaled) in IMAGE_PRESENTATION, while an object representing a CD album might have a cover it can give as representation.

We expect to cache the result of calls to these disseminators.

Note that VIDEO/SOUND/TEXT_PRESENTATION(0, MaxInt) will give the entire stream, while IMAGE_PRESENTATION(maxInt,maxInt) will give the entire image.

IMAGE_PRESENTATION(64,64) might be used for thumbnails.

INDEX_REPRESENTATION -> XML gives a unified representation of this object for indexing purposes. It is only used on indexable objects, and in those cases it will return information that might be extracted from an entire graph of objects. For instance a CD might collect information about all tracks from sub-objects.

Data Model: Unresolved Issues

PIDS

  • The current format is not final. Has been marked thus in the DataModel

  • Namespaces and language: All relations are namespaced like the rest of the repository. We use the doms:whatever_u_like namespace for now for all objects and <collection>:whatever_u_like for relations. We want to use a standard naming convention, and we want a decision on language (the doms base collection uses English for both FoxML file names, PIDs and relations; the other example collection are inconsistent).

    • Written a proposal on the naming of things in the DataModel. Not final -- abr

Validation of relations

  • In the current data model Type Image defines relation hasFile to a File Object with relation isFileFormat to a TIFF Fileformat Object. Since we allow Fileformat subtyping, we actually mean to a TIFF Fileformat object or a subtype of this. This makes validation difficult. Instead we could subtype Type File to Type TIFF File and Type Image could define relation hasFile to a TIFF File Object. This introduces more Type Objects, and it seems that the file formats are modelled twice... We need a decision.

    • I opt for the more difficult validation. When we need to validate types, we are faced with the same problem of following relations to make sure. Subclassing are just a way of hardcoding the relation into the type, so you must follow the "isObjectType" once instead of the "extendsFormat" a few times. --abr

    Decided thus

Object Structure

  • We are still missing descriptions of type, rights and technical datastreams. These are necessary to define for the validation iteration.
  • Rules for preservation of existing PIDs as metadata when ingesting - where and how?
  • We have no model of ordered grouping (e.g. pages in a book)... Namespace solution / fragmentation solution ?

New:

  • We need a guide for creating the metadata model for new collections. For example in the Gentofte project, is the manufacturer a field on the revy object, or a reference on the revy object to a manufacturer project? The Kowari index (MISSING LINK) makes this somewhat irrelevant, but without that it becomes paramount -- mke 2007-11-13 09:17:07

Ideas

  • DC subject: Idea for clever use: When requesting the subject of an object, follow the chain of parent-objects to the top and concatenate all subjects. See hasConceptualParent later in the RDF section.

  • Ideas for future use: hasConceptualParent <PID for an object>
    Practical use is for indexing the subject, which is a concatenation of the current object's dc:subject and the conceptual parent's subject. The chain of conceptual parents always ends with project. Cyclic references aren't allowed.

MiniDOMSDataModel/FullDataModel (last edited 2010-03-17 13:08:51 by localhost)