Differences between revisions 10 and 20 (spanning 10 versions)

Action decide Data Model with respect to Fedora 3

Assigned: KFC + ABR + EK

Prev assigned

Tasks adressed: TaskA.2.2,TaskA.2.1

Time estimated: 8md

Time used: 6md

Priority: 6

Status: In progress

Iteration: 11,12

Notes

The Fedora 3 Data Model seems to indicate a different datamodel to what we expected, and this has impact on how we should define the final DOMS data model. This needs to be decided, and soon, so Mjølner knows how our final datamodel looks.

The issues we have identified so far:

Inheritance is not implemented, but two equally useful ways present themselves. Choosing the one that Fedora will use will help our system gain acceptance.
Arbitrary xml in DS-COMPOSITION, instead of SCHEMABINDINGS
rdf:type and a OWL Full to describe the relationsship model
Policy Relationships
The schemas for the Base objects datastreams must be decided and formalised.

The purpose of this action is to establish how key concepts, mainly inheritance, should be implemented, and formalise the DOMS base collection. We should start manipulating the Fedora developers into accepting our wishes, mainly by emails to their mailinglists.

The product should be a datamodel description, and the foxml object for the Base collection.

Progress

DOMS Data model

Much of the conceptual work have been done, but there is still a huge amount of documentation lacking. The DataModel document should be updated to reflect the ontology advances, and the changes to the content models. A proper, up to date, description of the data model for the reel tape collection should be made.

DS-COMPOSITE

We use DS-COMPOSITE to store arbitrary metadata, as proposed. Fedora does not choke on it, and preserves it faithfully. A new schema for DS-COMPOSITE have been defined, and a schema for the elements we embed in it have also been defined.

Schema validation

There are objects in the tape collection, that need a subset of the qualified dublin core metadata set. This forced us to concretize how we intended schema validation to work.

Until we know the precise capabilities of the engine that should parse the schemas for the GUI, we are sorely restricted in what we can do. For qualified dublin core, admittedly a simple example, we first recreated the dcterms schema. We dropped all the substitution groups, and all the qualifiers, leaving us with just a long list of unique terms. Then we defined schemas, just sequences, restricting which field should be allowed. The idea is to use such a restriction list, combined with the "full" schema, which should be parsable for even simple systems.

PREMIS, and other more advanced schemas have not been considered yet.

View datastream

Have not been updated. There is a serious issue with the way we have chosen to implement it:

In order for incremental updates of a search index to be made, we must inform the search tool each time a "post" in the index is changed. A "post" is defined as the view of a main object. If one of the objects in this view is changed, it has no way of finding its main object, and thus we cannot inform the search index that this "post" is changed.

This issue is still outstanding.

DOMS owl ontology

The Fedora OWL ontology system we will use have been defined in FedoraOntology. Beside the lack of support for DataTypeProperties at current, we believe that it has no loose ends.

TODO

The TODO list from ActionDataModelTDRRequirements has been added to TaskA.2.3AnalysisDocument.

Checklist For Working On An Action

The Life Cycle of an Action:

Assign people for action definition: Done at start of iteration status meeting. Fill out Assigned
Define the action: Describe information about what is to be done and how. Fill out Tasks Addressed and Time Estimated.
Review the definition: Get another project group member to review the action definition, and update it.
Assign people for action implementation: Done by project manager, usually the same persons who wrote the definition. Fill out Assigned and Prev assigned if new persons are assigned.
Implement the action: See details below
Review the action: Get another project group member to review what is implemented (code and documentation), and update it.
Finish the action: Change the status to "Finished" and update the "time used" field on the action page.

Please make sure that you address the below issues, when working on an action:

Update the state of the action to "In Progress" when you start working on it.
Check if the tasks addressed by this action have their status set to "In Progress". If that is not the case, then change the state of them.
Keep track of how much time that has been spent working on the action. If it addresses more than one task, then make a note on the action page about how much of the elapsed time that has been spent on the individual tasks. Hint: Continually updating the "Time used" field will make it easier for you.
Update the "Progress History" and documentation pages of each task addressed by this action when appropriate. This depends on the situation, but in general, the task pages should hold all important related information about the work done, experiences gathered, identified requirements and so on.

-  ⇤ ← Revision 10 as of 2008-08-25 15:08:52 → 
  Size: 7466
  Editor: abr
  Comment:
+   ← Revision 20 as of 2010-03-17 13:12:54 → ⇥
  Size: 6275
  Editor: localhost
  Comment: converted to 1.6 markup
-Deletions are marked like this.
+Additions are marked like this.
 Line 21:
- Tasks adressed:: ["TaskA.2.2"],["TaskA.2.1"]
+ Tasks adressed:: [[TaskA.2.2]],[[TaskA.2.1]]
 Line 33:
- Time used:: 0.5md
+ Time used:: 6md
 Line 45:
- Status:: Description needs review
+ Status:: In progress
-Line 51:
+Line 50:
- Iteration:: 11
+ Iteration:: 11,12
-Line 86:
+Line 85:
-=== DS-COMPOSITE ===
We use DS-COMPOSITE to store arbitrary metadata, as proposed. Fedora does not choke on it, and preserves it faithfully.
+=== DOMS Data model ===

Much of the conceptual work have been done, but there is still a huge amount of documentation lacking. The DataModel document should be updated to reflect the ontology advances, and the changes to the content models.
A proper, up to date, description of the data model for the reel tape collection should be made.
-Line 91:
+Line 92:
+=== DS-COMPOSITE ===
-Line 92:
+Line 94:
-=== Inheritance and rdf:type ===
It is a great priority to be able to describe the datamodel in OWL lite. To be able to do this, we propose the following system.

 * A data object must have fedora-model:hasModel relations to all the Content models it implements.
 * A content model must have doms:extendsModel relations to it's direct parents.

In a content model, we use OWL to describe the contents of the RELS-EXT datastream of it's subscribing data objects. We make no attempts to describe the contents of the RELS-EXT datastream of the content models.
The fedora-model:hasModel relations will not be expressed in this OWL schema. 

There are three typical scenarios where you want to construct the complete schema for RELS-EXT datastream

==== Given a content model C , make a new object ====
 1. For the content model C, follow the doms:extendsModel relation to construct the list of parent content models. Uniquify this list.
 1. From each content model in the list, extract the schema, if any, for the RELS-EXT datastream. Concatenate these schemas into an xml file.
 1. Use an xslt to transform any mention of doms:extendsModel in this schema into rdf:isSubclassOf
 1. Construct a new rdf individual of type C. Use the schema to find the required and the possible relations. Populate the individual.
 1. Transform the rdf:type relation in the individual into a fedora-model:hasModel relation to each of the classes from the schema.

==== Given a data object D, examine the schema for the RELS-EXT datastream ====
 1. Construct the list of content models for A thus:
   1. Follow all the fedora-model:hasModel relations of A, and construct a list AL of content models for A.
   1. For each content model B in the list AL, follow the doms:extendsModel relations, to construct a set AG of content models for A.
   1. Compare AG and AL. If they do not contain exactly the same models, there is a problem
 1. Construct the schema from the content models thus:
   1. For each content model in AL, find the schema, if any, for the RELS-EXT datastream.
   1. Concatenate these schemas to a xml document B
   1. Use a xslt to transform the doms:extendsType objectproperties into rdf:isSubclassOf
   1. Take the RELS-EXT datastream from the data object, and store it in a document C.
   1. Use a xslt to transfrom the doms:hasModel relation into rdf:type
Document B should now be the schema for the object described in C. Make the changes you want in C, transform the rdf:type back into fedora-model:hasModel and store it as the RELS-EXT datastream.
+We use DS-COMPOSITE to store arbitrary metadata, as proposed. Fedora does not choke on it, and preserves it faithfully.
A new schema for DS-COMPOSITE have been defined, and a schema for the elements we embed in it have also been defined.
-Line 124:
+Line 98:
-==== Validate all the objects in the repository ====
 1. Use the triple store to find all content models in the system.
 1. For each content model C, follow the doms:extendsModel relation to construct the list of parent content models. Uniquify this list.
 1. From each content model in the list, extract the schema, if any, for the RELS-EXT datastream. Concatenate these schemas into an xml file.
 1. Use an xslt to transform any mention of doms:extendsModel in this schema into rdf:isSubclassOf
+=== Schema validation ===
There are objects in the tape collection, that need a subset of the qualified dublin core metadata set. This forced us to concretize how we intended schema validation to work.
-Line 130:
+Line 101:
-. For each data object in the repository, extract the RELS-EXT datastream in a document
 1. For each document, xslttransform the fedora-model:hasModel relation into rdf:type relation.
 1. Validate all these documents against the set of schemas.
+Until we know the precise capabilities of the engine that should parse the schemas for the GUI, we are sorely restricted in what we can do. For qualified dublin core, admittedly a simple example, we first recreated the dcterms schema. We dropped all the substitution groups, and all the qualifiers, leaving us with just a long list of unique terms. Then we defined schemas, just sequences, restricting which field should be allowed. The idea is to use such a restriction list, combined with the "full" schema, which should be parsable for even simple systems.

PREMIS, and other more advanced schemas have not been considered yet.


=== View datastream ===

Have not been updated. There is a serious issue with the way we have chosen to implement it:

In order for incremental updates of a search index to be made, we must inform the search tool each time a "post" in the index is changed. A "post" is defined as the view of a main object. If one of the objects in this view is changed, it has no way of finding its main object, and thus we cannot inform the search index that this "post" is changed.

This issue is still outstanding.


=== DOMS owl ontology ===

The Fedora OWL ontology system we will use have been defined in FedoraOntology. Beside the lack of support for DataTypeProperties at current, we believe that it has no loose ends.

=== TODO ===

The TODO list from [[ActionDataModelTDRRequirements]] has been added to [[TaskA.2.3AnalysisDocument]].