Differences between revisions 7 and 8

Action View Datastream

Assigned: ABR+KFC

Prev assigned: JRG (description review)

Tasks adressed: TaskB.2,TaskB.1

Time estimated: 2md

Time used: 0md

Priority: 6

Status: Described, lacks review

Iteration: 13

Notes

The Problem

There is one major outstanding issue with the technological foundation of the DOMS data model. For many purposes, it is useful to regard a number of objects as a combined whole, with one of them being the main object. There are a number of ways to specify this.

In the following, the combination of data from a number of objects will be called the blob.

A system to associate objects and blobs must be defined. The system must adhere to the following restrictions

Given a main object, you must be able to get the rest of the blob.
Given an object from a blob, you must be able to get to the main object of the blob.
An object should be able to belong to any number of blobs
The blobs must be described by information stored in the Content Models, preferably to be backwards compatible with the current system.
Disseminators that output the entire blob should be defined.

Current system

The current system is described here: FedoraViewBlobs

In the content models, we have a reserved datastream, called VIEW. VIEW lists the the datastreams from this object to include in the blob, and the relations to be followed from this object to find other objects to include in the blob. Some objects are main objects. To make a blob, you begin from a main object, and follow the listed relations, until there are no more to follow.

This has the nice feature of having the combined whole defined in the content models. No individual object can redefine how it's view looks.

The blobs visible from different angles (interfaces to the system) might not be identical. The blobs used as basis for the gui will probably not be the blobs harvested by a search engine and so on.

The problem arise when performing changes to objects. When an object is changed, by some generic means, the contents of every blob that this object belongs to, should be updated. So, there must be a way to get from an object to the main object, so that the blob can be recomputed. Unfortunately, this is problematic with this implementation.

Progress

There are a number of different models thrown about for how to make this system.

Model 1

Use the current system, but with a database system thrown in. This database will contain the static information about the contents of each blob. It will be recomputed from time to time. Via the database, the lookups for a object to the main object can be achived.

Model 2

Current system with a restriction. There are just one kind of blobs, so that the gui and all other tools operate on the same blobs. Changes are ever only performed through a system that read in the entire blob, and so know the main object when performing the change.

This have the disadvantage that all tools we use must understand the concept of the blob and the VIEW datastream.

Model 3

Current system with an addition. The content model VIEW datastream is bi-directional. Not only does it list the relations to follow from this object, it also lists the possible relations to this object, which would make it part of a VIEW. So, the content model tells you which relations leading into the object should be followed in reverse, when going from an object to the main object.

This will unfortunately mean that the content models will start to define and restrict things that are under the control of other content models. This creates unfortunate interdependencies between the content models.

Model 4

Not the current system.

The view datastream only lists the local datastreams to include. There are no main objects. For each blob, there exist a special aggregate object, that has a "aggregates" relation to each object in the blob. Whenever an object is changed, the aggregate object is notified, so that the blob can be recomputed.

This solution has both advantages and disadvantages. An advantage is that we can automate the blob update, so that all changes to objects notify the aggregate objects automatically, and tools do not need to understand the blob idea. The disadvantage is that we introduce new objects, which perform some of the same tasks as the old main objects. And objects which are not aggregated will not be visible to any blob-enabled tool. It does change the concepts about how to make a new object.

It will make search easier, as the search should only search in the dissemination output of the aggregation objects, which will all be of a particular type. The search output disseminator should be on the aggregation object.

The aggregation type should probably be subclassed for each of the collections, to help define the main objects.

Note: This has been heavily inspired by the OAI-ORE model for ressource maps. Look at their primer LINK.

This system breaks with the principle of having the blobs defined in the content models. The blobs are now defined in special data objects, one for each blob.

Making a new object by this system

The process for making a new object, with templates will be the following.

A new aggregation object is made, empty, but with the title of the blob, if any. The aggregation object is of the kind special to the collection, if any, or just a generic aggregation.
If the aggragation is subclassed, it specifies a content model for the first object. If not, one must be selected.
When the content model for the first object is selected, find the prototypes for objects of this content model. Make a new object from these.
Based on the main object ontology and contents make the other objects it should relate to.
Write the data objects to the repository
Make the aggregation object with relations to each of the objects you made. Write this object to the repository.

Conclusion

Checklist For Working On An Action

The Life Cycle of an Action:

Assign people for action definition: Done at start of iteration status meeting. Fill out Assigned
Define the action: Describe information about what is to be done and how. Fill out Tasks Addressed and Time Estimated.
Review the definition: Get another project group member to review the action definition, and update it.
Assign people for action implementation: Done by project manager, usually the same persons who wrote the definition. Fill out Assigned and Prev assigned if new persons are assigned.
Implement the action: See details below
Review the action: Get another project group member to review what is implemented (code and documentation), and update it.
Finish the action: Change the status to "Finished" and update the "time used" field on the action page.

Please make sure that you address the below issues, when working on an action:

Update the state of the action to "In Progress" when you start working on it.
Check if the tasks addressed by this action have their status set to "In Progress". If that is not the case, then change the state of them.
Keep track of how much time that has been spent working on the action. If it addresses more than one task, then make a note on the action page about how much of the elapsed time that has been spent on the individual tasks. Hint: Continually updating the "Time used" field will make it easier for you.
Update the "Progress History" and documentation pages of each task addressed by this action when appropriate. This depends on the situation, but in general, the task pages should hold all important related information about the work done, experiences gathered, identified requirements and so on.

-  ⇤ ← Revision 7 as of 2008-10-06 12:12:50 → 
  Size: 8659
  Editor: jrg
  Comment:
+   ← Revision 8 as of 2010-03-17 13:09:15 → ⇥
  Size: 8659
  Editor: localhost
  Comment: converted to 1.6 markup
-Deletions are marked like this.
+Additions are marked like this.
 Line 21:
- Tasks adressed:: ["TaskB.2"],["TaskB.1"]
+ Tasks adressed:: [[TaskB.2]],[[TaskB.1]]