Differences between revisions 11 and 13 (spanning 2 versions)
Revision 11 as of 2008-10-13 12:40:32
Size: 12143
Editor: abr
Comment:
Revision 13 as of 2008-10-14 11:54:01
Size: 10536
Editor: abr
Comment:
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
DOMS employ an overall atomistic data model. Atomistic data models are much more flexible than traditional compound data models, but they have one big (and largely unmet) challenge. When working with data objects you will frequently need to operate on a number of objects as if they were a common whole. The easiest usecase for this is the public dissemination of data. If the data that should go into one Dissemination Information Package is distributed over several objects, the system need to understand this. DOMS employs an overall atomistic data model. Atomistic data models are much more flexible than traditional compound data models, but they have one big (and largely unmet) challenge. When working with data objects you will frequently need to operate on a number of objects as if they were a common whole. The easiest usecase for this is the public dissemination of data. If the data that should go into one Dissemination Information Package is distributed over several objects, the system needs to understand this.
Line 5: Line 5:
The DOMS team have laboured long and hard to find a nice way to model this in a Fedora context. This is their product. The DOMS team has laboured long and hard to find a nice way to model this in a Fedora context. This is their product.
Line 11: Line 11:
Each view contains an object the view is centered around. We call this the main object, and the ID of the View is the ID of the main object. All the other objects in the View are related to the main object by some chain of relations. Therein lies a crucial feature of this View system; '''Rather than having special relations from the main object to all objects in the view, some of the structural relations are annotated to be view relations.''' Or rather, we list the relations that should be followed to find the objects in the view, rather than define view-relations. Each view contains an object the view is centered around. We call this the main object, and the ID of the View is the ID of the main object. All the other objects in the View are related to the main object by some chain of relations. Therein lies a crucial feature of this View system; '''Rather than having special relations from the main object to all objects in the view, some of the structural relations are annotated to be view relations.''' Or rather, we list the relations that should be followed to find the objects in the view, rather than define view-relations. Actually, we annotate both relations to and from a given object as view relations.
Line 13: Line 13:
The view nessesary for a proper public dissemination of the objects might not be the same as what is required for a useful GUI access, through. The way around this is to define multiple views on the same objects. Each named view has it's own main objects and set of annotated relations to follow from these main objects. In no way do they interact, and we can therefore have radically different ways of viewing the same data. The view nessesary for a proper public dissemination of the objects might not be the same as what is required for a useful GUI access, through. The way around this is to define multiple views on the same objects. Each named view has its own main objects and set of annotated relations to follow from these main objects. In no way do they interact, and we can therefore have radically different ways of viewing the same data.
Line 19: Line 19:
Now we come to another crucial feature of this view system; '''Views are defined on the content model level.''' An data object does not identify itself as a main object. The content model for this object tells that all objects of this class are main objects. Everything is defined in the classes of objects, never in the actual data objects. As such, it is easy to change and add views on a collection-wide basis. Now we come to another crucial feature of this view system; '''Views are defined on the content model level.''' A data object does not identify itself as a main object. The content model for this object tells that all objects of this class are main objects. Everything is defined in the classes of objects, never in the actual data objects. As such, it is easy to change and add views on a collection-wide basis.
Line 21: Line 21:
To facilitate this, the "VIEW" datastream in content models have been designated as Reserved and Required. The "VIEW" datastream is, basicly, a sequence of named views, each with their designated relations. To facilitate this, the "VIEW" datastream in content models have been designated as Reserved and Required. The "VIEW" datastream is, basicaly, a sequence of named views, each with their designated relations.
Line 67: Line 67:
TODO: THIS HSOULD REALLY BE REFORMULATED

Line 69: Line 72:
 1. Read the list of view relations from it's content model  1. Read the list of view relations from its content model
Line 119: Line 122:
== Content Model Inheritance and Views ==

DOMS employ inheritance for content models, as detailed in FedoraOntology. This interferes with the View system.

As you cannot mark something as NOT being in the view, there are few potential conflicts. For a data object, just take the list of view relations from each of its content models and their ancestors and concatenate and remove duplicates. This is the view relations for this object. Same with the inverse view relations.

Previously we required that you could only mark (as view relations) relations that had been defined in the same content model. This will now be problematic. Rather, the rule now is: '''In the VIEW datastream, you can only mention relations that are defined in this content model or one of its parents.''' The inverse relations can still be freely mentioned.
Line 147: Line 142:
                        <xsd:enumeration value="importfield"/>                         <xsd:enumeration value="importable"/>
Line 149: Line 144:
                        <xsd:enumeration value="datafile"/>                         <xsd:enumeration value="uploadable"/>
Line 151: Line 146:
                        <xsd:enumeration value="invisible"/>
Line 162: Line 158:
The semantic meaning of the four types are really decided by the GUI, but the approximate meaning is as follows
 * importfield: The contents should be the result of an import function.
The semantic meaning of the five types are really decided by the GUI, but the approximate meaning is as follows
 * importable: The content is inline xml, and should be the result of an import function. Once written, the datastream count as "readonly"
Line 165: Line 161:
 * datafile: The contents is a link to a file in bitstorage. If the datastream does not exist, the GUI should present a way to upload a file. Otherwise the link to Bitstorage should appear, readonly.  * uploadable: The contents is a link to a file in bitstorage. If the datastream does not exist, the GUI should present a way to upload a file. Otherwise the link to Bitstorage should appear, readonly.
Line 167: Line 163:
 * invisible: The GUI should totally disregard this datastream, and behave as if it is not there. This is the default, if no guirepresentation is defined for a datastream.
Line 180: Line 177:
== Content Model Inheritance and Views ==
Line 181: Line 179:
== Changing an object and marking the view as updated == DOMS employ inheritance for content models, as detailed in FedoraOntology. This interferes with the View system.
Line 183: Line 181:
Whenever one of the components of a View is changed, the whole View count as updated. As such, any services that subscribe to the View in any way need to be notified. If there is a search index for the Views, and one is updated, its state in the index must be recomputed. As you cannot mark something as NOT being in the view, there are few potential conflicts. For a data object, just take the list of view relations from each of its content models and their ancestors and concatenate and remove duplicates. This is the view relations for this object. Same with the inverse view relations.
Line 185: Line 183:
The problem arrives when trying to do this. The View system is designed to ease the computing of a View when knowing the Main object. The reverse is finding the Views, ie. the Main objects, that have this data object in their View. Rather than encoding this information in the model, we chose to keep an external record of all the views. Previously we required that you could only mark (as view relations) relations that had been defined in the same content model. This will now be problematic. Rather, the rule now is: '''In the VIEW datastream, you can only mention relations that are defined in this content model or one of its parents.''' The inverse relations can still be freely mentioned.
Line 187: Line 185:
The external record will be SQL based, or something similar. It will basicly be one big list (VIEWS), that will associate each PID in DOMS with one or more PIDs of Main objects. In addition, there will be a list (MAIN_OBJECTS) of the Main objects, and whether they have changed since the list was generated.
These lists will be made at one time, by computing the view of all main objects in Doms. All Main objects will have been found, and added to MAIN_OBJECTS. When appropriate changes are made to the objects in DOMS, the object in MAIN_OBJECTS will be marked as changed. Periodically, the View of changed main objects will be computed, and the VIEWS list will be updated, and the objects will be marked as not changed.

We expect to hook the fedora API-M functions directly, so that the updating of the VIEW lists are done without any user input.

There are several situations where the list will be used

 * A data object is modified: (ModifyDatastreanBy*) Look up the PID of the object, find the main object(s), and mark them as modified.
 * A non-main data object is added: Do nothing
 * A main data object is added: (ingest/AddRelationship->Contentmodel) Register the PID as a main object, and mark it as changed.
 * A relation is added/deleted in an data object: (AddRelationship/PurgeRelationship) Look up the PIDs of both the object and the subject of the relation, and mark both's main objects as changed.
 * A relation is modified in a data object:(API?) Look up the PIDs of all the object, and the new and old subject, and find the main objects of all three, and mark these as changed.
The inheritance rules for datastream views is the same as for datastream definitions. So, like the schema extension, where it is only the last schema that takes effect, it is also only the last guirepresentation that should be considered by the gui.

Fedora View Blobs

DOMS employs an overall atomistic data model. Atomistic data models are much more flexible than traditional compound data models, but they have one big (and largely unmet) challenge. When working with data objects you will frequently need to operate on a number of objects as if they were a common whole. The easiest usecase for this is the public dissemination of data. If the data that should go into one Dissemination Information Package is distributed over several objects, the system needs to understand this.

The DOMS team has laboured long and hard to find a nice way to model this in a Fedora context. This is their product.

Views

A view is a way of combining objects in the DOMS into a domain-relevant group. It is a way of seeing a number of objects as related - as a whole.

Each view contains an object the view is centered around. We call this the main object, and the ID of the View is the ID of the main object. All the other objects in the View are related to the main object by some chain of relations. Therein lies a crucial feature of this View system; Rather than having special relations from the main object to all objects in the view, some of the structural relations are annotated to be view relations. Or rather, we list the relations that should be followed to find the objects in the view, rather than define view-relations. Actually, we annotate both relations to and from a given object as view relations.

The view nessesary for a proper public dissemination of the objects might not be the same as what is required for a useful GUI access, through. The way around this is to define multiple views on the same objects. Each named view has its own main objects and set of annotated relations to follow from these main objects. In no way do they interact, and we can therefore have radically different ways of viewing the same data.

The VIEW datastream

Now we come to another crucial feature of this view system; Views are defined on the content model level. A data object does not identify itself as a main object. The content model for this object tells that all objects of this class are main objects. Everything is defined in the classes of objects, never in the actual data objects. As such, it is easy to change and add views on a collection-wide basis.

To facilitate this, the "VIEW" datastream in content models have been designated as Reserved and Required. The "VIEW" datastream is, basicaly, a sequence of named views, each with their designated relations.

The schema for the VIEW datastream is as follows:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            targetNamespace="http://doms.statsbiblioteket.dk/types/view/0/1/#"
            xmlns="http://doms.statsbiblioteket.dk/types/view/0/1/#"
            elementFormDefault="qualified"
            attributeFormDefault="unqualified">

    <xsd:element name="views" type="viewsType"/>

    <xsd:complexType name="viewsType">
        <xsd:sequence>
            <xsd:element name="view" type="viewType" minOccurs="0" maxOccurs="unbounded"/>
        </xsd:sequence>
    </xsd:complexType>

    <xsd:complexType name="viewType">
        <xsd:sequence>
            <xsd:element name="relations" type="relationsType" minOccurs="0" maxOccurs="1"/>
            <xsd:element name="inverse-relations" type="inverse-relationsType" minOccurs="0" maxOccurs="1"/>
        </xsd:sequence>
        <xsd:attribute name="name" type="xsd:string" use="required"/>
        <xsd:attribute name="mainobject" type="xsd:boolean" default="false"/>
    </xsd:complexType>

    <xsd:complexType name="relationsType">
        <xsd:sequence>
            <xsd:any namespace="##any" processContents="skip" maxOccurs="unbounded"/>
        </xsd:sequence>
    </xsd:complexType>

    <xsd:complexType name="inverse-relationsType">
        <xsd:sequence>
            <xsd:any namespace="##any" processContents="skip" maxOccurs="unbounded"/>
        </xsd:sequence>
    </xsd:complexType>

</xsd:schema>

Multilevel Views

TODO: THIS HSOULD REALLY BE REFORMULATED

The system described above works as follows.

  1. Start with a main object.
  2. Read the list of view relations from its content model
  3. Follow these relations to other objects.
  4. Keep following these relations until no new objects are found.

The implementation of the view system detailed above does have one lack, which the clever reader might have spotted. It is not local. One of the fundamental design requirements for expansions to Fedora is that data objects should only be described by content models they subscribe to, and content models should only describe the objects that subscribe to them.

For that reason, the meaining of the relations mentioned in the "VIEW" datastream is changed somewhat: Each data object has a view, encompassing the object and the views of other directly related data objects. So, if the VIEW datastream in a main object was

<view:views  xmlns:view="http://doms.statsbiblioteket.dk/types/views/0/1/#">
  <view:view name="GUI" mainobject="true">
    <view:relations>
      <doms:hasFile xmlns:doms="http://doms.statsbiblioteket.dk/relations/default/0/1/#"/>
    </view:relations>
    <view:inverse-relations>
      <doms:isPartOfCollection xmlns:doms="http://doms.statsbiblioteket.dk/relations/default/0/1/#"/>
    </view:inverse-relations>
  </view:view>
</view:views>

then the View of this main object encompass the main object itself, and the View of any objects that the main object has a "doms:hasFile" relation to and any object that has a "doms:isPartOfCollection" relation to this object.

The procedure to calculate the total view of a main object is detailed in this bit of pseudo code. It basicly performs a depthfirst search of the objects. The order of the objects in the View does not carry any sort of meaning, and will be random.

Set<Object> visitedObjects;

List<Object> CalculateView(Object o) {
   List<Objects> view = new List<Objects>();

   if (visitedObjects.contain(o){
      return view;
   }

   visitedObjects.add(o);
   ContentModel c = o.getContentModel();
   List<Relations> view-rels = c.getViewRelations();
   for (Relation r : view-rels){
     view.addAll(CalculateView(r.getObject());
   }

   List<Relations> view-invrels = c.getInverseViewRelations();
   for (Relation r : view-invrels){
     view.addAll(CalculateView(r.getSubject());
   }

   return view;
}

Datastream View

The described view system can designate exactly which objects are part of a view. But it is not always enough to know just the objects. For the GUI, it is nessesary to know exactly which datastreams should be presented, and how. For this purpose we have designed an DS-COMPOSITE extension, which follows the system laid down in FedoraTypeChecking.

<xsd:schema
        targetNamespace="http://doms.statsbiblioteket.dk/types/dscompositeschema/guirepresentation/0/1/#"
        xmlns="http://doms.statsbiblioteket.dk/types/dscompositeschema/guirepresentation/0/1/#"
        xmlns:xsd="http://www.w3.org/2001/XMLSchema"
        elementFormDefault="qualified"
        attributeFormDefault="unqualified">

    <xsd:element name="guirepresentation">
        <xsd:complexType>

            <xsd:attribute name="presentAs" use="required">
                <xsd:simpleType>
                    <xsd:restriction base="xsd:string">
                        <xsd:enumeration value="importable"/>
                        <xsd:enumeration value="editable"/>
                        <xsd:enumeration value="uploadable"/>
                        <xsd:enumeration value="readonly"/>
                        <xsd:enumeration value="invisible"/>
                    </xsd:restriction>
                </xsd:simpleType>
            </xsd:attribute>

        </xsd:complexType>

    </xsd:element>

</xsd:schema>

The semantic meaning of the five types are really decided by the GUI, but the approximate meaning is as follows

  • importable: The content is inline xml, and should be the result of an import function. Once written, the datastream count as "readonly"
  • editable: The contents is inline xml, and should be parsed according to their schema, and presented in the GUI.
  • uploadable: The contents is a link to a file in bitstorage. If the datastream does not exist, the GUI should present a way to upload a file. Otherwise the link to Bitstorage should appear, readonly.
  • readonly: The contents is inline xml, generated by some other means. The user should be able to read the contents in the GUI, but not change them. The GUI might hide the contents by default, but they must be accessible.
  • invisible: The GUI should totally disregard this datastream, and behave as if it is not there. This is the default, if no guirepresentation is defined for a datastream.

So, an example of a datastream entry in DS-COMPOSITE would now be:

<dsTypeModel ID="DC">
    <form MIME="text/xml"/>
    <extensions name="DOMS">
        <schema:schema type="xsd" datastream="DC_SCHEMA" object="doms:DublinCore_Schema"/>
        <gui:guirepresentation presentAs="editable"/>
    </extensions>
</dsTypeModel>

Content Model Inheritance and Views

DOMS employ inheritance for content models, as detailed in FedoraOntology. This interferes with the View system.

As you cannot mark something as NOT being in the view, there are few potential conflicts. For a data object, just take the list of view relations from each of its content models and their ancestors and concatenate and remove duplicates. This is the view relations for this object. Same with the inverse view relations.

Previously we required that you could only mark (as view relations) relations that had been defined in the same content model. This will now be problematic. Rather, the rule now is: In the VIEW datastream, you can only mention relations that are defined in this content model or one of its parents. The inverse relations can still be freely mentioned.

The inheritance rules for datastream views is the same as for datastream definitions. So, like the schema extension, where it is only the last schema that takes effect, it is also only the last guirepresentation that should be considered by the gui.

FedoraViewBlobs (last edited 2010-03-17 13:09:38 by localhost)