Differences between revisions 3 and 4

Fedora Repository Views

DOMS employs an overall atomistic data model. Atomistic data models are much more flexible than traditional compound data models, but they have one big (and largely unmet) challenge. When working with data objects you will frequently need to operate on a number of objects as if they were a common whole. The easiest usecase for this is the public dissemination of data. If the data that should go into one Dissemination Information Package is distributed over several objects, the system needs to understand this. Search indexing is another usecase. Search services tend to use a flat index, each record contain all it's metadata.

The solution to this is the concept of repository views.

Theoretical basis

A repository contain data. This data can be separated into a number of records. A record does not nessesarily correspond to a data object, but is some atomic, selfcontained entry. As they are atomic, they cannot reasonably be further broken down. As they are selfcontained, they are only weakly linked to other entries. A repository view is the mapping from the repository data into these records.

What constitues atomic selfcontained entries are dependent on the reason for accessing the repository. A search engine harvester might want to see one kind of records, while an export function might want another. We call such reasons "view angles". The mapping of data into records is dependent on the view angle.

Fedora Views

Fedora is a repository not just of data, but of digital objects. So, the view mapping should be from a number of objects into a record of some format. I assume A to be a data object.

A reasonable requirement is that for an object to be in the view of A, it must be related somehow to A. Thus, A is connected through some chain of relations, to every other object in it's view.

The second requirement, and this is very fundamental, is that A does not know it is being viewed. A is just a data object. It cannot be expected to keep up with new ways of accessing the repository, and new ways to view the data. So, A must not store any information that pertain solely to this or any other view angle. The relations of A should only be structural, in regards to the data it contains.

So, finding the view of A seems an impossible task, but it is not. For while the second requirement forbids A from knowing about the view angle, the class of A could. In Fedora, the classes of data objects are represented by content models. So, the content model(s) of A could know about this and other view angles of A. But a content model cannot say anything about A specifically, it can only describe the entire class of objects like A. So what it can do it annotate the relations of A. It could say "For this class of objects and this view angle, these structural relations denote references to other objects that are in the view."

This naturally lends itself to a recursive approach. The view of A is A plus the view of any object related to A through such an annotated relation.

But the angle one views the repository might also affect the number of entries seen. The above, recursive approach will always lead to one entry per data object. The remedy for this is to mark some classes as Entries for a certain view angle. This means that to compute the records for a given view angle, the view of all objects of a class that is an Entry should be computed. This is the view of the repository.

Fedora Implementation

This section describes how the above could be implemented in Fedora.

Entry Declaration

It is very simple for a content model to declare itself to be an Entry for a view angle. All it has to do is have a literal relation in the RELS-EXT datastream, by the name "isEntryForViewAngle", in the view namespace (see DomsNameSpacesAndSchemas), to the literal name of the view angle.

Add this relation to any content models that should describe entries for the view angle named GUI.

<view:isEntryForViewAngle xmlns:view="http://doms.statsbiblioteket.dk/types/view/0/2/#">GUI</view:isEntryForViewAngle>

Annotated Relations

To annotate relations, a special datastream have been introduced, called "VIEW". This datastream should exist in the content models, and the name have been made Reserved.

It is basically a list of view angles, and the relations that should be view relations for each. There is a little twist, though. Above, we only defined that an object should be related through some chain of relations to every object in it's view. We did not specify that the direction of these relations. So, if we have the objects A and B, and B have a relations #relatesTo to A, B could still be in the view of A. And indeed, A does not have to be in the view of B, even if B is in the view of A.

To achieve this, the view datastream allows you to annotate incoming relations, as well as outgoing.

Anchor(ViewSchema) The schema for the VIEW datastream is as follows:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            targetNamespace="http://doms.statsbiblioteket.dk/types/view/0/2/#"
            xmlns="http://doms.statsbiblioteket.dk/types/view/0/2/#"
            elementFormDefault="qualified"
            attributeFormDefault="unqualified">

    <xsd:element name="views" type="viewsType"/>

    <xsd:complexType name="viewsType">
        <xsd:sequence>
            <xsd:element name="viewangle" type="viewType" minOccurs="0" maxOccurs="unbounded"/>
        </xsd:sequence>
    </xsd:complexType>

    <xsd:complexType name="viewType">
        <xsd:sequence>
            <xsd:element name="relations" type="relationsType" minOccurs="0" maxOccurs="1"/>
            <xsd:element name="inverse-relations" type="inverse-relationsType" minOccurs="0" maxOccurs="1"/>
        </xsd:sequence>
        <xsd:attribute name="name" type="xsd:string" use="required"/>
    </xsd:complexType>

    <xsd:complexType name="relationsType">
        <xsd:sequence>
            <xsd:any namespace="##any" processContents="skip" maxOccurs="unbounded"/>
        </xsd:sequence>
    </xsd:complexType>

    <xsd:complexType name="inverse-relationsType">
        <xsd:sequence>
            <xsd:any namespace="##any" processContents="skip" maxOccurs="unbounded"/>
        </xsd:sequence>
    </xsd:complexType>

</xsd:schema>

Example of a VIEW datastream

This is an example of how the VIEW datastream could look for the view angle GUI.

<view:views  xmlns:view="http://doms.statsbiblioteket.dk/types/views/0/1/#">
  <view:viewangle name="GUI">
    <view:relations>
      <doms:hasFile xmlns:doms="http://doms.statsbiblioteket.dk/relations/default/0/1/#"/>
    </view:relations>
    <view:inverse-relations>
      <doms:isPartOfCollection xmlns:doms="http://doms.statsbiblioteket.dk/relations/default/0/1/#"/>
    </view:inverse-relations>
  </view:viewangle>
</view:views>

The GUI view angle of this object encompass the object itself, and the GUI viewangle of any objects that the object has a "doms:hasFile" relation to and any object that has a "doms:isPartOfCollection" relation to this object.

Calculating the view

The procedure to calculate the total view of a object is detailed in this bit of pseudo code. It basicly performs a depthfirst search of the objects. The order of the objects in the View does not carry any sort of meaning, and will be random.

Set<Object> visitedObjects;

List<Object> CalculateView(Object o) {
   List<Objects> view = new List<Objects>();

   if (visitedObjects.contain(o){
      return view;
   }

   visitedObjects.add(o);
   ContentModel c = o.getContentModel();
   List<Relation> view-rels = c.getViewRelations();
   List<Relation> object-rels = o.getRelations();

   for (Relation r : object-rels){
     if (view-rels.contain(r)){
       view.addAll(CalculateView(r.getObject());
     }
   }

   List<Relation> view-invrels = c.getInverseViewRelations();
   List<Relation> object-invrels = o.getInverseRelations();
   for (Relation r : object-invrels){
     if (view-invrels.contain(r)){
       view.addAll(CalculateView(r.getSubject());
     }
   }

   return view;
}

Content Model Inheritance and Views

DOMS employ inheritance for content models, as detailed in FedoraOntology. This interferes with the View system.

As you cannot mark something as NOT being in the view, there are few potential conflicts. For a data object, just take the list of view relations from each of its content models and their ancestors and concatenate and remove duplicates. This is the view relations for this object. Same with the inverse view relations.

Previously we required that you could only mark (as view relations) relations that had been defined in the same content model. This will now be problematic. Rather, the rule now is: In the VIEW datastream, you can only mention relations that are defined in this content model or one of its parents. The inverse relations can still be freely mentioned.

The inheritance rules for datastream views is the same as for datastream definitions. So, like the schema extension, where it is only the last schema that takes effect, it is also only the last guirepresentation that should be considered by the gui. The different extentions do not interfere with each other, so the SCHEMA extension could be defined at the top of the inheritance tree, but the GUI extension near the bottom.

Main views are inherited, as any object that has a content model also has every supertype of this content model. So, they will be objects of a content model that mark them as main view objects.

-  ⇤ ← Revision 3 as of 2009-03-03 14:51:33 → 
  Size: 14355
  Editor: abr
  Comment:
+   ← Revision 4 as of 2009-03-04 11:29:51 → ⇥
  Size: 9881
  Editor: abr
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 24:
-This naturally lends itself to a recursive approach. The view of A is A plus the view of any object related to A through such an annotated relation. But this leads to the problem of separation. ...
+This naturally lends itself to a recursive approach. The view of A is A plus the view of any object related to A through such an annotated relation.

But the angle one views the repository might also affect the number of entries seen. The above, recursive approach will always lead to one entry per data object. The remedy for this is to mark some classes as Entries for a certain view angle. This means that to compute the records for a given view angle, the view of all objects of a class that is an Entry should be computed. This is the view of the repository.
-Line 27:
+Line 29:
+== Fedora Implementation ==
-Line 28:
+Line 31:
-=== PIDs angle ===
+This section describes how the above could be implemented in Fedora.
-Line 30:
+Line 33:
-The easiest situation (angle) is when mapping to a list of PIDs. Each record is a list of the PIDs of the data object it consist of. This is the situation I will handle first.
+=== Entry Declaration ===
-Line 32:
+Line 35:
-Each view is centered around some data object, and contain a number of other objects. So, if A is a data object, view(A) = A.pid + other pids.
The view is identified by the object it centered around, i.e. the PID of the data object.
+It is very simple for a content model to declare itself to be an Entry for a view angle. All it has to do is have a literal relation in the RELS-EXT datastream, by the name "isEntryForViewAngle", in the view namespace (see DomsNameSpacesAndSchemas), to the literal name of the view angle.
-Line 35:
+Line 37:
-   All the other objects in the View are related to the center object by some chain of relations. Therein lies a crucial feature of this View system; '''Rather than having special relations between data objects to other objects in their view, some of the structural relations are annotated to be view relations.''' 

Or rather, we list the relations that should be followed to find the objects in the view, rather than define view-relations. Actually, we annotate both relations to and from a given object as view relations.

The view nessesary for a proper public dissemination of the objects might not be the same as what is required for a useful GUI access, through. So, it is desirably to define multiple views on the same objects. Each named view has its own set of annotated relations to follow. In no way do they interact, and we can therefore have radically different ways of viewing the same data. 

But the views of certain classes of object will tend to more useful than others. Each content model can declare itself to be a main content model for any named view. The exact semantic meaning of being a main view is defined by the systems using this view. 

Named views so far:
 * "GUI": The GUI view specify which objects should be opened as a combined whole, and which should be regarded as external to this whole. A whole opened in the GUI will always be centered around a object having a content model declaring itself to be main content model for the GUI view. 


== The main view declaration ==

It is very simple for a content model to declare itself to be a main object for a named view. All it has to do is have a literal relation in the RELS-EXT datastream, by the name "isMainForNamedView", in the view namespace (see DomsNameSpacesAndSchemas), to the literal name of the view.

Add this relation to any content models that should describe main views for the GUI view.
+Add this relation to any content models that should describe entries for the view angle named GUI.
-Line 56:
+Line 39:
-<view:isMainForNamedView xmlns:view="http://doms.statsbiblioteket.dk/types/view/0/2/#">GUI</view:isMainForNamedView>
+<view:isEntryForViewAngle xmlns:view="http://doms.statsbiblioteket.dk/types/view/0/2/#">GUI</view:isEntryForViewAngle>
-Line 60:
+Line 43:
+=== Annotated Relations ===
-Line 61:
+Line 45:
-== The VIEW datastream ==
+To annotate relations, a special datastream have been introduced, called "VIEW". This datastream should exist in the content models, and the name have been made Reserved.
-Line 63:
+Line 47:
-Now we come to another crucial feature of this view system; '''Views are defined on the content model level.''' The content model describes how the view, from this class of objects, should be generated. Everything is defined in the classes of objects, never in the actual data objects. As such, it is easy to change and add views on a class-wide basis.
+It is basically a list of view angles, and the relations that should be view relations for each. There is a little twist, though. Above, we only defined that an object should be related through some chain of relations to every object in it's view. We did not specify that the direction of these relations. So, if we have the objects A and B, and B have a relations #relatesTo to A, B could still be in the view of A. And indeed, A does not have to be in the view of B, even if B is in the view of A.
-Line 65:
+Line 49:
-To facilitate this, the "VIEW" datastream in content models have been designated as Reserved and Required. The "VIEW" datastream is, basicaly, a sequence of named views, each with their designated relations.
+To achieve this, the view datastream allows you to annotate incoming relations, as well as outgoing.
-Line 81:
+Line 65:
-            <xsd:element name="view" type="viewType" minOccurs="0" maxOccurs="unbounded"/>
+            <xsd:element name="viewangle" type="viewType" minOccurs="0" maxOccurs="unbounded"/>
-Line 110:
+Line 94:
-=== Multilevel Views ===
-Line 112:
+Line 95:
-Of course, it is very preferable to be able to have deeply nested views. Achiving this is easy. Above we defined the annotated relations as marking which objects should belong to the view. What we really meant was which object-views should be included in the view.
-Line 114:
+Line 96:
-The formal definition of the semantic meaning of the relations in the "VIEW" datastream is therefore: '''Each data object has a view, encompassing the object and the views of other directly related data objects'''. So, if the VIEW datastream in a object was
+=== Example of a VIEW datastream ===

This is an example of how the VIEW datastream could look for the view angle GUI.
-Line 117:
+Line 102:
-  <view:view name="GUI">
+  <view:viewangle name="GUI">
-Line 124:
+Line 109:
-  </view:view>
+  </view:viewangle>
-Line 127:
+Line 112:
-then the View of this object encompass the object itself, and the "GUI" View of any objects that the object has a "doms:hasFile" relation to and any object that has a "doms:isPartOfCollection" relation to this object.
+The GUI view angle of this object encompass the object itself, and the GUI viewangle of any objects that the object has a "doms:hasFile" relation to and any object that has a "doms:isPartOfCollection" relation to this object.
-Line 129:
+Line 114:
+=== Calculating the view ===
-Line 167:
+Line 155:
-== Datastream View ==

The described view system can designate exactly which objects are part of a view. But it is not always enough to know just the objects. For the GUI, it is nessesary to know exactly which datastreams should be presented, and how. For this purpose we have designed an DS-COMPOSITE extension, which follows the system laid down in FedoraTypeChecking.

[[Anchor(DSCompositeGUISchema)]]
{{{
<xsd:schema
        targetNamespace="http://doms.statsbiblioteket.dk/types/dscompositeschema/guirepresentation/0/1/#"
        xmlns="http://doms.statsbiblioteket.dk/types/dscompositeschema/guirepresentation/0/1/#"
        xmlns:xsd="http://www.w3.org/2001/XMLSchema"
        elementFormDefault="qualified"
        attributeFormDefault="unqualified">

    <xsd:element name="guirepresentation">
        <xsd:complexType>

            <xsd:attribute name="presentAs" use="required">
                <xsd:simpleType>
                    <xsd:restriction base="xsd:string">
                        <xsd:enumeration value="importable"/>
                        <xsd:enumeration value="editable"/>
                        <xsd:enumeration value="uploadable"/>
                        <xsd:enumeration value="readonly"/>
                        <xsd:enumeration value="invisible"/>
                    </xsd:restriction>
                </xsd:simpleType>
            </xsd:attribute>

        </xsd:complexType>

    </xsd:element>

</xsd:schema>
}}}

The semantic meaning of the five types are really decided by the GUI, but the approximate meaning is as follows
 * importable: The content is inline xml, and should be the result of an import function. Once written, the datastream count as "readonly"
 * editable: The contents is inline xml, and should be parsed according to their schema, and presented in the GUI.
 * uploadable: The contents is a link to a file in bitstorage. If the datastream does not exist, the GUI should present a way to upload a file. Otherwise the link to Bitstorage should appear, readonly.
 * readonly: The contents is inline xml, generated by some other means. The user should be able to read the contents in the GUI, but not change them. The GUI might hide the contents by default, but they must be accessible.
 * invisible: The GUI should totally disregard this datastream, and behave as if it is not there. This is the default, if no guirepresentation is defined for a datastream. 


So, an example of a datastream entry in DS-COMPOSITE would now be:
{{{
<dsTypeModel ID="DC">
    <form MIME="text/xml"/>
    <extensions name="SCHEMA">
        <schema:schema type="xsd" datastream="DC_SCHEMA" object="doms:DublinCore_Schema"/>
    </extensions>
    <extensions name="GUI">
        <gui:guirepresentation presentAs="editable"/>
    </extensions>
</dsTypeModel>
}}}

== Content Model Inheritance and Views ==
+=== Content Model Inheritance and Views ===