Differences between revisions 2 and 3
Revision 2 as of 2011-01-07 12:37:32
Size: 5380
Editor: abr
Comment:
Revision 3 as of 2011-01-18 14:47:12
Size: 29349
Editor: abr
Comment:
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
This page will be in danish, as the original material was in danish. Will be translated later.


== Pakken ==

Det her er første nyere forsøg på at overdrage doms systemet til drift.
Det vi ønsker at overdrage ligger (i flere udgaver) på
fedora@alhena:/fedora/TilTom

I den mappe ligger der
 1. En doms installer, i zip form
 2. En doms installer, unzipped
 3. En "package"
 4. Et installeret system "installed"


De to sidste ting er nok de mest underlige, så jeg vil lige forklare
noget mere om dem.

Doms testbedden har to vigtige setup filer.

 1. package.sh
 2. install.sh

package.sh gør alt det det der kan gøres, uden at installere en tomcat,
eller starte noget. Install.sh installerer så en tomcat, starter den og
ingester de basis objekter der skal være der.

Brugen er således
./package.sh /path/to/where/the/doms/should/be

Ud over temp filer, som forhåbentligt bliver ryddet op igen, skriver de
IKKE TIL NOGEN ADRESSER UDENFOR DEN ANGIVNE. Dvs. de bibeholder systemet
"rent".

install.sh er meget simpel, den installerer en tomcat, kalder package
til at gøre alt arbejdet, og afslutter med at ingeste basis objecterne i
den nye doms.

package.sh bliver styret fra config/conf.sh, hvor man angiver værdien af
de variable der skal sættes overalt i config filerne. Hvis det virkeligt
er vigtigt kan jeg godt spore til jer, hvordan alle informationerne i
conf.sh bliver distribueret.


Nu til de indeholdte ting, og hvad vi gør og forventer af dem.


== TOMCAT ==

Doms bruger en standard tomcat 6.0.20. Hvis du bruger en anden version,
burde de ikke gøre så meget, men det skal helst være en 6'er.

Vi overskriver et par konfig filer fra tomcat, og tilføjer nogen andre.

Det her er dem vi overskriver
 1. server.xml (vores version vedhæftet).
 2. tomcat-users.xml (ligegyldig, vi har bare haft brug for at være manager i test situationer)
 3. setenv.sh (vedhæftet. Sætter en ENV_VAR til fedora, og lidt om memory krav)

Så tilføjer vi nogen flere, fordi vi bruger tomcat/conf/ til config
filer for de deployede webservices.

De primære her er

 1. context.xml (vedhæftet)
 2. log4j.*.xml (log config filer til hver webapp)

Det er context.xml der angiver hvilken af de mange log4j config filer
hver webapp skal bruge.

I kan se i "packaged" mappen at der ligger et tomcat dir. Det indeholder
præcis de ting vi lægger ind i tomcat, men er ellers tomt. Brug det til
at se hvad vi retter ved en tomcat.

== ALMINDELIGE WEBAPPS ==

Der er et antal almindelige webapps, som domsen benytter sig af. Disse
har ikke specielle krav. De lever i tomcatten og bliver styret vha.
context.xml

Listen er, så vidt jeg husker

 1. ecm
 2. surveillance
 3. pidGenerator
 4. authChecker
 5. characteriser
 6. lowlevel
 7. highlevel
 8. central
 9. updatetracker
 10. IpRoleMapper

IPRoleMapper er med, men den forsvinder så snart der er en version i
rigtig drift som domsen kan trække på.

== FEDORA ==

Fedora er sjov, eller noget. Den består af to ting.

 1. En webapp
 2. En datafolder

Udover det forventer fedora også at have en database tilgængelig.

Efter installeren har kørt, har vi pakket en fedora.war til jer, som er
sat korrekt op, og som burde kunne redeployes frit. Den ligger i
fedora/install/fedora.war. Den er også kopieret over i tomcattens
webapps. fedora-original.war er fedora.war uden de ændringer vi har
lavet.

Her er de relevante mapper, i forhold til fedora
 * fedora/server/logs (fedoras logfiler)
 * fedora/server/config (fedoras config filer)
 * fedora/data (her gemmes alt data og caches som fedora selv styrer)

Den primære configfil til fedora er fedora/server/config/fedora.fcfg Jeg
har vedhæftet en sample, hvor I kan se hvor man skal rette i ting. Det
installerede system har også en færdig config fil. Det er bl.a. i denne
config fil at man angiver database configurationen.



== DATABASEN ==

Fedora baserer sig bla. på en database. Dette kan for DOMS tilfælde være enten en indbygget derby database, eller en postgresql database. Hvilken type der bruges styres i conf.sh. Hvis fedora er nyinstalleret fylder den databasen ved første opstart. Det er derfor vigtigt at databasen er tom, inden fedora starter op første gang. Man tømmer og opretter databasen med denne kommando.
{{{
fedora$ dropdb -h localhost -U doms-test78 doms-test78
fedora$ createdb -h localhost -U doms-test78 -O doms-test78 doms-test78
}}}

== BASE OBJECTS ==

Med de her ting, så er systemet faktisk sat op. Nu mangler vi bare at
tænde for tomcatten, og tilføje basis objekterne. Dette gøres med denne
stump script, som også er det sidste i install.sh

{{{
== The System ==

This is the DOMS System, Digital Object Management System System. It is a complete
repository system, based around a Fedora instance.

These are the components, that make up the system

 * Central WebService AKA DomsServer - The interface to the system
 * ECM - Enhanced Content Models, added functionality for Fedora
 * Highlevel Bitstorage - A highlevel way to work with files in Doms
 * Lowlevel bistorage - The lowlevel, near the storage system way to work with files in doms
 * Characteriser - Characterises datafiles that are stored
 * Updatetracker - Keeps track of which records in doms have changed
 * Pidgenerator - Generates pids for new objects
 * Surveillance - Monitors that everything is working and not reporting errors

A full guide to the workings of DOMS is beyound this document.

== The release ==

Due to the nature of Fedora, it is impossible to just generate an extractable
zip. Rather, Fedora is based around an installer, and thus is DOMS required to
be the same way. The doms release is this installer.

The installer is invoked by the script install.sh in bin/. In bin, there are
 the following scripts
 1. install_basic_tomcat.sh
 2. package.sh
 3. ingest_base_objects.sh
 4. install.sh
 5. setenv.sh

setenv.sh is the script that controls everything. It sets all the variables
that determine the location of various folders, and port numbers.

install.sh is a very simple script, in that it just calls install_basic_tomcat.sh,
package.sh and ingest_base_objects.sh in sequence. It is used thus
{{{
./install.sh PATH_TO_INSTALLDIR
}}}
Where PATH_TO_INSTALLDIR is the location to install the doms system in. The DOMS
system will never modify anything outside this path (unless you change the
default values in setenv.sh)

ingest_base_objects.sh assumes that there is a working doms installation, and
ingests the basic objects that is nessesary for real data to be ingested. It reads
it configuration from setenv.sh

package.sh is the real workhorse of the install process. It copies and creates
the entire installation, symlinks and replace stuff in config files, so the system
works. It does not install a tomcat, but it does write the nessessary config files,
so that if one is already installed, it will be configured.

install_basic_tomcat.sh just extracts the included tomcat server.



== setenv.sh - configuring the install ==

setenv.sh looks like this default
{{{
#!/bin/bash

TOMCATZIP=`basename $BASEDIR/data/tomcat/*.zip`
FEDORAJAR=`basename $BASEDIR/data/fedora/*.jar`
Line 143: Line 70:
# Start the tomcat server # Check for install-folder and potentially create it.
Line 145: Line 72:
echo ""
echo "Starting the tomcat server"
$TESTBED_DIR/tomcat/bin/startup.sh > /dev/null
echo "Sleep 30"
sleep 30

#
# Ingest initial objects
#
echo "Ingesting base objects"
export FEDORA_HOME=$FEDORA_DIR
sh $FEDORA_DIR/client/bin/fedora-ingest.sh dir \
$BASEDIR/data/objects \
'info:fedora/fedora-system:FOXML-1.1' \
localhost:${PORTRANGE}80 $FEDORAADMIN $FEDORAADMINPASS http
}}}

Den skulle gerne skrive noget om at 20 objecter er blevet ingested
successfully. Hvis det virker, plejer doms systemet at virke.

Så er doms systemet installeret.
TESTBED_DIR=$@
if [ -z "$TESTBED_DIR" ]; then
    echo "install-dir not specified. Bailing out." 1>&2
    usage
fi
if [ -d $TESTBED_DIR ]; then
    echo ""
else
    mkdir -p $TESTBED_DIR
fi
pushd $@ > /dev/null
TESTBED_DIR=$(pwd)
popd > /dev/null

# The normal config values
PORTRANGE=78
TOMCAT_SERVERNAME=localhost

FEDORAADMIN=fedoraAdmin
FEDORAADMINPASS=fedoraAdminPass

FEDORAUSER=fedoraReadOnlyAdmin
FEDORAUSERPASS=fedoraReadOnlyPass

# The folders
LOG_DIR=$TESTBED_DIR/logs

TOMCAT_DIR=$TESTBED_DIR/tomcat

FEDORA_DIR=$TESTBED_DIR/services/fedora

DATA_DIR=$TESTBED_DIR/data

CACHE_DIR=$TESTBED_DIR/cache

TOMCAT_CONFIG_DIR=$TESTBED_DIR/services/conf

WEBAPPS_DIR=$TESTBED_DIR/services/webapps

#Database
USE_POSTGRESQL=true
POSTGRESQL_DB=doms-test$PORTRANGE
POSTGRESQL_USER=doms-test$PORTRANGE
POSTGRESQL_PASS=doms-test$PORTRANGE

#Bitstorage
BITFINDER=http://bitfinder.statsbiblioteket.dk/
BITSTORAGE_SCRIPT="ssh doms@stage01 bin/server.sh"

}}}

$BASEDIR is the root of the installer, ie. bin/.. It is set by the scripts
using setenv.sh and should not be overridden.

The big blob about TESTBED_DIR reads the first parameter from the command
line and store it as the testbed dir. This is root of where everything will
be installed.

Then comes the portrange. All the doms components are configured to use a port
inside this range, so the tomcat server would run on 7880, when the PORTRANGE is
set to 80. Use this to configere which set of 100 ports doms should have.

The tomcat servername can safely be kept as localhost. It is not used for much,
but when fedora gives back an url, it will use the servername as prefix. At the
moment fedora is completely shielded behind the services, so this will not be used.

We have defined to standard users for fedora. Both are allowed to view all the
repository contains, but only one of them is allowed to change data. I am not
entirely sure if the system would work if you change the FEDORAADMIN username to
something else, but the other 3 parameters can be changed freely.

The the big set of important directories for doms. Notice that all of them
are relative to the TESTBED_DIR. This is what prevents DOMS from modifying anything
outside the TESTBED_DIR.

LOG_DIR is where the webservices and fedora logs. It does not control where
tomcat logs.
TOMCAT_DIR is where tomcat should be installed (install_basic_tomcat.sh) or
where we expect a tomcat to be (package.sh). Package.sh will create this folder
if it does not exist, and populate it with only the tomcat config files we have
changed.
FEDORA_DIR is where fedora is to be installed. This is just the basic fedora
instance, with config and client, not any of the dynamic content souch as data.
Repository policies are stored in the FEDORA_DIR
DATA_DIR is where fedora should store it's objects and datastreams.
CACHE_DIR is where the activeMQ and mulgara triple store should store their files.
This is not a cache in the sense that it can just be deleted, but it is a cache
in the sence that it can be regenerated from the contents of the DATA_DIR
TOMCAT_CONFIG_DIR is where the configuration files for services running in tomcat
is placed. This is mostly used to place the log4j config files.
WEBAPPS_DIR is where the webapps should go. Tomcat is configured to load the
webapps from this location
USE_POSTGRESQL is binary, ie true or false. If true, it will attempt to connect
to a postgresql database running on localhost with the credentials below. If false
fedora will use a java derby database. The derby database will be stored in CACHE_DIR
BITFINDER is the url that should be prepended to the filenames when lowlevel
bitstorage stores a file and returns an url
BITSTORAGE_SCRIPT is the script that should be invoked to interact with the
lowlevel bitstorage backend

These are, for now, the config parameters that control the doms, and which can
be set before the install.


== An installed system ==

We now assume that you have installed the DOMS system, by using install.sh
or package.sh. This section is a guide to what is controlled where, if you desire
to make changes. I will write this guide based on the above defaults from
setenv.sh. If you changed anything, it is fairly simple to figure out where
the files will be instead.

The directory structure, following an install will be thus. I have left out folders
that does not contain anything to change, as they are not relevant to the current discussion

 * cache/
 * data/
 * services/
  * conf/
     log4j.*.xml
     context.xml.default (symlinked to tomcat/conf/Catalina/localhost/)
     setenv.sh (symlinked to tomcat/bin)
  * fedora/
   * server
    * config/
      akubra-llstore.xml
      beSecurity.xml
      fedora.fcfg
      fedora-users.xml
      jaas.conf
      logback.xml
  * webapps/
 * tomcat/
  * conf/
    server.xml
    web.xml
    context.xml

=== log4j.*.xml ===
These are the respective log4j config files for each of the deployed webservices
As can be seen, they have a specific logappender, developed in dk.statsbiblioteket
added. This logappender collects the "bad" messages, and is part of the
surveillance system. Other than that, is is a standard config file, that logs to
 the LOG_DIR

=== context.xml.default ===
This is the big configuration file for all the doms services. The doms services
are all built so that they take all their configuration from context params.
The way this file is linked into tomcat enables the values to be overriden
from the file tomcat/conf/context.xml

The values in context.xml will default be something like this

==== fedora ====
{{{
    <!--fedora-->
    <Parameter
     name="fedora.home"
     value="services/fedora"
     override="false"/>
}}}
This setting controls where the fedora instance is installed


==== highlevelbitstorage ====
{{{
    <!--highlevelbitstorage-->
    <Parameter
            name="dk.statsbiblioteket.doms.bitstorage.highlevel.characteriserlocation"
            value="http://localhost:7880/characteriser/characterise/?wsdl"
            override="false"/>
}}}
This value controls where the highlevel bitstorage service expects to be able to
contact the characteriser service

{{{
    <Parameter
            name="dk.statsbiblioteket.doms.bitstorage.highlevel.lowlevellocation"
            value="http://localhost:7880/lowlevelbitstorage/lowlevel/?wsdl"
            override="false"/>
}}}
This value controls where the highlevel bitstorage service expects to be able to
contact the lowlevel bitstorage service

{{{
    <Parameter
            name="dk.statsbiblioteket.doms.bitstorage.highlevel.fedora.server"
            value="http://localhost:7880/fedora"
            override="false"/>
}}}
This value controls where the highlevel bitstorage service expects to be able to
contact the fedora webservice

{{{
    <Parameter
            name="dk.statsbiblioteket.doms.bitstorage.highlevel.fedora.characstream"
            value="CHARACTERISATION" override="false"/>
}}}
This is the datastream in a fedora object to use for characterisation information

{{{
    <Parameter
            name="dk.statsbiblioteket.doms.bitstorage.highlevel.fedora.contentstream"
            value="CONTENTS" override="false"/>
}}}
This is the datastream in a fedora object to use for binary content

{{{
    <Parameter name="dk.statsbiblioteket.doms.bitstorage.highlevel.log4jconfig"
               value="${user.home}/services/conf/log4j.highlevelbitstorage.xml"
               override="false"/>
}}}
This is where to find the log4j config file for this service.

==== lowlevelbitstorage ====

{{{
    <!--lowlevelbitstorage-->
    <Parameter
            name="dk.statsbiblioteket.doms.bitstorage.lowlevel.scriptimpl.script"
            value="ssh doms@stage01 bin/server.sh" override="false"/>
}}}
This is set to the value of BITSTORAGE_SCRIPT by the install process. It is the
way to contact the bitstorage server

{{{
    <Parameter name="dk.statsbiblioteket.doms.bitstorage.lowlevel.bitfinder"
               value="http://bitfinder.statsbiblioteket.dk/" override="false"/>
}}}
This is set to the value of BITFINDER by the install process. The prefix to
add to filenames, when turning them into permament urls.

{{{
    <Parameter
            name="dk.statsbiblioteket.doms.bitstorage.lowlevel.preferredBytesLeft"
            value="1000000" override="false"/>
}}}
If the backend bitstorage have less than this number of bytes left, issue a
warning via the surveillance system

{{{
    <Parameter
            name="dk.statsbiblioteket.doms.bitstorage.lowlevel.requiredBytesLeft"
            value="100000" override="false"/>
}}}
If the backend bitstorage system have less than this number of bytes left, issue
a error via the surveillance system.

{{{
    <Parameter name="dk.statsbiblioteket.doms.bitstorage.lowlevel.log4jconfig"
               value="${user.home}/services/conf/log4j.lowlevelbitstorage.xml"
               override="false"/>
}}}
log4j config for this service

==== characteriser ====

{{{
    <!--characteriser-->
    <Parameter
            name="dk.statsbiblioteket.doms.bitstorage.characteriser.log4jconfig"
            value="${user.home}/services/conf/log4j.characteriser.xml"
            override="false"/>
}}}
log4j config for this service

==== ecm ====
{{{
    <!--ecm-->
    <Parameter name="dk.statsbiblioteket.doms.ecm.fedora.connector"
               value="dk.statsbiblioteket.doms.ecm.repository.fedoraclient.FedoraClientConnector"
               override="false"/>
}}}
Which implementation of a fedora connector to use. Do not change

{{{
    <Parameter name="dk.statsbiblioteket.doms.ecm.fedora.location"
               value="http://localhost:7880/fedora"
               override="false"/>
}}}
Location of the fedora, in regards to the ECM service.

{{{
    <Parameter name="dk.statsbiblioteket.doms.ecm.pidGenerator.client"
               value="dk.statsbiblioteket.doms.ecm.repository.PidGeneratorImpl"
               override="false"/>
}}}
Which implementation of the pidgenerator client to use. Do not change

{{{
    <Parameter
            name="dk.statsbiblioteket.doms.ecm.pidgenerator.client.wsdllocation"
            value="http://localhost:7880/pidgenerator/pidGenerator/?wsdl"
            override="false"/>
}}}
Location of the pidGenerator service

{{{
    <Parameter name="dk.statsbiblioteket.doms.ecm.log4jconfig"
               value="${user.home}/services/conf/log4j.ecm.xml"
               override="false"/>
}}}
log4j config for the ecm service


==== centralDomsWebservice ====
{{{
    <!--centralDomsWebservice-->
    <Parameter name="dk.statsbiblioteket.doms.central.fedoraLocation"
               value="http://localhost:7880/fedora"
               override="false"/>
}}}
Where the fedora webservice resides

{{{
    <Parameter name="dk.statsbiblioteket.doms.central.ecmLocation"
               value="http://localhost:7880/ecm"
               override="false"/>
}}}
Where the ecm webservice resides

{{{
    <Parameter name="dk.statsbiblioteket.doms.central.bitstorageWSDL"
               value="http://localhost:7880/highlevelbitstorage/highlevel/?wsdl"
               override="false"/>
}}}
Whiere the highlevel bitstorage webservice resides

{{{
    <Parameter name="dk.statsbiblioteket.doms.central.updateTrackerLocation"
               value="http://localhost:7880/updatetrackerWebservice/updatetracker/?wsdl"
               override="false"/>
}}}
Where the update tracker webservice resides

{{{
    <Parameter name="dk.statsbiblioteket.doms.central.log4jconfig"
               value="${user.home}/services/conf/log4j.centralDomsWebservice.xml"
               override="false"/>
}}}
log4j config for the centralDomsWebservice service

==== authchecker ====
{{{
    <!--authchecker-->
    <Parameter name="dk.statsbiblioteket.doms.authchecker.tickets.timeToLive"
               value="1200000" override="false"/>
}}}
The authchecker is also the ticketissuer (relevant for summa). This param controls
how long issued tickets should live, in ms.

{{{
    <Parameter name="dk.statsbiblioteket.doms.authchecker.users.timeToLive"
               value="1200000" override="false"/>
}}}
When summa tells doms that a given user is allowed to view something, a
temp user account is created. This is the time before this user account is
removed again

{{{
    <Parameter name="dk.statsbiblioteket.doms.authchecker.fedoralocation"
               value="http://localhost:7880/fedora"
               override="false"/>
}}}
This is the location of Fedora, for the authchecker webservice

{{{
    <Parameter name="dk.statsbiblioteket.doms.authchecker.log4jconfig"
               value="${user.home}/services/conf/log4j.authchecker.xml"
               override="false"/>
}}}
This is the log4j config for the authchecker webservice


==== updatetrackerWebservice ====
{{{
    <!--updatetrackerWebservice-->
    <Parameter name="dk.statsbiblioteket.doms.updatetracker.fedoralocation"
               value="http://localhost:7880/fedora"
               override="false"/>
}}}
This is the fedora location for the update tracker webservice

{{{
    <Parameter name="dk.statsbiblioteket.doms.updatetracker.log4jconfig"
               value="${user.home}/services/conf/log4j.updatetrackerWebservice.xml"
               override="false"/>
}}}
This is the log4j config for the updatetracker webservice.


==== pidgenerator ====
{{{
    <!--pidgenerator-->
    <Parameter name="dk.statsbiblioteket.doms.pidgenerator.log4jconfig"
               value="${user.home}/services/conf/log4j.pidgenerator.xml"
               override="false"/>
}}}
This is the log4j config for the pidgenerator webservice.

==== surveillance-fedorasurveyor ====

{{{
    <Parameter
            name="dk.statsbiblioteket.doms.surveillance.fedorasurveyor.fedoraUser"
            value="fedoraReadOnlyAdmin" override="false"/>
}}}
This is the user account the surveillance system should use, when testing fedora.
This should of course correspond to a user account from fedora-users.xml

{{{
    <!--surveillance-fedorasurveyor-->
    <Parameter
            name="dk.statsbiblioteket.doms.surveillance.fedorasurveyor.fedoraPassword"
            value="2ZeMA1bN" override="false"/>
}}}
The password that goes with the user account above

{{{
    <Parameter
            name="dk.statsbiblioteket.doms.surveillance.fedorasurveyor.fedoraUrl"
            value="http://localhost:7880/fedora"
            override="false"/>
}}}
Yet another specification of where to find fedora

{{{
    <Parameter
            name="dk.statsbiblioteket.doms.surveillance.fedorasurveyor.log4jconfig"
            value="${user.home}/services/conf/log4j.surveillance-fedorasurveyor.xml"
            override="false"/>
}}}
log4j config for this webservice

==== surveillance-surveyor ====

{{{
    <!--surveillance-surveyor-->
    <Parameter
            name="dk.statsbiblioteket.doms.surveillance.surveyor.ignoredMessagesFile"
            value="ignored.txt" override="false"/>
}}}
File to use (path relative to WHAT??) for messages to ignore from the surveillance system.
Example, every time fedora cannot find a datastream it logs this as an ERROR, despite
it being a common occurrence in normal workflows.

{{{
    <Parameter name="dk.statsbiblioteket.doms.surveillance.surveyor.restUrls"
               value="
               http://localhost:7880/surveillance-surveyor/surveyable/getStatusSince/{date};
               http://localhost:7880/surveillance-fedorasurveyor/surveyable/getStatusSince/{date};
               http://localhost:7880/authchecker/surveyable/getStatusSince/{date};
               http://localhost:7880/ecm/surveyable/getStatusSince/{date};
               http://localhost:7880/fedora/surveyable/getStatusSince/{date}"
               override="false"/>
}}}
All the doms rest webservices have a little extra servlet, that allows the surveillance
system to poll them. This is the list of such servlets that can be contacted by rest.

{{{
    <Parameter name="dk.statsbiblioteket.doms.surveillance.surveyor.soapUrls"
               value="
               http://localhost:7880/lowlevelbitstorage/surveyable/?wsdl;
               http://localhost:7880/highlevelbitstorage/surveyable/?wsdl;
               http://localhost:7880/characteriser/surveyable/?wsdl;
               http://localhost:7880/pidgenerator/surveyable/?wsdl;
               http://localhost:7880/updatetrackerWebservice/surveyable/?wsdl;
               http://localhost:7880/centralDomsWebservice/surveyable/?wsdl"
               override="false"/>
}}}
Same as above, but for the soap webservices.

{{{
    <Parameter name="dk.statsbiblioteket.doms.surveillance.surveyor.log4jconfig"
               value="${user.home}/services/conf/log4j.surveillance-surveyor.xml"
               override="false"/>
}}}
log4j config for this webservice




=== setenv.sh (in tomcat, not the one in the installer) ===

The setenv.sh is used to set a few variables, that for some reason needs to
be overridden thus, rather from context.xml.

{{{
export CATALINA_OPTS="-Dorg.apache.activemq.default.directory.prefix="$HOME/cache/" -XX:+HeapDumpOnOutOfMemoryError"
export FEDORA_HOME=$HOME/services/fedora/
}}}
There are only two important variables here.
org.apache.activemq.default.directory.prefix controls where activeMQ stores it's
temp files. FEDORA_HOME needs to be set, I believe, but the context.xml fedora.home
param might be sufficient. In older days, the FEDORA_HOME had to be set.


=== akubra-llstore.xml ===

This file controls where the fedora objects and datastreams are stored. For
now, we do not store datastreams managed, so only objects are relevant. Still,
this is the file to change, if the data dir should be moved

=== beSecurity.xml ===

When one fedora invocation make fedora call itself, it needs to do so with
a certain set of credentials. Due to the nature of net communication, it will
not be possible to reuse the credentails of the original call to fedora. Instead
these backend calls use the credentials defined in this file.
Note, this file does not define user accounts, so the credentails specified here
must correspond to an accound in fedora-users.xml

=== fedora-users.xml ===

This is the simple file that handles fedora users. It should only be used for
backend user accounts. Users that just want to access material will be given
temporary user accounts that time out. Users of the future GUI, will use their
account from the LDAP. So, this is for those users (ie. ingester, summa) which
fall in neither category.

=== jaas.conf ===

java as a service. This is the config file for some part of fedoras authentification
framework. Only the top section matters. What we have done is thrown in the
authchecker webservice in the process. When summa needs to get credentials for a
temporary user, it requests these from the authchecker. The authchecker then
creates a new user account in memory. When the user tries to use the given credentials,
he is subjected to the fedora authentification system, where we have injected
the authchecker so it can respond that it knows the user, and let him in.

When LDAP should be enabled with the doms, this would also be the file to edit
for fedora to forward authentication to the LDAP server.

=== logback.xml ===

Fedora controls logging with the logback system. This is the config file.
Like the log4j config files for the webservices, we have defined a special
appender, that collects the logmessages for the surveillance system.


=== fedora.fcfg ===

Last, and most certainly longest. This is the config file that controls fedora.

The top section section, before the first <module> tag controls params that are
global for all of fedora. Most relevant of these are the portnumbers and the
tomcat address. Fedora needs to know where it lives.

Those of interest to us (ie, those where you can change something meaningful are)
{{{
     <module role="org.fcrepo.server.security.Authorization" class="org.fcrepo.server.security.DefaultAuthorization">
        ....
        <param name="REPOSITORY-POLICIES-DIRECTORY"
               value="$FEDORA_DIR$/fedora-xacml-policies/repository-policies"
               isFilePath="true"/>
        ...
    </module>
}}}


{{{
    <module role="org.fcrepo.server.storage.DOManager" class="org.fcrepo.server.storage.DefaultDOManager">
        ....
        <param name="storagePool" value="localPostgreSQLPool">
            <comment>The named connection pool from which read/write database
                connections are to be provided for the storage subsystem (see the
                ConnectionPoolManager module). Default is the default provided by the
                ConnectionPoolManager.</comment>
        </param>
        ....
    </module>
}}}
Choose between the two database systems defined below. The valid values are
 localPostgreSQLPool and localDerbyPool.

{{{
    <module role="org.fcrepo.server.management.Management" class="org.fcrepo.server.management.ManagementModule">
        ....
        <param name="decorator2"
               value="dk.statsbiblioteket.doms.bitstorage.highlevel.HookApprove">
            <comment>This is the hook that ensures that when a file object is
                marked active, the corresponding file is approved in bitstorage
            </comment>
        </param>
        <param name="dk.statsbiblioteket.doms.bitstorage.highlevel.hookapprove.filecmodel"
               value="doms:ContentModel_File">
            <comment>This is the content model an object must have to be considered
                a file object
            </comment>
        </param>
        <param name="dk.statsbiblioteket.doms.bitstorage.highlevel.hookapprove.webservicelocation"
               value="http://localhost:7880/highlevelbitstorage/highlevel/?wsdl">
            <comment>This specifies the location of the highlevel webservice
            </comment>
        </param>
        <param name="dk.statsbiblioteket.doms.bitstorage.highlevel.hookapprove.username"
               value="fedoraReadOnlyAdmin">
            <comment>This is the username used for publishing files
            </comment>
        </param>
        <param name="dk.statsbiblioteket.doms.bitstorage.highlevel.hookapprove.password"
               value="fedoraReadOnlyPass">
            <comment>This is the password used for publishing files
            </comment>
        </param>

        ...
        <param name="decorator3" value="dk.statsbiblioteket.doms.ecm.fedoravalidatorhook.FedoraModifyObjectHook"/>
    </module>
}}}

This defines the two hooks we use inside fedora. The first (decorator2) is the bitstorage
approve hook. When a file object is approved/published/set active (many names,
same thing), the corresponding datafile should be moved from temporary
bitstorage to permanent bitstorage. To do this, it need the location of the
highlevelbitstorage webservice, and some credentials to invoke it with. It does
not need write access to fedora for anything, but it will require read rights
of whatever object is set published. The user account is, of course, defined
in fedora-users.xml

Decorator3 is the ecm hook. When an object is published, it should be validated.
if it fails, it should not be published.



{{{
    <module role="org.fcrepo.server.messaging.Messaging" class="org.fcrepo.server.messaging.MessagingModule">
        ....
        <param name="java.naming.provider.url" value="vm:(broker:(tcp://localhost:$PORTRANGE$16)?useJmx=false)"/>
        ....
    </module>
}}}
This controls the portrange for the activeMQ system, so that it stays inside the
prescribed range. Jmx is disabled, as it cannot be pushed inside the
prescribed range, and is not currently needed

{{{
    <module role="org.fcrepo.server.storage.ConnectionPoolManager" class="org.fcrepo.server.storage.ConnectionPoolManagerImpl">
        <comment>This module facilitates obtaining ConnectionPools</comment>
        <param name="defaultPoolName" value="localPostgreSQLPool"/>
        <param name="poolNames" value="localPostgreSQLPool"/>
    </module>
}}}

This controls which database system to use for stuff. Legal values are
localPostgreSQLPool and localDerbyPool.

{{{
    <datastore id="localDerbyPool">
        ....
        <param name="dbUsername" value="fedoraAdmin">
            <comment>The database user name.</comment>
        </param>
        <param name="dbPassword" value="fedoraAdmin">
            <comment>The database password.</comment>
        </param>
        <param name="jdbcURL" value="jdbc:derby:$CACHE_DIR$/derby/fedora3;create=true">
            <comment>The JDBC connection URL.</comment>
        </param>
        ....
    </datastore>
}}}

This is the config for the derby database system. If you do not use the derby
system, ignore this section. Note, I have had problems with changing the
username and password.



{{{
    <datastore id="localPostgreSQLPool">
        ....
        <param name="dbUsername" value="$POSTGRESQL_USER$">
            <comment>The database user name.</comment>
        </param>
        <param name="dbPassword" value="$POSTGRESQL_PASS$">
            <comment>The database password.</comment>
        </param>
        <param name="jdbcURL" value="jdbc:postgresql://localhost/$POSTGRESQL_DB$">
            <comment>The JDBC connection URL.</comment>
        </param>
        ....
    </datastore>
}}}

This is the config for the postgresql database. Fedora needs this database to
archive stuff along the way, so it must be there (or derby must). The referenced
database (in postgresql) must exist, but fedora will populate it with the right tables upon
startup.


=== server.xml ===

This is the tomcat config file. For our purposes, we use it to define the
ports tomcat will use, and where it will look for war files.


=== web.xml ===

totally default except this bit
{{{
    <session-config>
        <session-timeout>1</session-timeout>
    </session-config>
}}}
We do not use sessions, and with a timeout of 30 mins, a great big pile of
sessions will assemble, to grab all the memory.

Install Guide

The System

This is the DOMS System, Digital Object Management System System. It is a complete repository system, based around a Fedora instance.

These are the components, that make up the system

  • Central WebService AKA DomsServer - The interface to the system

  • ECM - Enhanced Content Models, added functionality for Fedora
  • Highlevel Bitstorage - A highlevel way to work with files in Doms
  • Lowlevel bistorage - The lowlevel, near the storage system way to work with files in doms
  • Characteriser - Characterises datafiles that are stored
  • Updatetracker - Keeps track of which records in doms have changed
  • Pidgenerator - Generates pids for new objects
  • Surveillance - Monitors that everything is working and not reporting errors

A full guide to the workings of DOMS is beyound this document.

The release

Due to the nature of Fedora, it is impossible to just generate an extractable zip. Rather, Fedora is based around an installer, and thus is DOMS required to be the same way. The doms release is this installer.

The installer is invoked by the script install.sh in bin/. In bin, there are

  • the following scripts
  • install_basic_tomcat.sh
  • package.sh
  • ingest_base_objects.sh
  • install.sh
  • setenv.sh

setenv.sh is the script that controls everything. It sets all the variables that determine the location of various folders, and port numbers.

install.sh is a very simple script, in that it just calls install_basic_tomcat.sh, package.sh and ingest_base_objects.sh in sequence. It is used thus

./install.sh PATH_TO_INSTALLDIR

Where PATH_TO_INSTALLDIR is the location to install the doms system in. The DOMS system will never modify anything outside this path (unless you change the default values in setenv.sh)

ingest_base_objects.sh assumes that there is a working doms installation, and ingests the basic objects that is nessesary for real data to be ingested. It reads it configuration from setenv.sh

package.sh is the real workhorse of the install process. It copies and creates the entire installation, symlinks and replace stuff in config files, so the system works. It does not install a tomcat, but it does write the nessessary config files, so that if one is already installed, it will be configured.

install_basic_tomcat.sh just extracts the included tomcat server.

setenv.sh - configuring the install

setenv.sh looks like this default

TOMCATZIP=`basename $BASEDIR/data/tomcat/*.zip`
FEDORAJAR=`basename $BASEDIR/data/fedora/*.jar`

#
# Check for install-folder and potentially create it.
#
TESTBED_DIR=$@
if [ -z "$TESTBED_DIR" ]; then
    echo "install-dir not specified. Bailing out." 1>&2
    usage
fi
if [ -d $TESTBED_DIR ]; then
    echo ""
else
    mkdir -p $TESTBED_DIR
fi
pushd $@ > /dev/null
TESTBED_DIR=$(pwd)
popd > /dev/null

# The normal config values
PORTRANGE=78
TOMCAT_SERVERNAME=localhost

FEDORAADMIN=fedoraAdmin
FEDORAADMINPASS=fedoraAdminPass

FEDORAUSER=fedoraReadOnlyAdmin
FEDORAUSERPASS=fedoraReadOnlyPass

# The folders
LOG_DIR=$TESTBED_DIR/logs

TOMCAT_DIR=$TESTBED_DIR/tomcat

FEDORA_DIR=$TESTBED_DIR/services/fedora

DATA_DIR=$TESTBED_DIR/data

CACHE_DIR=$TESTBED_DIR/cache

TOMCAT_CONFIG_DIR=$TESTBED_DIR/services/conf

WEBAPPS_DIR=$TESTBED_DIR/services/webapps

#Database
USE_POSTGRESQL=true
POSTGRESQL_DB=doms-test$PORTRANGE
POSTGRESQL_USER=doms-test$PORTRANGE
POSTGRESQL_PASS=doms-test$PORTRANGE

#Bitstorage
BITFINDER=http://bitfinder.statsbiblioteket.dk/
BITSTORAGE_SCRIPT="ssh doms@stage01 bin/server.sh"

$BASEDIR is the root of the installer, ie. bin/.. It is set by the scripts using setenv.sh and should not be overridden.

The big blob about TESTBED_DIR reads the first parameter from the command line and store it as the testbed dir. This is root of where everything will be installed.

Then comes the portrange. All the doms components are configured to use a port inside this range, so the tomcat server would run on 7880, when the PORTRANGE is set to 80. Use this to configere which set of 100 ports doms should have.

The tomcat servername can safely be kept as localhost. It is not used for much, but when fedora gives back an url, it will use the servername as prefix. At the moment fedora is completely shielded behind the services, so this will not be used.

We have defined to standard users for fedora. Both are allowed to view all the repository contains, but only one of them is allowed to change data. I am not entirely sure if the system would work if you change the FEDORAADMIN username to something else, but the other 3 parameters can be changed freely.

The the big set of important directories for doms. Notice that all of them are relative to the TESTBED_DIR. This is what prevents DOMS from modifying anything outside the TESTBED_DIR.

LOG_DIR is where the webservices and fedora logs. It does not control where tomcat logs. TOMCAT_DIR is where tomcat should be installed (install_basic_tomcat.sh) or where we expect a tomcat to be (package.sh). Package.sh will create this folder if it does not exist, and populate it with only the tomcat config files we have changed. FEDORA_DIR is where fedora is to be installed. This is just the basic fedora instance, with config and client, not any of the dynamic content souch as data. Repository policies are stored in the FEDORA_DIR DATA_DIR is where fedora should store it's objects and datastreams. CACHE_DIR is where the activeMQ and mulgara triple store should store their files. This is not a cache in the sense that it can just be deleted, but it is a cache in the sence that it can be regenerated from the contents of the DATA_DIR TOMCAT_CONFIG_DIR is where the configuration files for services running in tomcat is placed. This is mostly used to place the log4j config files. WEBAPPS_DIR is where the webapps should go. Tomcat is configured to load the webapps from this location USE_POSTGRESQL is binary, ie true or false. If true, it will attempt to connect to a postgresql database running on localhost with the credentials below. If false fedora will use a java derby database. The derby database will be stored in CACHE_DIR BITFINDER is the url that should be prepended to the filenames when lowlevel bitstorage stores a file and returns an url BITSTORAGE_SCRIPT is the script that should be invoked to interact with the lowlevel bitstorage backend

These are, for now, the config parameters that control the doms, and which can be set before the install.

An installed system

We now assume that you have installed the DOMS system, by using install.sh or package.sh. This section is a guide to what is controlled where, if you desire to make changes. I will write this guide based on the above defaults from setenv.sh. If you changed anything, it is fairly simple to figure out where the files will be instead.

The directory structure, following an install will be thus. I have left out folders that does not contain anything to change, as they are not relevant to the current discussion

  • cache/
  • data/
  • services/
    • conf/
      • log4j.*.xml context.xml.default (symlinked to tomcat/conf/Catalina/localhost/) setenv.sh (symlinked to tomcat/bin)
    • fedora/
      • server
        • config/
          • akubra-llstore.xml beSecurity.xml fedora.fcfg fedora-users.xml jaas.conf logback.xml
    • webapps/
  • tomcat/
    • conf/
      • server.xml web.xml context.xml

log4j.*.xml

These are the respective log4j config files for each of the deployed webservices As can be seen, they have a specific logappender, developed in dk.statsbiblioteket added. This logappender collects the "bad" messages, and is part of the surveillance system. Other than that, is is a standard config file, that logs to

  • the LOG_DIR

context.xml.default

This is the big configuration file for all the doms services. The doms services are all built so that they take all their configuration from context params. The way this file is linked into tomcat enables the values to be overriden from the file tomcat/conf/context.xml

The values in context.xml will default be something like this

fedora

    <!--fedora-->
    <Parameter
            name="fedora.home"
            value="services/fedora"
            override="false"/>

This setting controls where the fedora instance is installed

highlevelbitstorage

    <!--highlevelbitstorage-->
    <Parameter
            name="dk.statsbiblioteket.doms.bitstorage.highlevel.characteriserlocation"
            value="http://localhost:7880/characteriser/characterise/?wsdl"
            override="false"/>

This value controls where the highlevel bitstorage service expects to be able to contact the characteriser service

    <Parameter
            name="dk.statsbiblioteket.doms.bitstorage.highlevel.lowlevellocation"
            value="http://localhost:7880/lowlevelbitstorage/lowlevel/?wsdl"
            override="false"/>

This value controls where the highlevel bitstorage service expects to be able to contact the lowlevel bitstorage service

    <Parameter
            name="dk.statsbiblioteket.doms.bitstorage.highlevel.fedora.server"
            value="http://localhost:7880/fedora"
            override="false"/>

This value controls where the highlevel bitstorage service expects to be able to contact the fedora webservice

    <Parameter
            name="dk.statsbiblioteket.doms.bitstorage.highlevel.fedora.characstream"
            value="CHARACTERISATION" override="false"/>

This is the datastream in a fedora object to use for characterisation information

    <Parameter
            name="dk.statsbiblioteket.doms.bitstorage.highlevel.fedora.contentstream"
            value="CONTENTS" override="false"/>

This is the datastream in a fedora object to use for binary content

    <Parameter name="dk.statsbiblioteket.doms.bitstorage.highlevel.log4jconfig"
               value="${user.home}/services/conf/log4j.highlevelbitstorage.xml"
               override="false"/>

This is where to find the log4j config file for this service.

lowlevelbitstorage

    <!--lowlevelbitstorage-->
    <Parameter
            name="dk.statsbiblioteket.doms.bitstorage.lowlevel.scriptimpl.script"
            value="ssh doms@stage01 bin/server.sh" override="false"/>

This is set to the value of BITSTORAGE_SCRIPT by the install process. It is the way to contact the bitstorage server

    <Parameter name="dk.statsbiblioteket.doms.bitstorage.lowlevel.bitfinder"
               value="http://bitfinder.statsbiblioteket.dk/" override="false"/>

This is set to the value of BITFINDER by the install process. The prefix to add to filenames, when turning them into permament urls.

    <Parameter
            name="dk.statsbiblioteket.doms.bitstorage.lowlevel.preferredBytesLeft"
            value="1000000" override="false"/>

If the backend bitstorage have less than this number of bytes left, issue a warning via the surveillance system

    <Parameter
            name="dk.statsbiblioteket.doms.bitstorage.lowlevel.requiredBytesLeft"
            value="100000" override="false"/>

If the backend bitstorage system have less than this number of bytes left, issue a error via the surveillance system.

    <Parameter name="dk.statsbiblioteket.doms.bitstorage.lowlevel.log4jconfig"
               value="${user.home}/services/conf/log4j.lowlevelbitstorage.xml"
               override="false"/>

log4j config for this service

characteriser

    <!--characteriser-->
    <Parameter
            name="dk.statsbiblioteket.doms.bitstorage.characteriser.log4jconfig"
            value="${user.home}/services/conf/log4j.characteriser.xml"
            override="false"/>

log4j config for this service

ecm

    <!--ecm-->
    <Parameter name="dk.statsbiblioteket.doms.ecm.fedora.connector"
               value="dk.statsbiblioteket.doms.ecm.repository.fedoraclient.FedoraClientConnector"
               override="false"/>

Which implementation of a fedora connector to use. Do not change

    <Parameter name="dk.statsbiblioteket.doms.ecm.fedora.location"
               value="http://localhost:7880/fedora"
               override="false"/>

Location of the fedora, in regards to the ECM service.

    <Parameter name="dk.statsbiblioteket.doms.ecm.pidGenerator.client"
               value="dk.statsbiblioteket.doms.ecm.repository.PidGeneratorImpl"
               override="false"/>

Which implementation of the pidgenerator client to use. Do not change

    <Parameter
            name="dk.statsbiblioteket.doms.ecm.pidgenerator.client.wsdllocation"
            value="http://localhost:7880/pidgenerator/pidGenerator/?wsdl"
            override="false"/>

Location of the pidGenerator service

    <Parameter name="dk.statsbiblioteket.doms.ecm.log4jconfig"
               value="${user.home}/services/conf/log4j.ecm.xml"
               override="false"/>

log4j config for the ecm service

centralDomsWebservice

    <!--centralDomsWebservice-->
    <Parameter name="dk.statsbiblioteket.doms.central.fedoraLocation"
               value="http://localhost:7880/fedora"
               override="false"/>

Where the fedora webservice resides

    <Parameter name="dk.statsbiblioteket.doms.central.ecmLocation"
               value="http://localhost:7880/ecm"
               override="false"/>

Where the ecm webservice resides

    <Parameter name="dk.statsbiblioteket.doms.central.bitstorageWSDL"
               value="http://localhost:7880/highlevelbitstorage/highlevel/?wsdl"
               override="false"/>

Whiere the highlevel bitstorage webservice resides

    <Parameter name="dk.statsbiblioteket.doms.central.updateTrackerLocation"
               value="http://localhost:7880/updatetrackerWebservice/updatetracker/?wsdl"
               override="false"/>

Where the update tracker webservice resides

    <Parameter name="dk.statsbiblioteket.doms.central.log4jconfig"
               value="${user.home}/services/conf/log4j.centralDomsWebservice.xml"
               override="false"/>

log4j config for the centralDomsWebservice service

authchecker

    <!--authchecker-->
    <Parameter name="dk.statsbiblioteket.doms.authchecker.tickets.timeToLive"
               value="1200000" override="false"/>

The authchecker is also the ticketissuer (relevant for summa). This param controls how long issued tickets should live, in ms.

    <Parameter name="dk.statsbiblioteket.doms.authchecker.users.timeToLive"
               value="1200000" override="false"/>

When summa tells doms that a given user is allowed to view something, a temp user account is created. This is the time before this user account is removed again

    <Parameter name="dk.statsbiblioteket.doms.authchecker.fedoralocation"
               value="http://localhost:7880/fedora"
               override="false"/>

This is the location of Fedora, for the authchecker webservice

    <Parameter name="dk.statsbiblioteket.doms.authchecker.log4jconfig"
               value="${user.home}/services/conf/log4j.authchecker.xml"
               override="false"/>

This is the log4j config for the authchecker webservice

updatetrackerWebservice

    <!--updatetrackerWebservice-->
    <Parameter name="dk.statsbiblioteket.doms.updatetracker.fedoralocation"
               value="http://localhost:7880/fedora"
               override="false"/>

This is the fedora location for the update tracker webservice

    <Parameter name="dk.statsbiblioteket.doms.updatetracker.log4jconfig"
               value="${user.home}/services/conf/log4j.updatetrackerWebservice.xml"
               override="false"/>

This is the log4j config for the updatetracker webservice.

pidgenerator

    <!--pidgenerator-->
    <Parameter name="dk.statsbiblioteket.doms.pidgenerator.log4jconfig"
               value="${user.home}/services/conf/log4j.pidgenerator.xml"
               override="false"/>

This is the log4j config for the pidgenerator webservice.

surveillance-fedorasurveyor

    <Parameter
            name="dk.statsbiblioteket.doms.surveillance.fedorasurveyor.fedoraUser"
            value="fedoraReadOnlyAdmin" override="false"/>

This is the user account the surveillance system should use, when testing fedora. This should of course correspond to a user account from fedora-users.xml

    <!--surveillance-fedorasurveyor-->
    <Parameter
            name="dk.statsbiblioteket.doms.surveillance.fedorasurveyor.fedoraPassword"
            value="2ZeMA1bN" override="false"/>

The password that goes with the user account above

    <Parameter
            name="dk.statsbiblioteket.doms.surveillance.fedorasurveyor.fedoraUrl"
            value="http://localhost:7880/fedora"
            override="false"/>

Yet another specification of where to find fedora

    <Parameter
            name="dk.statsbiblioteket.doms.surveillance.fedorasurveyor.log4jconfig"
            value="${user.home}/services/conf/log4j.surveillance-fedorasurveyor.xml"
            override="false"/>

log4j config for this webservice

surveillance-surveyor

    <!--surveillance-surveyor-->
    <Parameter
            name="dk.statsbiblioteket.doms.surveillance.surveyor.ignoredMessagesFile"
            value="ignored.txt" override="false"/>

File to use (path relative to WHAT??) for messages to ignore from the surveillance system. Example, every time fedora cannot find a datastream it logs this as an ERROR, despite it being a common occurrence in normal workflows.

    <Parameter name="dk.statsbiblioteket.doms.surveillance.surveyor.restUrls"
               value="
               http://localhost:7880/surveillance-surveyor/surveyable/getStatusSince/{date};
               http://localhost:7880/surveillance-fedorasurveyor/surveyable/getStatusSince/{date};
               http://localhost:7880/authchecker/surveyable/getStatusSince/{date};
               http://localhost:7880/ecm/surveyable/getStatusSince/{date};
               http://localhost:7880/fedora/surveyable/getStatusSince/{date}"
               override="false"/>

All the doms rest webservices have a little extra servlet, that allows the surveillance system to poll them. This is the list of such servlets that can be contacted by rest.

    <Parameter name="dk.statsbiblioteket.doms.surveillance.surveyor.soapUrls"
               value="
               http://localhost:7880/lowlevelbitstorage/surveyable/?wsdl;
               http://localhost:7880/highlevelbitstorage/surveyable/?wsdl;
               http://localhost:7880/characteriser/surveyable/?wsdl;
               http://localhost:7880/pidgenerator/surveyable/?wsdl;
               http://localhost:7880/updatetrackerWebservice/surveyable/?wsdl;
               http://localhost:7880/centralDomsWebservice/surveyable/?wsdl"
               override="false"/>

Same as above, but for the soap webservices.

    <Parameter name="dk.statsbiblioteket.doms.surveillance.surveyor.log4jconfig"
               value="${user.home}/services/conf/log4j.surveillance-surveyor.xml"
               override="false"/>

log4j config for this webservice

setenv.sh (in tomcat, not the one in the installer)

The setenv.sh is used to set a few variables, that for some reason needs to be overridden thus, rather from context.xml.

export CATALINA_OPTS="-Dorg.apache.activemq.default.directory.prefix="$HOME/cache/" -XX:+HeapDumpOnOutOfMemoryError"
export FEDORA_HOME=$HOME/services/fedora/

There are only two important variables here. org.apache.activemq.default.directory.prefix controls where activeMQ stores it's temp files. FEDORA_HOME needs to be set, I believe, but the context.xml fedora.home param might be sufficient. In older days, the FEDORA_HOME had to be set.

akubra-llstore.xml

This file controls where the fedora objects and datastreams are stored. For now, we do not store datastreams managed, so only objects are relevant. Still, this is the file to change, if the data dir should be moved

beSecurity.xml

When one fedora invocation make fedora call itself, it needs to do so with a certain set of credentials. Due to the nature of net communication, it will not be possible to reuse the credentails of the original call to fedora. Instead these backend calls use the credentials defined in this file. Note, this file does not define user accounts, so the credentails specified here must correspond to an accound in fedora-users.xml

fedora-users.xml

This is the simple file that handles fedora users. It should only be used for backend user accounts. Users that just want to access material will be given temporary user accounts that time out. Users of the future GUI, will use their account from the LDAP. So, this is for those users (ie. ingester, summa) which fall in neither category.

jaas.conf

java as a service. This is the config file for some part of fedoras authentification framework. Only the top section matters. What we have done is thrown in the authchecker webservice in the process. When summa needs to get credentials for a temporary user, it requests these from the authchecker. The authchecker then creates a new user account in memory. When the user tries to use the given credentials, he is subjected to the fedora authentification system, where we have injected the authchecker so it can respond that it knows the user, and let him in.

When LDAP should be enabled with the doms, this would also be the file to edit for fedora to forward authentication to the LDAP server.

logback.xml

Fedora controls logging with the logback system. This is the config file. Like the log4j config files for the webservices, we have defined a special appender, that collects the logmessages for the surveillance system.

fedora.fcfg

Last, and most certainly longest. This is the config file that controls fedora.

The top section section, before the first <module> tag controls params that are global for all of fedora. Most relevant of these are the portnumbers and the tomcat address. Fedora needs to know where it lives.

Those of interest to us (ie, those where you can change something meaningful are)

     <module role="org.fcrepo.server.security.Authorization" class="org.fcrepo.server.security.DefaultAuthorization">
        ....
        <param name="REPOSITORY-POLICIES-DIRECTORY"
               value="$FEDORA_DIR$/fedora-xacml-policies/repository-policies"
               isFilePath="true"/>
        ...
    </module>

    <module role="org.fcrepo.server.storage.DOManager" class="org.fcrepo.server.storage.DefaultDOManager">
        ....
        <param name="storagePool" value="localPostgreSQLPool">
            <comment>The named connection pool from which read/write database
                connections are to be provided for the storage subsystem (see the
                ConnectionPoolManager module). Default is the default provided by the
                ConnectionPoolManager.</comment>
        </param>
        ....
    </module>

Choose between the two database systems defined below. The valid values are

  • localPostgreSQLPool and localDerbyPool.

    <module role="org.fcrepo.server.management.Management" class="org.fcrepo.server.management.ManagementModule">
        ....
        <param name="decorator2"
               value="dk.statsbiblioteket.doms.bitstorage.highlevel.HookApprove">
            <comment>This is the hook that ensures that when a file object is
                marked active, the corresponding file is approved in bitstorage
            </comment>
        </param>
        <param name="dk.statsbiblioteket.doms.bitstorage.highlevel.hookapprove.filecmodel"
               value="doms:ContentModel_File">
            <comment>This is the content model an object must have to be considered
                a file object
            </comment>
        </param>
        <param name="dk.statsbiblioteket.doms.bitstorage.highlevel.hookapprove.webservicelocation"
               value="http://localhost:7880/highlevelbitstorage/highlevel/?wsdl">
            <comment>This specifies the location of the highlevel webservice
            </comment>
        </param>
        <param name="dk.statsbiblioteket.doms.bitstorage.highlevel.hookapprove.username"
               value="fedoraReadOnlyAdmin">
            <comment>This is the username used for publishing files
            </comment>
        </param>
        <param name="dk.statsbiblioteket.doms.bitstorage.highlevel.hookapprove.password"
               value="fedoraReadOnlyPass">
            <comment>This is the password used for publishing files
            </comment>
        </param>

        ...
        <param name="decorator3" value="dk.statsbiblioteket.doms.ecm.fedoravalidatorhook.FedoraModifyObjectHook"/>
    </module>

This defines the two hooks we use inside fedora. The first (decorator2) is the bitstorage approve hook. When a file object is approved/published/set active (many names, same thing), the corresponding datafile should be moved from temporary bitstorage to permanent bitstorage. To do this, it need the location of the highlevelbitstorage webservice, and some credentials to invoke it with. It does not need write access to fedora for anything, but it will require read rights of whatever object is set published. The user account is, of course, defined in fedora-users.xml

Decorator3 is the ecm hook. When an object is published, it should be validated. if it fails, it should not be published.

    <module role="org.fcrepo.server.messaging.Messaging" class="org.fcrepo.server.messaging.MessagingModule">
        ....
        <param name="java.naming.provider.url" value="vm:(broker:(tcp://localhost:$PORTRANGE$16)?useJmx=false)"/>
        ....
    </module>

This controls the portrange for the activeMQ system, so that it stays inside the prescribed range. Jmx is disabled, as it cannot be pushed inside the prescribed range, and is not currently needed

    <module role="org.fcrepo.server.storage.ConnectionPoolManager" class="org.fcrepo.server.storage.ConnectionPoolManagerImpl">
        <comment>This module facilitates obtaining ConnectionPools</comment>
        <param name="defaultPoolName" value="localPostgreSQLPool"/>
        <param name="poolNames" value="localPostgreSQLPool"/>
    </module>

This controls which database system to use for stuff. Legal values are localPostgreSQLPool and localDerbyPool.

    <datastore id="localDerbyPool">
        ....
        <param name="dbUsername" value="fedoraAdmin">
            <comment>The database user name.</comment>
        </param>
        <param name="dbPassword" value="fedoraAdmin">
            <comment>The database password.</comment>
        </param>
        <param name="jdbcURL" value="jdbc:derby:$CACHE_DIR$/derby/fedora3;create=true">
            <comment>The JDBC connection URL.</comment>
        </param>
        ....
    </datastore>

This is the config for the derby database system. If you do not use the derby system, ignore this section. Note, I have had problems with changing the username and password.

    <datastore id="localPostgreSQLPool">
        ....
        <param name="dbUsername" value="$POSTGRESQL_USER$">
            <comment>The database user name.</comment>
        </param>
        <param name="dbPassword" value="$POSTGRESQL_PASS$">
            <comment>The database password.</comment>
        </param>
        <param name="jdbcURL" value="jdbc:postgresql://localhost/$POSTGRESQL_DB$">
            <comment>The JDBC connection URL.</comment>
        </param>
        ....
    </datastore>

This is the config for the postgresql database. Fedora needs this database to archive stuff along the way, so it must be there (or derby must). The referenced database (in postgresql) must exist, but fedora will populate it with the right tables upon startup.

server.xml

This is the tomcat config file. For our purposes, we use it to define the ports tomcat will use, and where it will look for war files.

web.xml

totally default except this bit

    <session-config>
        <session-timeout>1</session-timeout>
    </session-config>

We do not use sessions, and with a timeout of 30 mins, a great big pile of sessions will assemble, to grab all the memory.

Install Guide (last edited 2011-01-28 11:06:53 by abr)