Differences between revisions 5 and 6
Revision 5 as of 2011-01-18 14:48:39
Size: 29362
Editor: abr
Comment:
Revision 6 as of 2011-01-18 14:49:19
Size: 29222
Editor: abr
Comment:
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:

<<TableOfContents([maxdepth])>>
Line 6: Line 5:

This is the DOMS System, Digital Object Management System System. It is a complete
repository system, based around a Fedora instance.
This is the DOMS System, Digital Object Management System System. It is a complete repository system, based around a Fedora instance.
Line 24: Line 21:

Due to the nature of Fedora, it is impossible to just generate an extractable
zip. Rather, Fedora is based around an installer, and thus is DOMS required to
be the same way. The doms release is this installer.
Due to the nature of Fedora, it is impossible to just generate an extractable zip. Rather, Fedora is based around an installer, and thus is DOMS required to be the same way. The doms release is this installer.
Line 30: Line 24:
 the following scripts
 1. install_basic_tomcat.sh
 2. package.sh
 3. ingest_base_objects.sh
 4. install.sh
 5.
setenv.sh

setenv.sh is the script that controls everything. It sets all the variables
that determine the location of various folders, and port numbers.

install.sh is a very simple script, in that it just calls install_basic_tomcat.sh,
package.sh and ingest_base_objects.sh in sequence. It is used thus

 .
the following scripts
 * install_basic_tomcat.sh
 * package.sh
 * ingest_base_objects.sh
 * install.sh
 *
setenv.sh

setenv.sh is the script that controls everything. It sets all the variables that determine the location of various folders, and port numbers.

install.sh is a very simple script, in that it just calls install_basic_tomcat.sh, package.sh and ingest_base_objects.sh in sequence. It is used thus
Line 45: Line 39:
Where PATH_TO_INSTALLDIR is the location to install the doms system in. The DOMS
system will never modify anything outside this path (unless you change the
default values in setenv.sh)

ingest_base_objects.sh assumes that there is a working doms installation, and
ingests the basic objects that is nessesary for real data to be ingested. It reads
it configuration from setenv.sh

package.sh is the real workhorse of the install process. It copies and creates
the entire installation, symlinks and replace stuff in config files, so the system
works. It does not install a tomcat, but it does write the nessessary config files,
so that if one is already installed, it will be configured.
Where PATH_TO_INSTALLDIR is the location to install the doms system in. The DOMS system will never modify anything outside this path (unless you change the default values in setenv.sh)

ingest_base_objects.sh assumes that there is a working doms installation, and ingests the basic objects that is nessesary for real data to be ingested. It reads it configuration from setenv.sh

package.sh is the real workhorse of the install process. It copies and creates the entire installation, symlinks and replace stuff in config files, so the system works. It does not install a tomcat, but it does write the nessessary config files, so that if one is already installed, it will be configured.
Line 60: Line 47:

Line 63: Line 48:
Line 65: Line 49:
{{{
#!/bin/bash

{{{#!/bin/bash
Line 122: Line 106:

}}}

$BASEDIR is the root of the installer, ie. bin/.. It is set by the scripts
using setenv.sh and should not be overridden.

The big blob about TESTBED_DIR reads the first parameter from the command
line and store it as the testbed dir. This is root of where everything will
be installed.

Then comes the portrange. All the doms components are configured to use a port
inside this range, so the tomcat server would run on 7880, when the PORTRANGE is
set to 80. Use this to configere which set of 100 ports doms should have.

The tomcat servername can safely be kept as localhost. It is not used for much,
but when fedora gives back an url, it will use the servername as prefix. At the
moment fedora is completely shielded behind the services, so this will not be used.

We have defined to standard users for fedora. Both are allowed to view all the
repository contains, but only one of them is allowed to change data. I am not
entirely sure if the system would work if you change the FEDORAADMIN username to
something else, but the other 3 parameters can be changed freely.

The the big set of important directories for doms. Notice that all of them
are relative to the TESTBED_DIR. This is what prevents DOMS from modifying anything
outside the TESTBED_DIR.

LOG_DIR is where the webservices and fedora logs. It does not control where
tomcat logs.
TOMCAT_DIR is where tomcat should be installed (install_basic_tomcat.sh) or
where we expect a tomcat to be (package.sh). Package.sh will create this folder
if it does not exist, and populate it with only the tomcat config files we have
changed.
FEDORA_DIR is where fedora is to be installed. This is just the basic fedora
instance, with config and client, not any of the dynamic content souch as data.
Repository policies are stored in the FEDORA_DIR
DATA_DIR is where fedora should store it's objects and datastreams.
CACHE_DIR is where the activeMQ and mulgara triple store should store their files.
This is not a cache in the sense that it can just be deleted, but it is a cache
in the sence that it can be regenerated from the contents of the DATA_DIR
TOMCAT_CONFIG_DIR is where the configuration files for services running in tomcat
is placed. This is mostly used to place the log4j config files.
WEBAPPS_DIR is where the webapps should go. Tomcat is configured to load the
webapps from this location
USE_POSTGRESQL is binary, ie true or false. If true, it will attempt to connect
to a postgresql database running on localhost with the credentials below. If false
fedora will use a java derby database. The derby database will be stored in CACHE_DIR
BITFINDER is the url that should be prepended to the filenames when lowlevel
bitstorage stores a file and returns an url
BITSTORAGE_SCRIPT is the script that should be invoked to interact with the
lowlevel bitstorage backend

These are, for now, the config parameters that control the doms, and which can
be set before the install.
}}}
$BASEDIR is the root of the installer, ie. bin/.. It is set by the scripts using setenv.sh and should not be overridden.

The big blob about TESTBED_DIR reads the first parameter from the command line and store it as the testbed dir. This is root of where everything will be installed.

Then comes the portrange. All the doms components are configured to use a port inside this range, so the tomcat server would run on 7880, when the PORTRANGE is set to 80. Use this to configere which set of 100 ports doms should have.

The tomcat servername can safely be kept as localhost. It is not used for much, but when fedora gives back an url, it will use the servername as prefix. At the moment fedora is completely shielded behind the services, so this will not be used.

We have defined to standard users for fedora. Both are allowed to view all the repository contains, but only one of them is allowed to change data. I am not entirely sure if the system would work if you change the FEDORAADMIN username to something else, but the other 3 parameters can be changed freely.

The the big set of important directories for doms. Notice that all of them are relative to the TESTBED_DIR. This is what prevents DOMS from modifying anything outside the TESTBED_DIR.

LOG_DIR is where the webservices and fedora logs. It does not control where tomcat logs. TOMCAT_DIR is where tomcat should be installed (install_basic_tomcat.sh) or where we expect a tomcat to be (package.sh). Package.sh will create this folder if it does not exist, and populate it with only the tomcat config files we have changed. FEDORA_DIR is where fedora is to be installed. This is just the basic fedora instance, with config and client, not any of the dynamic content souch as data. Repository policies are stored in the FEDORA_DIR DATA_DIR is where fedora should store it's objects and datastreams. CACHE_DIR is where the activeMQ and mulgara triple store should store their files. This is not a cache in the sense that it can just be deleted, but it is a cache in the sence that it can be regenerated from the contents of the DATA_DIR TOMCAT_CONFIG_DIR is where the configuration files for services running in tomcat is placed. This is mostly used to place the log4j config files. WEBAPPS_DIR is where the webapps should go. Tomcat is configured to load the webapps from this location USE_POSTGRESQL is binary, ie true or false. If true, it will attempt to connect to a postgresql database running on localhost with the credentials below. If false fedora will use a java derby database. The derby database will be stored in CACHE_DIR BITFINDER is the url that should be prepended to the filenames when lowlevel bitstorage stores a file and returns an url BITSTORAGE_SCRIPT is the script that should be invoked to interact with the lowlevel bitstorage backend

These are, for now, the config parameters that control the doms, and which can be set before the install.
Line 179: Line 124:

We now assume that you have installed the DOMS system, by using install.sh
or package.sh. This section is a guide to what is controlled where, if you desire
to make changes. I will write this guide based on the above defaults from
setenv.sh. If you changed anything, it is fairly simple to figure out where
the files will be instead.

The directory structure, following an install will be thus. I have left out folders
that does not contain anything to change, as they are not relevant to the current discussion
We now assume that you have installed the DOMS system, by using install.sh or package.sh. This section is a guide to what is controlled where, if you desire to make changes. I will write this guide based on the above defaults from setenv.sh. If you changed anything, it is fairly simple to figure out where the files will be instead.

The directory structure, following an install will be thus. I have left out folders that does not contain anything to change, as they are not relevant to the current discussion
Line 213: Line 152:
These are the respective log4j config files for each of the deployed webservices
As can be seen, they have a specific logappender, developed in dk.statsbiblioteket
added. This logappender collects the "bad" messages, and is part of the
surveillance system. Other than that, is is a standard config file, that logs to
 the LOG_DIR
These are the respective log4j config files for each of the deployed webservices As can be seen, they have a specific logappender, developed in dk.statsbiblioteket added. This logappender collects the "bad" messages, and is part of the surveillance system. Other than that, is is a standard config file, that logs to

 .
the LOG_DIR
Line 220: Line 157:
This is the big configuration file for all the doms services. The doms services
are all built so that they take all their configuration from context params.
The way this file is linked into tomcat enables the values to be overriden
from the file tomcat/conf/context.xml
This is the big configuration file for all the doms services. The doms services are all built so that they take all their configuration from context params. The way this file is linked into tomcat enables the values to be overriden from the file tomcat/conf/context.xml
Line 231: Line 165:
  name="fedora.home"
  value="services/fedora"
  override="false"/>
         name="fedora.home"
         value="services/fedora"
         override="false"/>
Line 236: Line 170:
Line 246: Line 179:
This value controls where the highlevel bitstorage service expects to be able to
contact the characteriser service
This value controls where the highlevel bitstorage service expects to be able to contact the characteriser service
Line 255: Line 187:
This value controls where the highlevel bitstorage service expects to be able to
contact the lowlevel bitstorage service
This value controls where the highlevel bitstorage service expects to be able to contact the lowlevel bitstorage service
Line 264: Line 195:
This value controls where the highlevel bitstorage service expects to be able to
contact the fedora webservice
This value controls where the highlevel bitstorage service expects to be able to contact the fedora webservice
Line 289: Line 219:
Line 296: Line 225:
This is set to the value of BITSTORAGE_SCRIPT by the install process. It is the
way to contact the bitstorage server
This is set to the value of BITSTORAGE_SCRIPT by the install process. It is the way to contact the bitstorage server
Line 303: Line 231:
This is set to the value of BITFINDER by the install process. The prefix to
add to filenames, when turning them into permament urls.
This is set to the value of BITFINDER by the install process. The prefix to add to filenames, when turning them into permament urls.
Line 311: Line 238:
If the backend bitstorage have less than this number of bytes left, issue a
warning via the surveillance system
If the backend bitstorage have less than this number of bytes left, issue a warning via the surveillance system
Line 319: Line 245:
If the backend bitstorage system have less than this number of bytes left, issue
a error via the surveillance system.
If the backend bitstorage system have less than this number of bytes left, issue a error via the surveillance system.
Line 330: Line 255:
Line 378: Line 302:
Line 422: Line 345:
The authchecker is also the ticketissuer (relevant for summa). This param controls
how long issued tickets should live, in ms.
The authchecker is also the ticketissuer (relevant for summa). This param controls how long issued tickets should live, in ms.
Line 429: Line 351:
When summa tells doms that a given user is allowed to view something, a
temp user account is created. This is the time before this user account is
removed again
When summa tells doms that a given user is allowed to view something, a temp user account is created. This is the time before this user account is removed again
Line 446: Line 366:
Line 464: Line 383:
Line 475: Line 393:
Line 481: Line 398:
This is the user account the surveillance system should use, when testing fedora.
This should of course correspond to a user account from fedora-users.xml
This is the user account the surveillance system should use, when testing fedora. This should of course correspond to a user account from fedora-users.xml
Line 509: Line 425:
Line 516: Line 431:
File to use (path relative to WHAT??) for messages to ignore from the surveillance system.
Example, every time fedora cannot find a datastream it logs this as an ERROR, despite
it being a common occurrence in normal workflows.
File to use (path relative to WHAT??) for messages to ignore from the surveillance system. Example, every time fedora cannot find a datastream it logs this as an ERROR, despite it being a common occurrence in normal workflows.
Line 530: Line 443:
All the doms rest webservices have a little extra servlet, that allows the surveillance
system to poll them. This is the list of such servlets that can be contacted by rest.
All the doms rest webservices have a little extra servlet, that allows the surveillance system to poll them. This is the list of such servlets that can be contacted by rest.
Line 553: Line 465:


Line 557: Line 466:

The setenv.sh is used to set a few variables, that for some reason needs to
be overridden thus, rather from context.xml.
The setenv.sh is used to set a few variables, that for some reason needs to be overridden thus, rather from context.xml.
Line 565: Line 472:
There are only two important variables here.
org.apache.activemq.default.directory.prefix controls where activeMQ stores it's
temp files. FEDORA_HOME needs to be set, I believe, but the context.xml fedora.home
param might be sufficient. In older days, the FEDORA_HOME had to be set.
There are only two important variables here. org.apache.activemq.default.directory.prefix controls where activeMQ stores it's temp files. FEDORA_HOME needs to be set, I believe, but the context.xml fedora.home param might be sufficient. In older days, the FEDORA_HOME had to be set.
Line 572: Line 475:

This file controls where the fedora objects and datastreams are stored. For
now, we do not store datastreams managed, so only objects are relevant. Still,
this is the file to change, if the data dir should be moved
This file controls where the fedora objects and datastreams are stored. For now, we do not store datastreams managed, so only objects are relevant. Still, this is the file to change, if the data dir should be moved
Line 578: Line 478:

When one fedora invocation make fedora call itself, it needs to do so with
a certain set of credentials. Due to the nature of net communication, it will
not be possible to reuse the credentails of the original call to fedora. Instead
these backend calls use the credentials defined in this file.
Note, this file does not define user accounts, so the credentails specified here
must correspond to an accound in fedora-users.xml
When one fedora invocation make fedora call itself, it needs to do so with a certain set of credentials. Due to the nature of net communication, it will not be possible to reuse the credentails of the original call to fedora. Instead these backend calls use the credentials defined in this file. Note, this file does not define user accounts, so the credentails specified here must correspond to an accound in fedora-users.xml
Line 587: Line 481:

This is the simple file that handles fedora users. It should only be used for
backend user accounts. Users that just want to access material will be given
temporary user accounts that time out. Users of the future GUI, will use their
account from the LDAP. So, this is for those users (ie. ingester, summa) which
fall in neither category.
This is the simple file that handles fedora users. It should only be used for backend user accounts. Users that just want to access material will be given temporary user accounts that time out. Users of the future GUI, will use their account from the LDAP. So, this is for those users (ie. ingester, summa) which fall in neither category.
Line 595: Line 484:

java as a service. This is the config file for some part of fedoras authentification
framework. Only the top section matters. What we have done is thrown in the
authchecker webservice in the process. When summa needs to get credentials for a
temporary user, it requests these from the authchecker. The authchecker then
creates a new user account in memory. When the user tries to use the given credentials,
he is subjected to the fedora authentification system, where we have injected
the authchecker so it can respond that it knows the user, and let him in.

When LDAP should be enabled with the doms, this would also be the file to edit
for fedora to forward authentication to the LDAP server.
java as a service. This is the config file for some part of fedoras authentification framework. Only the top section matters. What we have done is thrown in the authchecker webservice in the process. When summa needs to get credentials for a temporary user, it requests these from the authchecker. The authchecker then creates a new user account in memory. When the user tries to use the given credentials, he is subjected to the fedora authentification system, where we have injected the authchecker so it can respond that it knows the user, and let him in.

When LDAP should be enabled with the doms, this would also be the file to edit for fedora to forward authentication to the LDAP server.
Line 608: Line 489:

Fedora controls logging with the logback system. This is the config file.
Like the log4j config files for the webservices, we have defined a special
appender, that collects the logmessages for the surveillance system.
Fedora controls logging with the logback system. This is the config file. Like the log4j config files for the webservices, we have defined a special appender, that collects the logmessages for the surveillance system.
Line 615: Line 492:
Line 618: Line 494:
The top section section, before the first <module> tag controls params that are
global for all of fedora. Most relevant of these are the portnumbers and the
tomcat address. Fedora needs to know where it lives.
The top section section, before the first <module> tag controls params that are global for all of fedora. Most relevant of these are the portnumbers and the tomcat address. Fedora needs to know where it lives.
Line 623: Line 497:
Line 632: Line 507:

Line 647: Line 520:
 localPostgreSQLPool and localDerbyPool.
 .
localPostgreSQLPool and localDerbyPool.
Line 684: Line 558:

This defines the two hooks we use inside fedora. The first (decorator2) is the bitstorage
approve hook. When a file object is approved/published/set active (many names,
same thing), the corresponding datafile should be moved from temporary
bitstorage to permanent bitstorage. To do this, it need the location of the
highlevelbitstorage webservice, and some credentials to invoke it with. It does
not need write access to fedora for anything, but it will require read rights
of whatever object is set published. The user account is, of course, defined
in fedora-users.xml

Decorator3 is the ecm hook. When an object is published, it should be validated.
if it fails, it should not be published.

This defines the two hooks we use inside fedora. The first (decorator2) is the bitstorage approve hook. When a file object is approved/published/set active (many names, same thing), the corresponding datafile should be moved from temporary bitstorage to permanent bitstorage. To do this, it need the location of the highlevelbitstorage webservice, and some credentials to invoke it with. It does not need write access to fedora for anything, but it will require read rights of whatever object is set published. The user account is, of course, defined in fedora-users.xml

Decorator3 is the ecm hook. When an object is published, it should be validated. if it fails, it should not be published.
Line 706: Line 569:
This controls the portrange for the activeMQ system, so that it stays inside the
prescribed range. Jmx is disabled, as it cannot be pushed inside the
prescribed range, and is not currently needed
This controls the portrange for the activeMQ system, so that it stays inside the prescribed range. Jmx is disabled, as it cannot be pushed inside the  prescribed range, and is not currently needed
Line 717: Line 578:

This controls which database system to use for stuff. Legal values are
localPostgreSQLPool and localDerbyPool.
This controls which database system to use for stuff. Legal values are localPostgreSQLPool and localDerbyPool.
Line 736: Line 595:

This is the config for the derby database system. If you do not use the derby
system, ignore this section. Note, I have had problems with changing the
username and password.

This is the config for the derby database system. If you do not use the derby system, ignore this section. Note, I have had problems with changing the  username and password.
Line 758: Line 612:

This is the config for the postgresql database. Fedora needs this database to
archive stuff along the way, so it must be there (or derby must). The referenced
database (in postgresql) must exist, but fedora will populate it with the right tables upon
startup.
This is the config for the postgresql database. Fedora needs this database to archive stuff along the way, so it must be there (or derby must). The referenced database (in postgresql) must exist, but fedora will populate it with the right tables upon startup.
Line 766: Line 615:

This is the tomcat config file. For our purposes, we use it to define the
ports tomcat will use, and where it will look for war files.
This is the tomcat config file. For our purposes, we use it to define the ports tomcat will use, and where it will look for war files.
Line 772: Line 618:
Line 774: Line 619:
Line 779: Line 625:
We do not use sessions, and with a timeout of 30 mins, a great big pile of
sessions will assemble, to grab all the memory.
We do not use sessions, and with a timeout of 30 mins, a great big pile of  sessions will assemble, to grab all the memory.

Install Guide

<<TableOfContents: execution failed [Argument "maxdepth" must be an integer value, not "[maxdepth]"] (see also the log)>>

The System

This is the DOMS System, Digital Object Management System System. It is a complete repository system, based around a Fedora instance.

These are the components, that make up the system

  • Central WebService AKA DomsServer - The interface to the system

  • ECM - Enhanced Content Models, added functionality for Fedora
  • Highlevel Bitstorage - A highlevel way to work with files in Doms
  • Lowlevel bistorage - The lowlevel, near the storage system way to work with files in doms
  • Characteriser - Characterises datafiles that are stored
  • Updatetracker - Keeps track of which records in doms have changed
  • Pidgenerator - Generates pids for new objects
  • Surveillance - Monitors that everything is working and not reporting errors

A full guide to the workings of DOMS is beyound this document.

The release

Due to the nature of Fedora, it is impossible to just generate an extractable zip. Rather, Fedora is based around an installer, and thus is DOMS required to be the same way. The doms release is this installer.

The installer is invoked by the script install.sh in bin/. In bin, there are

  • the following scripts
  • install_basic_tomcat.sh
  • package.sh
  • ingest_base_objects.sh
  • install.sh
  • setenv.sh

setenv.sh is the script that controls everything. It sets all the variables that determine the location of various folders, and port numbers.

install.sh is a very simple script, in that it just calls install_basic_tomcat.sh, package.sh and ingest_base_objects.sh in sequence. It is used thus

./install.sh PATH_TO_INSTALLDIR

Where PATH_TO_INSTALLDIR is the location to install the doms system in. The DOMS system will never modify anything outside this path (unless you change the default values in setenv.sh)

ingest_base_objects.sh assumes that there is a working doms installation, and ingests the basic objects that is nessesary for real data to be ingested. It reads it configuration from setenv.sh

package.sh is the real workhorse of the install process. It copies and creates the entire installation, symlinks and replace stuff in config files, so the system works. It does not install a tomcat, but it does write the nessessary config files, so that if one is already installed, it will be configured.

install_basic_tomcat.sh just extracts the included tomcat server.

setenv.sh - configuring the install

setenv.sh looks like this default

{{{#!/bin/bash

TOMCATZIP=basename $BASEDIR/data/tomcat/*.zip FEDORAJAR=basename $BASEDIR/data/fedora/*.jar

# # Check for install-folder and potentially create it. # TESTBED_DIR=$@ if [ -z "$TESTBED_DIR" ]; then

  • echo "install-dir not specified. Bailing out." 1>&2 usage

fi if [ -d $TESTBED_DIR ]; then

  • echo ""

else

  • mkdir -p $TESTBED_DIR

fi pushd $@ > /dev/null TESTBED_DIR=$(pwd) popd > /dev/null

# The normal config values PORTRANGE=78 TOMCAT_SERVERNAME=localhost

FEDORAADMIN=fedoraAdmin FEDORAADMINPASS=fedoraAdminPass

FEDORAUSER=fedoraReadOnlyAdmin FEDORAUSERPASS=fedoraReadOnlyPass

# The folders LOG_DIR=$TESTBED_DIR/logs

TOMCAT_DIR=$TESTBED_DIR/tomcat

FEDORA_DIR=$TESTBED_DIR/services/fedora

DATA_DIR=$TESTBED_DIR/data

CACHE_DIR=$TESTBED_DIR/cache

TOMCAT_CONFIG_DIR=$TESTBED_DIR/services/conf

WEBAPPS_DIR=$TESTBED_DIR/services/webapps

#Database USE_POSTGRESQL=true POSTGRESQL_DB=doms-test$PORTRANGE POSTGRESQL_USER=doms-test$PORTRANGE POSTGRESQL_PASS=doms-test$PORTRANGE

#Bitstorage BITFINDER=http://bitfinder.statsbiblioteket.dk/ BITSTORAGE_SCRIPT="ssh doms@stage01 bin/server.sh" }}} $BASEDIR is the root of the installer, ie. bin/.. It is set by the scripts using setenv.sh and should not be overridden.

The big blob about TESTBED_DIR reads the first parameter from the command line and store it as the testbed dir. This is root of where everything will be installed.

Then comes the portrange. All the doms components are configured to use a port inside this range, so the tomcat server would run on 7880, when the PORTRANGE is set to 80. Use this to configere which set of 100 ports doms should have.

The tomcat servername can safely be kept as localhost. It is not used for much, but when fedora gives back an url, it will use the servername as prefix. At the moment fedora is completely shielded behind the services, so this will not be used.

We have defined to standard users for fedora. Both are allowed to view all the repository contains, but only one of them is allowed to change data. I am not entirely sure if the system would work if you change the FEDORAADMIN username to something else, but the other 3 parameters can be changed freely.

The the big set of important directories for doms. Notice that all of them are relative to the TESTBED_DIR. This is what prevents DOMS from modifying anything outside the TESTBED_DIR.

LOG_DIR is where the webservices and fedora logs. It does not control where tomcat logs. TOMCAT_DIR is where tomcat should be installed (install_basic_tomcat.sh) or where we expect a tomcat to be (package.sh). Package.sh will create this folder if it does not exist, and populate it with only the tomcat config files we have changed. FEDORA_DIR is where fedora is to be installed. This is just the basic fedora instance, with config and client, not any of the dynamic content souch as data. Repository policies are stored in the FEDORA_DIR DATA_DIR is where fedora should store it's objects and datastreams. CACHE_DIR is where the activeMQ and mulgara triple store should store their files. This is not a cache in the sense that it can just be deleted, but it is a cache in the sence that it can be regenerated from the contents of the DATA_DIR TOMCAT_CONFIG_DIR is where the configuration files for services running in tomcat is placed. This is mostly used to place the log4j config files. WEBAPPS_DIR is where the webapps should go. Tomcat is configured to load the webapps from this location USE_POSTGRESQL is binary, ie true or false. If true, it will attempt to connect to a postgresql database running on localhost with the credentials below. If false fedora will use a java derby database. The derby database will be stored in CACHE_DIR BITFINDER is the url that should be prepended to the filenames when lowlevel bitstorage stores a file and returns an url BITSTORAGE_SCRIPT is the script that should be invoked to interact with the lowlevel bitstorage backend

These are, for now, the config parameters that control the doms, and which can be set before the install.

An installed system

We now assume that you have installed the DOMS system, by using install.sh or package.sh. This section is a guide to what is controlled where, if you desire to make changes. I will write this guide based on the above defaults from setenv.sh. If you changed anything, it is fairly simple to figure out where the files will be instead.

The directory structure, following an install will be thus. I have left out folders that does not contain anything to change, as they are not relevant to the current discussion

  • cache/
  • data/
  • services/
    • conf/
      • log4j.*.xml
      • context.xml.default (symlinked to tomcat/conf/Catalina/localhost/)
      • setenv.sh (symlinked to tomcat/bin)
    • fedora/
      • server
        • config/
          • akubra-llstore.xml
          • beSecurity.xml
          • fedora.fcfg
          • fedora-users.xml
          • jaas.conf
          • logback.xml
    • webapps/
  • tomcat/
    • conf/
      • server.xml
      • web.xml
      • context.xml

log4j.*.xml

These are the respective log4j config files for each of the deployed webservices As can be seen, they have a specific logappender, developed in dk.statsbiblioteket added. This logappender collects the "bad" messages, and is part of the surveillance system. Other than that, is is a standard config file, that logs to

  • the LOG_DIR

context.xml.default

This is the big configuration file for all the doms services. The doms services are all built so that they take all their configuration from context params. The way this file is linked into tomcat enables the values to be overriden from the file tomcat/conf/context.xml

The values in context.xml will default be something like this

fedora

    <!--fedora-->
    <Parameter
            name="fedora.home"
            value="services/fedora"
            override="false"/>

This setting controls where the fedora instance is installed

highlevelbitstorage

    <!--highlevelbitstorage-->
    <Parameter
            name="dk.statsbiblioteket.doms.bitstorage.highlevel.characteriserlocation"
            value="http://localhost:7880/characteriser/characterise/?wsdl"
            override="false"/>

This value controls where the highlevel bitstorage service expects to be able to contact the characteriser service

    <Parameter
            name="dk.statsbiblioteket.doms.bitstorage.highlevel.lowlevellocation"
            value="http://localhost:7880/lowlevelbitstorage/lowlevel/?wsdl"
            override="false"/>

This value controls where the highlevel bitstorage service expects to be able to contact the lowlevel bitstorage service

    <Parameter
            name="dk.statsbiblioteket.doms.bitstorage.highlevel.fedora.server"
            value="http://localhost:7880/fedora"
            override="false"/>

This value controls where the highlevel bitstorage service expects to be able to contact the fedora webservice

    <Parameter
            name="dk.statsbiblioteket.doms.bitstorage.highlevel.fedora.characstream"
            value="CHARACTERISATION" override="false"/>

This is the datastream in a fedora object to use for characterisation information

    <Parameter
            name="dk.statsbiblioteket.doms.bitstorage.highlevel.fedora.contentstream"
            value="CONTENTS" override="false"/>

This is the datastream in a fedora object to use for binary content

    <Parameter name="dk.statsbiblioteket.doms.bitstorage.highlevel.log4jconfig"
               value="${user.home}/services/conf/log4j.highlevelbitstorage.xml"
               override="false"/>

This is where to find the log4j config file for this service.

lowlevelbitstorage

    <!--lowlevelbitstorage-->
    <Parameter
            name="dk.statsbiblioteket.doms.bitstorage.lowlevel.scriptimpl.script"
            value="ssh doms@stage01 bin/server.sh" override="false"/>

This is set to the value of BITSTORAGE_SCRIPT by the install process. It is the way to contact the bitstorage server

    <Parameter name="dk.statsbiblioteket.doms.bitstorage.lowlevel.bitfinder"
               value="http://bitfinder.statsbiblioteket.dk/" override="false"/>

This is set to the value of BITFINDER by the install process. The prefix to add to filenames, when turning them into permament urls.

    <Parameter
            name="dk.statsbiblioteket.doms.bitstorage.lowlevel.preferredBytesLeft"
            value="1000000" override="false"/>

If the backend bitstorage have less than this number of bytes left, issue a warning via the surveillance system

    <Parameter
            name="dk.statsbiblioteket.doms.bitstorage.lowlevel.requiredBytesLeft"
            value="100000" override="false"/>

If the backend bitstorage system have less than this number of bytes left, issue a error via the surveillance system.

    <Parameter name="dk.statsbiblioteket.doms.bitstorage.lowlevel.log4jconfig"
               value="${user.home}/services/conf/log4j.lowlevelbitstorage.xml"
               override="false"/>

log4j config for this service

characteriser

    <!--characteriser-->
    <Parameter
            name="dk.statsbiblioteket.doms.bitstorage.characteriser.log4jconfig"
            value="${user.home}/services/conf/log4j.characteriser.xml"
            override="false"/>

log4j config for this service

ecm

    <!--ecm-->
    <Parameter name="dk.statsbiblioteket.doms.ecm.fedora.connector"
               value="dk.statsbiblioteket.doms.ecm.repository.fedoraclient.FedoraClientConnector"
               override="false"/>

Which implementation of a fedora connector to use. Do not change

    <Parameter name="dk.statsbiblioteket.doms.ecm.fedora.location"
               value="http://localhost:7880/fedora"
               override="false"/>

Location of the fedora, in regards to the ECM service.

    <Parameter name="dk.statsbiblioteket.doms.ecm.pidGenerator.client"
               value="dk.statsbiblioteket.doms.ecm.repository.PidGeneratorImpl"
               override="false"/>

Which implementation of the pidgenerator client to use. Do not change

    <Parameter
            name="dk.statsbiblioteket.doms.ecm.pidgenerator.client.wsdllocation"
            value="http://localhost:7880/pidgenerator/pidGenerator/?wsdl"
            override="false"/>

Location of the pidGenerator service

    <Parameter name="dk.statsbiblioteket.doms.ecm.log4jconfig"
               value="${user.home}/services/conf/log4j.ecm.xml"
               override="false"/>

log4j config for the ecm service

centralDomsWebservice

    <!--centralDomsWebservice-->
    <Parameter name="dk.statsbiblioteket.doms.central.fedoraLocation"
               value="http://localhost:7880/fedora"
               override="false"/>

Where the fedora webservice resides

    <Parameter name="dk.statsbiblioteket.doms.central.ecmLocation"
               value="http://localhost:7880/ecm"
               override="false"/>

Where the ecm webservice resides

    <Parameter name="dk.statsbiblioteket.doms.central.bitstorageWSDL"
               value="http://localhost:7880/highlevelbitstorage/highlevel/?wsdl"
               override="false"/>

Whiere the highlevel bitstorage webservice resides

    <Parameter name="dk.statsbiblioteket.doms.central.updateTrackerLocation"
               value="http://localhost:7880/updatetrackerWebservice/updatetracker/?wsdl"
               override="false"/>

Where the update tracker webservice resides

    <Parameter name="dk.statsbiblioteket.doms.central.log4jconfig"
               value="${user.home}/services/conf/log4j.centralDomsWebservice.xml"
               override="false"/>

log4j config for the centralDomsWebservice service

authchecker

    <!--authchecker-->
    <Parameter name="dk.statsbiblioteket.doms.authchecker.tickets.timeToLive"
               value="1200000" override="false"/>

The authchecker is also the ticketissuer (relevant for summa). This param controls how long issued tickets should live, in ms.

    <Parameter name="dk.statsbiblioteket.doms.authchecker.users.timeToLive"
               value="1200000" override="false"/>

When summa tells doms that a given user is allowed to view something, a temp user account is created. This is the time before this user account is removed again

    <Parameter name="dk.statsbiblioteket.doms.authchecker.fedoralocation"
               value="http://localhost:7880/fedora"
               override="false"/>

This is the location of Fedora, for the authchecker webservice

    <Parameter name="dk.statsbiblioteket.doms.authchecker.log4jconfig"
               value="${user.home}/services/conf/log4j.authchecker.xml"
               override="false"/>

This is the log4j config for the authchecker webservice

updatetrackerWebservice

    <!--updatetrackerWebservice-->
    <Parameter name="dk.statsbiblioteket.doms.updatetracker.fedoralocation"
               value="http://localhost:7880/fedora"
               override="false"/>

This is the fedora location for the update tracker webservice

    <Parameter name="dk.statsbiblioteket.doms.updatetracker.log4jconfig"
               value="${user.home}/services/conf/log4j.updatetrackerWebservice.xml"
               override="false"/>

This is the log4j config for the updatetracker webservice.

pidgenerator

    <!--pidgenerator-->
    <Parameter name="dk.statsbiblioteket.doms.pidgenerator.log4jconfig"
               value="${user.home}/services/conf/log4j.pidgenerator.xml"
               override="false"/>

This is the log4j config for the pidgenerator webservice.

surveillance-fedorasurveyor

    <Parameter
            name="dk.statsbiblioteket.doms.surveillance.fedorasurveyor.fedoraUser"
            value="fedoraReadOnlyAdmin" override="false"/>

This is the user account the surveillance system should use, when testing fedora. This should of course correspond to a user account from fedora-users.xml

    <!--surveillance-fedorasurveyor-->
    <Parameter
            name="dk.statsbiblioteket.doms.surveillance.fedorasurveyor.fedoraPassword"
            value="2ZeMA1bN" override="false"/>

The password that goes with the user account above

    <Parameter
            name="dk.statsbiblioteket.doms.surveillance.fedorasurveyor.fedoraUrl"
            value="http://localhost:7880/fedora"
            override="false"/>

Yet another specification of where to find fedora

    <Parameter
            name="dk.statsbiblioteket.doms.surveillance.fedorasurveyor.log4jconfig"
            value="${user.home}/services/conf/log4j.surveillance-fedorasurveyor.xml"
            override="false"/>

log4j config for this webservice

surveillance-surveyor

    <!--surveillance-surveyor-->
    <Parameter
            name="dk.statsbiblioteket.doms.surveillance.surveyor.ignoredMessagesFile"
            value="ignored.txt" override="false"/>

File to use (path relative to WHAT??) for messages to ignore from the surveillance system. Example, every time fedora cannot find a datastream it logs this as an ERROR, despite it being a common occurrence in normal workflows.

    <Parameter name="dk.statsbiblioteket.doms.surveillance.surveyor.restUrls"
               value="
               http://localhost:7880/surveillance-surveyor/surveyable/getStatusSince/{date};
               http://localhost:7880/surveillance-fedorasurveyor/surveyable/getStatusSince/{date};
               http://localhost:7880/authchecker/surveyable/getStatusSince/{date};
               http://localhost:7880/ecm/surveyable/getStatusSince/{date};
               http://localhost:7880/fedora/surveyable/getStatusSince/{date}"
               override="false"/>

All the doms rest webservices have a little extra servlet, that allows the surveillance system to poll them. This is the list of such servlets that can be contacted by rest.

    <Parameter name="dk.statsbiblioteket.doms.surveillance.surveyor.soapUrls"
               value="
               http://localhost:7880/lowlevelbitstorage/surveyable/?wsdl;
               http://localhost:7880/highlevelbitstorage/surveyable/?wsdl;
               http://localhost:7880/characteriser/surveyable/?wsdl;
               http://localhost:7880/pidgenerator/surveyable/?wsdl;
               http://localhost:7880/updatetrackerWebservice/surveyable/?wsdl;
               http://localhost:7880/centralDomsWebservice/surveyable/?wsdl"
               override="false"/>

Same as above, but for the soap webservices.

    <Parameter name="dk.statsbiblioteket.doms.surveillance.surveyor.log4jconfig"
               value="${user.home}/services/conf/log4j.surveillance-surveyor.xml"
               override="false"/>

log4j config for this webservice

setenv.sh (in tomcat, not the one in the installer)

The setenv.sh is used to set a few variables, that for some reason needs to be overridden thus, rather from context.xml.

export CATALINA_OPTS="-Dorg.apache.activemq.default.directory.prefix="$HOME/cache/" -XX:+HeapDumpOnOutOfMemoryError"
export FEDORA_HOME=$HOME/services/fedora/

There are only two important variables here. org.apache.activemq.default.directory.prefix controls where activeMQ stores it's temp files. FEDORA_HOME needs to be set, I believe, but the context.xml fedora.home param might be sufficient. In older days, the FEDORA_HOME had to be set.

akubra-llstore.xml

This file controls where the fedora objects and datastreams are stored. For now, we do not store datastreams managed, so only objects are relevant. Still, this is the file to change, if the data dir should be moved

beSecurity.xml

When one fedora invocation make fedora call itself, it needs to do so with a certain set of credentials. Due to the nature of net communication, it will not be possible to reuse the credentails of the original call to fedora. Instead these backend calls use the credentials defined in this file. Note, this file does not define user accounts, so the credentails specified here must correspond to an accound in fedora-users.xml

fedora-users.xml

This is the simple file that handles fedora users. It should only be used for backend user accounts. Users that just want to access material will be given temporary user accounts that time out. Users of the future GUI, will use their account from the LDAP. So, this is for those users (ie. ingester, summa) which fall in neither category.

jaas.conf

java as a service. This is the config file for some part of fedoras authentification framework. Only the top section matters. What we have done is thrown in the authchecker webservice in the process. When summa needs to get credentials for a temporary user, it requests these from the authchecker. The authchecker then creates a new user account in memory. When the user tries to use the given credentials, he is subjected to the fedora authentification system, where we have injected the authchecker so it can respond that it knows the user, and let him in.

When LDAP should be enabled with the doms, this would also be the file to edit for fedora to forward authentication to the LDAP server.

logback.xml

Fedora controls logging with the logback system. This is the config file. Like the log4j config files for the webservices, we have defined a special appender, that collects the logmessages for the surveillance system.

fedora.fcfg

Last, and most certainly longest. This is the config file that controls fedora.

The top section section, before the first <module> tag controls params that are global for all of fedora. Most relevant of these are the portnumbers and the tomcat address. Fedora needs to know where it lives.

Those of interest to us (ie, those where you can change something meaningful are)

     <module role="org.fcrepo.server.security.Authorization" class="org.fcrepo.server.security.DefaultAuthorization">
        ....
        <param name="REPOSITORY-POLICIES-DIRECTORY"
               value="$FEDORA_DIR$/fedora-xacml-policies/repository-policies"
               isFilePath="true"/>
        ...
    </module>

    <module role="org.fcrepo.server.storage.DOManager" class="org.fcrepo.server.storage.DefaultDOManager">
        ....
        <param name="storagePool" value="localPostgreSQLPool">
            <comment>The named connection pool from which read/write database
                connections are to be provided for the storage subsystem (see the
                ConnectionPoolManager module). Default is the default provided by the
                ConnectionPoolManager.</comment>
        </param>
        ....
    </module>

Choose between the two database systems defined below. The valid values are

  • localPostgreSQLPool and localDerbyPool.

    <module role="org.fcrepo.server.management.Management" class="org.fcrepo.server.management.ManagementModule">
        ....
        <param name="decorator2"
               value="dk.statsbiblioteket.doms.bitstorage.highlevel.HookApprove">
            <comment>This is the hook that ensures that when a file object is
                marked active, the corresponding file is approved in bitstorage
            </comment>
        </param>
        <param name="dk.statsbiblioteket.doms.bitstorage.highlevel.hookapprove.filecmodel"
               value="doms:ContentModel_File">
            <comment>This is the content model an object must have to be considered
                a file object
            </comment>
        </param>
        <param name="dk.statsbiblioteket.doms.bitstorage.highlevel.hookapprove.webservicelocation"
               value="http://localhost:7880/highlevelbitstorage/highlevel/?wsdl">
            <comment>This specifies the location of the highlevel webservice
            </comment>
        </param>
        <param name="dk.statsbiblioteket.doms.bitstorage.highlevel.hookapprove.username"
               value="fedoraReadOnlyAdmin">
            <comment>This is the username used for publishing files
            </comment>
        </param>
        <param name="dk.statsbiblioteket.doms.bitstorage.highlevel.hookapprove.password"
               value="fedoraReadOnlyPass">
            <comment>This is the password used for publishing files
            </comment>
        </param>

        ...
        <param name="decorator3" value="dk.statsbiblioteket.doms.ecm.fedoravalidatorhook.FedoraModifyObjectHook"/>
    </module>

This defines the two hooks we use inside fedora. The first (decorator2) is the bitstorage approve hook. When a file object is approved/published/set active (many names, same thing), the corresponding datafile should be moved from temporary bitstorage to permanent bitstorage. To do this, it need the location of the highlevelbitstorage webservice, and some credentials to invoke it with. It does not need write access to fedora for anything, but it will require read rights of whatever object is set published. The user account is, of course, defined in fedora-users.xml

Decorator3 is the ecm hook. When an object is published, it should be validated. if it fails, it should not be published.

    <module role="org.fcrepo.server.messaging.Messaging" class="org.fcrepo.server.messaging.MessagingModule">
        ....
        <param name="java.naming.provider.url" value="vm:(broker:(tcp://localhost:$PORTRANGE$16)?useJmx=false)"/>
        ....
    </module>

This controls the portrange for the activeMQ system, so that it stays inside the prescribed range. Jmx is disabled, as it cannot be pushed inside the prescribed range, and is not currently needed

    <module role="org.fcrepo.server.storage.ConnectionPoolManager" class="org.fcrepo.server.storage.ConnectionPoolManagerImpl">
        <comment>This module facilitates obtaining ConnectionPools</comment>
        <param name="defaultPoolName" value="localPostgreSQLPool"/>
        <param name="poolNames" value="localPostgreSQLPool"/>
    </module>

This controls which database system to use for stuff. Legal values are localPostgreSQLPool and localDerbyPool.

    <datastore id="localDerbyPool">
        ....
        <param name="dbUsername" value="fedoraAdmin">
            <comment>The database user name.</comment>
        </param>
        <param name="dbPassword" value="fedoraAdmin">
            <comment>The database password.</comment>
        </param>
        <param name="jdbcURL" value="jdbc:derby:$CACHE_DIR$/derby/fedora3;create=true">
            <comment>The JDBC connection URL.</comment>
        </param>
        ....
    </datastore>

This is the config for the derby database system. If you do not use the derby system, ignore this section. Note, I have had problems with changing the username and password.

    <datastore id="localPostgreSQLPool">
        ....
        <param name="dbUsername" value="$POSTGRESQL_USER$">
            <comment>The database user name.</comment>
        </param>
        <param name="dbPassword" value="$POSTGRESQL_PASS$">
            <comment>The database password.</comment>
        </param>
        <param name="jdbcURL" value="jdbc:postgresql://localhost/$POSTGRESQL_DB$">
            <comment>The JDBC connection URL.</comment>
        </param>
        ....
    </datastore>

This is the config for the postgresql database. Fedora needs this database to archive stuff along the way, so it must be there (or derby must). The referenced database (in postgresql) must exist, but fedora will populate it with the right tables upon startup.

server.xml

This is the tomcat config file. For our purposes, we use it to define the ports tomcat will use, and where it will look for war files.

web.xml

totally default except this bit

    <session-config>
        <session-timeout>1</session-timeout>
    </session-config>

We do not use sessions, and with a timeout of 30 mins, a great big pile of sessions will assemble, to grab all the memory.

Install Guide (last edited 2011-01-28 11:06:53 by abr)