## Action
= Action Yearbook Collection =

## Detailed description of wanted output from the work to be carried out.
## E.g. Implement a utility class for writing data to a disk. The data must be base64 encoded before being written.

## Targeted WBS Tasks and Assigned Developers can be found on the iteration page for this action.

The Collection formerly known as Gentofte need to be finalised for the DOMS system. This involves various changes detailed below:

=== FoxML ===
 * (trivial) In order to prevent name-collision, all objects from the collection must use the PID prefix doms:Aarbog_ rather than doms:
   . DONE 2008-04-01 (fix in {{{doms/gentofte_preingest/data/transformations/PIDs.xsl}}})
   . Catch: The base objects may also need renaming, and references to the base objects, and reference namespace, and what about the disseminators? -- [[bam]] <<DateTime(2008-04-09T07:33:21Z)>>
   . DONE 2008-04-16 :)
 * (trivial) The PIDs of the revyItems will be visible to the users. For this reason, remove the _revyItem postfix from the PIDs of these. Verify that this does not cause name collisions.
   . DONE 2008-04-01 (fix in {{{doms/gentofte_preingest/data/transformations/PIDs.xsl}}}; all other object type PID's have a postfix, so this does not cause name collisions)
 * (trivial) The manufacturer objects should not have the creator dc field initialised. Fix this.
   . DONE 2008-04-01 (fix in {{{doms/gentofte_preingest/data/templates/manufacturer_template.xml}}})
 * (trivial) Encode the type of document as a field in DomsDC, for display in Summa. Punt this until discussion with Gitte is complete
1 md in total

=== DataFiles ===
 * (1 md) The default image format of DOMS is tiff, but the CfkaG uses a mix of gif, png and jpg. Tiff files need to created from each image in the collection. !ImageMagick convert should probably be used.
 * (0.5 md) The DataModel assumes that all articles have an UTF-8 reprensentation. This is not the case for this collection, so in order to satisfy the DataModel empty text files must be created for each article.
 * (trivial) Make sure the filenames in the entire collection is in utf-8, not latin-1. See convmv.
2 md total

In progress. The files have been copied to my machine from zeus (exact location on the [[Årbøger]] page) using scp, the filenames have been converted using the exact commands on the [[convmv]] page, and there is now a {{{convert.sh}}} script, which converts all gif, png and jpg files to tiff and creates empty txt files for all pdf files in a given directory. Any existing tiff files or txt files with the same names will be overwritten! This should probably be fixed, but there are no tiff or txt files in the original material, so it is not important. And the created tiff images have meaningless resolution units - should be fixed! -- [[bam]] <<DateTime(2008-04-03T11:16:28Z)>> 

I have run the convert script locally. It takes approx. 47 minutes to run the script. I have checked the result and found and corrected a bug (the script converted to the wrong file names if there were more than one dot in the original file names). The created tiff images still have meaningless resolution units - should be fixed! -- [[bam]] <<DateTime(2008-04-07T07:27:23Z)>>

The resolution units are undefined in the original images, so we will leave image conversion as is. In the conversion script we will change file access permissions from {{{rwxr-xr-x}}} to {{{rw-r--r--}}}. We note that image magick is not good for conversion to tiff, and we need to find good conversion tools for putting collections into DOMS. -- [[bam]] <<DateTime(2008-04-08T09:02:58Z)>>

If you want to know the metadata in an image, the following commands may be helpfull: {{{identify -verbose}}}, jhove?, {{{tiffinfo}}}, {{{exiv2}}}.

DONE. To get the wanted yearbook collection files
 * copy the files from zeus (exact location on the [[Årbøger]] page)
 * convert the filenames using the commands on the [[convmv]] page
 * run the {{{convert.sh}}} script with the command {{{
./convert.sh <YEARBOOK-COLLECTION-FOLDER> >convert.log 2>&1
}}} (The script is located in the [http://merkur/viewvc/trunk/aarbog_preingest/bin/?root=doms preingest module]; the log is just because the script produces quite a bit of output) -- ["bam"] [[DateTime(2008-04-09T07:14:28Z)]]

Catch: We need to read up on our own tiff requirements and make sure that the converted files meet these. The convert script has been updated to __not__ compress the tiff files. And we note that updating image magick to version 6.3.5 takes care of the 'Unknown pseudo-tag 65537. `TIFFSetField'.'-errors.