Differences between revisions 1 and 2
Revision 1 as of 2008-06-26 12:26:09
Size: 2711
Editor: kfc
Comment: Created by the PackagePages action.
Revision 2 as of 2010-03-17 13:12:48
Size: 2711
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 12: Line 12:
||RedHat, ext3 ||200.000 ||3.3 min[[BR]] ~1000 files/sek ||1.5 min[[BR]] ~2200 files/sek||
||RedHat, ext3 ||1.000.000 ||13.2 min[[BR]] ~1250 files/sek ||9.3 min[[BR]] ~1800 files/sek||
||RedHat, ext3 ||5.000.000 ||72.9 min[[FootNote(The maschine was used for light work)]][[BR]] ~1150 files/sek ||103.1 min[[FootNote(The maschine was left alone and locked the screen. No demanding screensaver was active)]][[BR]] ~800 files/sek||
||Windows XP, NTFS 5 ||200.000 ||10.5 min[[BR]] ~320 files/sek ||10.5 min[[BR]] ~320 files/sek||
||Windows XP, NTFS 5 ||1.000.000 ||51.2 min[[BR]] ~320 files/sek ||53.5 min[[BR]] ~310 files/sek||
||Windows XP, NTFS 5 ||5.000.000 ||243.1 min[[BR]] ~340 files/sek || 246,6 min[[BR]] ~340 files/sek||
||RedHat, ext3 ||200.000 ||3.3 min<<BR>> ~1000 files/sek ||1.5 min<<BR>> ~2200 files/sek||
||RedHat, ext3 ||1.000.000 ||13.2 min<<BR>> ~1250 files/sek ||9.3 min<<BR>> ~1800 files/sek||
||RedHat, ext3 ||5.000.000 ||72.9 min<<FootNote(The maschine was used for light work)>><<BR>> ~1150 files/sek ||103.1 min<<FootNote(The maschine was left alone and locked the screen. No demanding screensaver was active)>><<BR>> ~800 files/sek||
||Windows XP, NTFS 5 ||200.000 ||10.5 min<<BR>> ~320 files/sek ||10.5 min<<BR>> ~320 files/sek||
||Windows XP, NTFS 5 ||1.000.000 ||51.2 min<<BR>> ~320 files/sek ||53.5 min<<BR>> ~310 files/sek||
||Windows XP, NTFS 5 ||5.000.000 ||243.1 min<<BR>> ~340 files/sek || 246,6 min<<BR>> ~340 files/sek||

A rough estimate on the numbers of objects in a repository for Statsbiblioteket is 4-5 million at this time. The bulk of these objects is a collection of Radioavisen manuscripts with 3 million TIFF files.

The numbers of relations between objects will be much higher. Related projects (primarily Wales National Library) sets the number of relations pr. object to 50-100. This number includes general metadata like Dublin Core.

Fedora creates a files for each object in the repository. This does not present a problem for standard access, as they are cached in a database, but it is potentially a showstopper for backup and replication.

Quick test

Redhat with ext3 and Windows XP with NTFS5, both using a single harddisk with a block size of 4KB. The ext3 had 7.5 million inodes free (dumpe2fs).

System

Files

Creation timing

ZIP timing

RedHat, ext3

200.000

3.3 min
~1000 files/sek

1.5 min
~2200 files/sek

RedHat, ext3

1.000.000

13.2 min
~1250 files/sek

9.3 min
~1800 files/sek

RedHat, ext3

5.000.000

72.9 min1
~1150 files/sek

103.1 min2
~800 files/sek

Windows XP, NTFS 5

200.000

10.5 min
~320 files/sek

10.5 min
~320 files/sek

Windows XP, NTFS 5

1.000.000

51.2 min
~320 files/sek

53.5 min
~310 files/sek

Windows XP, NTFS 5

5.000.000

243.1 min
~340 files/sek

246,6 min
~340 files/sek

Test conclusion

A backup time of below two hours on a single harddisk, with a non-small-files-optimized filesystem, is acceptable (Toke). A bigger problem is the inconsistencies that occur due to changed files during backup.

Windows XP does not perform that well with NTFS5. One possibility could be to use another filesystem. There is a freeware ext2/3 driver for Windows XP located at http://fs-driver.org/

Problem

There are political and technical problems with backup of millions of small files. The web development group had this problem with Horizon. They have implemented a database-driven file handling system in order to reduce the number of files. Talk to Hans about this.

The Royal Library solves the problem by using XMLTapes for meta data (concatenate the XML files and index them), but that solution works best for very static metadata.

See also

Scalability tests on several different maschines: http://fedora.statsbiblioteket.dk/fedoraWiki/Fedora_performance

  1. The maschine was used for light work (1)

  2. The maschine was left alone and locked the screen. No demanding screensaver was active (2)

Millions of objects (last edited 2010-03-17 13:12:48 by localhost)