A rough estimate on the numbers of objects in a repository for Statsbiblioteket is 4-5 million at this time. The bulk of these objects is a collection of Radioavisen manuscripts with 3 million TIFF files. The numbers of relations between objects will be much higher. Related projects (primarily Wales National Library) sets the number of relations pr. object to 50-100. This number includes general metadata like Dublin Core. Fedora creates a files for each object in the repository. This does not present a problem for standard access, as they are cached in a database, but it is potentially a showstopper for backup and replication. = Quick test = Redhat with ext3 and Windows XP with NTFS5, both using a single harddisk with a block size of 4KB. The ext3 had 7.5 million inodes free (dumpe2fs). ||'''System''' || '''Files''' || '''Creation timing''' || '''ZIP timing''' || ||RedHat, ext3 ||200.000 ||3.3 min<
> ~1000 files/sek ||1.5 min<
> ~2200 files/sek|| ||RedHat, ext3 ||1.000.000 ||13.2 min<
> ~1250 files/sek ||9.3 min<
> ~1800 files/sek|| ||RedHat, ext3 ||5.000.000 ||72.9 min<><
> ~1150 files/sek ||103.1 min<><
> ~800 files/sek|| ||Windows XP, NTFS 5 ||200.000 ||10.5 min<
> ~320 files/sek ||10.5 min<
> ~320 files/sek|| ||Windows XP, NTFS 5 ||1.000.000 ||51.2 min<
> ~320 files/sek ||53.5 min<
> ~310 files/sek|| ||Windows XP, NTFS 5 ||5.000.000 ||243.1 min<
> ~340 files/sek || 246,6 min<
> ~340 files/sek|| == Test conclusion == A backup time of below two hours on a single harddisk, with a non-small-files-optimized filesystem, is acceptable (Toke). A bigger problem is the inconsistencies that occur due to changed files during backup. Windows XP does not perform that well with NTFS5. One possibility could be to use another filesystem. There is a freeware ext2/3 driver for Windows XP located at http://fs-driver.org/ = Problem = There are political and technical problems with backup of millions of small files. The web development group had this problem with Horizon. They have implemented a database-driven file handling system in order to reduce the number of files. Talk to Hans about this. The Royal Library solves the problem by using XMLTapes for meta data (concatenate the XML files and index them), but that solution works best for very static metadata. = See also = Scalability tests on several different maschines: http://fedora.statsbiblioteket.dk/fedoraWiki/Fedora_performance