Hardware

Quick notes... More to come.

See also IndexBuilding.

Appetizer

SSD_Appetizer2.png

The test:

The index:

Machines:

Legend:

Threading

Performance of threaded searches with a single searcher shared between threads. The difference between this test and the one above is a larger index and the use of threading. The t1/t2/t3/t4 signifies the number of concurrent threads used for searching.

threads_shared_searcher.png

Performance of threaded searches with an unique searcher for each thread.

threads_harddisk_unique.png

threads_ssd_unique.png

Observations

Individual searchers for each thread performs significantly better than a shared searcher for each thread. At least in the long run.

The sweet spot for the number of threads seems to be the number of CPU-cores or one more than the number of CPU-cores.

Warming up

Warming the searchers means running realistic queries before providing access to the searchers. It is a known fact that Lucene benefits a lot from a warming. This is partly due to Lucene's internal structures being initialized and values being cached and partly due to the system disk-cache being populated.

warming.png

RAM vs SSD vs Harddisks

Turns out we could cram 14GB of index into RAM at our machine (see the page history for the previous test with 9GB). We tried loading the index into RAM and pitted it against the same index on SSD and conventional harddisks, with the available memory reduced to 3GB.

ram_2threads.png

Observations

Using SSD with multiple independent searchers with just 3 GB of RAM can give performance nearly on par with the pure RAM-setup using 24 GB of RAM. It's very interesting to see that the RAM-based searchers also need some time to ramp up in speed, which shows that a substantial part of the warm-up time is due to Lucene's internal structure initialization - faster storage doesn't help here

CPU-core scaling

We got 3 new machines medio august 2008. They are quad-core Xeon machines with 6MB of level 2 cache, 16GB RAM and 4 * 64GB SSD in RAID0. We're in love.

In the graph below, metis is our "old" dual-core machine with 2 * 32GB MTRON SSDs in RAID 0, while prod is one of the mew machines. As can be seen, metis is a bit faster than prod for 1-2 threads, after which prod pulls ahead. Not surprising. What's interesting is that speed continues to increase at a great pace up to 4-5 threads - the CPU is the bottleneck, not the SSDs.

Looking at the performance increase, we can calculate how many more raw queries/sec we can deliver for each extra thread:

250SSD_HRAID_VS_MTRON.png

Closer to the real world

Work in progress

What we're aiming for with Summa is running updates of the index. Luckily the low warm-up time for Solid State Drives means that sub-minute update times are possible with a plain index. Lucene 2.3 promises better performance when re-opening an index though, so the lead by Solid State Drives might not be that huge in The Real World.

Scenario 1: An index of aproximately the same size and layout as our current one. Additions are made every minute and we accept 10 seconds from a commit until the changes are reflected (the warm up time). The warm-up will take place in the background, affecting the performance of the active searcher, but we'll ignore that for now. What we can't ignore is that it takes approximately 5 seconds just to open the index (pending experiments with re-open).

Scenario 2: An index of aproximately the same size and layout as our current one. Additions are made every 10 minutes and we accept that there is 10 minutes from a commit until the changes are reflected (the warm up time). As before the warm-up will take place in the background.

Scenario 3: An index of aproximately the same size and layout as our current one. Additions are made every day and we accept that there is 1 hour from a commit until the changes are reflected (the warm up time).

Upcoming production system

We got 3 Quad-core Xeon 6MB cache, 16GB RAM machines. Each one equipped with

Test-code

Due to legal issues, the package below only contains a sub-part of the files needed for actual compiling and running. However, it should be enough for a quick review.

SBLucenePerformance0.1.zip

Hardware (last edited 2010-03-11 13:18:13 by localhost)