If you have a ‘readonly’ Btree database in BDB, you might benefit by this small trick that has multiple benefits. The trick is to reload the database. You’ll get a compact file with blocks appearing in order.
Perhaps you’re using BDB as a cached front end to your slower SQL database, and you dump an important view and import it into BDB. Maybe you’ve written a custom importer program. You’ve got this down to a process, that runs, say, hourly. Here’s the thing – if you happened to import the data in the same order that it will appear in BDB (i.e. the key is sorted ascending) you’ll get some great optimizations.
When a page is filled up and is being split, and BDB recognizes that the new item is at the end of the tree, then the new item is placed on its own page. That means that data inserted in order will ‘leave behind’ leaf pages that are almost completely filled. BDB also recognizes the opposite case, when a key is inserted at the beginning of the database, and makes the uneven split in the other direction.
Here are the benefits to such a compaction. More key/data pairs per page means fewer pages. Your cache is more effective since you can squeeze more key/data pairs into memory at a time. Viewed from another angle, less of your BDB cache is effectively wasted space. (Nobody wants to say it this way but a fill factor of 72% is 28% wasted space). More keys on a page means fewer internal pages in the level above. There is a cascading effect – the btree may be shallower in a compact database than one uncompacted. Fewer levels means faster access.
One more benefit to reloading is that pages will be allocated in the underlying file in order. Sequential (cursor) scans through the database are going to appear as accesses to sequential file pages. If the OS performs readahead, your IO will be already done when you advance to the next block. Your OS may not have readahead, but your file system may try to group sequential disk blocks closely on the physical disk. If the disk’s cache can accommodate it, sequential read requests may be satisfied in advance there. The takeaway is that even if BDB’s cache is not large enough to hold your nicely ordered file, you will probably get better read performance during cursor scans due to an optimization or cache at some other level.
If you’ve inserted all your key/data pairs in key-order then you’ve already getting these great benefits. But if you were unable to that, here’s the quick and easy way to get there. Remember, we’re talking about a readonly database, so the right time to do this is right after creating the db file and before your application opens it.
$ db_dump x.db | db_load new.x.db
$ mv new.x.db x.db
That’s it. I’ve written this in UNIX shell-ese, but it works similarly on other systems. One could easily write a C/C++/Java/C#/etc. function to do this for embedded installations.
And speaking of a function to do this, there is already the DB->compact function, which can be used on a running system to get many of the same effects we were touting for compacted data. DB->compact won’t try to optimize the page ordering. But that’s okay, it’s really solving a harder problem since it can be used on databases that are not readonly.
If your database is not strictly readonly, there’s a slight downside to a fully compacted database. Since all the pages are filled to the brim with key/data pairs, a new entry, any new entry, will split a page. You’re pretty much guaranteed that your first put and some subsequent ones will be more expensive. There’s a cost to splitting pages that needs to be weighed against the benefits you’ll get. Fortunately, DB->compact has an input fill factor; with an access pattern with higher proportion of scattered writes, you may want to lower the fill factor.
There you have it. Two lines of shell code, and gobs of verbiage to beat it to death. Good thing I don’t get paid by lines of code.