Keys to a performance win

You might think that you don’t have much of a choice in selecting keys for a database. Sometimes you have more choices than you think. And those choices can make a whale of a difference when it comes to performance.

Let’s start with a simple example using an online shop. Sure, you say, another online store example. Yeah, but this store deals exclusively with ultimate frisbee supplies! Or dog booties. Maybe lepidopterist hats? Now that I’ve roped in a few random google surfers, let’s get started :-).

Our order numbers are plain old integers, and we want to store the orders in a BDB database. Assuming we’re coding this in C/C++ we might start with a simple model like this:

 

struct OrderKey {
    int orderNumber;
};
struct Order {
    int orderItemList;    // a key into another data store
    int accountNumber;
    char shippingAddress[100];
    ......
};

 

The data for an order’s going to be a lot more complex than this, but we’re focusing on the key here. What could be simpler, just an integer, right?

If you use this key/value arrangement, things will function just fine. But when you start looking at what’s happening performance-wise, you might be surprised. Since we generate our order keys sequentially, we want our key/value pairs to appear sequentially in the Btree. But that won’t necessarily happen. If your processor is a ‘LittleEndian’ one, like that x86 that dominates commodity hardware, the order number appears with the least significant byte first. BDB processes keys as sequences of bytes — it has no clue that those bytes made up a meaningful integer. So order number 256 (0×00000100) is stored as these bytes: 00 01 00 00.  With its least significant ’00′ byte, it appears before order number 1 (and 2 and 3…) in the database. 257 appears between 1 and 2. To drive home the point, here’s the first chunk of keys you’d see in your database after storing order numbers 1 through 1000 as integer keys.

 

db bytes           interpreted as 'int' order number
00 01 00 00        256
00 02 00 00        512
00 03 00 00        768
01 00 00 00          1
01 01 00 00        257
01 02 00 00        513
01 03 00 00        769
02 00 00 00          2
02 01 00 00        258
02 02 00 00        514
02 03 00 00        770
03 00 00 00          3
03 01 00 00        259
03 02 00 00        515
03 03 00 00        771
04 00 00 00          4

The gap between order numbers 1 and 2, and between 2 and 3, etc., gets wider and wider as we add more and more orders.

Why should we care?  We care because if our access pattern is that the most recently allocated order numbers are accessed most recently, and those orders are scattered all over the btree, well, we just might be stressing the BDB cache. That is, we’d need more, more, more cache as the database gets bigger, bigger, bigger. Even if my application’s data set is entirely in memory, I still may be missing out on other cache benefits at the hardware level. We’re pretty much losing all the Locality of Reference inherent in our access pattern.

Okay, if scattered data is the disease, let’s look at the cures. DB->set_bt_compare() does the job, but with caveats. The bt_compare function is a custom function that allows you to do the comparison any way you want. Here’s the sort of compare function we’d want to have with this sort of integer key:

int my_compare(DB *db, const DBT *dbt1, const DBT *dbt2) {
    // I know what my keys look like!
    int key1 = *(int *)dbt1->data;
    int key2 = *(int *)dbt2->data;
    if (key1 < key2)
        return -1;
    else if (key1 > key2)
        return 1;
    else
        return 0;
}

There are two downsides to using this approach. First, the btree compare function is called a lot. It’s got to be really fast or else your performance will suffer. Even though the above code is pretty tight, it’s hard to get faster than BDB’s default memcmp. The more important issue is that introducing a btree compare function creates a maintenance headache for you. If you ever want to use the db_load utility, you’ll need to modify the sources to know about your keys. db_verify will no longer be able to check key ordering without source modification. Who knows what other future tools won’t be available to you. Since db_dump and db_load are useful for various reasons (data transfer, upgrades, ultimate db salvage) that should be enough for you.

So the better approach is to fix the key. Here’s one way:

 

struct OrderKey {
    char bytesForOrderNumber[4];  //   Store MSB of int first.
};

You can marshall the bytes yourself or call intrinsic byte swap routines (try __builtin_bswap32 for GCC, and _byteswap_ulong for VC++). If you’re using C++, you could make a nifty class (let’s call it BdbOrderedInt) that hides the marshalling and enforces the ordering. Our example suddenly becomes much more readable:

 

struct OrderKey {
    BdbOrderedInt orderNumber;
};

 

Oh what the heck, here’s an implementation of such a class – partially tested, which should be pretty close for demonstration purposes.

Another way that makes the actual data readable, but is less space efficient, would be to store the bytes as a string: “0000000123″.  For good or bad, this goes a little down the path of string typing your data as opposed to strong typing.

Both of these approaches will get us to the locality we are looking for.
Those of you that use the Java API are probably yawning. That API has a little bit different model of how you present keys to BDB that does the marshaling for you (for example, using IntegerBinding). They even go a little bit further, as negative values for signed quantities are ordered before zero and positive values. As for other BDB languages: C# – I don’t see any marshaling in the API; PHP, Perl, Python, I frankly don’t know if this is an issue.

There’s more to say about key selection, that will wait for another post.

 

About these ads

About ddanderson

Berkeley DB, Java, C, C , C# consultant and jazz trumpeter
This entry was posted in Uncategorized. Bookmark the permalink.

One Response to Keys to a performance win

  1. Jesús Cea says:

    In python, using pybsddb (the Berkeley DB bindings I maintain – http://www.jcea.es/programacion/pybsddb.htm ), you have the same issue. You basically store key -> value where both key and values are strings. So you must decide how to “serialize” whatever data/key you have to strings.

    The trivial way would be to serialize the number to a string (with “str()” or “repr()”), but it waste some space (in disk and in BDB cache). Other option would be to use “struct.pack(“>L”, key)” if your key fit in an unsigned long.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s