Schema evolution, or joke driven development?

In a previous post we talked about a fast and loose way to clean up your zoo — that is, how to evolve the structs you store with BDB. I stated that that fast and loose Perl script couldn’t be used transactionally. Baskhar’s comment ‘What if the zoo needs to keep operational even while eliminating peanuts?’ directs this post. And if you’re using the offline-upgrade route anyway, Alexandr Ciornii offered some improvements to the perl script. See, I do read those comments!

Somehow this question reminds me of the old joke about three envelopes. A new manager is appointed to a position and the on the way out, the old manager hands her three envelopes. The outgoing manager says, “in times of crisis, open these one at a time.” Well, after a short time, the new manager finds herself in hot water and she opens the first envelope. It says, “Blame your predecessor.” She does that, and things cool off for a while. But when the going gets tough again, she opens the second envelope. “Reorganize.” She promptly shuffles the organization structure and somehow that makes things better. Sometime later, she is faced with yet another crisis. She doesn’t have any choice but to open envelope #3. It says, “Prepare three envelopes.”

Now, suppose you were new to the zoo project, and you’ve been told that the zoo needs to stay up and running. If you’re lucky, you received those three envelopes from your predecessor.

Here’s our original struct:

    struct zoo_supplies {
       int n_bananas;
       int n_peanuts;
       int n_bamboo_stalks;
    };

and where we want to get to:

    struct downsized_zoo_supplies {
       int n_bananas;
       int n_bamboo_stalks;
    };

Envelope #1 tells us to blame our predecessor. Indeed, here’s what he could have done – put a version number in every struct to be stored. It should be first so it will never change position:

    struct zoo_supplies_versioned {
       int version_num;
       int n_bananas;
       int n_peanuts;
       int n_bamboo_stalks;
    };

version_num would be zeroed initially. Then, to downsize our struct, we’d have this:

    struct downsized_zoo_supplies_versioned {
       int version_num;
       int n_bananas;
       int n_bamboo_stalks;
    };

with version_num always being 1. When you read the struct from the database, look at version_num first so you know which one to cast it to. If you’re using C++/Java, you might want to inherit from a common class containing version_num.

This is all elementary stuff, our predecessor really missed the boat!

While blaming your predecessor might feel good, it didn’t solve the problem. The heat’s still on. So let’s go to envelope #2, and reorganize. All your data is starting in this format, which you’ve renamed:

    struct zoo_supplies_version_0 {
       int n_bananas;
       int n_peanuts;
       int n_bamboo_stalks;
    };

Before we get to the final version, let’s introduce this one:

    struct zoo_supplies_version_1 {
       int version_num;
       int n_bananas;
       int n_peanuts;
       int n_bamboo_stalks;
    };

Every new insert or update in the database uses zoo_supplies_version_1 (with version_num set to 1). When you get a record from the database, you can’t use the version_num field yet. Rather, you look at the size returned from DB, if you get sizeof(zoo_supplies_version_0), then cast it to that struct, otherwise cast to zoo_supplies_version_1.

This approach alters the database a little at a time, but we really need a push to get it all done. How about a little background utility that marches through the database to convert a record at a time. We’ll want to turn on the DB_READ_COMMITTED flag for its cursor to make sure it’s not holding on to any locks it doesn’t need.

Once we’ve confirmed that the background utility has done its march and every record is version 1, then we can finally make the real mod we’re seeking:

    struct zoo_supplies_version_2 {
       int version_num;
       int n_bananas;
       // sorry, no more peanuts
       int n_bamboo_stalks;
    };

At this point, we can reliably use the version_num field, and cast as appropriate.

But there’s an implicit problem here with adding a version_num field at all. Many BDB databases have a small record size – the 12 bytes in our toy example is not often too far off the mark. Adding 4 bytes to a small record can result in a proportionally large increase in the overall database size. If you’re memory tight and cache bound, your runtime performance may suffer in even greater proportion.

Time to open envelope #3? Not just yet, there’s a couple other solutions that just might have a better punchline. See you next post.

Advertisements

About ddanderson

Berkeley DB, Java, C, C , C# consultant and jazz trumpeter
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s