Cleaning up your zoo

Let’s look at something that’s über-practical this week. Some of you will define this version of practical as hackery, so you can hold your nose and slyly slip this into your bag of tricks. You’ll know when you need it.

Suppose your program stores values with the following structure in a database:

struct {
int n_bananas;
int n_peanuts;
int n_bamboo_stalks;
};

and due to downsizing at the zoo, you’d like to change your values to this:

struct {
int n_bananas;
int n_bamboo_stalks;
};

Did I say this was a contrived example? Regardless, if you’ve ever reorged your data structure, you’ll need to think about any data that you already have stored. How would one change millions of records like the above (or even just one)?

One option would be to change ‘int n_peanuts’ to ‘int reserved’, and forget about it. But that’s not so good – the old data will still be there for existing records, so you really can’t easily reuse that reserved field for anything.

You could write a program that reads from one format and produces another. And that’s the solution most of us would do (and no reason to hold your nose).

Or you could leverage the fact that db_dump and db_load use a scrutable text format. Perl has some modules that know about Berkeley DB, but here’s a Perl solution that uses the db_dump and db_load in your path to do the database part, and leaving really just the transformation part. Note that this is much the same as the option just described (a program that reads from one format and produces another), except that the program is here, written for you, and is trivial to modify. And it has the virtue of being in a neutral language – the Java folks won’t complain that it’s C, and the C/C++ folks won’t complain that it’s Java. I guess anyone could complain that it’s Perl…

 

#!/usr/bin/perl
#
# Copyright (c) 2011
#	Donald D. Anderson.  All rights reserved.
#
# Redistribution and use in source and binary forms are permitted.
# This software is provided 'as is' and any express or
# implied warranties, including, but not limited to, the implied
# warranties of merchantability, fitness for a particular purpose, or
# non-infringement, are disclaimed.
#
################################################################
#
# Usage:  bdb_convert_data fromfile tofile
#
# Converts data in a Berkeley DB database.
# Can be customized to remove or add bytes to each record,
# and even add new records.
#
# Warning: this should normally be used on a quiet system (or best, a
# system that has no live BDB processes.  There is nothing
# transactional about this script!
#
# Warning: should not be used if you've changed your btree compare function
# (since db_load will not do the right thing).
#
# See discussion on https://libdb.wordpress.com

die "Usage: $0 fromfile tofile" if ($#ARGV != 1);
$from = $ARGV[0];
$to = $ARGV[1];
die "$0: $from: does not exist" if (! -f $from);
die "$0: $to: exists, will not overwrite" if (-f $to);
open IN, "db_dump $from|" || die "$0: cannot run db_dump";
unlink "$to.TMP";
open OUT, "|db_load $to.TMP" || die "$0: cannot run db_load";

# convert_key and convert_value are called with $_ set to
# a data line from the db_dump output.  Each data line starts
# with a single space, then there are hex digits, a pair of hex
# digits for each 8 bit char.  E.g. ' 65696e7320d0bfd1' is 8 bytes.

# This convert_key passes through the key without modification
sub convert_key() {
    $line=$_;
    #print "key: $line";
    print OUT "$line";
}

# This convert_value 'removes' the second 4 bytes, for demonstration
# !! **** modify this as necessary **** !!
#
sub convert_value() {
    $line=$_;

    # !! **** here's the custom part **** !!
    if (length($_) > 17) {
      $line = substr($_,0,9) . substr($_,17);
    }

    #print "dat: $line";
    print OUT "$line";
}

$iskey = 1;

# The dbdump format contains some header info, that starts
# in the first column.  Those lines are copied directly.
# The data appears with a single space in the first column,
# followed by a bunch of hex numbers.  Lines of data alternate
# between keys and values.
while () {
    if (/^ /) {
        if ($iskey) {
            &convert_key;
        } else {
            &convert_value;
        }
        # alternate lines
        $iskey = ! $iskey;
    }
    else {
        print OUT $_;
    }
}
close (IN);
close (OUT);
rename "$to.TMP", "$to";

 

The same code is nicely formatted and colorized here.

Easy enough to customize.  If you needed to delete records, you could do it. If you needed to add records, you could do it.

Heed the warnings in the script.  First, it should be used on a quiet system. A minimum requirement is that the file being converted is closed by all. There is nothing transactional about this script!

Secondly, if you’ve changed your btree compare function, or duplicate sorting function, this script is not for you. Back to making a custom program.

Finally, if your data structure is not dead simple – if you can’t easily discern byte positions, etc. then by all means, write a proper conversion program in your choice of language.

And it might be nice to test it before trying it out on your production system. We all want to keep our zoo clean, but even hackers have standards.

Advertisements

About ddanderson

Berkeley DB, Java, C, C , C# consultant and jazz trumpeter
This entry was posted in Uncategorized. Bookmark the permalink.

5 Responses to Cleaning up your zoo

  1. K.S. Bhaskar says:

    A more interesting problem is how to effect the change on a live system. What if the zoo needs to keep operational even while eliminating peanuts? The script doesn’t have to be transactional because presumably the zoo has eliminated peanuts already, and now wants the ERP system to catch up to that fact. But it does need to operate on a live system where a record may be accessed and modified before it is changed as well as after it is changed.

    • ddanderson says:

      Thanks, Bhaskar. For non-critical systems, one may consider shutting down operations. New software with the new struct defines may need to be installed, which may give a non-transactional opportunity for such upgrades.

      For a continually operating zoos, there are a number of choices. Easiest might be to create an interim version of software that knows about both structures, and selects based on the record size returned by DB. Install that side by side with the current version, shut down the current version. Then have a transactional program that record by record, nibbles away those peanuts.
      I fully meant to blog a fuller response but ran out of time.

  2. Pingback: Schema evolution, or joke driven development? | libdb

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s