In a past column, I’ve mentioned memp_trickle as a way to get beyond the double I/O problem. And it often works well for this, but there are times when it doesn’t.
Trickle is a sort of optimization that I would call speculative. These sorts of optimizations attempt to predict the future. We do work now, in a separate thread, because in the future the fruits of our work will be useful. In trickle’s case, we do writes from the cache now, in a separate thread, because in the future clean cache pages will eliminate one of our I/Os in the main thread, decreasing latency.
But gazing into the crystal ball of the future can give a hazy picture. One obvious case is that we might not benefit from the clean cache page, ever. Our program may simply stop, or have no more database requests. Generally we’re not particularly worried about that — BDB systems typically run forever, we’ll eventually get more traffic, updates, orders, etc.
Our second hazy case is that we may not need more clean cache pages. If our entire working set of accessed pages fits into the BDB cache, then we’ll be accessing the same pages over and over. No new pages needed. Trickle done on this sort of system will create extra I/O traffic. Consider a single leaf page in this scenario. It’s updated, perhaps once a second, but never written to disk, at least not until a checkpoint. Every update, we get for free, as far as I/O goes. Another way to look at these free updates is that the update per write ratio is way up. Add in a trickle thread, and it may be written more often (update per write goes down). That’s unneeded I/O.
Unneeded I/O yes, but this may not be a big problem. Remember in this scenario our entire working set fits into the BDB cache. Our main thread is not doing any I/O anyway. While trickle adds more I/O, but nobody is waiting on those spinning disks. If we were paying attention to our BDB stats, we’d see that we didn’t have a double I/O problem to begin with.
There’s another hazy case that’s a little more subtle. Even though our data accesses may not be entirely in cache, and we do see double I/Os, we may see trickle be counter-productive. This can happen if we’ve totally saturated our I/O. The extra burden of trickle adds to the I/O queue, and any I/O request will take longer. Trickle may still be helpful if our cache hit rate is low enough that we don’t have many free updates and we’ll really need a high proportion of pages that trickle creates.
Trickle’s bread and butter scenario is when there is a mix of get and put traffic (get benefits the most from trickles effects, puts are needed to create dirty pages that give trickle something to do), when I/O is not overwhelmed, when the system is not entirely in cache. On the butter side down, we see trickle not performing when we don’t have some of those conditions satisfied. There’s lots of in between when it’s not so clear, you just have to try it, fiddle with the frequency and percentage, and see.
I’ll have more to say about other sorts of speculative optimizations in later posts. For now, let’s just say that like other forms of speculation, this one has no guarantees. There is never any substitute for testing on your own system.