Conversations about life & privacy in the digital age

Speeding up and running legacy test suites, part one

This is part one in a two part series on Test Driven Development at SpiderOak.
In part one, I discuss ways to decrease the time it takes to run a test suite.
In part two, I’ll discuss two ways to run a test suite that are painful if the
tests are slow, but greatly beneficial if performed often with fast tests.

As any experienced developer will likely say, the longer a test suite takes to
run, the less often it will be run. A test suite that is seldom run can be
worse than no test suite at all, as production code behavior diverges from that
of the tests, possibly leading to a test suite that lies to you about the
correctness of your code. A top priority, therefore, for any software
development team that believes testing is beneficial, should be to maintain
fast tests.

Over the years, SpiderOak has struggled with this. The reason, and I suspect
many test suites run slowly for similar reasons, is tests which claim to be
testing a “unit”, but actually end up running code from many parts of the
system. In the early days of SpiderOak we worked around some of the problem by
caching, saving/restoring state using test fixtures, etc. But a much better
approach, which we’re in the process of implementing, is to make unit tests
actually test small units rather than entire systems. During the
transition, we still have the existing heavy tests to fall back on, but for
day-to-day development, small unit tests profoundly increase productivity.

There are many techniques for keeping tests small and fast, and even more for
transitioning a legacy test suite. Each code base will ultimately require its
own tricks, but I will outline a few here that we’ve adopted at SpiderOak.

Mocks

Mock objects are “stand-in” objects that replace parts of your code that are
expensive to set up or perform, such as encryption, network or disk access,
etc. Using mocks can greatly improve the running time of your tests. At
SpiderOak, we use Michael Foord’s excellent
Mock library.

One area where mocking has been particularly helpful in speeding up the legacy
tests in SpiderOak is by reducing startup time. In some cases, even if
individual tests run quickly, running the test suite can still take a long time
due to unnecessary startup costs, such as importing modules unrelated to the
code under test. To work around this, I often inject a fake module into
Python’s import system to avoid loading huge amounts of code orthogonal to what
I’m trying to test. As an example, at the top of a test module, you might see
the following:

import sys
from test.util import Bucket

# don't waste time importing the real things, since we're isolating anyway
sys.modules['foo'] = Bucket()
sys.modules['foo.bar'] = sys.modules['foo'].bar

import baz

How it works

When you import a module in Python, the interpreter first looks for it in
sys.modules. This speeds up subsequent imports of a module that has already
been imported. We can also take advantage of this fact to prevent importing of
bloated modules altogether, by sticking a lightweight fake object in there,
which will get imported instead of the real code.

In the example above, foo is a bloated module that takes a long time to load,
and baz is the module under test. baz imports foo, so without this
workaround, the test would take a long time to load as it imports foo. Since
we’re writing isolated unit tests, using Mocks to replace things in foo, we
can skip importing foo for the tests altogether, saving time.

Bucket is a simple class that I use whenever I need an object on which I can
access an arbitrary path of attributes. This is perfect for fake package/module
structures, so I often use it for this purpose.

from collections import defaultdict

class Bucket(defaultdict):
    def __init__(self, *args, **kw):
        super(Bucket, self).__init__(Bucket, *args, **kw)
        self.__dict__ = self

This class allows you to access arbitrary attributes and get another Bucket
back. For example:

bucket = Bucket()
some_object = bucket.some.path.to.some_object
assert type(some_object) == Bucket

A caveat: since Python imports packages and modules recursively, you need to insert each
part of the dotted path into sys.modules for this to work. As you can see, I
have done this for foo.bar in the example from above.

sys.modules['foo'] = Bucket()
sys.modules['foo.bar'] = sys.modules['foo'].bar

Ideally, using an isolated approach to TDD with Mock objects, your project
would never evolve into a state where importing modules takes a long time, but
when working with a legacy codebase, the above approach can sometimes help your
tests run faster, which means they’ll be run more often, during the transition.

Next, part two will outline two ways to run your tests regularly. After all, a
test suite is only useful when it is actually used.