Conversations about life & privacy in the digital age

SpiderOak’s Analysis and Recommendations for the Crypto in Kim Dotcom’s Mega, Part One

Here at SpiderOak, we’re excited to see new peers in the privacy industry. Kim Dotcom’s Mega has brought attention to privacy in a way that we’ve sought for years. Mega has created a surge of awareness in the importance of online privacy, and it will undoubtedly result in a boom of privacy-aware cloud applications. This is a good day for the Internet.

Crypto is very hard to get right even under the best of circumstances. Producing a usable browser-side cryptosystem on a tight schedule is a minefield, and getting a product out the door is an accomplishment. And while we applaud them for their bravado, we must be honest to their users. Rushing to meet a symbolic launch date was not the best choice for protecting the privacy of end users. A private launch first inviting cryptographers to use and abuse the system and make recommendations would have been worth the wait.

Mega’s crypto has similar design goals to SpiderOak, so we have a long familiarity with these topics. Bear with us as we dive into some of the important parts of their code, after which we’ll make some recommendations for Mega and their users.

In our analysis of Mega’s crypto, we are so far just looking at design-level problems and recommending solutions. We have not done a full implementation-level security code audit, where we painstakingly inspect, test, verify, and occasionally shake our heads at every line of code to be sure that it’s doing exactly what it should be. Those are expensive and take time.

We’re also ignoring entire categories of things that make this arrangement fragile, such as the untrustworthiness of the Javascript runtime as a secure environment for handling secrets, the lack of true random number source in many browsers, and the way the code is packaged and verified. Those have been discussed and debated in depth for a long time. Clipperz has some recommendations for code delivery which would at least make things easier for code auditors.

As of yesterday, the method used for verifying Javascript from untrusted mirrors was vulnerable although it’s likely fixed or will be fixed soon since that’s an easy change.

The reason we’re entertaining the idea of Javascript runtime cryptography at all is specifically because of the particular threat model involved. This approach may be effective at protecting the service provider from the perils of plaintext access to user data. We have the same goals here at SpiderOak and are sympathetic to them. Browser based crypto where the service provider is motivated to stay ignorant of your data is at least preferable to the Dropbox approach where no attempt at meaningful cryptography is made and everything is stored in plain text equivalent.

How Mega’s Key Generation and User Authentication Code Works

With Mega, as with SpiderOak, your passphrase has the double burden of supporting account authentication (without disclosing that password to the server) and outer level data encryption. An outer level key is derived from the text of your password using a key derivation function. With that derived outer level key, a “master key” for symmetric AES is encrypted. Everything else on down the hierarchy is encrypted to that master key, or another key that is itself directly or indirectly encrypted to that master key.

Here’s the master key creation function and the key derivation function:


function api_create_u_k()
{
    u_k = Array(4); // static master key, will be stored at
                    // the server side encrypted with the master pw
    for (var i = 4; i--; ) u_k[i] = rand(0x100000000);
}

// convert user-supplied password array
function prepare_key(a)
{
    var i, j, r;
    var pkey = [0x93C467E3,0x7DB0C7A4,0xD1BE3F81,0x0152CB56];

    for (r = 65536; r--; )
    {
        for (j = 0; j < a.length; j += 4)
        {
            key = [0,0,0,0];

            for (i = 0; i < 4; i++) if (i+j < a.length) key[i] = a[i+j];

            aes = new sjcl.cipher.aes(key);
            pkey = aes.encrypt(pkey);
        }
    }

    return pkey;
}

User Authentication starts in user0001.js with the u_login() function. It works by sending the server a calculated authentication value value called "userhash."

Here's the code for calculating userhash.


function stringhash(s,aes)
{
    var s32 = str_to_a32(s);
    var h32 = [0,0,0,0];

    for (i = 0; i < s32.length; i++) h32[i&3] ^= s32[i];

    for (i = 16384; i--; ) h32 = aes.encrypt(h32);

    return a32_to_base64([h32[0],h32[2]]);
}

It's a cute little hash with similarities to CBC-MAC. A 128-bit "message" is created by splitting up the string into 128-bit sections and xor-ing all of those together with 128 bits of zeroes. This message is then run through 16384 rounds of AES using a supplied key. In the case of the userhash, the key is the output of your password run through the Key Derivation Function.

Your email address and userhash are sent to Mega's servers, which presumably authenticate using the userhash, and your encrypted master key is returned along with a session identifier (sid). Getting the master key is therefore based on a simple token that can be replayed instead of a more secure challenge/response mechanism.

On the server side, auditors should look for a timing vulnerability that can be exploited when the server compares your userhash, which would allow an attacker to gradually learn the right hash over many attempts. If the server side implementation is competent, they're treating this stringhash output as a password (it effectively is one for auth purposes) and storing it server side using bcrypt or scrypt.

Now that the client has your master key, it is decrypted using the password-derived key. It gets a little confusing with their unfortunate reuse of variable names, but aes winds up being your master key. It's also interesting to note that if you select "remember me" on login, your master key gets stored unencrypted in browser local storage, where it undoubtedly gets written to disk.


    if (typeof res == 'object')
    {
        var aes = new sjcl.cipher.aes(ctx.passwordkey);

        // decrypt master key
        if (typeof res[0].k == 'string')
        {
            k = base64_to_a32(res[0].k);

            if (k.length == 4)
            {
                k = decrypt_key(aes,k);

                aes = new sjcl.cipher.aes(k);

The sid is derived in one of two ways. Either you get a tsid with the encrypted master key, which seems to be for ephemeral and unconfirmed accounts, or you get a csid, which seems to be for confirmed accounts. If you get a csid, your RSA private key is unpacked from the response and decrypted with your master key. t will be used in a bit.


    var t = mpi2b(base64urldecode(res[0].csid));

    var privk = a32_to_str(decrypt_key(aes,base64_to_a32(res[0].privk)));

The RSA private key is then turned into more usable integer arrays and used to decrypt csid, which becomes your sid. You will have to squint to see what's going on here, it's the second element of r.


    r = [k,
         base64urlencode(
            b2s(
                RSAdecrypt(t, rsa_privk[2], rsa_privk[0],
                           rsa_privk[1], rsa_privk[3])
               ).substr(0,43)),
         rsa_privk];

The far more interesting path is when you get a tsid. The client checks it by encrypting the first 16 bytes with your master key and verifying it against the last 16 bytes.


    t = base64urldecode(res[0].tsid);
    if (a32_to_str(
            encrypt_key(aes,str_to_a32(t.substr(0,16)))
        ) == t.substr(-16)
       )
        r = [k,res[0].tsid];

It seems that in the case of a tsid, the server does have the unprotected master key. It's the only way the server could have calculated the last sixteen bytes that are verified here. Since this is probably only for ephemeral accounts, the master key won't be used again and this doesn't really matter. It does open the interesting possibility that data uploaded with ephemeral accounts could be less private than confirmed accounts.

Analysis and First Recommendations

Beyond our caveats above, Mega's first "hair on fire" problem is the implementation of their Key Derivation Function. This function's job is to take a (probably weak) passphrase and 1) turn it into a usable symmetric key that can encrypt secrets and 2) do so in a way that is not easily brute forced. The first part it does well; the second part it hardly does at all.

Mega's KDF is not one of the best practice recommendations such as PBKDF2 or Bcrypt, but rather a simple home grown implementation which runs the user's passphrase through 216 rounds of an AES cipher with a fixed key. That key is the same across all users — there is no added "salt".

A "salt" is a small random value used to vary KDF output with the same password input. No salt means that any two users with the same passphrase will have the same KDF output. Various research suggests that a large portion of end users will have passwords found on a list of the top 10,000 most common passwords. Calculating Mega keys from such a list would be simple. Having to account for a random salt in the KDF exponentially complicates this effort, offering some protection to a database of users who often make poor password choices.

The other problem with Mega's KDF is just that AES is too fast. It's implemented in hardware on commodity systems — you're probably reading this on a system with hardware AES. Fast AES makes each password attempt faster, making it easier to brute force the password. In contrast, Bcrypt intentionally uses Blowfish because it is expensive to compute.

With this easily computed KDF, huge databases of password keys can be built using the fixed key and every password combination out to some reasonable length. These tables will not only allow the recovery of data encrypted inside every account, but they will also recover all the passwords themselves. (A database with combinations of usernames and passwords is often considered more valuable than the accounts they protect, because they can be re-used at PayPal.com, etc. and many will succeed.) It's been revealed that href="http://arstechnica.com/security/2013/01/cracking-tool-milks-weakness-to-reveal-some-mega-passwords/">Mega confirmation emails contain the encrypted master key, making it that much easier to brute force the password offline. Oh, and someone already created a crack tool to help you.

I suspect that the implementers of the system were reasoning to the effect of: "So what? This derived key is only held in memory and is only used for encrypting another key, and only that encrypted output is saved to the server. That next key is acting like a salt." To be effective, the salt must go at the beginning of this process rather than the end, so that every attack attempt has to restart calculations from the beginning. Whatever those next level of keys decrypt will be plaintext and look like plaintext so checking for success only adds a couple of cheap steps to an attack, if they're necessary at all.

Recommendation: Use PBKDF2, Scrypt, or Bcrypt instead of AES. There are JavaScript implementations available (we use jsBCrypt for web signups). If you must stick with AES for some reason, make the input to the KDF not be the raw password, but a SHA256 HMAC of the password with the salt as the HMAC key. Save the salt (plaintext) along with the output. Use at least 32 bytes of true random data for the salt. Turn up the rounds as high as you can stand (and use web workers to avoid stalling the browser).

While you're working on that section of the code, you can also improve the authentication process. When creating an account, run the KDF twice with the same password, but with two different salts. One salt is for deriving a outer layer key like usual, and the other is for creating and storing a challenge key. Then you can use a real challenge/response approach to auth instead of a static token, challenging a user to decrypt something using their challenge key. This makes it cheap for the server to do 'Zero-Knowledge' password proof auth (no CPU intensive bcrypt comparisons), avoids an attacker authenticating through just replaying a previous login session, and keeps you out of the business of storing a database of password equivalents.

In the short term, those of you using Mega should set a very long random password that you never use elsewhere and avoid storing sensitive data. Then as soon as these flaws are resolved, close your old Mega account and create a new one with a different password (and thus a whole new set of keys). Expect everything from the old account to be compromised.

We will continue in additional parts to discuss metadata encryption, data encryption, MACs, convergent encryption / encrypted data de-duplication, and exchanging messages with others.