Conversations about life & privacy in the digital age

New GPG key for Ubuntu/Debian repositories

Hello to our Debian and Ubuntu users. Our signing repository key has expired, and so new packages for these platforms are in place to rectify this. Users on these platforms will need to use one of the following two procedures for automatic package updates to continue.

Our new signing key info you should see is the following:

pub 1024D/F1A41D5E 2010-09-21 [expires: 2013-09-20]
Key fingerprint = 1AE8 1DE0 67D3 968A 5494 B175 5D65 4504 F1A4 1D5E
uid SpiderOak Apt Repository

To import this new key into your package manager you can upgrade to our newest beta, or install our new key:

The SpiderOak Crypto Monkeys

Who would win in a struggle between all the Mac OS X cats and all the Linux animals?

I just hope they’ve backed up their data; this is about to get messy.

Here’s the layered tiff if you want to play around.. and the 1920×699 jpeg

EDIT; here is a 2.2MB .rar package of high-res desktop-backgrounds featuring the image for single and dual-screen setups as requested – Download Link!

50 bonus gigs to the first TEN commentators to accurately identify each creature from left to right!

Update Here

So go ahead and download the SpiderOak client and join for free while you think, drop your answer in the comment section and if you are correct we will soon post a follow up with winning comments and instructions on how to get your bonus GB!

Does privacy really matter? To you? To Google?

I feel as though every couple of months a friend forwards me a story about the importance of privacy in this digital age upon which we live. And like clockwork, I received the following post the other day. If you have a brief moment, please give it a quick read:

GCreep: Google Engineer Stalked Teens, Spied on Chats (Updated)

In quick summary, a systems administrator at Google penetrated several Google accounts to view Google Voice and Google Chat logs. Apparently he had known the people whose accounts he had entered and was literally ‘spying’ on them.

Of course this one breach brings up a whole host of issues and touches on the much much larger problem of what else could potentially be going on behind the Google firewall that isn’t being caught or reported. In this case it was Mr. Barksdale’s arrogance and aggressiveness that lead to his demise but one has to figure others could/would be much smarter in their approach. At the very least – it surfaces the question.

So why did we create our ‘zero-knowledge’ privacy environment? I suppose the above case proves the point so well that no explanation is really necessary. And does this privacy come at a price? Yes – it does indeed. It means that SpiderOak cannot provide services with the same speed or as ‘openly’ as some of our competitors (feel free to read this post for further explanation: Why and How SpiderOak architecture is different than other online storage services: The surprising consequences on database design from our Zero-Knowledge Approach to privacy). However, to create a world where neither our system administrators nor potential thieves nor any government agency across the globe could access plaintext data on our servers was far more important and necessary.

After all, the world we live in now is as much about having options as anything else and we present our ‘zero-knowledge’ privacy environment as one for the security conscious. Oh – and don’t worry – if you miss this post then there will surely be another opportunity.

Two-Way Mobile Clients & You

Hello everyone,

I’ve been taking a bunch of requests for two-way functionality from our mobile clients, and I would like to address that. Let me start by saying that I *fully* understand how much of a killer feature that would be for all of us. I use SpiderOak for my own personal file storage, sync, and share, as well as using it for work (Android beta testers know I use a ShareRoom to distribute the beta builds, for example). I use my SpiderOak iPhone app several times a day. For mobile users, your pain is my pain as well.

That said, traditional two-way interaction with SpiderOak would be nontrivial for a mobile phone. Our zero-knowledge system takes full advantage of the power available in modern computer systems to run the encryption/decryption and file de-dupe locally before uploading to our servers- see our engineering description page for more details on that. I think most of the latest generation of smartphones (1 GHz Android, A4-powered iOS, generally) are finally powerful enough to even think about doing this. That said, there are still a few obstacles to overcome:

  • Most of our application code is written in Python. Either this can be ported to both ObjC and Java, or I can figure out how to tie in some sort of framework to connect it across from Python. Either way, it won’t be pretty or quick.
  • Battery life is generally the biggest concern I have, once the above could be done. SpiderOak would be sucking down the device’s battery, as well as sucking down data usage (important for those of us on metered data plans). Running the CPU as hard as we can certainly do will run down the battery just as hard. CPU and memory usage of our desktop client is one of the most common complaints about our service, and moving that to a phone is only going to exaggerate this.

I have ideas ideas we’re working on to incorporate our very cool DIY storage system into our mobile platforms to offer secured backup to our storage. This would take advantage of built-in cryptography on the phone, and use a shared key between the desktop application and the phone to encrypt data and then upload that to our servers. That can then be tied in desktop clients to offer zero-knowledge backup from mobile widgets. I can’t guarantee this is going to happen overnight, but it’s where I want to take us until the ARM Cortex A15 gives us the power to do this efficiently.

By using our HTTP-based DIY system, we can bypass the most CPU and memory-intensive portions of SpiderOak interactions, and as I don’t anticipate typical mobile load being that much, it shouldn’t overwhelm the desktop client to ask for just a little bit more help.

SpiderOak: From where does it come?

In an ancient time there existed a tree which the locals referred to as the Spider Oak due to the web-like tentacles that hung from its branches. The tree was known to catch and hold the dreams and hopes of people for safe keeping. Situated on the top of a hillside, the more and more the tree collected these visions the bigger and bigger it grew. It was later thought that there was a special type of spider that lived only in this type of oak tree that spun an intricate web between the leaves which created the illusion of the ‘web-like’ tentacles which are so strong they were known to withstand the weight of a falling full grown human should he or she slip while climbing amongst the branches. It is believed that this type of tree still stands today although no one has been able to pinpoint the exact location. The spider who spins these webs has similarly not been found despite spotting similar web patterns in other types of environments.

Whether folklore or myth or mystery or a bit of all three, we like the idea of holding hopes and dreams and keeping them safe for eternity.

Why SpiderOak doesn’t de-duplicate data across users (and why it should worry you if we did)

One of the features of SpiderOak is that if you backup the same file
twice, on the same computer or different computers within your account, the 2nd
copy doesn’t take up any additional space. This also applies if you have
several versions of a file as it evolves over time — we only need to save the
new data blocks.

Some storage companies take this de-duplication to a second level, and do a
similar form of de-duplication across all the data from all their customers.
It’s a great deal for the company. They can sell the bytes of storage to every
user at full price while incurring zero additional cost. In some ways its
helpful to the user too — uploads are certainly faster when you don’t have to
transfer the data!

How does cross user data de-duplication even work?

The entire process of a server de-duplicating files that haven’t even been
uploaded to the server yet is a bit magical, and works through the properties
of cryptographic hash functions. These allow us to make something like a
fingerprint of any file. Like people, no two files should have the same
fingerprints, right? The server can just keep a database of file fingerprints
and compare any new data to these fingerprints.

So it’s possible for the server to de-duplicate and store my files, knowing
only the fingerprints. So, how does this affect my privacy at all then?

With only the knowledge of a file’s fingerprint, there’s no clear way to
reconstruct the file the fingerprint was made from. We could even use a
technique for prepending deduplicated files with some random data when making
fingerprints, so they would not match outside databases of common files and
their fingerprints.

However, imagine a scenario like this. Alice has operated the best BBQ
restaurant in Kansas City for decades. No one can match Alice’s amazing sauce.
Suddenly Mallory opens a BBQ joint right across the street, with better prices
and sauce that’s just as good! Alice is pretty sure Mallory has stolen the
recipe right off her computer! Her attorney convinces a court to issue a
subpoena to SpiderOak: Does Mallory have a copy of her recipe? “How would we
know? We have no knowledge of his data beyond its billable size.” Exasperated,
the court rewrites their subpoena, “Does Mallory’s data include a file with
matching fingerprints from the provided recipe file here in exhibit A?” If
we have a de-duplication database, this is indeed a question we can answer, and
we will be required to answer. As much as we enjoyed Alice’s BBQ, we never
wanted to support her cause by answering a 3rd party’s questions about customer

Imagine more everyday scenarios: a divorce case; a patent, trademark, or
copyright dispute; a political case where a prosecutor wants to establish that
the high level defendant “had knowledge of” the topic. Establishing that they
had a document about whatever it was in their personal online storage account
might be very interesting to the attorneys. Is it a good idea for us to be
even capable of betraying our customers like that?

Bonus: Deduping via Cryptographic Fingerprints Enables The Ultimate Sin

The ultimate sin from a storage company isn’t simply losing customer data.
That’s far too straight forward a blunder to deserve much credit, really.

The ultimate sin is when a storage company accidentally presents Bob’s data
to Alice as if it were her own. At once Bob is betrayed and Alice is
This is what can happen if Bob and Alice each have different files
that happen to have the same fingerprints.

Actually cryptographic hashes are more like DNA evidence at a crime scene
than real fingerprints — people with identical DNA markers can and do exist.
Cryptographers have invented many smart ways to reduce the likelihood of this,
but those ways tend to make the calculations more expensive and the database
larger, so some non-zero level of acceptable risk must of collisions be
determined. In a large enough population of data, collisions happen.

This all makes for an entertaining conversation between Alice and Bob when
they meet each other this way. Hopefully they’ll tell the operators of the
storage service, which will otherwise have no way of even knowing this error
has happened
. Of course, it’s still rather unlikely to happen to you…

There’s a Public Information Leak Anyone can Exploit

Any user of the system can check if a file is already contained within the
global storage set. They do this simply by adding the file to their own storage account,
and observing the network traffic that follows. If the upload completes
without transferring the content of the file, it must be in the backup
somewhere already.

For a small amount of additional work they could arrange to shutdown the
uploading program as soon as they observe enough network traffic to know the
file is not a duplicate. Then they could check again later. In this way, they
could check repeatedly over time, and know when a given file enters the global
storage set

If you wanted to, you could check right now if a particular file was already
backed up by many online storage services.

How might someone be able to maliciously use this property of a global
de-duplication system to their advantage?

  • You could send a new file to someone using the storage service and know
    for sure when it had arrived in their possession
  • You could combine with the href="">Canary Trap method to
    expose the specific person who is leaking government documents or corporate
    trade secrets to journalists
  • You could determine whether your copyrighted work exists on the backup
    service, and then sue the storage service for information on users storing the

There are also categories of documents that only a particular user is likely
to have.

How much space savings are we really talking about?

Surely more than a few users have the same Britney Spears mp3s and other
predictable duplicates. Across a large population, might 30%, 40%, or perhaps
even 50% of the data be redundant? (Of course there should be greater
likelihood of matches as the total population increases. This effect of
increasing de-duplication diminishes though: it is more significant as the data
set grows from 1 user to 10,000 users than from 10,000 users to 20,000 users,
and so on.)

In our early planning phase with SpiderOak, and during the first few months
while we operated privately before launch, we did a study with a population of
cooperative users who were willing to share fingerprints, anonymized as much as
was practical. Of course, our efforts suffered from obvious selection bias,
and probably numerous other drawbacks that make them unscientific. However,
even when we plotted the equations up to very large populations, we found that
the savings was unlikely to be as much as 20%. We chose to focus instead on
developing other cost advantages, such as building our own backend storage
clustering software.

What if SpiderOak suddenly decides to start doing this in the future?

We probably won’t… and if we did, it’s not possible to do so
retroactively with the data already stored. Suppose we were convinced someday;
here are some ways we might minimize the dangers:

  • We would certainly discuss it with the SpiderOak community first and incorporate the often-excellent suggestions we receive
  • It would be configurable according to each user’s preference
  • We would share some portion of the space savings with each customer
  • We would only de-duplicate on commonly shared and traded filetypes, like mp3s, where it’s most likely to be effective, and least likely to be harmful

A New Approach to Syncing Folder Deletions

One of goals of SpiderOak sync is that it will never destroy data in a way
that cannot be retrieved, even if the Sync happens wrongly.  So, as a design
goal, SpiderOak sync will never delete a file or folder that is not already
backed up.

Every time SpiderOak deletes a file, it checks that the file already exists
in the folder’s journal, and the timestamp of the file currently on disk
matches that of the file in the journal, or a cryptographic fingerprint match
if the timestamps differ.  This allows only the narrowest of what programmers
call “race conditions” when deleting a file because of a sync.  Here’s how the
race works:

  1. SpiderOak checks that the timestamp and cryptographic fingerprint match the journal (i.e. it is backed up and could be retrieve from the backup set.)
  2. SpiderOak deletes the file

The trouble is that there is a very small time window between step 1 and 2. 
The user could potentially save new data into the file during this very small
time window.  If the user were to save new data into this file at this instant,
the two actions are racing to completion.

Since the time window is so very small (less than milliseconds), this is an
acceptable risk.

Now consider the same scenario for deleting a folder.  Again, SpiderOak
makes a pass through the folder and verifies that the complete contents are
available in the journals, then it removes the folder.  The trouble now though
is that the window is much larger between step 1 and 2.  A very large folder
could take minutes for SpiderOak to scan through and verify.  It maybe modified
again between the time we start scanning and the time we finish scanning, and
before the deletion begins. 

Even though SpiderOak is plugged into the OS’s system for notification of
changes to the file system, such notifications are not guaranteed to be
immediate or to happen at all (such as on a network volume.)

So there is a larger “race condition,” or opportunity for data to be saved
to the folder between step 1 and 2 in the case of a large folder.

So, SpiderOak again tries to be conservative.  Instead of deleting the
folder, it tries to rename it out of the way.  Then, later it can verify that
nothing is changed inside the folder, after it has been renamed out of the

Syncing deletes of folders actually most commonly fails in this renaming
step.  Sometimes it just can’t rename it.  There are some differences in how
Windows and Unix platforms handle open files in these cases, and the rename
solution tends to work well on Unix and has greater opportunity for error on
Windows.  There are also some cases in which it categorically fails — such as
trying to rename across drive letters in Windows or (in Unix) across different
file systems.

We could fix those, but I think an entirely new approach in probably

Starting in the next version, instead of approaching the “delete a folder”
action as the deletion of an entire folder, it will now approach it as the
deletion of each individual item contained within the folder and all of its
subfolders recursively. We will use the same sequence as described above for
individual file deletions for each file, from the lowest subfolders on up, and prune folders when they are free
of files.

This eliminates the need for the rename step, reduces the race condition
down to milliseconds in the case of each removed file. Most importantly, this
means that the files causing problems (i.e. the files in use, or are changing
to fast to backup and thus SpiderOak refuses to delete, etc.) will be obvious:
they will be the only files remaining.

We’ll have a beta available with this behavior soon, announced in the href="">release notes (rss).

The ‘Forum’: A Nudist Colony waiting to happen…

When we first started SpiderOak I did not know anything about a ‘Forum’. In truth, I didn’t even know that Forums existed on companies sites as a way for users to interact with each other, share ideas, experiences, and lessons learned. I suppose it would be safe to say that I have never been that kind of user.

Roughly nine months into SpiderOak we started to receive more requests to setup a Forum (at that time I had been doing all of the customer service requests so I would see the suggestions firsthand). This of course prompted me to ask “What is a Forum?” and “How is a Forum different than other forms of communication on our website?” and “Do we NEED a Forum?”. My research returned some contradictory findings which I will share in the paragraphs to follow.

I was told by many (including members of our own team) that a ‘Forum’ was a way for SpiderOak users and staff to have open conversations about various topics, to discuss shared experiences with the product (both positive and negative), to serve as a live journal and index of information, an open line of communication, and the like. This all sounded like a good idea – a positive ‘forum’ for conversation where everyone (including SpiderOak staff members) could learn from each other and which we could use as tool for growth and understanding about how people use SpiderOak and how we can make it better.

Based on this information and the ideas above, we started to construct our Forum. In full disclosure and not as a typical ‘Forum’ user, I was a bit concerned and quite scared that this ‘Forum’ would be too naked and too exposing. After all, the purpose was full disclosure and openness which we are all in favor of in theory but in practice we all usually wear some form of clothing. After all, a nudist colony is a fun place to visit but I would not want to live there permanently let alone be the host as you never know who is going to show up.

The Forum launch was welcomed by our users and we received some very positive feedback. Of course I would monitor it almost daily to see what users were saying – the good and the bad – and worked hard (as I always try) to place every comment in the right context and use it as a learning experience. Over time, my job duties took me a bit further away from actively engaging the forum but I would still review its contents at least weekly.

Fast forward to a few days ago. I was reviewing our customer service emails and read this from one of our users:

“According to your Forum, your Sync product does not work so am closing my account. Thank you.”

As with most cancellations, I try to send an email to gain a more full and complete understanding of their reasons for leaving. In the response from this particular user I would learn that he never actually tried to use our Sync product. As you might imagine, I was very upset by this and immediately went back to our Forum to better understand how this form of expression and openness turned into a tool capable of driving users away. The answer was clear and I would invite you to have a look.

In full disclosure, our Sync product does not work perfectly in every situation. However, in the vast majority of implementations it does perform well and as intended. Given the very nature of the Forum as a collection for topics that users are having difficulty navigating, it is not surprising that these posts would be present. And if I were simply to review this one page, casually perusing the headers but never actually reading the specific post dialog, I suppose I might come to a similar conclusion about SpiderOak Sync. However, if you open and read into any of these posts, you will see that there are pages and pages of dialog. And further – in several cases – there have been resolutions to problems posted. But one would never see this on the first page displaying the titles or the subsequent pages if they never bothered to look deeper into any one Forum string.

This realization makes me mad for three reasons. First – one of the early lessons of childhood is that you should never judge a book by its cover and as adults we all too often forget this rather simple principle. Second – users like the one above diminish and dismiss all of the hard work and effort of our development team. They work tirelessly to make SpiderOak better and deserve at least the opportunity to provide a helpful solution as opposed to ‘according to your forums it doesn’t work’. And third – it taints the purpose and design of the whole ‘Forum’ concept as explained in the introduction.

I would also like to add that I took the time to visit the Forums of our competitors and it would appear that they suffer from a similar fate. And as I feel bad for our lost opportunities, I feel bad for theirs as well as we all work hard and want to provide a good product to our users.

In the end, it is clear to me that we need to restructure the nature and presentation of our Forum. What started as a place for open communication has turned into a concentration of negativity that does far more harm than good for all of us who have Forums (referring specifically to the competitors sites I reviewed) as well as potential users who are being swayed prematurely. One of our most basic ideas is to better categorize our Forum posts so that a user who simply wants to review general information can access it on the homepage under ‘General Discussion’. The other posts are not going away of course but rather just being properly placed under the appropriate sections. On a larger scale I would like to find a way to display the final post of a Forum string as opposed to the title or first line as it is much more descriptive and true to the work and effort involved. The idea is not and never will be to minimize or curb exposure as this would defeat the very nature and ‘Nudistness’ of the Forum but surely there is a medium that can be met.

The Monthly 100GB Premium account Twitter Contest – Redux!

Here at href="">SpiderOak, we are all about providing a Free, innovative and secure
online backup, synchronization, and sharing solution in addition to great
service to our customers. However, we also have lighter side.

With the release of href="">SpiderOak 3.7 we thought it would be a good idea to reboot our twitter free premium account contest.

Starting August 2010, anyone that follows href="">@SpiderOak on Twitter
(start now!) and Re-Tweets: ‘I Just entered the SpiderOak Free Account Contest!
You can win by following @Spideroak and RT! #giveaway #contest
‘ will be in
the running for a 1-year 100GB SpiderOak Premium account.

The rules are as easy to follow as using SpiderOak! (Download our client for free href="">here and get up to 7GB Free for life!)

  • Follow href="">@SpiderOak
  • Re-Tweet ‘I Just entered the SpiderOak Free Account Contest!
    You can win by following @Spideroak and RT! #giveaway #contest

The winners will be announced via Twitter the 30th of every month.

Psst… If you can’t wait or are feeling unlucky, you can use the promotion
code ‘twitterspideroak’ and pick up a 20% discount on any yearly
purchased account – 100 GB, 200 GBs, and so on…

We also offer our popular href="">refer-a-friend program, href="">affiliate program for webmasters and marketers, href="">white-label solution and of course our new and exciting href="">DIY Archival Data Storage API for developers big and small.

Rules: The contest is open to anyone including previous free account
holders. SpiderOak employees, family member’s, and close friends of
employees are not eligible. The winners will be picked at random from
re-tweets and announced via twitter tweets and direct messages to the
winners. If a winner does not answer and claim their price within 7 days,
another winner will be picked at random.

Netbooks you say? Well there’s a client for that..

When people discussed the future of ‘mobile’ back in 1999, everything was WAP and cell phone Internet and mobile website development and thin clients. Well at least most of us remember how that ‘mobile revolution’ changed our lives, or, at the very least, our 401K’s.

Fast forward about half a decade and a company named Asus launches the ‘Eee PC 700′. At first the news of this new smaller notebook was a non-event; after all it is just a smaller whose main features are low weight and lasting battery life.

Fast forward again and it’s 2010 and netbooks sales are dwarfing traditional laptops, entire operating systems are being rewritten to fit our smallest best friends, and a new generation of ‘Pads’ are rolling out in the wake of the immense popularity of the cheap, lightweight and long lasting mini-laptops.

Netbooks and SpiderOak

At SpiderOak we pride ourselves in offering our service and client for pretty much every operating system imaginable and have supported netbook usage from the very beginning (now please don’t take this as an invite to start asking for Amiga OS support and tell us about how our client crashes on AS400 systems).

And with our new release we have resized, stripped, and customize our client for the sometimes quite low resolution on 7″-10″ netbook screens. We have also improved memory handling and a few other things so that you will get the best possible experience on your little sub-notebook friend.

The client auto-senses maximum resolution and customizes itself on installation so no ‘special download’ is needed just download the latest version of SpiderOak and you are good to go.

For those interested our complete list of release notes for the new release is available here.