Conversations about life & privacy in the digital age

Dropping DropBox: A Relocation Guide

If you have recently switched from DropBox to SpiderOak, we’d like to take a few moments to welcome you to the secure side of backup. We know that switching backup providers can be a lot like moving to a new place.

It can be exciting and maybe a little intimidating too. We at SpiderOak would like to provide you with a relocation guide that will make the transition a little easier. There’s no map or moving boxes required, just the opportunity to start living your new and improved life on the
cloud.

Know the Territory:

You may be accustomed to the DropBox landscape, but at SpiderOak, security is our foundation. We have a strict, zero-knowledge privacy policy and an extensive, layered encryption system. Your data will remain secure on our servers in your very own SpiderOak safe house and we don’t even have the keys! Your password is the only key to unlock the encryptions and we have no way of ever knowing your password. At SpiderOak, we take security seriously.

Learn the Language:

With DropBox, you created a central folder on your hard drive for backup. You dragged and dropped your files into the main DropBox folder. SpiderOak has a completely different approach that allows you to keep your current file structure. Our client allows you to select data from your folder hierarchy using the ‘BackUp’ tab. SpiderOak uploads a mirror copy of your selection to our servers.

As you can see below, it’s easy to select files for backup.

No need to move folders around anymore. ‘Drag and Drop’ becomes ‘Select and Save’

Practice the Customs:

Security is the backbone for all of our features. To ensure security, backup must occur before syncing and sharing. Anytime you modify a file, SpiderOak must first upload the changes and build encryption blocks before any other process can begin. This requires you to back up every device you would like to sync. It also requires a little bit of extra time for the upload process. The motto to remember: ‘Security is our number one priority’.

Get to Know the Locals:

Learn more about the SpiderOak community by visiting our FAQs and Forum. The FAQ is a great place to learn how SpiderOak can meet your individual needs. You can also learn about the variety of special features that SpiderOak has to offer. Our Forum provides an interactive community where our more experienced users can help you get acquainted with your new surroundings. If you have specific questions or requests, please contact our SpiderOak Customer Relations department.

Embrace Your New Home:

SpiderOak offers our free 2GB plan to backup your data for as long as you like. When you’re ready, you can expand your new space and upgrade to our paid plans in 100GB increments. We provide 100GB for $10 per month or $100 per year. SpiderOak doesn’t discriminate. You can back up as many devices as you like, even external drives. We support all major platforms and have no file size limit. The space is yours. The security is yours. Welcome to SpiderOak!

What does i_ m__n __ __v_r _____ ___ ____ ____ ___c_?

We have been getting a lot of questions lately about our block level
de-duplication, how it works, and how it is applied through the SpiderOak
process. As I consider myself to be layman, please allow me to explain this in
more simplistic terms – such that even I will be able to understand.

For the sake of this example, let us say you have created a document
entitled ‘Why peanut butter and jelly sandwiches are better when you place
salt & vinegar chips in the middle’. The size of this document is 10k.
After saving the initial version, you go back and make 9 additional edits.
Each time you make an edit, you save the document as a new version thus giving
you 10 complete versions. And with each version being exactly 10k, the
complete document takes up a total of 100k on disk (or 10 versions multiplied
by 10k).

SpiderOak, on the other hand, works much more efficiently when storing data
- creating many wonderful benefits for the user. As you can imagine, from the
first version of ‘Why peanut butter and jelly sandwiches are better when you
place salt & vinegar chips in the middle’ to the last, only small pieces
of the document have changed. One simple example is replacing the word
‘excitable’ with the word ‘volatile’ in the third paragraph. Instead of
storing (and uploading) a whole new version of the document each time a small
change is made, SpiderOak breaks each document into blocks of data and then
only backs up (or uploads) the change or delta between the new version and the
old. Using this process, the same 10 versions of the aforementioned document
on SpiderOak only amounts to 15k on disk (as opposed to 100k above).

Although the below visual example only uses two versions of a document, it
does further explain how the SpiderOak de-duplication process occurs.

This process saves our users a considerable amount of space as a user is
only billed for the de-duplicated amount. Furthermore, the upload can occur
with much greater speed because only the changed blocks of data are sent from
one version to the next. In the end, SpiderOak works extraordinarily hard to
never upload and/or store the same block of data twice – saving our users
money and time.

Question: So perhaps now you may better understand the title and how it
relates to de-duplication?

Answer: What does it mean to never store the same data twice?

Online Privacy – Strange Bedfellows…

Normally, when people think of ‘online’, privacy is definitely not the first, second, or fiftieth thought that comes to mind. If fact, people generally exhibit quite the opposite response and conjure up images of complete nakedness. After all, the modern-day Internet has evolved mostly for the purpose of providing instant exposure, distribution, and presence to the world over. The question then becomes, can the value of the Internet extend beyond nakedness?

One of the driving purposes behind SpiderOak was to dispel the notion that just because data is online means it can no longer be private. The goal was simple – devise a plan where a user’s files, filenames, file types, folders, and/or any other personal information is never exposed to anyone for any reason (even under government subpoena). This of course includes the SpiderOak staff who – even with physical access to the servers upon which the data resides – should never be able to see or interact with a user’s plaintext data. Creating this environment, however, would prove more difficult than simply making these statements.

In the beginning, we grappled with how best to accomplish this feat – creating ‘Zero-Knowledge’ privacy as we call it. Most of our competitors and thousands of other companies make claims and statements about security and privacy but, at the end of the day, they would all fall short of achieving our aforementioned goals. To use the most general example – if a company can reset your password, it means someone in the company has access to your encryption keys (if they encrypt the data) which further means they can access your data if they ‘had’ to or, worse yet, someone else could with far worse intentions.

A more specific case is Mozy’s use of encryption. Mozy’s encryption is far better than most online storage providers and yet it contains serious oversights. The default options have you choosing between a stronger ‘Mozy’ key (which Mozy then knows and could use to decrypt your data) or a weaker key you choose on your own and keep private. Even if you choose the weaker private key, Mozy still stores your file and folder names in plain text – meaning they know a list of every file archived from your computer. We would suspect they know the size and timestamp of each file as well although this information has not been publicly disclosed. This seems to represent a great deal of information to reveal about the contents of your ‘private’ data, doesn’t it?

To overcome this threat and others, we at SpiderOak decided to never store a user’s password nor the plaintext of a user’s encryption keys. This ensures that there can never be a point – ever – where we could even unknowingly betray the trust or privacy of a user. Why? Because – to put it simply – we don’t ever come into contact with the keys needed to unlock the encryption surrounding the data. Even with physical access to the server or under subpoena, SpiderOak simply can never see or turn over a user’s plaintext files, filenames, file sizes, file types, etc… On the server, we only see sequentially numbered containers of encrypted data.

This necessarily meant a different approach to various processes throughout SpiderOak which you may or may not have noticed – including forced registration through the desktop application and never via the web. In the
end, however, we did accomplish our goals and proved that, although strange bedfellows indeed, ‘online’ and ‘privacy’ can sleep next to each other every night, naked, and live happily ever after…

Stop Judging Resumes: Virtuously Virtual Hiring Practices

In my own experience there’s been very little relationship between the
quality of a resume and the eventual usefulness of a developer. I’ve seen guys
with great work history, references, advanced degrees, numerous publications,
and so on, and yet their presence proved less valuable than their absence.
Meanwhile some of the most rewarding engineers I’ve worked with introduced
themselves with nothing more than a simple letter.

At a previous company I worked with in the dot-com era, we created an epic
test for long distance interviews for a Perl programmer/ Linux sysadmin role.
It consisted of questions that a veteran hacker would maybe know 80 or 90% off
the top of his head, and exactly which man pages to lookup for another 10%.
Cute stuff like “How can you rm a file named -rf?” and “Name
3 things you can accomplish at a GRUB prompt.” We would arrange a designated
time and email the applicant the test. They had one hour (which we would pay
them for) to return it. The test was so long and specific there was no hope of
completion if you needed Google’s help for a large portion of the answers. The
feedback from many applicants was elaborately negative.

These days our process is more to the point. If we’re considering brining
someone on staff, we start by giving them some work to do. We find detachable
development tasks that will further the SpiderOak cause, send them a minimal
set of instructions, and let them run with it. It’s usually something
smallish, 1 – 3 days at most. As an all telecommute team, we’re already
accustomed to giving code feedback. When they’re done, they send us a bill and
we send them a review.

Sometimes we give several people the same task. The results often show an
obvious contrast of strengths and weaknesses across several applicants, and it
conserves the (sometimes scarce) resource of development tasks that don’t
require detailed knowledge of core SpiderOak source code. Sometimes we’re not
sure after the first task so we give more.

I’m sure there are big corporate HR departments who would be astonished to
learn that the best predictor of a developer’s usefulness might be an ability
to complete development tasks.

In the trenches…

I sit writing this blog at 35,000 feet above the earth as myself and about a
100 other travelers head from San Francisco to Chicago. The time is 4:30 am CDT
(or 2:30 PDT) on Saturday morning October 25th.

It has been an exciting although long week at SpiderOak. In fact, this week
brings the conclusion of a long string of weeks, ending successfully in the
launch of SpiderOak 2.0 – a faster, more responsive, more flexible iteration of
our initial version. In addition to these increased functions, the truly
exciting part about 2.0 is that it serves as a strong foundation for many
important and needed features to come including our Sync tool, Team Sync (or
read/write ShareRooms), development of our multi-user / multi computer
environment for small businesses, and several others…

Over the last couple of weeks we have received many emails asking if we had
stopped developing or growing because our blog had not been regularly updated.
Being somewhat of an outsider to the blogging world, I thought the question a
bit strange. After all, the last few months have seen some of our most
important advancements as a company – a redevelopment of our core architecture
to make SpiderOak faster, more efficient, and increased responsiveness (a known
issue since we launched). But because this development hadn’t happened in
‘public’ or our progress been constantly updated in the formal setting of our
blog, people thought this a sign of weakness.

Well – I write this post to ensure you all – those who inquire about our
resolve or longevity – that we are here and here to stay. And if it is regular
blog posts that you require to confirm this, then you shall have them as we
promise to rise up from the trenches more often and let you all know that we
are alive, well, and committed to bringing you the best and most secure online
backup, storage, access, sharing, and sync product available. And 2.0 brings us
a lot closer..

Python is Python is Python…. except when it isn’t Python.

One of the largest factors to recommend dynamic interpreted
languages and runtimes is, of course, memory and object management.
However, when interfacing these to external libraries, the boundary
is crossed from a managed environment to a binary ABI environment,
with all the ‘fun’ that entails. This becomes especially interesting
when your interface is a ‘light’ wrapper that does not protect
against shooting yourself in the foot or insulating you away
from the bugs of that binary ABI.

Awhile back, the excellent
valgrind tool was developed, which
is a dynamic memory and threading debugging tool for Linux
applications. Valgrind becomes an excellent tool for complicated C
and C++ programs. Because valgrind works at the OS/ABI level, it can
be adapted to any environment, however.

Here at SpiderOak we use valgrind when a debugging issue appears
to be involved with any C or C++ library we interface with; the
most frequent case is Qt. When writing an application handling
I/O in real-time from multiple sources, you end up with a sophisticated
flow of code, which makes the output of tools like valgrind difficult
to use. In Python, valgrind has been used as a tool to debug Python
itself, but not necessarily to debug Python applications, as valgrind
won’t tell you what Python code called the C or C++ or other library
code where the bug you’re hunting has appeared.

We have a patch to valgrind and a small wrapper library that lets
you recover this information. You can download this (GPLv3)
at our code page.

To use, you will need to be able to recompile your own valgrind
executable. For us, the Ubuntu gutsy or hardy valgrind source
packages are excellent for this. In our python support for valgrind,
we implement a ‘supplemental stack’ that a running program can use to
notify valgrind of where in your application it’s at, so you can
track what python functions are involved with an issue as well as the
C/C++ library functions. In our example environment, this
information is helpful when your application involves twisted or
pydispatch/louie-powered indirect calls (i.e. via Twisted deferred or
pydispatch/louie signals). We distribute this supplemental stack
patch along with a Cpython wrapper library which valgrind will use to
wray Python stack frames to retrieve the information needed.

After downloading our valgrind-python support patches, and
building libpywrap.so, you can run your Python application with
LD_PRELOAD=libpywrap.so valgrind /path/to/python/interp/using/app.
Valgrind will then give you output corresponding to the python stack
frames and source locations alongside the usual Valgrind stack
output.

This is not a turnkey or very stable solution. We absolutely do
not suggest running it in an untrusted environment. Make sure you’re not
running this with anything involving the opportunity to leak data,
or, a particularly nasty user might crack your box.
That said, valgrind often allows you to shave hours off your
debugging time for tracking down some problems. Now you can shave
hours off your debugging for those problems when they’re in Python,
too.

For those new to valgrind, here’s a short example of how to use this in
Ubuntu, having a download of our valgrind-python-1.0.1.tar.bz2. You should
also have HREF="http://svn.python.org/view/python/branches/release25-maint/Misc/valgrind-python.supp?rev=51333&view=markup">Misc/valgrind-python.supp
from your python source distribution. (Or use our provided link from the python
SVN).

% sudo apt-get build-dep valgrind
% sudo aptitude install fakeroot python2.5-dev
% apt-get source valgrind
% tar xjf valgrind-python-1.0.1.tar.bz2
% # this is where we add our supplemental stack patch for valgrind
% cd valgrind-3.3.0/debian/patches
% cp ../../../valgrind-python-1.0.1/50_sup-stack.dpatch .
% # go ahead and edit this line in the middle of patches if you care
% echo 50_sup-stack >> patches
% cd ../..
% fakeroot ./debian/rules binary
% sudo dpkg -i ../the_valgrind_deb_you_made.deb
% cd ../valgrind-python-1.0.1
% make
% # after make finishes, you should have libpywrap.so in the
valgrind-python dir. This is what you run with
LD_PRELOAD=libpywrap.so valgrind python2.5

And so…

% LD_PRELOAD=$(pwd)/libpywrap.so valgrind
–suppressions=valgrind-python.supp ipython
[various valgrind boilerplate here]
>>> from ctypes import *
>>> class crasher(Union):
… _fields_=[(“x”,c_int),(“y”,c_char_p)]

>>> badptr=crasher()
>>> badptr.x=2
>>> badptr.y[0] # BOOM!

==29497== Python Stack:
==29497== <stdin>:1 <module>
==29497== Invalid read of size 1
==29497== at 0x40239D8: strlen (mc_replace_strmem.c:242)
==29497== by 0x80945A9: PyString_FromString (stringobject.c:112)
==29497== by 0x47F1474: z_get (cfield.c:1341)
==29497== by 0x47ECD0D: CData_get (_ctypes.c:2315)
==29497== by 0x47F0BE9: CField_get (cfield.c:221)
==29497== by 0x808968C: PyObject_GenericGetAttr (object.c:1351)
==29497== by 0x80C7608: PyEval_EvalFrameEx (ceval.c:1990)
==29497== by 0x402773B: PyEval_EvalFrameEx (pywrap.c:62)
==29497== by 0x80CB0D6: PyEval_EvalCodeEx (ceval.c:2836)
==29497== by 0x80CB226: PyEval_EvalCode (ceval.c:494)
==29497== by 0x80EADAF: PyRun_InteractiveOneFlags (pythonrun.c:1273)
==29497== by 0x80EAFD5: PyRun_InteractiveLoopFlags (pythonrun.c:723)
==29497== Address 0×2 is not stack’d, malloc’d or (recently) free’d

With some more configuration work, you will get valgrind
output with useful data for whichever libraries you use, and can tell
what python usage may be tweaking bugs in your non-python libraries.
Good luck!



SpiderOak command line options — much faster, much less memory

The newest released version of SpiderOak supports --batchmode
scheduled operation may be useful to command line users and “GUI only” people
alike. The command line version is considerably faster for most tasks (3-4x by
my estimation), and uses drastically less memory (For me on OS X, an average VM
size of 32meg, peak at 64.)

This is supported in versions 1.0.3753 and newer (released today.) On
Windows and OS X, an existing SpiderOak install should automatically upgrade
the next time it connects to the server. On Ubuntu or Debian, the apt upgrade
process should get the newest version.

Here’s what you can do (so far) from the command line:


Alan@Alan ~ $ /Applications/SpiderOak.app/Contents/MacOS/SpiderOak --help

Usage: SpiderOak basic command line usage:

Options:
  -h, --help            show this help message and exit
  --print-selection     Print a list of selected and excluded backup items
  --reset-selection     Reset selection (but preserve excluded files)
  --exclude-file=EXCLUDE_FILE
                        Exclude the given file from the selection
  --exclude-dir=EXCLUDE_DIR
                        Exclude the given directory from the selection
  --include-dir=INCLUDE_DIR
                        Include the given directory in the selection
  --force               Do in/exclusion even if the path doesn't exist
  --headless            Never start the GUI
  --batchmode           set the config option exit_when_nothing_to_do to true

Most of these are self explanatory. --headless and
--batchmode are the ones I use most often. We’ll be adding support
for much more command line control in the future — send mail to cmdline at
spideroak.com if you want to suggest other options.

--headless just runs SpiderOak with no GUI at all. It just runs,
without printing anything to the console, so there’s no interactiveness or
activity indicators (except what’s written to the spideroak.log.) This
is suitable for use on servers or other environments where you want something
to run continuously, using as few resources as possible, without any user
input.

By the way, one of the benefits to a fault tollerant application design, is
that you don’t have to be nice to it. Feel free to force quit or kill (even
-9) at any time, and SpiderOak will rollback any uncommitted transactions, and
resume uploading or building where it left of — without corruption — the next
time you start. If you need all the available bandwidth to your first person
shooter, Skype, or you’re just trying to make your battery last as long as
possible, just killall SpiderOak and restart it when you want backups
to resume.

The next option is --batchmode (which implies --headless).
This means that SpiderOak will do all available work (i.e. scan the filesystem,
then build and upload everything in the queue, download and replay transactions
from other devices), and then exit. This is a good option for scheduled use.
You can add this to a cron job, or just run it yourself periodically whenever
you want to update your backup set.

SpiderOak is also careful not to start more than one instance of itself at a
time. For example, if you schedule SpiderOak to run in --batchmode
each night, and for the first few days, SpiderOak has so much to upload that it
does not finish before the next scheduled startup time, you don’t need to worry
about coming back to find several instances running.

In the next major release of SpiderOak, we’re restructuring the user
interface to be equally or more efficient as the command line version is now.
So, we expect the 1.5.0 series GUI versions to be several times faster than the
1.0.0 series GUI versions are today.

Notes from the Dungeon

Hi, I’m Chip, SpiderOak’s semi-resident sysadmin. I’m the guy in the picture
below who looks like he’s up to no good. }:-> Mostly, I’m in charge of
keeping the beasts in the server room well-fed and happy, which, like most
admin work, involves generous helpings of my sanity. It’s a job that requires
me to be available around the clock — especially during inopportune times
like weekends and holidays. Nearly every day I’m asked to do something I
haven’t done before. Sometimes it’s fun, but many times it’s not.

But you know what? I love it.

And it’s not just because most days I can wake up at noon and work in the
wee hours of the morning, or that if I need a break I can pop into the other
room and play some Portal (Ah, GLaDOS, an admin after my own heart). I work
with a great bunch of people, and despite the fact that we’re many states and
time zones away from each other, the whole group meshes into a team that I’m
proud to be a part of. (My absence in the group picture below
notwithstanding…)

What exactly goes on behind the scenes at SpiderOak? It’s a really sim…

Gruuuuuuuuuuu…

Whoops, looks like I’ll have to explain later. Duty calls. :)