Encryption White Paper
important consideration for organizations who implement
cryptography and store data in the cloud is key management. Do you know who has your keys?
Key management is the most
important consideration for how a backup provider implements
cryptography and is often the least understood.
Many companies boast
about using same encryption algorithms militaries or banks use, and best
practices such as SSL to encrypt data during transport. Some providers
also encrypt data at rest on their servers.
Those considerations are important but are contingent on who holds the
encryption keys. Many companies (including Dropbox and Box, for example)
encrypt data both during transport and at rest, and provides a false
sense of security. The encryption is not meaningful because the provider
also holds the encryption keys. Staffers at those companies also have
access to the data.
If that’s the case, companies should ask who has access to their data.
They should review the policies in place regarding this data and ask
about auditing procedures. Most companies have a policy but lack the
underlying systems to manage and enforce this policy.
With SpiderOak, the customer exclusively holds the encryption keys.
Thus, we are not capable of betraying our customers through data
Another important topic is what information is actually encrypted.
SpiderOak encrypts all information: metadata and data. The storage
server is not aware of customer folder or file names. The server sees
only a sequentially numbered series of data containers. This is what we
call a “Zero Knowledge” privacy environment.
Regarding transport security SpiderOak is an all SSL site, implementing
HTTP Strict Transport Security and Forward Secrecy. SpiderOak clients
use pinned certificates when connecting to the server mitigating
Man-in-the-Middle attacks even when an attacker can forge certificates.
For data encryption at rest,
SpiderOak uses AES-256-CFB and HMAC SHA256. This data is encrypted by
the SpiderOak program running on the end user’s device (with a key only
known to the customer), before being transferred. Transfer itself
happens over SSL with pinned certificates.
Also in the SpiderOak world no central database of your files exists.
Rather, you keep your own database. If you have several computers all
connected to your SpiderOak account, each of them maintains a local
database giving them a full view into your account-wide storage.
This client-side database is updated continuously as uploads from all
computers in your account progress. Each upload is a transaction. We add
changes into a transaction until it reaches 10 MB or 500 files. The
contents of the transaction are sequentially numbered data blocks (the
data) and entries in sequentially numbered journals (meta data). For
each transaction, the server stores everything, and passes the meta-data
only along to all the other devices in your account. This meta-data is
fully encrypted when moving between devices, as each device holds a set
of your keys.
In this sense, SpiderOak is really more of a peer-to-peer application
than a client-server application. The traffic all goes through central
servers but that’s just a conveniently reliable medium for data-passing
and storage. The servers can’t read any of it.
important privacy consideration is how a backup and sync provider
handles data deduplication. Deduplication over encrypted data is a
careful process. SpiderOak does data compression and both file level and
block level deduplication. The space savings from those operations
benefit the customer in reduced bandwidth.
When a storage provider uses cross-customer data deduplication, it
compromises the privacy of all users through several forms of
information leakage. Sometimes this is called “convergent encryption.”
SpiderOak does deduplication only across the data of a single customer.
Historical Versions And Deleted Files
SpiderOak retains historical
versions of files for as long as you like, so that you can always
recover a file from a previous point in time. The default setting is to
retain them forever, and this is configurable by organizations or
individuals. This is in sharp contrast to other backup providers that
will remove deleted items after 30 days.
Historical versions mean that even when a virus holds your data hostage,
and maybe you don’t notice immediately, you can always go back to the
Similar options cover retention of deleted items.
Fault Tolerant Design
SpiderOak uses a fault tolerant application design with ACID based data
transactions. This allows SpiderOak to recover automatically from a
variety of failure modes including network, memory and disk corruption
on end user devices.
It’s a common misbelief that memory and hard drives are 100% accurate,
that all data they store is faithfully the same as the original data
written. However, all magnetic media have a nonzero data failure rate,
even when the devices report no errors. This is sometimes called
“bitrot,” and means that after data is written to a disk, later when
that data is read, it may actually not be exactly the same data as was
originally written. This can occur over time, anytime after the data was
written. As hard drives have became larger over time and user data has
become bulkier, the likelihood of encountering these errors increases.
In our experience from the statistics reported across a large population
of SpiderOak end user devices, the bitrot rate of magnetic media is
about 1 single bit error per 4.2 TB of data.
Another source of corruption is memory errors, where data in memory is
occasionally corrupted in a similar way to the bitrot experienced by
local disks. Some computers have error correcting memory to reduce the
frequency of this occurrence but for most devices it still happens
SpiderOak’s upload and storage process involves continuous cryptographic
digest checks on the data throughout each phase action: reading a new
file from local disk, encrypting that result into data blocks to be
uploaded, the transfer of each block, and the long term cloud storage of
those items. When data corruption is detected, SpiderOak discards the
specific transaction that caused the corruption and retries. If a
specific file consistently causes problems, SpiderOak will continue with
the rest of the backup, and occasionally retry the troubled item.
Data stored in the SpiderOak cloud is stored redundantly across many
computers and is continuously audited, with errors automatically
Considerations When Choosing Backup Software
The first consideration is what we like to call the subpoena test. If the government legitimately subpoenas a vendor, what is their notification process and what is contained in the data? Because of the key management process, many vendors have access to plain text data that can be turned over. Even the strictest encryption policies will still yield metadata such as filenames, file sizes and access times. This information can often reveal as much as the data itself.
Are there restrictions on what can be backed up? External drives? Network volumes? In the case of SpiderOak products, any connected volume can be backuped.
Does the software preserve MacOS specific file information such as resource forks? Some MacOS programs, particularly graphic design programs store critical information in a file's "resource fork." The resource fork is a MacOS specific feature that is not present in files on other operating systems. If the resource fork is not backed up along with the rest of the file, critical data may not be able to be restored. SpiderOak products retain MacOS specific file information.
Is the software scriptable? Can it run in batch mode or be used from the command line? SpiderOak is highly scriptable, can be executed from the command line, and is one of the more popular backup solutions for Linux.
Can the system integrate with my user account management? SpiderOak Enterprise can integrate with Active Directory (LDAP) for centralized enterprise user management.
What options are available for choosing how we retain historical and deleted items?
Does the software support the many computers in my environment such as Mac, Windows, Linux, Android, and iOS devices? SpiderOak products support all the major operating systems.
Who has access to the plaintext data? How is that access enforced? (How do they keep system administrators accountable? Most companies they have a policy but no underlying system to manage or enforce policy, and probably aren’t doing it.) SpiderOak never saves plaintext data about your backups.
Contact our enterprise sales team at firstname.lastname@example.org to get your team set up with our No Knowledge Enterprise Backup.