Encryption White Paper

The most important consideration for organizations who implement cryptography and store data in the cloud is key management. Do you know who has your keys?

Key management is the most important consideration for how a backup provider implements cryptography and is often the least understood.

Many companies boast about using same encryption algorithms militaries or banks use, and best practices such as SSL to encrypt data during transport. Some providers also encrypt data at rest on their servers.

Those considerations are important but are contingent on who holds the encryption keys. Many companies (including Dropbox and Box, for example) encrypt data both during transport and at rest, and provides a false sense of security. The encryption is not meaningful because the provider also holds the encryption keys. Staffers at those companies also have access to the data.

If that’s the case, companies should ask who has access to their data. They should review the policies in place regarding this data and ask about auditing procedures. Most companies have a policy but lack the underlying systems to manage and enforce this policy.

With SpiderOak, the customer exclusively holds the encryption keys. Thus, we are not capable of betraying our customers through data disclosure.

Another important topic is what information is actually encrypted. SpiderOak encrypts all information: metadata and data. The storage server is not aware of customer folder or file names. The server sees only a sequentially numbered series of data containers. This is what we call a “Zero Knowledge” privacy environment.

Regarding transport security SpiderOak is an all SSL site, implementing HTTP Strict Transport Security and Forward Secrecy. SpiderOak clients use pinned certificates when connecting to the server mitigating Man-in-the-Middle attacks even when an attacker can forge certificates.

For data encryption at rest, SpiderOak uses AES-256-CFB and HMAC SHA256. This data is encrypted by the SpiderOak program running on the end user’s device (with a key only known to the customer), before being transferred. Transfer itself happens over SSL with pinned certificates.

Also in the SpiderOak world no central database of your files exists. Rather, you keep your own database. If you have several computers all connected to your SpiderOak account, each of them maintains a local database giving them a full view into your account-wide storage.

This client-side database is updated continuously as uploads from all computers in your account progress. Each upload is a transaction. We add changes into a transaction until it reaches 10 MB or 500 files. The contents of the transaction are sequentially numbered data blocks (the data) and entries in sequentially numbered journals (meta data). For each transaction, the server stores everything, and passes the meta-data only along to all the other devices in your account. This meta-data is fully encrypted when moving between devices, as each device holds a set of your keys.

In this sense, SpiderOak is really more of a peer-to-peer application than a client-server application. The traffic all goes through central servers but that’s just a conveniently reliable medium for data-passing and storage. The servers can’t read any of it.

Data Deduplication

Another important privacy consideration is how a backup and sync provider handles data deduplication. Deduplication over encrypted data is a careful process. SpiderOak does data compression and both file level and block level deduplication. The space savings from those operations benefit the customer in reduced bandwidth.

When a storage provider uses cross-customer data deduplication, it compromises the privacy of all users through several forms of information leakage. Sometimes this is called “convergent encryption.” SpiderOak does deduplication only across the data of a single customer.

Historical Versions And Deleted Files

SpiderOak retains historical versions of files for as long as you like, so that you can always recover a file from a previous point in time. The default setting is to retain them forever, and this is configurable by organizations or individuals. This is in sharp contrast to other backup providers that will remove deleted items after 30 days.

Historical versions mean that even when a virus holds your data hostage, and maybe you don’t notice immediately, you can always go back to the pre-infection state.

Similar options cover retention of deleted items.

Fault Tolerant Design

SpiderOak uses a fault tolerant application design with ACID based data transactions. This allows SpiderOak to recover automatically from a variety of failure modes including network, memory and disk corruption on end user devices.

It’s a common misbelief that memory and hard drives are 100% accurate, that all data they store is faithfully the same as the original data written. However, all magnetic media have a nonzero data failure rate, even when the devices report no errors. This is sometimes called “bitrot,” and means that after data is written to a disk, later when that data is read, it may actually not be exactly the same data as was originally written. This can occur over time, anytime after the data was written. As hard drives have became larger over time and user data has become bulkier, the likelihood of encountering these errors increases. In our experience from the statistics reported across a large population of SpiderOak end user devices, the bitrot rate of magnetic media is about 1 single bit error per 4.2 TB of data.

Another source of corruption is memory errors, where data in memory is occasionally corrupted in a similar way to the bitrot experienced by local disks. Some computers have error correcting memory to reduce the frequency of this occurrence but for most devices it still happens occasionally.

SpiderOak’s upload and storage process involves continuous cryptographic digest checks on the data throughout each phase action: reading a new file from local disk, encrypting that result into data blocks to be uploaded, the transfer of each block, and the long term cloud storage of those items. When data corruption is detected, SpiderOak discards the specific transaction that caused the corruption and retries. If a specific file consistently causes problems, SpiderOak will continue with the rest of the backup, and occasionally retry the troubled item.

Data stored in the SpiderOak cloud is stored redundantly across many computers and is continuously audited, with errors automatically repaired.

Considerations When Choosing Backup Software

The first consideration is what we like to call the subpoena test. If the government legitimately subpoenas a vendor, what is their notification process and what is contained in the data? Because of the key management process, many vendors have access to plain text data that can be turned over. Even the strictest encryption policies will still yield metadata such as filenames, file sizes and access times. This information can often reveal as much as the data itself.

Are there restrictions on what can be backed up? External drives? Network volumes? In the case of SpiderOak products, any connected volume can be backuped.

Does the software preserve MacOS specific file information such as resource forks? Some MacOS programs, particularly graphic design programs store critical information in a file's "resource fork." The resource fork is a MacOS specific feature that is not present in files on other operating systems. If the resource fork is not backed up along with the rest of the file, critical data may not be able to be restored. SpiderOak products retain MacOS specific file information.

Is the software scriptable? Can it run in batch mode or be used from the command line? SpiderOak is highly scriptable, can be executed from the command line, and is one of the more popular backup solutions for Linux.

Can the system integrate with my user account management? SpiderOak Enterprise can integrate with Active Directory (LDAP) for centralized enterprise user management.

What options are available for choosing how we retain historical and deleted items?

Does the software support the many computers in my environment such as Mac, Windows, Linux, Android, and iOS devices? SpiderOak products support all the major operating systems.

Who has access to the plaintext data? How is that access enforced? (How do they keep system administrators accountable? Most companies they have a policy but no underlying system to manage or enforce policy, and probably aren’t doing it.) SpiderOak never saves plaintext data about your backups.

Contact our enterprise sales team at sales@spideroak-inc.com to get your team set up with our No Knowledge Enterprise Backup.

Backup for Groups

For teams and small businesses. Sign up for a free 14-day trial or take a quick tour to see if it's the right fit for you.

Learn More

Enterprise Backup

Scalable and flexible hosting for your company's most important information. Ideal for 100 employees or more.

Learn More