As 37signals famously described, in the software business we almost always create valuable byproducts. To build a privacy-respecting backup and sync service that was affordable, we also had to build a world class long term archival storage system.
We had to do it. Most companies in the online backup space (including BackBlaze, Carbonite, Mozy, and SpiderOak to name a few) have made substantial investments in creating an internal system to cost effectively store data at massive scale. Those who haven’t such as Dropbox and JungleDisk are not price competitive per GB and put their efforts into competing on other factors.
Long term archival data is different than everyday data. It’s created in bulk, generally ignored for weeks or months with only small additions and accesses, and restored in bulk (and then often in a hurried panic!)
This access pattern means that a storage system for backup data ought to be designed differently than a storage system for general data. Designed for this purpose, reliable long term archival storage can be delivered at dramatically lower prices.
Unfortunately, the storage hardware industry does not offer great off-the-shelf solutions for reliable long term archival data storage. For example, if you consider NAS, SAN and RAID offerings across the spectrum of storage vendors, they are not appropriate for one or both of these reasons:
- Unreliable: They do not protect against whole machine failure. If you have enough data on enough RAID volumes, over time you will lose a few of them. RAID failures happen every day.
- Expensive: Pricy hardware and high power consumption. This is because you are paying for low-latency performance that does not matter in the archival data world.
Of course #1 is solvable by making #2 worse. This is the approach of existing general purpose redundant distributed storage systems. All offer excellent reliability and performance but require overpaying for hardware. Examples include GlusterFS, Linux DRBD, MogileFS, and more recently Riak+Luwak. All of these systems replicate data to multiple whole machines making the combined cluster tolerant of machine failure at the cost of 3x or 4x overhead. Nimbus.IO takes a different approach using parity striping instead of replication, for only 1.25x overhead.
Customers purchasing long term storage don’t typically notice or care about the difference between a transfer starting in 0.006 seconds or 0.6 seconds. That’s two orders of magnitude of latency. Customers care greatly about throughput (megabyte per second of transfer speed) but latency (how long until the first byte begins moving) is not relevant the way it is if you’re serving images on a website.
Meanwhile the added cost to support those two orders of magnitude of latency performance is huge. It impacts all three of the major cost components – bandwidth, hardware, and power consumption.
A service designed specifically for bulk, long-term, high-throughput storage is easily less than half the cost to provide.
Since launching SpiderOak in 2007, we’ve rewritten the storage backend software four times and gone through five different major hardware revisions for the nodes in our storage clusters. Nimbus.IO is a new software architecture leveraging everything we’ve learned so far.
The Nimbus.IO online service is noteworthy in that the backend hardware and software is also open source, making it possible for people to either purchase storage from Nimbus.IO similar to S3, or run storage clusters locally on site.
If you are currently using or planning to adopt cloud storage, we hope you will give Nimbus.IO some consideration. Chances are we can eliminate 2/3 of your monthly bill.