What is deduplication?

Deduplication is a process by which the differences are recorded and stored between an initial version and a subsequent version or versions of a file. This process makes it possible to store several versions of a single file without having to restore each version as a completely new file each time a change is made.

If you're saving multiple copies of the same file, only the original copy of the file will take up the full amount of space; all of the other copies will be a lot smaller because SpiderOak only saves the data that differs from your original file. For example, if you add more text or a graphic to a document, SpiderOak will only save the new data, instead of saving the entire file again. Also, if you back up a file on one computer that has already been backed up on another computer, this file will occupy no additional space in your account. SpiderOak uses deduplication to save our users space and, therefore, money.

When you are uploading a copy of a file which is already saved to our servers, SpiderOak performs deduplication before it ever begins the upload, comparing the files to the information you have already saved. It then uploads only the information that differs between the two files, such as their locations, in the form of journal entries. Although it appears that SpiderOak is uploading the entire file again, you’ll see that the upload goes much faster and takes up very little space because in fact only these journal entries are being uploaded.

SpiderOak only performs deduplication on files stored in your account and not across users. We explain in more detail in our blog post Why SpiderOak doesn't deduplicate data across users and why it should worry you if we did.