Semaphor In Depth
The following white paper outlines the security and encryption used in SpiderOak's Semaphor product. For more information, please visit spideroak.com
The Semaphor app provides team collaboration functionality, such as message and file transfer, built with a Zero Knowledge design, where the vendor or hosting provider is unable to access content sent by the users of the software.
Semaphor employs Daniel J. Bernstein's NaCl (in the form of libsodium) for most cryptographic operations. The data structures and sync protocol (the “Flow” technology) are created by SpiderOak and independently reviewed by outside engineers and cryptographers. The implementation of Flow is from a single Golang code base on all client platforms.
This document gives an outline of what occurs cryptographically as Semaphor is used, and the interrelationship between objects in the Semaphor object schema.
Features and Threat Models
Semaphor is a real-time team collaboration application created by SpiderOak intended to provide an experience comparable to products like HipChat, Slack, or IRC. Semaphor focuses on fast and beautiful UX and employs end-to-end encryption for authentication of identities and confidentiality of content.
Semaphor's design is a rethink of common UX solutions to the challenges presented by end-to-end encrypted systems. These include: password-less sign up, secure device-to- device provisioning, contact cards with visually appealing peer key verification ("the Semaphoric pattern"), and an evolving implementation of Nervous Design. Semaphor aspires to further address decades-old issues with identity and the hostile relationship between user interfaces and secure systems.
Semaphor addresses the following business requirements, some of which have significant influence on the data structures:
Ease of Use
- First class mobile experience and fast (e.g. keyboard shortcut oriented) desktop experience.
- Fast and robust attachment handling
- Offline Mode
- Search (without revealing content or keywords to the server)
- New users see historical content: when a new user is included an existing conversation, they have the same access to conversation history as other members (including search.)
- Account recovery (ability to re-access account in the event that all devices are lost, through a previously stored recovery key.)
- Account compromise recovery: when an end user device is discovered to be compromised, that end user may recover by authenticating using a higher level secret and rotating their account keyring. All conversations the user is part of then also automatically rotate keys.
- Individual message deletion (i.e. wiping a specific message in a conversation from the server and all devices, while keeping the hash chain of the conversation verifiable.)
Secure Protocol and Hosting
- Forward Secrecy to the extent possible while meeting content retention policy requirements
- Cryptographically compartmentalized conversations: only the participants in any given conversation have access to that data or the encryption keys; however members who join the conversation later are able to see content created before their entry into the conversation. Compromising a conversation therefore requires compromising one of the participants in that specific conversation. There is not a (likely vulnerable) central server with "keys to everything."
- Minimal trust of the server, and comprehensive end-user verifiability of all events (accounts, team creation and membership, conversation creation and membership, messages, attachments, etc.)
- Confidential channel names and channel content
- Basic protection against message length analysis
Scalability and Openness
- First class support for customer operated (on premises) server infrastructure
- Efficient support for large channels (thousands of members)
- Ease of scripting and bot creation. Initial Semaphor release includes a local REST interface, Python API to that interface, and a local IRC gateway (i.e. use a local IRC client as the UI to a local Semaphor application.)
- Published source code, toolchain, and build instructions (e.g. end users do not need to trust opaque binaries)
- Flexible and enforceable Content Retention Policies
- Extensible such that an organization requiring it (such as certain regulated industries) can escrow all content for a specific team to highly protected air gapped keys
- Support for user configurable inbound and outbound "integrations" of 3rd party content/services, with end-to-end encryption where possible and opportunistic encryption otherwise.
- Extensible such that an organization can integrate account provisioning and/or single sign on (e.g. LDAP, Active Directory, etc.)
The client-server relationship in Semaphor can generally be defined as follows:
- Remote API - the protocol which ensures all transactions are valid and recorded
- Object Storage: Unencrypted - Storage of Member-to-Team associations (i.e. for billing). Encrypted - Storage of all user-encrypted content (i.e. Messages, etc.)
- Flowapp - component in which local plaintext objects are stored and maintained
- Local API - protocol by which FlowUI and Flowapp communicate
- FlowUI - component in which all actions are taken via the Local and Remote API's
Encryption, Key Generation, Cryptographic Hashing
All Public/Secret Pairs, Signature, and Symmetric Keys, Cryptographic Hashing outlined below are created with the default libsodium values, ciphers, and protocols.
Key Derivation Function (KDF)
The KDF used by Semaphor is scrypt, with the Work Factors using default values (N = 16384, r = 8, p = 1).
All randomly generated values utilize the operating system's pseudorandom number generator.
Identity Hash Tree
Semaphor is built on the concept that all entities within the application should be able to be verified. The core enabling feature for this is the "Identity Hash Tree" made up of the cryptographic Identity Hashes of all Servers, Accounts, Keyrings, Messages, etc. These hashes work because the Identity Hash of a Parent object is included in the Identity Hash of a given Child object. For example:
Binding Hashes (Event Hash Chains)
In additional to the (vertical) Identity Hash Tree, Semaphor utilizes a (horizontal) chain of Identity Hashes known as Binding Hashes. Objects that are used as Binding Hashes create a "chain of record" in which all newly created objects reference. These hash chains can then be evaluated and verified against the entire history of a given Channel or Organization. For example:
Object Relationship Overview Image
See the final page of this document for an image visualizing the hierarchy of Data Structure relationships.
Server Data Structure
Semaphor Server Possess two attributes which are used for authentication with end users:
- Server Name - common name for service
- Hostname - name for server
- Service Port - destination port of service
- API Root - supported API version
- Signature Key Pair - keys used for verification of Server messages
- Server Identity Hash - Hash generated by combining the above Server attributes
Account Data Structure
Account Attributes & Keys
The initial creation of an Account performs a number of cryptographic actions, which create the following attributes:
- Recovery Key - created using seven randomly chosen words, out of a collection of about 80,000 words.
- A 32-byte output is generated via KDF with the following parameters, Input - the Recovery Key; Salt - combination of the Server's Identity - The 32-byte output is then used in a HMAC key derivation generation function to generate two items, Seed Bytes - used for the Account Authentication Key Pair (see below); Level 1 Secret Key - symmetric key used in the event of account compromise.
- The Seed Bytes are then used to create the Account Authentication Key Pair, Account Authentication Public Key - which will be stored on the Server; Account Authentication Secret Key - never stored on any device.
- Level 2 Secret Key - 32-bytes of random used for encryption of an account’s communication keyring. A symmetric key; Stored in plaintext on the User's device; Stored on Server after being encrypted by Level 1 Secret Key.
- Account Identity Hash - hash generated by combining the following Account attributes, Account Parent - the associated Server Identity Hash; Identity Alias - email address or username selected by the end user; Account Creation Time - timestamp of Account creation; Client Token - 32-bytes of random.
Using the Account Keys (see above) an Account Keyring is now created which stores all root communication keys, and has the following attributes:
- Keyring Parent - account which created the Keyring
- Previous Keyring - if applicable, the Identity Hash of the prior, now rotated, Keyring
- Keyring Creation Time
- Encryption Key Pair - used for asymmetric encryption to this account by peer accounts (primarily used to encrypt Channel Session Keys to their recipients)
- Signing Key Pair - used for signatures over all events authored by the account so peers can validate the origin of the record (example records that are signed but not encrypted include: Organization Member Event records, Channel Member Event Records, etc.)
- Keyring Signature - self-signature created by hashing the above values
- Previous Keyring Signature - if applicable, a signature of this keyring entry by the prior, now rotated, Keyring entry
- Keyring Identity Hash - hash generated by combining the above Keyring attributes
Organization (Team) Data Structure
Description as team as the root data structure for a collaborative group of people, from which Membership, channels (and their Membership) descend.
An Organization (also known as a "Team" in the user interface) is a signed record of creation by the Organization's "founding" Member, and has the following attributes:
- Author Keyring - Identity Hash of the founding Member's keyring
- Parent Organization - if applicable, this is a reference to the Identity Hash of the parent Organization
- Organization Creation Time
- Organization Name - A unique and plaintext, human readable description
- Organization Signature - Signature, made over the above
- Organization Identity Hash - Identity Hash, made over the above
Organization Member Event Record
As part of verifiable hash tree that make up the Semaphor data structures, there are verifiable timelines in which all actions can relate. These objects that make declare the ordering of events are referred to as "binding hashes" as all actions that occur must reference the last-well-known hash in a number of chains in order to add itself.
Every time a Member's (user's) Organization state changes (addition/removal/policy/etc.) an Organization Member Event Record is created, containing:
- Parent - Identity Hash of the Organization to which the Member belongs
- Previous Organization Member Event Record - Identify Hash the Organization Member Event Record, created prior to this one (or empty, if the first record)
- Author Keyring - The Keyring Identity Hash of the user who is creating the Record
- Account Keyring - The Keyring Identity Hash of the user who is being modified (this can be the same as the Author Keyring)
- Organization Member Event Record Creation Time
- Account State - The intended state of the user being modified, some options include, a. Member - Standard Organization user, b. Administrator - User with full access/modification rights, c. Banned - User with revoked access to the Organization, d. Owner - Founding user of Organization, access cannot be revoked
- Organization Member Event Record Signature - Hash, made over the above
- Organization Member Event Record Identity Hash - Identity Hash, made over the above
Organization Join Request
When a user requests entry into an Organization, a request is generated which contains the following attributes. These requests are public to the administrative users of the Organization which the user is requesting to join.
- Parent - Organization the request is targeted at
- Author Keyring - Keyring Identity Hash of the user who is creating the Request
- Request Creation Time
- Organization Join Request Signature - Hash, made over the above
- Organization Join Request Identity Hash - Identity Hash, made over the above
Channel Data Structure
Channels act as the Parent (of CSK which…) entities for all messages sent using Semaphor, both in a multi-user or Member-to-Member context. They also act as the common object between the numerous "binding" hash chains used by the Members of the Channel. Channels are completely transparent, in that every action which occurs within it creates a permanent link in the cryptographic chain which makes up the collaboration between users. The Channel itself has the following attributes:
- Parent Organization - Identity Hash of the Organization to which the Channel belongs
- Author Keyring - Keyring Identity Hash of the user who is creating the Channel
- Parent Channel - if applicable, the Identify Hash of the Channel to which this Channel is a Sub-Channel
- Channel Creation Time
- Channel Signature - Hash, made over the above
- Channel Identity Hash - Identity Hash, made over the above
Channel Member Event Record
Similar to the Organization Member Event Record, the Channel's also have a binding hash chain of Membership records Message Data Structure, which are made up of all the modifications which occur to Members of given Channel. The attributes for these records are:
- Parent Channel - Identity Hash of the Channel to which the Members belong
- Previous Channel Member Event Record - Identity Hash of the Event Record created prior to this one
- Author Keyring - Keyring Identity Hash of the Member creating the Event Record
- Account Keyring - Keyring Identity Hash of the Member being targeted for modifications
- Last Organization Member Event Record - A linkage to another binding hash chain, this attribute points to the Identity Hash of last known change in the Organization Membership, to which the Parent Channel belongs
- Last Channel Session Key - A linkage to another binding hash chain, this attribute points to the last known valid symmetric encryption key used by this Channel and its Members
- Record Creation Time
- Account State - The intended state of the user being modified, some options include, a. Member - Standard Channel user, b. Administrator - User with full access/modification rights, c. Banned - User with revoked access to the Channel
- Channel Member Event Record Signature- Hash, made over the above
- Channel Member Event Record Identity Hash - Identity Hash, made over the above
Channel Session Key Record
All Messages which occur in a Semaphor Channel are encrypted with a symmetric key, called a Channel Session Key, which is shared between the Members of that Channel. There a number of conditions in which a Channel Session Key is rotated, including the modification of a Member's role, a Member departing the Channel, and interaction with organizational or channel content retention policy. In the event a Member leaves a Channel, the Session Key is rotated by the first Member to send a Message following that event. The Channel Session Key Record does not contain the symmetric key which was created by the user, that is instead contained in the Channel Session Key Share data structure.
- Parent Channel - Identity Hash of the Channel to which the Members, and keys, belong
- Previous Channel Session Key Record - Identity Hash of the Key Record created prior to this one
- Author Keyring - Keyring Identity Hash of the Member performing the key rotation
- Last Organization Member Event Record - Identity Hash of the last known change in the Organization Membership, to which the Parent Channel belongs
- Last Channel Member Event Record - Identity Hash of the last known change in the Channel Membership, to which the Member belongs
- Record Creation Time
- Hash of Channel Session Key - The Hash of the newly created Channel Session Key
- Integration Public Key - Value known by the Semaphor Server which allows authorized "Integrations" (aka Plugins) to submit content into the Channel
- Integration Secret Key - Value known to Channel Members allowing Members to read integration content submitted into the Channel
- Channel Session Key Record Signature - Hash, made over the above
- Channel Session Key Record Hash - Identity Hash, made over the above
Channel Session Key Share
The Channel Session Key Share is the (direct) Member to Member action of sharing a key to the other Members of the channel, and it has the following attributes:
- Parent Channel Session Key Share - The Identity Hash of the binding hash record for the new key being shared
- Author Keyring - Keyring Identity Hash of the Member performing the key rotation
- Recipient Keyring - Keyring Identity Hash of the Member being sent the key
- Channel Session Key Share Creation Time
- Key Content - the Channel Session Key encrypted from the author’s Encryption Key Pair to the recipient’s Encryption Key Pair
- Channel Session Key Share Signature - Hash, made over the above
- Channel Session Key Share Hash - Identity Hash, made over the above
Message Data Structure
Like other structures, Messages form a hash chain that allow all messages in Channel to be verified. Semaphor supports user controlled message deletion while maintaining verifiability of the hash chain (including the deleted links); this is performed using the combination of the following attributes:
- Parent Channel Session Key - Identity Hash of the current Channel Session Key
- Previous Message - Identity Hash of most recent Message in binding hash chain
- Author Keyring - Keyring Identity Hash of the Member send the message to the Channel
- Message Creation Time
- Message Category - The type of Message being sent, some options include, a. Standard - basic text Message, which may include attachment references, b. Channel Name - request to change the name of the current Channel
- Content Hash - Hash of the Message Cipher text and Deletion Token
- Message Signature - Hash, made over the above and some additional attributes, a. Last Organization Member Event Record - Identity Hash of the last known change in the Organization Membership, to which the Parent Channel belongs, b. Last Channel Member Event Record - Identity Hash of the last known change in the Channel Membership, to which the Member belongs
- Message Identity Hash - Identity Hash, made over the above and some additional attributes, a. Last Organization Member Event Record, b. Last Channel Member Event Record
- Deletion Token - 32 Bytes of random data included with each new message. If the message is deleted, this value is also deleted, but the Content Hash remains. The additional 32 bytes of entropy provided by the deletion token prevents brute force reconstruction of deleted message content.
- Message Cipher Text - Cipher text of actual Message content
- Attachment ID's - The identities of any Attachments referenced in Message
Attachment Data Structure
Attachments do not need to be completely uploaded in order to be referenced by a Message, and any user with visibility to an Attachment can reference the same Attachment without the need to upload any additional data to the Semaphor Server. Attachments have the following attributes:
- Attachment Creation Time
- Account - Keyring Identity Hash of the Member uploading the Attachment to Channel
- Encrypted Blob - Cipher text version of the Attachment content
- Blob Size - Length of Encrypted Blob
- Digest - Checksum of Encrypted Blob
All network traffic used in Semaphor is SSL/TLS with ephemeral keys and certificate pinning (this is in addition to the application layer signage and encryption.)
In order to provide end users with Push Notifications of "direct mentions" in various conversation threads, while still maintaining Zero Knowledge. Push Notifications are send via the following method:
- A Message is composed
- The client parses any known mentions
- The mentions are associated with known users
- The mentioned users' unique identifiers are added as part of the Message plaintext-metadata when being sent to the Semaphor Server
- The Server parses the metadata and pushes the Notification for each user
- End user receives "unread message" notification
It is important to note that though it is possible to associate which users are mentioning others via traffic analysis, no Message content of those members will be leaked.
Hash Chain Ordering and Conflict Resolution
Given the nature of the binding hash chains and hash tree used in the Semaphor data structures, it is critical that all users have a consistent representation of all Semaphor objects and the order in which they are created. In order to maintain the order of events, and to avoid conflicts when users attempt to simultaneously created new items, there are two services Semaphor implements:
- Semaphor Synchronization Layer - Logic which exists on the Semaphor Client which maintains an accurate representations of the Server's definitive copy of all binding hash chains and hash trees.
- Semaphor Conflict Resolution Layer - Logic which exists on the Semaphor Server and ensures that there are not conflicting actions created in numerous-user scenarios. For example, in the event two users attempt to add a new item to the end of a binding hash chain, the Semaphor Server sends both users different random retry intervals in which they should attempt to recreate the addition.
Semaphor allows groups to create scalable end-to-end encrypted collaboration channels, that are cryptographically verifiable to all Members within the channel. No user created or uploaded content is visible to SpiderOak, and can only be decrypted by the intended recipients. SpiderOak has no access to any secret keys created or utilized by Semaphor users, and user's identities can be verified by others to build verified-trust-user connections.