An initiative to design and implement a privacy-preserving communication system for decentralized applications

Status, Web3 Foundation, and Validity Labs recently held a workshop on designing communication protocols for decentralized applications that protect user privacy, with particular emphasis on protecting the metadata of communication. For a recap of the discussions prior to the workshop, please read the previous blog post. Hitherto, most communication protocols for such applications have failed to adequately protect user privacy. We came together to review existing work, discuss shortcomings and the design of a new privacy-preserving communication service that address these shortcomings. This blog summarizes the workshop outcomes.

Motivation

Decentralized applications have existed since the 1990s, which were historically called peer-to-peer (P2P) apps, with the most well known being file-sharing services like BitTorrent, and some privacy networks like I2P.  We recently witnessed an explosion of decentralized applications in the form of “dApps” on blockchain networks such as Ethereum. Some of these dApps have several thousands users. Examples include browsers like Status, exchanges like IDEX, games like Funfair, social networks like Peepeth, prediction services like Gnosis, and storage services like Filecoin. In this blog post, we focus on dapps in the context of blockchains.

A key advantage of decentralized applications is that they avoid the pressures to censor faced by centralized applications. At the same time, decentralized applications inherently require stronger privacy measures, and thus more complexity, because they lack any universally trusted party. We do not consider this a disadvantage and an increasing numbers of users dislike trusting centralized parties, too.  

Aside from protecting message content, we should protect message metadata as well because this metadata reveals inordinate amounts of private information.  Almost any performant scheme for protecting network level metadata requires “onion encryption,” meaning packets pass through several relays, each of which deforms the outgoing packet so that it is “unlinkable” to the incoming one. Figure 1 depicts an abstract example for a communication system that protects user’s metadata, with two senders, two receivers, and four relays that forward the incoming messages until it reaches the receivers. An adversary should not be able to track a message through the system.

Figure 1: A simple privacy-preserving communication tool that intends to disguise which sender is sending a message to the receivers.

Don't we have existing tools already?

Over the last three decades, a range of communication systems were proposed to protect user privacy, including network level metadata. Yet, only a handful were properly designed and built, while even fewer have seen much adoption. Next, we shortly review to classic privacy-preserving tools, Tor and Mixnets. However, they are not churn-friendly. Churn plays an important role in improving scalability and is a key feature we are taking into account. An example of a churn-friendly privacy-preserving communication tool is Whisper, which we review thereafter.

Tor is the most commonly used privacy-preserving communication tool with about 2 million daily users. Examples of dApps using Tor for privacy-preserving communication are Briar, Ricochet, and its group communication extension Cwtch, all use Tor.

In Tor, the communication initiators choose how the communication is going to be routed through the Tor network. As a result, all Tor relays must be known by all clients, which imposes a significant cost on the network. Tor tunnels arbitrary TCP traffic, but TCP streams cannot tolerate adding latency. Tor adds no cover traffic because cover traffic has no clear impact without additional latency. Moreover, Tor is not adding any delays to the communication which is why it is categorized usually as low-latency. As a result, Tor is only secure against an adversary with local reach and who does not apply statistical methods to analyzing network traffic.

Mixnets are the pioneering design for protecting network level metadata. These schemes “mix” packets by delaying and/or batching them, which disrupts an adversary’s ability to analyze traffic, especially when used in conjunction with cover traffic. Hence, mixnets are usually categorized as high-latency systems. In the past the latency was several hours, however, recent designs have more reasonable latencies.  Almost all mixnet designs, especially the deployed ones like Mixminion and Loopix, have the same issue as Tor in that communication initiators need to have complete knowledge about all relays in the system. This harms scalability.  

Whisper is a protocol for sending broadcast messages over the Ethereum network.  It contains no sender identifier per se, but does nothing to protect senders either.  It similarly does not protect receivers who respond to, or acknowledge, messages. The public and persistent nature of messages in Whisper in itself causes privacy-threats and also hinders scalability.

The Workshop

We organized a workshop to plan the design and implementation of a new privacy-preserving communication service for decentralized applications. In the meetup, representatives from Status, Web3 foundation, Validity Labs, Parity Technologies, Brainbot, Matrix, Cogarius, and Caelum Labs gathered together to discuss the details of the project.

We agreed on properties and features we are interested in for the system, discussed existing techniques and solutions to reach our desired features, and agreed on next steps and milestones.

A summary of the features we are interested in are as follows:

  • Anonymity for both senders and receivers - We aim at providing strong privacy guarantees, however, we have not decided on the metrics yet. Receiver anonymity was sacrificed in some recent efforts to use mixnet designs to support payment channels, which then limits scalability by preventing lite clients from recognizing their transactions’ progress. We aim at addressing receiver anonymity too.
  • Strong adversary model - We aim at being secure against an adversary with global reach, and who controls some relays in the system, because such adversaries have become increasingly commonplace.
  • Adversarial contacts - We aim at supporting modes where senders and receivers are adversarial. Academic proposals including Loopix frequently omit this assumption, making them weaker than Tor’s hidden services.
  • Mid-latency - We want to allow a low-latency mode for users who are interested in privacy against a weaker adversary and a mid-latency mode for users who are interested in privacy against a weaker adversary. Explicitly, we aim at having ~5 seconds for applications that target weaker adversaries and ~10 seconds for applications that wish to be private against a stronger adversary.
  • Strong usability - We aim to address useability for both users and developers. For example, for developers we want to provide proper documentation and make is easy to maintain and upgrade the system. For users we want to provide an easy interface and allow configurability. We have not decided yet whether we allow the users to keep history of their communication.
  • Asynchronicity - We want to support limited store-and-forward capability because “It's an asynchronous world” in which user may not be online simultaneously.
  • Scalable - We aim at designing a highly scalable system. However, we want to avoid protocols with artificial scaling limits below 100k relays and 1 billion users.
  • Adjustable privacy-performance tradeoffs - We aim at providing adjustable privacy often compromises everyone, but we should provide applications with any safe adjustment options that exist. For example, we could allow for two modes, one that has strong privacy and some delay and one with lower privacy but with no added delay.
  • Accommodate relay churn - Permit relays to join or leave the network easily. In case users need to be relays in the system this also applies to users.
  • Incentivize relaying - Our system should recognize correct relay operation with payments in a transactable currency.
  • Zero barriers for users - We want new users do not pay for basic services.
  • Low overhead for users - We aim at dedes low overhead for users that does not kill bandwidth or battery on mobile devices.
  • Group messaging - We aim at supporting small scale group messaging. However, it won't be part of the main protocol but rather an extension.
  • Ratchet compatibility - We should be compatible with forward secure encryption for messages, which may require improving MLS/TreeKEM or similar.
  • Easy upgradeable - Our system should enable upgradeability for all parts of the system.

While not all details of the design are decided yet we have discussed existing concepts we want to incorporate in the system.

As a minor example, we follow Loopix in allowing the users to determine the delays of their communication. Another example is that, we support adversarial contacts, that is, providing sender/receiver anonymity against the receiver/sender. If we would not considering adversarial contacts we’d be categorically weaker than Tor for many users, including some common financial use cases. These techniques do increase complexity and latency, so we should also support applications that bypass them for improved performance, or earlier deployment.

As another example, we cannot foresee scaling beyond roughly 10k relays using a Tor-like global consensus protocol. We expect to modify Brahms, or perhaps devise some distributed-hash-table (DHT) scheme, to provide secure random relay selection both for greater scalability and to improve performance for clients.

For more details about our workshop please see the workshop minutes.

Next Steps

  • Completing related work analysis particularly related solutions
  • Describe our incentive schemes and include security analysis
  • Clarify how data syncing is going to work
  • Early designs for routing model and sampling method
  • Virtual meetup in mid March to discuss some design decisions
  • Panel on Privacy-preserving Messaging in EthCC 2019 Paris

This post was written based on collective notes from all attendees at the workshop. Thanks to everyone involved.