Introduction

Otter is a drop-in, cloud native framework for peer-to-peer video communication within web applications. It can be deployed to AWS with a single command and is ideal for web applications with real-time video communication needs.

For an application developer, Otter abstracts away the complexity of establishing a resilient, scalable infrastructure, and provides a simple way to integrate peer-to-peer WebRTC video calling into an application where privacy is of utmost importance.

In this case study, we will dive into Otter and explore how we built it and the design decisions and technical challenges we faced along the way. We will begin with an overview of peer-to-peer (P2P) and WebRTC.

Why Video Calling?

Video calling; it’s hard to imagine our modern world without it, and thus it’s no surprise that software developers are increasingly interested in incorporating it into their applications. In the last few years especially, the growth of telehealth has made video calling more commonplace in healthcare, though this is just one of many industries embracing video communication.

However, implementing reliable and scalable video communication comes with nuanced considerations, and developers need to consider several approaches to ensure that their solution is optimized for their end users. For example, selecting a network topology (how the various nodes of a network are arranged and connected) can have implications regarding the latency, resilience, and even privacy of a network. Within this case study, we will discuss many of the tradeoffs a developer may need to consider, starting with the tradeoffs between two major network topologies: a peer-to-peer model versus a client-server model.

P2P vs Client-Server Network Topologies

P2P is a network topology in which devices communicate directly with each other without the need for a central server. In a P2P network, each device can act as both a client and a server, meaning that it can both send and receive data directly from other devices in the network.

By contrast, in a client-server network topology, devices connect to a centralized server, which manages communication between them.

Image

Benefits of P2P for Video Calling
  • Increased privacy for peers: P2P networks eliminate the need for a central server that can log or monitor data passing through.

  • Decreased latency between peers: Without a central server, data has one less hop on the network before being sent to its destination, which can reduce the distance and time required to transmit information.

  • Increased network resilience: A central server represents a single point of failure; if it goes down then peers cannot communicate. A P2P network distributes the burden of transporting data across all peers in the network, rather than relying on dedicated servers.

  • Cost-effective: A developer does not have to pay to provision and maintain dedicated servers or the bandwidth they would consume (since bandwidth consumed between peers is each peer’s individual responsibility).

P2P networks have their limitations, and in the context of a video call, the most notable limitation is call size. Let’s see what happens when more peers are added to the network.

Image

In its simplest form, a P2P network requires that each peer connects to every other peer in the network1. Thus, in a P2P video call with 20 peers, each peer must send its video data in 19 different directions, and must process 19 separate incoming video streams from other peers! Larger call sizes quickly become untenable in terms of the bandwidth requirement placed on individual peers; a P2P video call may be able to support around 6 participants before performance is degraded.

Benefits of Client-Server for Video Calling
  • Greater call size: Connections between peers are handled by a central server, which is within a developer’s control. This means the developer can scale the hardware of the server, or potentially scale to multiple servers to handle increased call sizes.

  • Real time processing: For example, the ability to record a call, apply closed captions, facial recognition, even use an AI note taker.

A client-server model would be a more effective choice for a developer who needs to support call sizes of more than a few people or needs additional features like the ability to record calls. This topology could make sense for a developer who wants to implement video conferencing for online classes, support large business meetings, or record video calls that can be reviewed later for training purposes.

However, for a developer working on an application where privacy is of the utmost importance, a P2P topology is a better fit. Telehealth calls and virtual legal consultations are situations where calls will often contain sensitive information, and the privacy of these calls may be protected by law. Thus, the privacy gained by removing a central server (which can process and monitor these communications) is notable. Additionally, the private nature of these calls means that call sizes will seldom be larger than a few people.

Transport Protocols in Context of Video Calling

For the developer who wants to add P2P video calling to their web application, what else should be considered that may affect the user experience? Latency is a key aspect that can make video calls feel responsive, and various methods of transporting media packets over the network can affect latency. Typically, media packets are transported through TCP (Transmission Control Protocol) or UDP (User Datagram Protocol).

TCP is a protocol designed to make data transmission across the network reliable. TCP is considered a connection-oriented protocol because every TCP connection has three well-defined phases, including a handshake which establishes the parameters for data transmission to come. Once the handshake is completed, packets can be sequenced and delivered in order, and if a packet is lost, it is retransmitted, thus providing a guarantee of delivery and a guarantee of in-order delivery. TCP also has built-in control for network congestion avoidance by delaying packets if network congestion is detected. Both such guarantees and network congestion avoidance can increase the latency in the transmission of real-time video.

On the other hand, UDP is a connectionless protocol, and can send data without having to first establish a connection. UDP does not offer reliability. It instead offers speed and flexibility. A few lost packets can be tolerated as long as the video stream remains uninterrupted, because having the latest data is more important than having all the data.

Image

UDP is therefore ideal for rich audio and video data transmission, while TCP’s reliability mechanisms can introduce delays that are not conducive to a responsive audio-video user experience.

UDP-based, P2P Video Calling Solutions

Now that we have established the need for a UDP-based, P2P video calling solution, what are some of the questions that will arise during the development of the video calling application?

  • How to enable reliable communication channels over UDP?
  • How to process audio and video media streams?
  • How to encrypt application data end-to-end?
  • How to bypass restrictive network environments?

Many protocols offer different solutions to the above challenges. We will explore some of them in depth.

It is possible for developers to build P2P video calling functionality by manually stitching together multiple protocols, like RTMFP (Real Time Media Flow Protocol) with SIP (Session Initiation Protocol). This could provide a high degree of customization, but at the expense of valuable time, as a developer would have to gain a deep understanding of multiple protocols and how they should interact together.

Fortunately, there is already a solution that orchestrates protocols to implement UDP-based, P2P real-time communication: WebRTC. WebRTC is a communication standard, is free, and works natively within browsers. It has been endorsed by W3C, an organization focused solely on the development of standards for the web. W3C has recommended “… the wide deployment of this specification [WebRTC] as a standard for the Web”. WebRTC has been widely adopted in many products and services, including Google Meet, Facebook Messenger, and Discord.

WebRTC orchestrates well established protocols to abstract away much of the complexity associated with implementing a video calling application. This makes WebRTC a strong candidate for a developer looking to integrate a UDP-based, P2P video calling solution into a web application.

Image

While implementing real-time P2P functionality will be much more straightforward with WebRTC than without, there is still a depth of knowledge required for working with it. We will touch on this essential knowledge next.

Notes


  1. There are additional P2P topologies, but these are beyond the scope of this case study ↩︎

Top