Teamflow 2.0 now supports thousands of people in the same space. Our engineering team did several weeks of hard work to accomplish this feat. We built a distributed tester running hundreds of bots to find bottlenecks. We’ve eliminated them one by one and are ecstatic to showcase what we’ve done!
In a Teamflow space, each person is connected to a WebRTC-based video call; we use Daily to create and connect to calls. Clients upload streams to an SFU server, which then forwards them to all subscribed clients. Since an SFU server can only process so many audio and video streams, there is an upper limit to how many clients can be in a single call.
Spatial clustering
In super-populated spaces, people generally spread out and divide themselves into groups. They all don’t need to be in a single, shared call. We exploit this pattern to optimally assign each person to a call.
In Teamflow, there are two spatial parameters: the current “space” and the “position” within that space. Each space is like a parallel universe - you can jump between spaces, but exist in one at a given time. Within your space, a set of 2D coordinates defines your position.
People cannot interact with each other across spaces. Furthermore, two people must be in “proximity” to each other to hear or see each other. In Teamflow, proximity is roughly defined with two conditions:
- (left) When you’re in an “audio zone”, everyone within that zone is in your proximity.
- (right) Otherwise, everyone near you within the audible radius is in proximity.
The trick now is to calculate the smallest “clusters” of people, such that any two people in proximity are always in the same cluster. We can then create one call per cluster; each cluster is expected to be a lot smaller than the whole workspace!
3-step pipeline
Spatial clustering is implemented using a pipeline of state-machines that reacts to changes in participant and audio zone positions. Since each space is independent, each one holds an instance of this pipeline. The clustering algorithm goes through three steps:
Spatial hashing
A spatial hash is a 2D lookup for finding entities near a position. It divides the whole space into ~1000px buckets, and maintains which bucket(s) each person or audio zone are in.
Since buckets are larger than the audible radius, finding people in proximity is limited to searching in neighbouring buckets. This setup allows the next step in the pipeline to do that in amortized O(1) time!
Density-based clustering
This state-machine assigns people to a cluster. The formal definition of a cluster can be formulated as:
- Make an undirected graph with each person as a vertex, where two vertices are connected by an edge if they are in proximity. This is called the proximity graph.
- Each cluster is a component of this graph.
The actual proximity graph is not kept in memory. Rather, the state machine only holds the set of people in each cluster. The important events that alter this state are:
- A person joins the call - The state machine will find all people in proximity and aggregate the current clusters they are in. This is done in O(1) time due to spatial hashing. These clusters are then merged together.
- A person leaves the call - The state machine will re-evaluate the cluster of the leaving person. This is because, by leaving, the person could have caused the cluster to split. The new “child” clustering is calculated by using a spatial-hash accelerated breadth-first search in the proximity graph. Note that this search is limited to people still in the “old cluster” because they have not gotten into proximity with anyone new. Otherwise, the algorithm is similar to finding the components of a graph. It has an O(n) running time.
- A person moves - This is implemented like leaving and then returning back! Simple as it gets.
Call provisioning
This last step holds a pool of calls. As people move around and regroup, the number of clusters fluctuate. The call provisioner assigns each cluster a call, and frees the call when the cluster dissolves.
Path independent state machine
The clustering pipeline so far takes in “events” like a person joining or leaving. The algorithms, however, are “path independent”. They don’t need to take in the events “as they occur”. See, there are infinitely many ways of going from State A to State B. For example, a person can move from (0, 0) to (100, 100) in one step; or they can move in small steps (0, 0), (20, 20), (60, 60), (100, 100)! It doesn’t matter if we tell the pipeline left from (0, 0) and rejoined at (100, 100) once or they did that 3 times in smaller steps. However, the latter path takes several times longer to execute!
This “path independence” allows us to batch up changes in state across a few hundred milliseconds; to users, however, the pipeline is still perceived as responding in real time!
Teamflow is the best place for remote and hybrid teams to collaborate and work together. Start your 30-day free trial today.