May 16 20247 min read

Scale Testing Part 1: Why Scale Test?

Cian Johnston

At Coder, we embarked on a major rewrite of our flagship product, culminating in a v2.0 release in late 2023. Prior to this, we started an initiative to perform comprehensive load tests to proactively identify and fix issues that would block rollouts at large scale. Spoiler: it turns out that scaling is hard, so we kept doing it.

What do we mean by "scale test"?

If you look on the internet, the overall consensus is that a "scale test" is where you attempt to determine the effects of increasing user load on a given system, while a "stress test" is where you throw as much load you can at a system to see how much it can handle. Our "scale tests" fall somewhere in between the two – given a Coder deployment with a certain amount of resources, we want to determine its ability to handle a given amount of load.

When we perform a scale test, we do the following:

Spin up a real Kubernetes cluster,
Install a given version of Coder via Helm with a set amount of resources available,
Create a given number of users and workspaces in parallel,
Send traffic to and from the workspaces,
Clean up created resources,

There's a bunch of other supporting work, but that's the gist of it. Scale testing is testing at scale.

Why did we do this?

You can unpack this question a number of ways:

Why did we do this at all?

We're a small company, and our internal dogfood deployment has at most 19 active users. Some deployments have thousands of active users! We're obviously not going to run into the same kinds of problems as these deployments, so it's important for us to validate that Coder can perform well at this scale.

Why not just perform benchmarks?

Benchmarks only test individual system components, and don't tell you what sort of behaviours you'll see at scale. Think of it as an analogue to unit tests versus integration tests – you don't just want one part of the system to perform well, you also want the whole system to perform well.

Why Kubernetes?

A number of reasons but the main ones are:

Our largest customers deploy Coder on Kubernetes, so this allows us to directly validate that deployment architecture.
Kubernetes also makes it simple to scale our test deployment up and down as we test different amounts of load versus resources.

We'll go into more detail about our Kubernetes scale testing environment in a later post.

What sort of problems did we run into?

Because scaling is hard, it also follows that testing at scale is hard:

Tooling: We ended up needing to write our own tooling to load test Coder. Just running ApacheBench against our JSON API wouldn't have cut it:
- A single JSON payload won't trigger the complex series of behaviours required to stand up a running workspace,
- Some aspects of our testing involve setting up a persistent Tailscale connection,
- Finally, we have a Go SDK. Why not use it?
Monitoring: As running a scale test produces tons of data (you are monitoring as you test, aren't you?), we quickly realized that the best choice was to use the industry-standard Prometheus time series database to export any data we needed, and Grafana to make sense of it all.
Cost: Even when we scale down resources to a bare minimum, performing scale tests is still relatively expensive. We very quickly invested in automation to bring up and tear down the cluster as needed.
- We also added the option to use preemptible nodes to save money while developing the automation. However, we would recommend against performing scale tests using preemptible nodes, as one or more nodes may be pre-empted without warning and cause the test to fail.

Was it worth it all, in the end?

Most definitely! We found many issues, both large and small, for example:

Each running workspace agent periodically performs a number of HTTP requests back to Coder for various purposes such as reporting statistics. Some of the queries behind these endpoints suffered badly from database read amplification, and others didn't need to write to the database immediately at all. We created custom database queries to minimize the overhead of these frequently-used endpoints, and added batching where applicable to reduce the number of database queries. Finally, we also moved the communication between the agent and Coder to use dRPC, which reduced the overhead of multiple HTTP requests.
Workspace agents connect to Coder using Tailscale. Initially Coder was wired up to create a separate Tailscale network for each agent, which resulted in excessively high memory usage when dialing multiple agents from Coder at once. We modified our network overlay to use a single Tailscale network for all agents, which resulted in a massive decrease in memory usage in this scenario!
- https://github.com/coder/coder/issues/8071
- Bonus: while testing this fix, we also found the fix needed a fix! https://github.com/coder/coder/issues/9929
We quickly found that creating a large number of workspaces at once takes up a considerable amount of resources. Because of this, we modified our Helm chart to allow separating the workload of building workspaces from the main Coder deployment itself, which allows us (and you) to scale the provisioning of workspaces separately!
- https://github.com/coder/coder/issues/8243
While iterating on our scaletesting template, we found a bug in the template editor that would cause the executable bit of template files to be removed!
- https://github.com/coder/coder/issues/10034

How do I run one myself?

We have fairly comprehensive steps documented in our GitHub repository, and we also have more detailed documentation about our scale testing method. But here's a quick version using KinD. Note that you will be constrained by the CPU and memory resources available on your host machine.

Ensure you have KinD installed on your machine, if you have not done so already. You can download it from https://github.com/kubernetes-sigs/kind/releases. You will need Docker installed and working on your host machine.
Create a local Kubernetes cluster:

Ensure your kubectl is configured to speak to this cluster, this should report that the Kubernetes control plane is running at http://127.0.0.1:12345 (Note: the port may differ for you):

Install Coder using the Helm chart, limiting Coder to 1 CPU core and 1 GB memory:

Initialize the Coder deployment and create the first user:

At this point, you can now access the Coder dashboard by port-forwarding and opening http://localhost:8080:

Import the default Kubernetes template. You can run this command or manually import it in the UI:

Create 3 scale test users and workspaces using the above template. The --no-cleanup flag prevents the workspaces from being deleted automatically on finish:

Send SSH traffic to all workspaces in 128 byte chunks every 100ms for 60 seconds:

Clean up all workspaces:

When you are finished, you can delete the KinD cluster:

For more information on scale testing, you can see our online documentation or run coder exp scaletest –help. Note that the exp scaletest command is not included in the agent (aka. "slim" binary) to save space, so make sure you are running the full Coder binary by checking the output of coder --version.

TL;DR Chat

Agent ready

Subscribe to our newsletter

Want to stay up to date on all things Coder? Subscribe to our monthly newsletter and be the first to know when we release new things!