HomeAdministrationInfrastructureValidated Architectures

Validated Architectures

Many customers operate Coder in complex organizational environments, consisting of multiple business units, agencies, and/or subsidiaries. This can lead to numerous Coder deployments, due to discrepancies in regulatory compliance, data sovereignty, and level of funding across groups. The Coder Validated Architecture (CVA) prescribes a Kubernetes-based deployment approach, enabling your organization to deploy a stable Coder instance that is easier to maintain and troubleshoot.

The following sections will detail the components of the Coder Validated Architecture, provide guidance on how to configure and deploy these components, and offer insights into how to maintain and troubleshoot your Coder environment.

General concepts
Kubernetes Infrastructure
PostgreSQL Database
Operational readiness

Who is this document for?

This guide targets the following personas. It assumes a basic understanding of cloud/on-premise computing, containerization, and the Coder platform.

Role	Description
Platform Engineers	Responsible for deploying, operating the Coder deployment and infrastructure
Enterprise Architects	Responsible for architecting Coder deployments to meet enterprise requirements
Managed Service Providers	Entities that deploy and run Coder software as a service for customers

CVA Guidance

CVA provides:	CVA does not provide:
Single and multi-region K8s deployment options	Prescribing OS, or cloud vs. on-premise
Reference architectures for up to 3,000 users	An approval of your architecture; the CVA solely provides recommendations and guidelines
Best practices for building a Coder deployment	Recommendations for every possible deployment scenario

For higher level design principles and architectural best practices, see Coder's Well-Architected Framework.

General concepts

This section outlines core concepts and terminology essential for understanding Coder's architecture and deployment strategies.

Administrator

An administrator is a user role within the Coder platform with elevated privileges. Admins have access to administrative functions such as user management, template definitions, insights, and deployment configuration.

Coder control plane

Coder's control plane, also known as coderd, is the main service recommended for deployment with multiple replicas to ensure high availability. It provides an API for managing workspaces and templates, and serves the dashboard UI. In addition, each coderd replica hosts 3 Terraform provisioners by default.

User

A user is an individual who utilizes the Coder platform to develop, test, and deploy applications using workspaces. Users can select available templates to provision workspaces. They interact with Coder using the web interface, the CLI tool, or directly calling API methods.

Workspace

A workspace refers to an isolated development environment where users can write, build, and run code. Workspaces are fully configurable and can be tailored to specific project requirements, providing developers with a consistent and efficient development environment. Workspaces can be autostarted and autostopped, enabling efficient resource management.

Users can connect to workspaces using SSH or via workspace applications like code-server, facilitating collaboration and remote access. Additionally, workspaces can be parameterized, allowing users to customize settings and configurations based on their unique needs. Workspaces are instantiated using Coder templates and deployed on resources created by provisioners.

Template

A template in Coder is a predefined configuration for creating workspaces. Templates streamline the process of workspace creation by providing pre-configured settings, tooling, and dependencies. They are built by template administrators on top of Terraform, allowing for efficient management of infrastructure resources. Additionally, templates can utilize Coder modules to leverage existing features shared with other templates, enhancing flexibility and consistency across deployments. Templates describe provisioning rules for infrastructure resources offered by Terraform providers.

Workspace Proxy

A workspace proxy serves as a relay connection option for developers connecting to their workspace over SSH, a workspace app, or through port forwarding. It helps reduce network latency for geo-distributed teams by minimizing the distance network traffic needs to travel. Notably, workspace proxies do not handle dashboard connections or API calls.

Provisioner

Provisioners in Coder execute Terraform during workspace and template builds. While the platform includes built-in provisioner daemons by default, there are advantages to employing external provisioners. These external daemons provide secure build environments and reduce server load, improving performance and scalability. Each provisioner can handle a single concurrent workspace build, allowing for efficient resource allocation and workload management.

Registry

The Coder Registry is a platform where you can find starter templates and Modules for various cloud services and platforms.

Templates help create self-service development environments using Terraform-defined infrastructure, while Modules simplify template creation by providing common features like workspace applications, third-party integrations, or helper scripts.

Please note that the Registry is a hosted service and isn't available for offline use.

Kubernetes Infrastructure

Kubernetes is the recommended, and supported platform for deploying Coder in the enterprise. It is the hosting platform of choice for a large majority of Coder's Fortune 500 customers, and it is the platform in which we build and test against here at Coder.

General recommendations

In general, it is recommended to deploy Coder into its own respective cluster, separate from production applications. Keep in mind that Coder runs development workloads, so the cluster should be deployed as such, without production-level configurations.

Compute

Deploy your Kubernetes cluster with two node groups, one for Coder's control plane, and another for user workspaces (if you intend on leveraging K8s for end-user compute).

Control plane nodes

The Coder control plane node group must be static, to prevent scale down events from dropping pods, and thus dropping user connections to the dashboard UI and their workspaces.

Coder's Helm Chart supports defining nodeSelectors, affinities, and tolerations to schedule the control plane pods on the appropriate node group.

Workspace nodes

Coder workspaces can be deployed either as Pods or Deployments in Kubernetes. See our example Kubernetes workspace template. Configure the workspace node group to be auto-scaling, to dynamically allocate compute as users start/stop workspaces at the beginning and end of their day. Set nodeSelectors, affinities, and tolerations in Coder templates to assign workspaces to the given node group:

resource "kubernetes_deployment" "coder" {
  spec {
    template {
      metadata {
        labels = {
          app = "coder-workspace"
        }
      }

      spec {
        affinity {
          pod_anti_affinity {
            preferred_during_scheduling_ignored_during_execution {
              weight = 1
              pod_affinity_term {
                label_selector {
                  match_expressions {
                    key      = "app.kubernetes.io/instance"
                    operator = "In"
                    values   = ["coder-workspace"]
                  }
                }
                topology_key = # add your node group label here
              }
            }
          }
        }

        tolerations {
          # Add your tolerations here
        }

        node_selector {
          # Add your node selectors here
        }

        container {
          image = "coder-workspace:latest"
          name  = "dev"
        }
      }
    }
  }
}

Node sizing

For sizing recommendations, see the below reference architectures:

AWS Instance Types

For production AWS deployments, we recommend using non-burstable instance types, such as m5 or c5, instead of burstable instances, such as t3. Burstable instances can experience significant performance degradation once CPU credits are exhausted, leading to poor user experience under sustained load.

Component	Recommended Instance Type	Reason
coderd nodes	`m5`	Balanced compute and memory for API and UI serving.
Provisioner nodes	`c5`	Compute-optimized performance for faster builds.
Workspace nodes	`m5`	Balanced performance for general development workloads.
Database nodes	`db.m5`	Consistent database performance for reliable operations.

Networking

It is likely your enterprise deploys Kubernetes clusters with various networking restrictions. With this in mind, Coder requires the following connectivity:

Egress from workspace compute to the Coder control plane pods
Egress from control plane pods to Coder's PostgreSQL database
Egress from control plane pods to git and package repositories
Ingress from user devices to the control plane Load Balancer or Ingress controller

We recommend configuring your network policies in accordance with the above. Note that Coder workspaces do not require any ports to be open.

Storage

If running Coder workspaces as Kubernetes Pods or Deployments, you will need to assign persistent storage. We recommend leveraging a supported Container Storage Interface (CSI) driver in your cluster, with Dynamic Provisioning and read/write, to provide on-demand storage to end-user workspaces.

The following Kubernetes volume types have been validated by Coder internally, and/or by our customers:

Our example Kubernetes workspace template provisions a PersistentVolumeClaim block storage device, attached to the Deployment.

It is not recommended to mount volumes from the host node(s) into workspaces, for security and reliability purposes. The below volume types are not recommended for use with Coder:

Not that Coder's control plane filesystem is ephemeral, so no persistent storage is required.

PostgreSQL database

Coder requires access to an external PostgreSQL database to store user data, workspace state, template files, and more. Depending on the scale of the user-base, workspace activity, and High Availability requirements, the amount of CPU and memory resources required by Coder's database may differ.

Disaster recovery

Prepare internal scripts for dumping and restoring your database. We recommend scheduling regular database backups, especially before upgrading Coder to a new release. Coder does not support downgrades without initially restoring the database to the prior version.

Performance efficiency

We highly recommend deploying the PostgreSQL instance in the same region (and if possible, same availability zone) as the Coder server to optimize for low latency connections. We recommend keeping latency under 10ms between the Coder server and database.

When determining scaling requirements, take into account the following considerations:

2 vCPU x 8 GB RAM x 512 GB storage: A baseline for database requirements for Coder deployment with less than 1000 users, and low activity level (30% active users). This capacity should be sufficient to support 100 external provisioners.
Storage size depends on user activity, workspace builds, log verbosity, overhead on database encryption, etc.
Allocate two additional CPU core to the database instance for every 1000 active users.
Enable High Availability mode for database engine for large scale deployments.

Recommended instance types by cloud provider

For production deployments, we recommend using dedicated compute instances rather than burstable instances (like AWS t-family) which provide inconsistent CPU performance. Below are recommended instance types for each major cloud provider:

AWS (RDS/Aurora PostgreSQL)

Small deployments (<1000 users): db.m6i.large (2 vCPU, 8 GB RAM) or db.r6i.large (2 vCPU, 16 GB RAM)
Medium deployments (1000-2000 users): db.m6i.xlarge (4 vCPU, 16 GB RAM) or db.r6i.xlarge (4 vCPU, 32 GB RAM)
Large deployments (2000+ users): db.m6i.2xlarge (8 vCPU, 32 GB RAM) or db.r6i.2xlarge (8 vCPU, 64 GB RAM)

Comparison

Azure (Azure Database for PostgreSQL)

Small deployments (<1000 users): Standard_D2s_v5 (2 vCPU, 8 GB RAM) or Standard_E2s_v5 (2 vCPU, 16 GB RAM)
Medium deployments (1000-2000 users): Standard_D4s_v5 (4 vCPU, 16 GB RAM) or Standard_E4s_v5 (4 vCPU, 32 GB RAM)
Large deployments (2000+ users): Standard_D8s_v5 (8 vCPU, 32 GB RAM) or Standard_E8s_v5 (8 vCPU, 64 GB RAM)

Comparison

Google Cloud (Cloud SQL for PostgreSQL)

Small deployments (<1000 users): db-perf-optimized-N-2 (2 vCPU, 16 GB RAM)
Medium deployments (1000-2000 users): db-perf-optimized-N-4 (4 vCPU, 32 GB RAM)
Large deployments (2000+ users): db-perf-optimized-N-8 (8 vCPU, 64 GB RAM)

Comparison

Storage recommendations

For optimal database performance, use the following storage types:

AWS RDS/Aurora: Use gp3 (General Purpose SSD) volumes with at least 3,000 IOPS for production workloads. For high-performance requirements, consider io1 or io2 volumes with provisioned IOPS.
Azure Database for PostgreSQL: Use Premium SSD (P-series) with appropriate IOPS and throughput provisioning. Standard SSD can be used for development/test environments.
Google Cloud SQL: Use SSD persistent disks for production workloads. Standard (HDD) persistent disks are suitable only for development or low-performance requirements.

If you enable database encryption in Coder, consider allocating an additional CPU core to every coderd replica.

Resource utilization guidelines

Below are general recommendations for sizing your PostgreSQL instance:

Increase number of vCPU if CPU utilization or database latency is high.
Allocate extra memory if database performance is poor, CPU utilization is low, and memory utilization is high.
Utilize faster disk options (higher IOPS) such as SSDs or NVMe drives for optimal performance enhancement and possibly reduce database load.

Operational readiness

Operational readiness in Coder is about ensuring that everything is set up correctly before launching a platform into production. It involves making sure that the service is reliable, secure, and easily scales accordingly to user-base needs. Operational readiness is crucial because it helps prevent issues that could affect workspace users experience once the platform is live.

Helm Chart Configuration

Reference our Helm chart values file and identify the required values for deployment.
Create a values.yaml and add it to your version control system.
Determine the necessary environment variables. Here is the full list of supported server environment variables.
Follow our documented steps for installing Coder via Helm.

Template configuration

Establish dedicated accounts for users with the Template Administrator role.
Maintain Coder templates using version control.
Consider implementing a GitOps workflow to automatically push new template versions into Coder from git. For example, on GitHub, you can use the Setup Coder action.
Evaluate enabling automatic template updates upon workspace startup.

Observability

Enable the Prometheus endpoint (environment variable: CODER_PROMETHEUS_ENABLE).
Deploy the Coder Observability bundle to leverage pre-configured dashboards, alerts, and runbooks for monitoring Coder. This includes integrations between Prometheus, Grafana, Loki, and Alertmanager.
Review the Prometheus response and set up alarms on selected metrics.

User support

Incorporate support links into internal documentation accessible from the user context menu. Ensure that hyperlinks are valid and lead to up-to-date materials.
Encourage the use of coder support bundle to allow workspace users to generate and provide network-related diagnostic data.

Who is this document for?

CVA Guidance

General concepts

Administrator

Coder control plane

User

Workspace

Template

Workspace Proxy

Provisioner

Registry

Kubernetes Infrastructure

General recommendations

Compute

Control plane nodes

Workspace nodes

Node sizing

AWS Instance Types

Networking

Storage

PostgreSQL database

Disaster recovery

Performance efficiency

Recommended instance types by cloud provider

AWS (RDS/Aurora PostgreSQL)

Azure (Azure Database for PostgreSQL)

Google Cloud (Cloud SQL for PostgreSQL)

Storage recommendations

Resource utilization guidelines

Operational readiness

Helm Chart Configuration

Template configuration

Observability

User support

On this page