Architecture

Coder's AI agent interacts with workspaces over the same connection path as a developer's IDE, web terminal, and SSH session already use. There is no sidecar process and no new network paths. If your developers can already connect to their workspaces, the agent can too.

Architecture at a glance

Three components are involved in every agent interaction:

  1. The control plane runs the agent loop. It receives prompts, streams them to the LLM provider, interprets tool calls, and dispatches them to workspaces.
  2. The LLM provider (Anthropic, OpenAI, Google, Azure, AWS Bedrock, or any OpenAI-compatible endpoint) performs model inference. It never communicates with the workspace directly.
  3. The workspace is standard compute infrastructure. It runs shell commands, reads and writes files, and executes processes — exactly what occurs when a developer connects via their IDE.
Architecture diagram

The same connection your IDE uses

This is the key architectural insight: the agent reaches into a workspace over the same Tailnet tunnel that a developer's tools already use.

When a developer opens a web terminal in the Coder dashboard, connects via VS Code Remote, or runs coder ssh, the traffic follows this path:

  1. The client connects to the control plane.
  2. The control plane routes the connection through its internal Tailnet node.
  3. The connection reaches the workspace daemon over a DERP relay or direct peer-to-peer link.
  4. The workspace daemon handles the request — spawning a shell, forwarding a port, or serving a file.

When the agent executes a tool call — reading a file, running a command, writing code — it follows the same tunnel:

  1. The agent loop in the control plane issues a tool call.
  2. The control plane routes the call through its internal Tailnet node.
  3. The call reaches the workspace daemon over the same DERP relay or peer-to-peer link.
  4. The workspace daemon handles the request via its HTTP API — reading a file, starting a process, or writing content.

The underlying tunnel is identical. IDE connections use SSH, web terminals use a WebSocket protocol, and the agent uses the workspace daemon's HTTP API — but all three traverse the same Tailnet connection and rely on the same security boundary. No additional ports or network paths are introduced.

No inbound ports

The workspace daemon always dials out to the control plane — never the reverse. The control plane then uses that established tunnel to reach back in. This means:

  • The workspace needs no inbound ports or exposed services.
  • You can block all inbound traffic to the workspace.
  • The only required outbound connection from the workspace is to the control plane itself.

This is unchanged from how workspaces already operate in Coder. Enabling Coder Agents does not change your workspace network requirements.

The agent loop

When a user submits a prompt, the control plane processes it as a background job:

  1. The prompt is saved to the database and the chat is marked pending.
  2. The control plane picks up the chat and marks it running.
  3. The control plane streams the conversation to the configured LLM provider.
  4. The model responds with text, reasoning, or tool calls.
  5. If the response includes tool calls, the control plane executes them (connecting to the workspace as needed) and returns the results to the model.
  6. Steps 3–5 repeat until the model produces a final response with no further tool calls.
  7. The chat is marked waiting for the next user message.

This loop runs inside the control plane process. There is no separate service to deploy — it is part of the same binary that serves the dashboard and API.

Context compaction

As conversations grow, the agent automatically summarizes older context to stay within the model's context window. When token usage exceeds a threshold, the agent generates a compressed summary and inserts it as a new message. Earlier messages remain in the database and are still visible to users, but are excluded from the model's context window. This happens transparently and keeps long-running sessions productive.

Message queuing

Users can send follow-up messages while the agent is actively working. Messages are queued in the database and delivered when the agent completes its current turn — the full sequence of steps until the model stops calling tools. There is no need to wait for a response before providing additional context or redirecting the agent.

Tool execution

Tools are how the agent takes action. Each tool call from the LLM translates to a concrete operation — either inside a workspace or within the control plane itself.

Workspace connection lifecycle

The connection to a workspace is lazy. It is not established when a chat starts — only when something needs to reach the workspace. This is typically triggered by the first tool call that requires workspace access. Once established, the connection is cached and reused for the duration of that chat session.

Chats that don't need workspace access (answering questions, planning an approach, discussing architecture) never provision or connect to a workspace.

Workspace tools

These tools execute inside the workspace via the workspace daemon's HTTP API. They traverse the same Tailnet tunnel used by web terminals and IDE connections.

ToolWhat it does
read_fileReads file contents with line-number pagination.
write_fileWrites content to a file.
edit_filesPerforms atomic search-and-replace edits across one or more files.
executeRuns a shell command (foreground or background).
process_outputRetrieves output from a background process.
process_listLists all tracked processes in the workspace.
process_signalSends a signal (SIGTERM or SIGKILL) to a background process.

Platform tools

These tools run entirely within the control plane. They do not require a workspace connection.

ToolWhat it does
list_templatesBrowses available workspace templates, sorted by popularity.
read_templateGets template details and configurable parameters.
create_workspaceCreates a workspace from a template and waits for it to be ready.

Orchestration tools

These tools manage sub-agents — child chats that work on independent tasks in parallel.

ToolWhat it does
spawn_agentDelegates a task to a sub-agent with its own context window.
wait_agentWaits for a sub-agent to finish and collects its result.
message_agentSends a follow-up message to a running sub-agent.
close_agentStops a running sub-agent.

What runs where

Understanding the split between the control plane and the workspace is central to the security model.

ResponsibilityWhere it runsDetails
Agent loopControl planePrompt processing, tool dispatch, step iteration.
LLM inferenceLLM providerThe control plane streams requests to the external provider.
Chat stateControl planeAll messages, token usage, and status stored in the database.
Git authenticationControl planeUses existing Coder external auth (GitHub, GitLab, Bitbucket).
User identityControl planeEvery action is tied to the user who submitted the prompt.
Model/prompt configControl planeAdministrators configure providers, models, and system prompts centrally.
File read/writeWorkspaceThe workspace file system is the source of truth for code.
Shell executionWorkspaceCommands run in the workspace's environment with its packages and tools.
Git operationsWorkspaceCommits, pushes, and branch management happen inside the workspace.
Build and testWorkspaceCompilation, test suites, and dev servers run on workspace compute.

The workspace has zero AI awareness. There are no LLM API keys, no agent processes, and no AI-specific software installed. If you inspect a workspace created by the agent, it looks identical to one a developer created manually.

Chat state and persistence

All chat data is stored in the control plane database, not in the workspace.

  • Chat metadata — status, owner, associated workspace, timestamps, and parent/child relationships for sub-agents.
  • Messages — every message (user, assistant, tool calls, tool results) is stored as a separate record with role, content, and token usage.
  • Compressed context — when the agent compacts the conversation, summaries are stored with a compression flag so the original context budget is preserved.
  • Queued messages — follow-up messages sent while the agent is working are held in a queue and delivered in order.

Because state lives in the database:

  • Chat history survives workspace stops, rebuilds, and deletions.
  • An administrator can inspect any chat for audit or debugging.
  • The agent can resume work by targeting a new workspace and continuing from the last git branch or checkpoint.

Security implications

The control plane architecture has direct consequences for how you secure AI coding workflows.

No API keys in workspaces

LLM provider credentials exist only in the control plane. The workspace never sees them. There is nothing for a developer, a compromised dependency, or a rogue process to exfiltrate.

Workspaces can be fully network-isolated

Because the workspace does not need to reach any LLM provider, you can restrict its network access to only:

  • The control plane (required for the workspace daemon to function).
  • Your git provider (for push/pull operations).

Everything else can be blocked. The AI functionality comes from the control plane, not from the workspace's network.

Tip

For sensitive environments, create dedicated templates for agent workloads with stricter egress rules than your standard developer templates. Because the AI comes from the control plane, these templates do not need any outbound access to LLM providers.

Centralized enforcement

Administrators control which models are available, the system prompt, and tool configuration from the control plane. Developers can select from the set of admin-enabled models when starting or continuing a chat, but cannot add their own providers or override system prompts or tool permissions. When an administrator removes a model or modifies the system prompt, the change applies to all agent sessions immediately.

User identity on every action

Every action the agent takes — PRs opened, code committed, commands executed — is tied to the user who submitted the prompt. There is no shared bot account or anonymous identity. If a developer submits a prompt that results in a pull request, that pull request is attributed to them via the git authentication already configured in your Coder deployment.

Scaling and resource impact

The control plane overhead for Coder Agents is minimal. The heavy computation happens elsewhere:

  • LLM inference runs on the external provider's infrastructure.
  • File I/O, builds, and tests run on workspace compute.
  • The control plane primarily proxies streaming responses and dispatches tool calls over existing network connections.