Agentic AI Meets Coder

From the start, Coder was built to give enterprises secure, governed, and scalable development environments for their developers. Not as an afterthought, not as a layer on top, but as the foundation for how modern software should be crafted in large organizations.

If you’re an enterprise exploring how to roll out AI agents to your teams, this is the part that matters most: without a platform like Coder to distribute, govern, and control those environments, AI agents don’t provide a material advantage. In fact, it could introduce severe risks. It’s one thing to be excited about what AI can do, but if you can’t give agents a place to work safely, securely, and at scale, they’re not going to deliver results.

We’re not the only ones thinking this way. As Brendan Humphreys, Canva’s Chief Technology Officer, recently put it, “Engineers need to guide, assess, correct, and ultimately own the output [of AI tools] as if they had written every line themselves.” We couldn’t agree more. Without strong environments that keep AI work contained and controlled, companies risk trading short-term wins for long-term pain in the form of security risks, technical debt, and unmaintainable code.

So if your first thought is, “AI agents committing to production codebases sounds risky and expensive,” you’re right, it is. Unless you have a way to contain and control what the agent can access, what commands it can run, and where it writes and tests code. That’s exactly what Coder already does for developers, and now, the same applies to AI.

We’ve been curious, too. So we ran real experiments to see what agentic AI like Claude Code could actually accomplish when working on real engineering problems in real environments. Not just another “AI wrote a to-do app” story, but actual issues and pull requests from our open-source repo.

Why Coder Workspaces Are Perfect for Agentic AI

Think about how you treat a junior developer on day one. You don’t give them keys to prod and say good luck. You give them a sandbox to work in — a real environment with the right tools, but isolated so if they break something, it doesn’t take down the business. Coder gives AI that same experience. And just like any developer, when AI has the right tools and clear context, it can contribute in meaningful ways.

Our workspaces are built to be fully isolated, self-contained spaces where work gets done. If something goes wrong, the workspace can be blown away and rebuilt in minutes. AI can experiment, fail, try again, and get closer to a useful result without putting your SDLC at risk.

Imagine a future where agents are running dozens to hundreds of tiny experiments on your codebase to improve performance, reliability, and even functionality.

The Guardrails You Need for AI to Work

Enterprise teams need guardrails to ensure AI doesn’t create chaos in a vacuum. Coder supports role-based access controls so you can govern exactly what a workspace (and by extension, an AI agent) can touch. Secrets management ensures sensitive credentials aren’t leaked when AI starts poking around. Resource quotas keep AI from running wild and burning through cloud resources and tokens.

And because every workspace is fully reproducible, you can audit exactly what was done, by whom, or in this case, by what. These are not features we added for AI. They’ve always been part of Coder because this is how enterprise development needs to work. AI just fits into that model.

Right-Sizing AI Workspaces

But governance is only half the story. Another big hurdle is spinning up secure, consistent environments for every AI agent without crushing your infra team under a pile of custom requests. With Coder, that happens almost instantly. You can provision infrastructure-as-code to your cloud, on-prem, or even air-gapped environments with zero extra effort from infrastructure teams.

And AI agents don’t all need the same horsepower. One overlooked challenge of running AI agents at scale is figuring out how much compute to give them. You don’t want to spin up massive, expensive VMs for every task just because an agent might need it. With Coder, you can tailor your agents’ infrastructure depending on their tasks. Some agents might need more memory but fewer cores, or vice versa. And because Coder uses infrastructure-as-code, you can define these patterns and stay in control as AI models and usage change.

So when people ask us how to let agentic AI write and test code safely, our answer is simple. Give it a Coder workspace. Let it go wild inside walls that keep the rest of your world safe.

The Experiment: Giving Claude Code Real Development Tasks and Coder Workspaces

We didn’t want to write another post full of “AI wrote a to-do app” stories. So, we decided to put Anthropic’s research preview of Claude Code to work on our own open-source product. We took real GitHub issues and pull requests — the same ones our engineering team and community work on — and asked Claude Code to solve them inside Coder environments. No artificial constraints. No hand-waving. Just real engineering problems.

Our goal wasn’t to see if AI could hit a 100 percent success rate. That’s not the point. Even if AI gets 30 or 50 percent of the way there but does it in a fraction of the time it would take a human, that’s progress. And we fully expect these models to improve rapidly, so today’s partial success could be tomorrow’s solved task. Success here isn’t binary, and it’s shaped by more than just the AI model itself. Things like how you frame the task, the quality of the repo, the availability of tests, and the codebase structure all influence the outcome.

One unexpected but valuable side effect of working with agentic AI is that it forces us to be better product managers. AI can’t rely on institutional knowledge like our engineers sometimes do, which means we have to get sharper at clearly defining the problem and the outcome we’re looking for. It’s a forcing function that makes us pause and think about what we’re actually trying to solve, and how to communicate that in a way that anyone (or anything) could pick up and work on. In a way, working with AI is helping us level up our own skills as engineers, product managers, and collaborators.

What we saw was a mixed bag, but a promising one. Sometimes AI crushed a task. Sometimes it flailed. But we learned a lot about where it shines, where it stumbles, and how teams can start working with it productively.

Three Real Tasks We Gave Agentic AI — and What We Learned

If you want to go deep on specific tasks we gave to agentic AI, we’ve published a separate post that breaks down every task, outcome, and lesson learned. You can check that out here — but for now, here’s a quick look at a few examples that give a sense of what we saw.

First, we asked Claude Code to implement a new feature in an internal admin dashboard to sort GitHub issues by creation and update time. Without much prompting, it found the right files, added a toggle for sort order, and implemented the logic. After a quick human review and one minor fix (it had flipped ascending and descending), it was ready to go and even fixed an unrelated bug in the process. The result was 114 lines added, 28 removed, for under $5 and about 45 minutes total.
Next, we gave it a harder problem: removing a network extension from macOS in Swift while dealing with Apple’s undocumented APIs. After a lot of back-and-forth, Claude produced code that compiled and worked, which was impressive but still more of a proof-of-concept than production ready. The total cost was around $10 and it took about 90 minutes combined.
Finally, we asked Claude to tackle a backend bug in Go by fixing a Docker port mapping issue in one of our API endpoints. It made partial progress but needed a lot of nudging, especially to write and run tests. In the end, it didn’t fully solve the problem, but it got partway there. The total cost was $2.65 and about an hour of effort.

The pattern is clear: AI is great at scoped, well-defined tasks, especially when there are tests to guide it. It struggles when the problem requires real-world context, deep reasoning, or a nuanced understanding of how different parts of a system fit together.

It also gave us a real sense of the ROI of using AI agents in actual development work. Running these experiments was not just about seeing if AI could do the job, but also about understanding what it would cost to automate that work. Now we have a baseline to evaluate whether agentic AI makes sense for certain tasks. As these models improve and hopefully get cheaper over time, tracking cost will be just as important as tracking their accuracy and output.

Thoughts on Developer Productivity

We should address something that often gets lost in conversations about AI and developer productivity.

Most developers don’t want to measure their productivity by how many backlog tickets they close or how many lines of code they write. And they definitely don’t want to spend their days supervising fleets of AI agents working through tasks like factory cogs. Developers want to build things that matter. They want to solve real problems, tackle complex challenges, and have the space to think deeply and create.

So while we’re optimistic about the potential of AI agents to take on routine, repetitive work, we also believe that productivity isn’t just about output. It’s about giving developers the space to focus on what they do best — thinking, designing, and building things that are meaningful.

If an AI agent can automate the tasks developers don’t want to do, like annoying bug fixes, light refactors, and never-ending test coverage, that’s a win. But like anything, it requires balance. Not every developer wants to spend their day monitoring agents. Some will be excited about the idea of orchestrating dozens of agents working in parallel. Others will prefer to focus on a single hard problem and stay in flow for hours without distraction.

The future of AI in software development needs to make room for both. It should amplify developers strengths and give them leverage, not just add new overhead to manage.

What’s Next: Scaling Agentic AI for Development

The way we see it, this isn’t a fad. Within the next year, engineering teams will be managing dozens, maybe hundreds, of development environments running agentic AI to tackle routine engineering work. But that’s only going to be safe and productive if companies have the right platform to manage those environments. That’s why we believe Coder’s workspaces are so well suited to this future — not because we built them for AI, but because we built them for real, modern, enterprise development with all the governance and oversight required to safely operate within an organization's requirements.

We fully expect that AI will require different interfaces, controls, and ways of working than human developers. But even in its current state, Coder is already well-suited to support agentic AI in secure, isolated workspaces. Much of what’s on our roadmap — from API-driven events to workspace governance, pre-builds, resource management, and developer tooling — naturally moves us closer to a world where teams can manage both AI agents and humans side by side. We see Coder as the foundation for that future, giving organizations a way to govern, observe, and scale development work, no matter who’s doing it.

You Can Do This Today

One final point. This isn’t some distant vision of what Coder could do. You can do this today. You can spin up a Coder environment (free and open source), give an agentic AI a terminal, and let it work. You’ll need to guide it, supervise it, and know when to step in, but it’s possible right now. And we think teams that start experimenting with this today will be way ahead of the curve as AI capabilities keep improving.

If you’re not already thinking about how agentic AI fits into your engineering org safely, securely, and scalably, it might be time to start. Coder is ready for that world because we were always built for it.

If you’re curious, give it a shot. Try running Claude Code or another agent inside a Coder environment and see what happens. And if you do, tell us what worked and what didn’t. We’re figuring this out, too, and the more real-world data we all have, the better.

AI-native Development

Coder Registry