fleeting.computer

Do agents break the way teams deliver software?

For a certain type of problem coding agents (Claude Code, GitHub Copilot and friends) let solo engineers ship code very quickly. In a lot of ways a coding agent is like a robotic exoskeleton that lets engineers go further, faster and for longer but the trade off is wearing a pair of mechanised iron mittens that reduces precision and "tactile" code level feedback.

I'm mostly convinced that for a solo engineer agents go faster. But what happens when every engineer on a team goes faster? Does the tried and tested idea that the unit of software delivery is the team break down? I'm not convinced the usual processes we rely on (CI/CD, unit testing, trunk based development, etc) will keep up when everyone is going at agent speed.

I think we need some new defaults for working with agents on software teams.

What breaks when everyone is going at agent-speed?

  • Integration: shipping bigger chunks of functionality faster makes merging harder… and no one wants to review your 5000 LOC Copilot fever dream pull request. Yes you can try to convince your coding agent of choice to write small atomic commits but it's very much not how these tools want to work. Having worked on a project where everyone is using agents it's not uncommon to come in the next morning and find everything changed.
  • Code trust: things we used to trust (unit tests, interface contracts, etc) churn a lot more often - after all we can just use the agent to rewrite them. The testing problem is especially tricky, you can't rely on tests you don't trust and it's very easy to prompt an agent to write loads of low value unit tests, or for an agent to write tests that look correct but don't actually test the domain logic you care about. As above the chance of a human being able to read through all those tests and spot the dud is very low.
  • Mental load: coding with an agent (if you do it right and try to keep up with what the agent is doing) is a constant stream of bringing architectural trade offs into focus and trying to stay one step ahead of Claude saying "I'll just store everything in memory for now…" or "Developing a more complete solution is out of scope". Whether you're going spec-driven or vibing you need to make decisions if you want the agent to do something valuable. It turns out making these decisions repeatedly and at speed is exhausting… let alone keeping up with what everyone else has done.
  • Review overhead: it's simply not realistic to expect humans to read high volumes of code and do timely reviews that produce value. Code reviews don't catch 100% of issues when going at human speed, they'll perform worse at "agent speed"
  • Balancing now with next: agents can't see what isn't in the code - are we refactoring toward or away from a pattern? What's the next-next feature we should be keeping in mind while we build this? Yes, you can try to get the agent to document this for itself but agents (at least without strict guardrails) are very bad at documentation - preferring to generate endless streams of markdown files instead of tight, precise and readable docs.

How can teams keep up with agents?

So, if we want teams to take full advantage of coding agents then we need some new techniques for working with and integrating the code they produce.

Smaller teams and clearer code-level boundaries

There's already a maximum size on teams when going human speed. It stands to reason that if engineers become more productive then that size gets smaller.

It also makes sense for the scope of an engineers' responsibility to get bigger. Instead of working on story or feature level changes I think engineers will be owning entire 'sub-systems' of the overall solution.

In some contexts this could be one micro-service (at a time) per engineer. But what's more interesting is to think about how larger code could be divided up. Projects will need to start with a single engineer (or pair) to establish a walking skeleton and identify how to break things into sub-systems. Once the boundaries and interfaces are understood more engineers can be added – but I think we'll cap out at around three per team. We still need high bandwidth human to human conversations to ensure we evolve things in the right direction.

What is a sub-system? Any interface that we think will solidify and expand over time is a good candidate. For example is your app needs some sort of queue to move work or data around that can safely hide its internals form everything else that would be a good choice to give to someone (and their agent) to own and evolve.

There's a tension here with the received agile wisdom of building things in end to end slices of value. While I think value delivered to the end user needs to be the North Star using some of the extra productivity from agents to get better at code level architecture is no bad thing. As long as we leave room for things to evolve and avoid 'big architecture up front'.

One option could be that while your engineers focus on building out the essential sub-systems that provide the capabilities of your app your Product Owners use agents to stitch them together into features - assembling pre-validated components is where agents already excel.

Over time we will see engineers move from "people who write code" to technical product owners of a particular sub-system of capability and needing to consider not just how behaviour manifests for the end user but also how their sub-system changes, evolves and is consumed over the life of the codebase.

Self verifying code, agent written tests aren't enough

We can't (or shouldn't) trust all of the code that agents produce. We also shouldn't trust humans to review it all.

Unit tests are also code and can be gamed, even with static analysis and test coverage it's possible to write junk tests that look good but do nothing.

What's better than unit tests? Putting the assertions into the code itself and blowing up when we encounter impossible or dangerous states.

At first glance this can appear counter productive. After all your classic web server is an exercise in refusing to let a process die no matter how much goes wrong. But by turning a 'correctness' concern into a 'liveness' concern we also turn it into something we can monitor.

Writing code like this opens the door to deterministic simulation testing (using an algorithm to generate lots of fake but realistic data and throwing it at your system) and some of the techniques in the Nasa rules for safety critical code.

Because we've already broken our code into sub-systems with known interfaces we can write these at a sub-system level.

The take away here is that once you have a certain number of test cases it's easier to understand a small program that tests everything (to a certain statistical level of everything) than a long list of assertions that test most things (you hope).

Languages with built in verification are another promising avenue to explore but many fall into the trap of needing the agent or engineer to write the verifications in the first place.

Of course tests and behaviour aren't the only things we need to verify. Insecure code, supply chain attacks, prompt injection inserting evil code into your app and common foot-guns also need to be detected before shipping to production. I think the thing to hope for here is that static analysis tooling finally gets fast enough to run as part of your fast-feedback loop in CI.

Dual piloting, two engineers driving an agent (especially at the start)

To tackle both the high cognitive load of "feeding the agent" somewhat sensible architecture and also bring the review step forward having two engineers drive a single agent seems like a good idea.

This is the re-discovery of pairing for agentic-teams (or, to stress the metaphor, your exoskeleton just levelled up to a giant mech with two pilots).

Effectively two engineers discuss the architecture as they build it, ideally with a whiteboard on hand. While the agent is generating code the engineers discuss the system, how it is evolving and how it should work, ready to feed pre-validated specs into the agent.

I predict this practice will be especially valuable at the start of projects where you're defining your core code architecture and looking for sub-systems you can pull out and delegate to team members.

Continual investment in the 'agent harness' and keep more context in your codebase

Finally teams wont just build code, they'll continually build tools to help them build and maintain code. Agent skills, for example, have a very low cost of creation - it's just markdown, but are very effective at getting an agent to do what you want.

Imagine we're migrating our queue system to use a newer interface, along with deprecating the old version we can publish an agent skill to help consumers update their own code from v1 to v2.

Or in a real world example, I'm writing a programming language that compiles to bytecode. I have a markdown file that tells Copilot how to read and debug the bytecode - it speeds up development a lot.

Over time I think we'll see more and more skills and key artefacts for software delivery pulled into the codebase. They wont always be fed into agent context but we'll want them there just in case. Claude, in particular, can already do some impressive things by analysing git history for patterns of changes. We'll probably end up keeping backlogs, architecture diagrams, back-porting log files from production, etc, etc all into the codebase (or at least having a skill that makes accessing these things trivial - but personally I'm long on text files).

Conclusion

Speeding up one part of a system often breaks the parts to either side. I don't think coding agents will do away with all of the practices we know today but they will force us to pick the parts that work and evolve the rest.

If I was starting a new team tomorrow and everyone was using coding agents then I'd definitely be pushing for assertions in code and regular (seriously, like daily) whiteboard discussions about how sub-systems were evolving and how we were keeping those concerns separate enough that agent-powered engineers could work on them without collisions.

The temptation and the trap for teams that go in with their eyes closed is handing over your architecture to be driven by the vibes. The reality is that the teams who are intentional and deeply understand the systems they are building will go the fastest - same as it ever was.