Every AI system diagram now has a box labelled “orchestrator”.
It is usually the biggest box. It often sits in the middle. Arrows go in, arrows go out. Sometimes it's a graph. Sometimes it's a loop. Occasionally it's a hexagon.
It is, in all cases, doing a lot of emotional labour.
The idea is straightforward enough.
LLMs are unreliable. They hallucinate, they drift, they occasionally decide that now is the moment to write a sonnet instead of calling your API. So we wrap them in something more structured—something that can coordinate steps, retry failures, and keep the whole thing on the rails.
This is “orchestration”.
And there are, at this point, a few distinct schools of thought on how to do it.
First, the graph people.
They will draw you a DAG. Each node is a step: call the model, hit a tool, validate the output. The edges define the flow. It looks clean, declarative. You can almost believe it will work first time.
Until you realise most of your edges are actually “if this looks a bit wrong, try again but differently”, which is not so much a graph as a polite fiction.
Then there are the workflow engine people.
They bring in serious machinery. Durable execution. Retries. Backoff strategies. Observability. If a step fails, it will be retried 17 times across multiple availability zones before finally giving up in a well-instrumented manner.
This is reassuring, right up until you notice you are now reliably retrying something that was never going to work in the first place.
There are also the minimalists.
“It's just a loop,” they say, correctly. Call the model, inspect the output, decide what to do next. No frameworks, no abstractions. Just code.
This works surprisingly well, until your loop contains twelve special cases, three escape hatches, and a comment that says “not sure why this fixes it”.
And finally, the multi-agent enthusiasts.
If one unreliable model is a problem, the solution is clearly several of them, talking to each other. Perhaps with roles. Perhaps with a supervisor. Perhaps with a voting system.
At this point you have not so much orchestrated the system as recreated a small, confused organisation.
What's striking is that all of these approaches, despite their differences, tend to converge.
Not in design, but in behaviour.
They all end up doing roughly the same things:
- calling a model
- inspecting a messy output
- deciding whether it's “good enough”
- trying again if it isn't
With some state threaded through, and a growing collection of edge cases.
In other words: a loop with opinions.
The reason for this convergence is slightly uncomfortable.
We don't actually know what the “steps” are.
In a traditional system, you can decompose a task with some confidence. This function parses input. That one validates it. This one writes to the database. Each step has a clear contract. Inputs in, outputs out.
With LLMs, that boundary is… negotiable.
Is “summarise this document” a single step? Usually. Until the summary is subtly wrong. Or incomplete. Or formatted in a way that breaks the next step. Now you need validation. Maybe a retry. Maybe a second model. Maybe a human.
What was a step is now a conversation.
And once your “steps” are fuzzy, everything built on top of them becomes fuzzy too.
So we add orchestration.
Not because we love orchestration, but because we're trying to stabilise something that doesn't have a stable shape.
The orchestrator ends up absorbing all the ambiguity:
- What counts as success?
- When do we retry?
- How do we recover from partial progress?
- What do we do when the model is confidently wrong?
These are not really orchestration problems. They're “what is this task, exactly?” problems.
But “task definition layer” doesn't fit as nicely in a diagram.
This is why orchestration code has a tendency to sprawl.
It starts as a clean abstraction—a graph, a workflow, a loop—and slowly accumulates exceptions. Little patches of logic that handle the cases where the model didn't quite behave as expected.
Over time, the orchestrator stops being a coordinator and becomes a kind of diplomatic layer between you and the model. Interpreting, correcting, nudging things along.
Doing, as mentioned, a lot of emotional labour.
So when we say “we need better orchestration”, it's worth being a bit suspicious of what we mean.
Because often what we're really saying is:
we don't yet have a clean way to describe what this system is supposed to do, step by step, in a way that survives contact with a language model
Until we do, the orchestration layer will continue to expand to fill that gap.
More nodes. More retries. More agents. Better diagrams.
The box in the middle gets bigger.
Orchestration isn't a layer in the stack.
It's what we call the part of the system we don't have a proper abstraction for yet.