Making Symphony Pull Queued Jobs From Entity
Symphony was originally assumed to be Linear-shaped. We rewrote the flow so Entity marks Swarm jobs queued, Symphony polls for them, and real runs start from there. Here is what broke and what finally worked.
Most agent demos die the second they leave the lab.
One model calling one script on one laptop is not the hard part. The hard part is queues, claims, retries, state, and writeback when real systems are involved.
We hit that wall with Symphony.
The original assumption was that Symphony would fit a Linear-shaped workflow. Our actual system runs work out of Entity through Swarm jobs. So the requirement became simple:
Symphony needed to pull queued jobs from Entity.
Simple sentence. Non-simple week.
The actual problem
We wanted one operational loop:
- A job gets created in Entity
- Dispatch marks it
queued - Symphony polls for queued work
- A worker claims the job and runs it
- Status and proof write back into Entity
That last step is the difference between a real system and a nice-looking demo. If humans cannot see what was requested, what got claimed, what ran, and what came back, the loop is not closed.
The first wrong assumption
The original path was push-shaped. Dispatch a job, throw it at Symphony, and hope the worker picks it up.
That was the wrong model.
Symphony is pull-based. The tracker exposes queued work. Symphony polls for it. Workers claim from that queue.
Once we stopped forcing a push flow and let Symphony poll the tracker properly, the architecture got simpler.
That meant rewriting the Swarm provider in Entity around the real contract:
- dispatch marks a job as
queued - Symphony polls
GET /api/swarm/jobs?status=queued - workers claim from that queue
No magic. Just the right shape.
What actually broke
The main architecture change was only half the story. Once the tracker and runner started speaking the same language, smaller failures showed up immediately.
1. The adapter read the wrong response shape
Entity’s API returned a single job like this:
{ "job": { "id": "290ed8b7b9714e4da958", "status": "queued" } }
The adapter was reading the payload as if the job object was at the top level.
So Symphony was asking for state and getting nonsense back because the adapter was looking in the wrong place. The fix was simple: unwrap the job object first, then map status from the actual payload.
Tiny bug. Large confusion.
2. Codex CLI was missing on the Mac that had to do the work
Classic systems comedy.
The queue worked. The poll loop worked. The claim path worked. Then the real runner tried to start Codex on the Mac and found nothing there.
So yes, we had built a respectable conveyor belt that ended in empty air.
Installing the Codex CLI on the Mac removed that blocker.
3. Approval policy drift
Symphony’s default schema for codex.approval_policy expected a reject-style map. Codex 0.115.0 wanted the string never.
That is the kind of version mismatch that makes agent systems look flaky when the problem is actually contractual. The runner was doing what the old schema said. The tool had moved.
We changed the default to never.
4. The Swarm router was not mounted on the live Entity server
The code existed. The route did not.
createSwarmRouter() had to be mounted explicitly so /api/swarm existed on the running server.
This is the part people like to skip in architecture diagrams. Repository truth and runtime truth are not the same thing.
The moment it became real
The system stopped being theory when Symphony claimed a real job from Entity and launched a real Codex session on the Mac against a real workspace.
That mattered more than any dashboard.
The proof looked like this:
- a live job was claimed from Entity
- a Codex session started on the Mac
- the run executed in a real workspace for about 42 seconds
- completion tried to write back into Entity
The last step exposed the remaining ugly edge. Entity dipped at exactly the wrong moment, so writeback was not reliable yet.
Annoying, yes. Also useful. That is what a real end-to-end test is for.
Why this architecture is better
Because the queue should live where the work lives.
Entity already owns the job record. Once Symphony polls queued jobs from Entity directly, the operational surface gets cleaner:
- the job is created in one place
- the queue state is visible in one place
- the claim starts from one place
- proof and status have a clear destination
That beats the usual mess where planning is in one tool, runners are in another, logs are somewhere else, and proof is scattered across screenshots, commits, and vibes.
If it takes three dashboards to answer “what is the agent doing right now?” the architecture is still lying to you.
The part still worth worrying about
One ugly edge remains: writeback depends on Entity being reachable at the exact moment a run finishes.
So the next layer needs to be boring and stubborn:
- retryable writeback
- durable proof records
- clearer stuck-job handling
- stronger health checks between Entity and Symphony
None of this is glamorous. Good.
The bigger lesson
Most autonomous systems do not fail because the model is dumb. They fail because the state model is fuzzy.
In this case the failures were embarrassingly concrete:
- the workflow was assumed to be push when it needed to be pull
- the Swarm router was not mounted on the live server
- the adapter read the wrong API shape
- Codex CLI was missing on the Mac
- the approval policy default no longer matched Codex 0.115.0
- writeback still depended on timing that was too fragile
That is the real lesson.
Make the loop legible before you make it impressive.
If queue, claim, run, and writeback are visible end to end, you can harden the system. If not, you are just teaching yourself to trust smoke.