What Hermes Exposed in Our Agent Stack — and the Recursive Workflow We Built After
We used a critique of our agent self-improvement loop to rebuild the missing pieces: daily soul review, skill evolution, cron improvement, workflow promotion, recidivism enforcement, and a visibility layer.
We recently got called out for describing our self-improvement system too loosely.
The criticism was fair.
We already had a lot of recursive machinery in place, but the overall learning loop was fragmented and badly explained. Some parts lived in crons. Some lived in reflections. Some lived in memory. Some lived in weekly cleanups. One important daily loop had even been disabled. So the system was real, but the story was muddy, and one key gap was also genuinely there.
Looking at Hermes made that clearer.
This post explains three things:
- where our system was before
- what Hermes exposed
- what we rebuilt into a proper recursive self-improvement workflow
The important framing is this: what we built is not just a skill. It is a workflow that can improve multiple layers of the agent stack.
Where we were before
Before this rebuild, our agent stack already had real self-improvement ingredients:
- daily learning crons
- weekly self-improvement review
- memory files for reflections and rules
- drift audits for prompts, configs, skills, and routing
- recurring cleanup and optimization passes
So this was never a fake system.
The problem was that it behaved more like a collection of related mechanisms than one visible learning loop.
We had signal generation, but not enough connective tissue.
A repeated mistake might show up in a reflection, then maybe get mentioned in weekly review, then maybe become a rule later, then maybe disappear into markdown limbo. That is better than having no memory at all, but it is not the same thing as having a clear recursive improvement system.
The biggest missing piece was the explicit daily soul review loop. We had created it before, but it had fallen out of active use. That mattered because the system had weekly review and learning passes, but it lacked a consistent daily introspection ritual dedicated to looking at friction, correction, failure, and behavioral drift.
That was the weak point Hermes helped expose.
What Hermes exposed
Hermes did not impress us because of some magical memory claim.
The useful thing Hermes demonstrated was clarity of recursive structure.
Its learning loop felt easier to point at. Easier to describe. Easier to believe.
That matters. A system can be powerful and still lose the comparison if the recursive loop is hidden, fragmented, or hard to inspect.
The strongest lesson was not “copy this one exact primitive from Hermes.”
It was this:
If an agent system learns, that learning should be visible, enforceable, and structurally connected to future behavior.
Not implied. Not buried. Not left to operator memory.
Hermes was stronger on legibility. We were stronger on governed operator infrastructure. The gap was not that we had nothing. The gap was that our loop was not explicit enough, and one important piece was missing.
What needed to be rebuilt
Once we stopped arguing from vibes, the missing shape became obvious.
We needed a first-class recursive workflow that could move from:
- reflection
- to classification
- to promotion
- to application
- to verification
- to enforcement when the same issue kept returning
- to visibility so the whole thing could be inspected
That workflow also needed to act on more than one layer.
This is the key part that was not clear enough in the first draft:
The self-improvement system does not only work on soul and skills.
It works across multiple durable layers of the agent stack.
What the recursive workflow can improve
1. Soul
This is the behavioral and prompt layer.
It includes:
- operating stance
- response defaults
- durable behavior rules
- persona and operator discipline
- things that should sit close to the active context window
This is where repeated behavioral corrections turn into stronger guidance.
2. Skills
This is the capability layer.
It includes:
SKILL.mdimprovements- better “when to use” guidance
- helper scripts
- safer patterns for execution
- clearer boundaries for auto-apply vs human review
This matters because some repeated failures are not prompt failures. They are capability-shape failures.
3. Crons
This is the scheduled learning and maintenance layer.
It includes:
- adding missing loops
- re-enabling disabled loops
- changing cadence when review is too slow
- splitting overloaded jobs
- adding watchdogs, weekly promotion passes, or visibility generators
In this rebuild, crons were not just a delivery mechanism. They were part of the self-improvement surface itself.
4. Memory and rules
This is the durable operating memory layer.
It includes:
- reflections
- promoted lessons
- hard rules
- recidivism tracking
- enforced guardrails in active memory
This is where a repeated mistake stops being “something we noticed” and becomes “something the system must now carry forward.”
5. Workflows
This is the cross-cutting orchestration layer.
Some problems are not “fix one skill” problems. They are system-shape problems.
For example:
- how reflection becomes a candidate
- how candidates are reviewed
- how verification works
- how recurring failures escalate
- how visibility is generated
That is not one skill. That is a workflow.
6. Scripts and tooling
This is the enforcement and automation layer.
It includes:
- candidate generators
- review scripts
- report generators
- recidivism enforcement tools
- wrappers and validators
This layer matters because a self-improvement system without executable tooling quickly turns into a note-taking hobby.
7. Process and routing
This is the operational behavior layer.
It includes:
- what gets delegated
- what must be reviewed
- what needs proof before being considered done
- when a repeated miss becomes a structural change
This is where the workflow affects how the agent actually operates, not just what files it edits.
The self-improvement workflow we built
Here is the rebuilt pack, phase by phase.
Phase 1: Daily soul review
We restored the dedicated daily introspection loop.
That means the system now has a daily ritual for reviewing:
- friction
- corrections
- failures
- pushback
- recurring misses
It writes those into durable artifacts instead of leaving them as conversational residue.
Core outputs include:
memory/soul-tracker.mdmemory/reflections.md- daily self-improvement artifacts
- promotion candidates for further action
This restored the missing heartbeat.
Phase 2: Recursive improvement backbone
We built the canonical pipeline:
- capture
- classify
- promote
- apply
- verify
This created the actual spine of the workflow.
Instead of scattered notes, the system now has a standard path for taking a meaningful reflection and turning it into an actionable candidate with a verification path.
Phase 3: Skill evolution path
We created a formal skill-evolution branch.
This means the workflow can now decide that a recurring issue should become:
- a
SKILL.mdpatch - a safer helper script
- a new bounded skill
- a human-gated patch proposal
This matters because not all recurrent failures belong in soul or memory. Some belong in the capability layer.
Phase 4: Visibility layer
We created a recursive visibility layer so the whole system can be inspected.
This gives visibility into:
- candidate funnel states like
PROPOSED,APPLIED, andVERIFIED - promoted rules
- recidivism trends
- escalated cases
- bottlenecks in the loop
This is what turns the system from hidden markdown archaeology into something legible.
Phase 5: Recidivism-driven enforcement
This is the teeth.
If the same failure keeps happening, the system should not just keep writing increasingly disappointed notes to itself.
It should escalate.
That means:
- repeated violations can force promotion into active memory
- promoted rules can escalate into structural candidates if they still do not stick
- the system can move from soft reminder to hard guardrail to real patch path
Without this phase, the rest of the workflow is too soft.
With this phase, self-improvement starts to change future behavior more reliably.
Why this is a workflow, not a skill
This is the part the naming has to get right.
Calling this a skill would undersell it.
A skill is appropriate when the system is teaching a bounded capability.
This is bigger than that.
This rebuilt pack spans:
- soul
- skills
- crons
- memory
- workflows
- scripts
- process
- routing
And the parts depend on each other.
The daily soul review creates signal. The recursive pipeline turns signal into candidates. The skill branch handles capability fixes. The cron layer carries scheduled review and execution. The recidivism layer adds enforcement. The visibility layer makes the whole thing inspectable.
That is workflow territory.
What changed in practice
Before this rebuild, our recursive system was real but fragmented.
After it, the structure is much cleaner:
- there is an explicit daily introspection loop
- there is a canonical promotion pipeline
- there is a skill evolution path
- there is recidivism enforcement
- there is a visibility layer
- there is a clearer answer to the question: what exactly gets improved?
The answer is: not just soul, not just skills.
It can improve any durable operating layer around the model, short of the base model weights themselves.
That is the right frame.
The broader lesson
The real lesson from Hermes was not that another system had better branding.
It was that recursive systems need explicit shape.
If you want an agent to improve over time, you need more than memory. You need:
- signal
- promotion
- enforcement
- visibility
- durable changes to the operating stack
That is what turns “the agent noticed something” into “the system got better.”
And that is why this rebuild matters.
It gave us a recursive workflow that can act on the whole operator stack, not just one narrow layer.
That is the difference between a clever note-taking system and a real self-improvement workflow.