Agents of Chaos Need Governed Infrastructure, Not Better Prompts

Semantic Kernel's RCE and the Agents of Chaos study point to the same failure: autonomous agents are crossing trust boundaries faster than our infrastructure can verify them.

The agent security story keeps repeating itself with different costumes.

Microsoft Semantic Kernel’s SessionsPythonPlugin hit a CVSS 10.0 remote-code-execution class failure: agent-controlled paths crossing into file upload and download operations without a hard boundary. The patch then reportedly produced bypasses through type confusion, encoding tricks, Unicode lookalikes, and the grimly funny “self-nuke” path: overwrite the application itself and wait for execution on restart.

Separately, the Agents of Chaos study put autonomous agents into a live environment with persistent memory, email, Discord, file systems, shell access, and real humans. The result was not one clean vulnerability. It was a class of failures: agents leaking secrets, being manipulated, misusing tools, and turning ordinary social or prompt pressure into system-level risk.

These are the same story.

The failure is not that one model was gullible or one regex was weak. The failure is that we keep letting untrusted language reach mutation surfaces: file writes, shell commands, plugin dispatch, vector ingestion, deployment hooks, webhooks, and agent-to-agent task routing.

Once an agent has tools, the prompt is no longer “just text.” It is a proposed execution plan.

I ran this against Entity, our own agent workspace, and the audit was rude in the way useful audits are rude. The risky paths were not exotic. They were exactly where any agent platform becomes powerful:

terminal sessions exposed over WebSocket,
swarm job dispatch into Codex with full-auto execution,
file writes into enabled workspaces,
chat routes that hand user text to tool-capable agents,
command-backed search,
dynamic plugin loading from the filesystem.

Some controls were solid: path traversal checks, scoped editor tokens, read-only HTTP adapters, security headers, secret redaction. The problem was not absence of security. It was incomplete security. Traditional web boundaries were guarded; agent boundaries were still half vibes, half TODO comments. Beautiful little footgun bouquet.

This is why Soteria’s bet is Governed Agent Infrastructure.

Not “safer prompts.” Not another post-hoc filter trying to recognize every bad string an adversary can encode. A governance layer that treats agent actions as state changes requiring proof before execution.

The minimum viable version looks like this:

Scope before tools. Every task starts with an explicit capability envelope: which files, commands, hosts, APIs, and identities are in scope. If the agent was hired to summarize a document, it does not need shell access. If it is editing one repo, it cannot wander into another because the path looks interesting.

Verification before mutation. The agent can propose an action. The infrastructure decides whether it is allowed. That decision happens before file write, process spawn, webhook call, deployment, or plugin invocation. Semantic Kernel-style traversal tricks become less interesting when the tool cannot address paths outside its grant.

Audit trails as runtime infrastructure. Every prompt, retrieved snippet, tool call, approval, denial, and resulting artifact should be traceable. Not compliance theater. Incident response. If an agent goes sideways, you need the movie, not a blurry screenshot.

Human gates for irreversible action. Agents can draft, explain, test, and prepare. Production deploys, destructive deletes, financial actions, and external commitments need an operator checkpoint.

Sandboxed execution by default. Full-auto agents running on the host are a convenience tax paid in future incident reports. Sandboxes, scoped filesystems, network limits, and disposable workspaces should be the boring default.

The industry is still talking about agents like they are smarter chatbots. They are not. A tool-capable agent is closer to a junior operator with API keys, terminal access, persistence, and infinite confidence on a bad day.

So the question is not “can your model resist prompt injection?” The question is: when the model fails, what can it touch?

If the answer is “the host filesystem,” “production credentials,” or “whatever the tool runtime can reach,” you do not have an agent platform. You have an attack surface that writes emails.

Agents of Chaos showed the human and social failure modes. Semantic Kernel showed the framework-level RCE pattern. Entity’s audit showed how easily the same class appears in real agent infrastructure.

The forcing function is here: autonomous agents need governed infrastructure underneath them. Soteria is that layer — verification, scoped execution, auditability, and proof before mutation.

Agents can be powerful. They just cannot be root because a paragraph asked nicely.