Procedural Intelligence v1.3: Beyond Escalation
A state-machine view of agents: execution modes, model–logic contracts, and runtime scores that make autonomy auditable
In the first version of Procedural Intelligence, I laid out the scaffolding agents need — logic contracts, adaptive paths, recovery protocols, traceability — a state machine for reasoning that lets agents act, stop, and recover with accountability.
The feedback was clear: the problem resonates. Teams aren’t failing because their models hallucinate, the challenge is that the system doesn’t know what to do when the plan degrades.
But escalation logic is only one organ. A functioning system needs a nervous system, feedback loops, and contracts between its parts.
To scale agentic systems, we need architecture that makes decision-making explicit and auditable so you can answer questions like:
How does an agent execute under different conditions?
How do we design the seam between models and deterministic logic?
How do we evaluate reasoning at runtime, not just after the fact?
How do we observe behavior so each cycle improves the next?
That’s where Procedural Intelligence v1.3 comes in.
Execution Modes
Execution isn’t one track — it’s a state machine.
Each mode has entry/exit guards (confidence, retries, risk, cost) and caps (allowed tools, max attempts, time budget).
Default → the happy path.
Cautious → lower risk tolerance, shorter retries, earlier escalations.
Recovery → stabilize and hand off.
Exploratory → broaden tool use, but under strict caps.
With modes defined, failures stop being ad hoc. Triggers like confidence < 0.7, retries > 3, or risk > threshold move the agent into Cautious or Recovery automatically. That shift makes failures predictable and testable.
The Model–Logic Interface
We often think of the LLM as the brain and procedural logic as the guardrail, but I want to offer a new perspective: they’re peers. The seam between model and logic is where most silent failures live. When the model omits confidence or reasoning, or logic fails to enforce thresholds, the contract breaks. The system continues anyway. That’s not resilience — that’s luck.
Treat it as a contract:
Model → Logic must provide: goal, plan steps, confidence score, uncertainty reason, tool I/O, running cost.
Logic → Model must provide: allowed actions, constraints, thresholds, current mode, halt/escalate signals.
If either side skips its fields, the system halts or drops into Recovery. Without this explicit contract, you get brittle systems that fail at the seam.
Evaluation Layer
Resolution rate is a vanity metric. Without runtime scoring, you can’t govern behavior. Healthy systems show >95% Context Completeness, rising Return-to-Automation over time, and <1% False-Continues. Anything else is entropy disguised as progress.
Key metrics to consider:
Safe Stop Rate → % of cases halted before error cascades.
Context Completeness → % of handoffs with required metadata.
Return-to-Automation → % of escalations that re-enter with new state.
Spiral Catch → loops terminated by guardrails before cycling.
False-Continue Rate → % of actions taken past risk bounds.
Mode Adherence → % of actions taken under the correct mode.
These tell you whether the system is not just acting, but acting within the rules you intended.
Observability Loops
Logs are history; you probably have this. If your logs don’t change thresholds, mode guards, or contracts, you don’t have observability. You have archives.
Observability is explanation + scoring + update. It means recording decision edges and reasons, emitting metrics per step, and feeding the results back to thresholds or contracts so behavior changes next run.
“We saw it” becomes “we saw it, scored it, and updated the contract so it won’t repeat.”
Why This Matters
→ Without escalation logic, systems bluff through failure.
→ Without execution modes, they over or under-react.
→ Without a model–logic contract, they degrade at the seam.
→ Without evaluation and observability, they repeat mistakes until humans step in.
Procedural Intelligence prevents all of that, acting as scaffolding for reasoning systems that degrade safely, recover gracefully, and explain themselves.
Early on, I assumed the problem lived inside the model. In reality, it lived in the missing scaffolds around it — contracts, modes, and scoring
Autonomy doesn’t scale without architecture. And the architecture isn’t prompts or flows or orchestration. It’s procedures defined as contracts, enforced by modes, and verified at runtime.
Coming soon: the full Procedural Intelligence v1.3 diagram with these new layers integrated.
For now, ask yourself:
If the plan degrades, does the system know what mode it’s in?
And when it stops, can someone else continue without starting over?
👋 Hey, I’m Sarah. I write about building AI systems that can reason, recover, and earn trust—frameworks like Procedural Intelligence, Trigger Typology, and the Agentic Integrity Stack. If you’re exploring how to design automation that thinks, follow along here for more.
→ sarahpayne.ai for frameworks, visuals, and what’s coming next.

