Engineering2026-04-27· 11 min read

Inside the planner-judge-executor loop.

Three roles. One cycle. Why we separated planning from judging — and why the judge model is intentionally cheaper than the planner.

Magistry Team

Engineering

Most 'AI agent' demos are a single model in a while-loop with tools. It looks magical for ten minutes and becomes unmanageable the moment it can spend money. Magistry runs every cycle through three separated roles — planner, judge, executor — and the separation is the whole design.

Planner: propose, don't act

The planner reads state and proposes actions. That's all it does. It never writes to Shopify or an ad account. It produces candidate decisions with the evidence it used, and hands them on. Because it can't act, you can let it be expensive and creative — the cost of a bad plan is zero until something downstream approves it.

Judge: cheaper on purpose

The judge scores each proposed action against policy and evidence. Counter-intuitively, the judge model is intentionally cheaper than the planner. Judging 'does this action satisfy these gates given this evidence?' is a narrower task than generating the action in the first place. A smaller model with tight policy context is more reliable at the bounded question than a large one improvising.

The planner explores the space of what could be done.
The judge enforces the space of what is allowed to be done.
Splitting them means you can audit and tune the boundary without touching the creativity.

orchestrator trace — one cycletext

cycle#7711 start
  planner → 14 candidate actions (model: large)
  judge   → 9 pass gates, 5 rejected (model: small)
            reject: discount_test blocked (cost tier B, needs A)
  executor→ 9 writes, each logged as a revertible decision row
            kill_switch: off · rate_limit: 9/120
cycle#7711 done (logged to decision_log)

Five of fourteen proposals never happened. They're still logged as rejected rows, with the gate that blocked them — so you can see what the agent wanted to do and why it wasn't allowed. That negative space is some of the most useful data in the system.

Executor: boring by design

The executor does the least interesting and most important work: it writes approved actions, logs each as a decision row the revert executor can undo, respects the rate limiter and advisory locks, and refuses to start at all when the kill switch is on. It has no opinions. An executor with opinions is just a second planner you forgot to govern.

A single model that plans, judges, and acts can't be held accountable for any of the three. Separate them and each becomes auditable.
— Magistry

The loop isn't novel because it's clever. It's useful because it's legible. When something goes wrong, you know which role failed — the plan was bad, the gate was wrong, or the write failed — and you fix that one thing. Monolithic agents make every failure a mystery. This makes every failure an address.

// reading this?

Reading this? You'd like the product.

If the writing resonates, the product probably will too. Same bar, same prose, same refusal to ship something you can't reverse.

Book a 20-min demo

Back to the blog

Dry-run by default · Append-only logs · One-click rollback