Skip to main content
The Operator Log
Engineering2026-04-27· 11 min read

Inside the planner-judge-executor loop.

Three roles. One cycle. Why we separated planning from judging — and why the judge model is intentionally cheaper than the planner.

Ines Park

Co-founder, Engineering

Most 'AI agent' demos are a single model in a while-loop with tools. It looks magical for ten minutes and becomes unmanageable the moment it can spend money. Magistry runs every cycle through three separated roles — planner, judge, executor — and the separation is the whole design.

Planner: propose, don't act

The planner reads state and proposes actions. That's all it does. It never writes to Shopify or an ad account. It produces candidate decisions with the evidence it used, and hands them on. Because it can't act, you can let it be expensive and creative — the cost of a bad plan is zero until something downstream approves it.

Judge: cheaper on purpose

The judge scores each proposed action against policy and evidence. Counter-intuitively, the judge model is intentionally cheaper than the planner. Judging 'does this action satisfy these gates given this evidence?' is a narrower task than generating the action in the first place. A smaller model with tight policy context is more reliable at the bounded question than a large one improvising.

  • The planner explores the space of what could be done.
  • The judge enforces the space of what is allowed to be done.
  • Splitting them means you can audit and tune the boundary without touching the creativity.
orchestrator trace — one cycletext
cycle#7711 start
  planner → 14 candidate actions (model: large)
  judge   → 9 pass gates, 5 rejected (model: small)
            reject: PRICE_DROP blocked by margin_gate
  executor→ 9 writes, each with reverse_op
            kill_switch: armed · rate_limit: 9/120
cycle#7711 done (logged: decision_log #84201–84209)

Five of fourteen proposals never happened. They're still logged as rejected rows, with the gate that blocked them — so you can see what the agent wanted to do and why it wasn't allowed. That negative space is some of the most useful data in the system.

Executor: boring by design

The executor does the least interesting and most important work: it writes approved actions, stamps each with its reverse op, respects the rate limiter and advisory locks, and stops instantly if the kill switch trips. It has no opinions. An executor with opinions is just a second planner you forgot to govern.

A single model that plans, judges, and acts can't be held accountable for any of the three. Separate them and each becomes auditable.

Ines Park

The loop isn't novel because it's clever. It's useful because it's legible. When something goes wrong, you know which role failed — the plan was bad, the gate was wrong, or the write failed — and you fix that one thing. Monolithic agents make every failure a mystery. This makes every failure an address.

// reading this?

Reading this? You'd like the product.

If the writing resonates, the product probably will too. Same bar, same prose, same refusal to ship something you can't reverse.

Dry-run by default · Append-only logs · One-click rollback