The loop is deliberately split into three roles so each is auditable on its own. The planner can be expensive and creative because it can't act; the judge is intentionally cheaper because enforcing a bounded policy is a narrower task than generating a plan; the executor is boring by design.
The three roles
- Planner — reads state, proposes candidate actions with the evidence used. Never writes.
- Judge — scores each candidate 0.0–1.0 against policy gates and the evidence chain.
- Executor — applies actions that clear the threshold, stamping each with its reverse op.
A cycle trace
cycle#7711
planner → 14 candidates (model: large)
judge → 9 pass, 5 rejected (model: small)
reject: PRICE_DROP blocked by margin_gate
executor→ 9 writes, each with reverse_op
kill_switch: armed · rate_limit: 9/120Thresholds per action
The judge threshold is not global — it's per action type. Soft, reversible actions clear at a lower score; expensive, sticky ones (price moves) require a higher score and Tier A cost confidence. Rejected candidates are still logged, with the gate that blocked them.
