An agent without an audit is just a leak.
Autonomy without a paper trail means you find out about a bad write in a dashboard, days later. The audit plane is what turns 'the agent did something' into a row you can read, evaluate, and reverse.
// audit plane
Every Magistry write — catalog mutation, ad spend change, customer reply — passes through one append-only plane. Evidence captured, judge score recorded, reversal op pre-stored. Phase-gated, kill-switched, rate-limited.
decision_log : 142 rows ├─ DRAFT 62 ├─ DISCOUNT_TEST 14 ├─ PUBLISH 8 └─ SCALE_WINNER 3 ad_decision_log : 47 rows ├─ ADJUST_BUDGET 22 ├─ AUDIENCE_WRITE 9 ├─ CREATIVE_BRIEF 6 └─ NEGATIVE_TERMS 10 cs_thread_evaluations: 318 sends ├─ judge_pass 302 ├─ judge_hold 11 └─ escalated 5 kill_switch : OFF (all systems live) rate_limiter : 0 ceilings hit advisory_locks : 2 active, 0 queued
// why it exists
Autonomy without a paper trail means you find out about a bad write in a dashboard, days later. The audit plane is what turns 'the agent did something' into a row you can read, evaluate, and reverse.
Catalog, Campaign, CS, Disputes, Social, Researcher — they all share one place for evidence, citations, and reversal ops. One query gets you the whole day, across every system.
Magistry never edits or deletes a row. Reversals create new rows that point to the original. Your audit history is a ledger, not a Wikipedia.
// what it stores
decision_log
Catalog Specialist
Every lifecycle transition for every SKU. From-state, to-state, action, trigger, evidence (jsonb), applied_to_shopify, reversal op.
// schema
ad_decision_log
Campaign Specialist
Every paid-side write — budget changes, bid strategy shifts, audience writes, creative pushes, negative-term additions. Per channel.
// schema
cs_thread_evaluations
CS Specialist
Every draft and send — judge scores per axis (policy, fact, brand_voice, risk), policy citation, supplier signal, language flag, verdict.
// schema
// phase model
On every new account, every new agent, every new policy. Magistry runs the full loop end-to-end but holds back the final mutation. You read the rows, you flag the questionable ones, you decide when to flip the switch.
Trust earned, mode set to live. The executor walks the queue and applies. Same rows, same evidence, but now the mutation actually lands. Rate limits hold, kill switch overrides, advisory locks prevent stomps.
// safety integration
stores.config->>'kill_switch'
Checked at every write. Global or scoped per agent. Flipped ON: all pending mutations halt, queued rows mark as held. No flush, no race.
safety/rate_limiter.check_action()
Per-action budgets like draft_max_per_week=20. Exceeded: row is logged with status=RATE_LIMITED and waits — never blasts through the ceiling.
safety/locks.store_lock(store, key)
Prevents concurrent executors stomping each other. Two cycles can plan in parallel; only one can execute against the same store key at a time.
apply_policy() pre-publish
Generated copy is screened against your registered trademark list before any catalog write. Hit → row marked NEEDS_REVIEW, never silently pushed.
// reversal model
Every applied row carries its own reversal op — pre-computed at plan time, not improvised at rollback time. Click the row, hit rollback, the executor pushes the inverse mutation and stamps a new row pointing at the original.
Append-only means your audit history grows; nothing gets overwritten. The pair of rows — original + reversal — is the story.
row #84193 reversible action : ADJUST_BUDGET delta : +12% status : APPLIED reversal_op : ADJUST_BUDGET δ=-12% row #84194 (reverses #84193) action : ADJUST_BUDGET delta : -12% status : APPLIED reverses : 84193 triggered_by: operator (jane@store.com) — audit chain stays intact —
// audit plane
Every decision row, across every specialist, in one queryable place. Read it, score it, reverse it, ship the digest to leadership — autonomy you can defend in a board meeting.
Append-only · Reversible · Built to be read