Phase 1 is dry-run: the agents propose, log, and reverse nothing because they wrote nothing. Phase 2 is live: the same decisions actually land. In one week, twelve brands flipped from the first to the second. Here's what the transition taught us — most of it about people, not models.
Dry-run is safe. Live is honest.
In dry-run, every proposed row is hypothetical, so it's easy to nod along. The moment writes go live, operators read the same rows completely differently — because now they're real. The biggest behavior change wasn't the agents'. It was operators suddenly reading their own policies closely, because the policies were now load-bearing.
Nobody really reads the dry-run rows until the writes are live. Then they read every one. Live mode is the best documentation review we ever ran.
— Sam Okafor
The patterns that showed up
- Stores that spent two weeks in dry-run flipped to live with almost no surprises. Stores that rushed it found their surprises in production.
- The first live rollback was a confidence event, not a failure — operators who reversed one action early trusted the system more, not less.
- Rate limits mattered more than anyone expected. Capping writes per hour kept the first live day calm.
- The kill switch got pressed twice across twelve stores in week one. Both were precautionary, both cleared in minutes. Exactly what it's for.
What we changed
We made the dry-run-to-live diff explicit: before you flip, you see a summary of what the agents would have done over the last 24 hours, grouped by action and gate. We also slowed the default ramp — live mode starts with tighter rate limits that loosen as the decision_log accrues clean rows. Honesty is better when it arrives at a manageable pace.
The headline lesson: the safe path and the honest path aren't in tension. Dry-run earns the operator's understanding; live earns their trust. Skip the first and the second never arrives.
