Product2026-03-29· 10 min read

What we learned shipping Phase 2 to the first 12 stores.

Dry-run is safe. Live is honest. The patterns we saw when 12 brands flipped to live writes inside the same week.

Sam Okafor

Product Lead

Phase 1 is dry-run: the agents propose, log, and reverse nothing because they wrote nothing. Phase 2 is live: the same decisions actually land. In one week, twelve brands flipped from the first to the second. Here's what the transition taught us — most of it about people, not models.

Dry-run is safe. Live is honest.

In dry-run, every proposed row is hypothetical, so it's easy to nod along. The moment writes go live, operators read the same rows completely differently — because now they're real. The biggest behavior change wasn't the agents'. It was operators suddenly reading their own policies closely, because the policies were now load-bearing.

Nobody really reads the dry-run rows until the writes are live. Then they read every one. Live mode is the best documentation review we ever ran.
— Sam Okafor

The patterns that showed up

Stores that spent two weeks in dry-run flipped to live with almost no surprises. Stores that rushed it found their surprises in production.
The first live rollback was a confidence event, not a failure — operators who reversed one action early trusted the system more, not less.
Rate limits mattered more than anyone expected. Capping writes per hour kept the first live day calm.
The kill switch got pressed twice across twelve stores in week one. Both were precautionary, both cleared in minutes. Exactly what it's for.

What we changed

We made the dry-run-to-live diff explicit: before you flip, you see a summary of what the agents would have done over the last 24 hours, grouped by action and gate. We also slowed the default ramp — live mode starts with tighter rate limits that loosen as the decision_log accrues clean rows. Honesty is better when it arrives at a manageable pace.

The headline lesson: the safe path and the honest path aren't in tension. Dry-run earns the operator's understanding; live earns their trust. Skip the first and the second never arrives.

// reading this?

Reading this? You'd like the product.

If the writing resonates, the product probably will too. Same bar, same prose, same refusal to ship something you can't reverse.

Book a 20-min demo

Back to the blog

Dry-run by default · Append-only logs · One-click rollback