A second model that only checks the first one's work sounds redundant until you watch it catch things humans don't. This week: three brand-voice drifts Reply Judge surfaced last month that no operator had noticed.
Three catches
- A creeping 'we sincerely apologize' formality in a brand whose anchor is warm and plain — invisible reply by reply, obvious in aggregate.
- An over-eager discount offer pattern that drifted past policy during a busy week.
- A subtle shift to passive voice in refund explanations that made the brand sound evasive.
None of these were wrong answers. They were right answers in a slightly wrong voice — exactly the failure mode a single helpfulness-optimized model can't see in itself. The judge isn't redundant; it's a different question asked by a different model.
Drift is invisible one reply at a time and obvious across two hundred. The judge reads the two hundred.
