Prompt Optimization Is a Coin Flip: Diagnosing When It Helps in Compound AI Systems

ArXi:2604.14585v1 Announce Type: new Prompt optimization in compound AI systems is statistically indistinguishable from a coin flip: across 72 optimization runs on Claude Haiku (6 methods $\times$ 4 tasks $\times$ 3 repeats), 49% score below zero-shot; on Amazon Nova Lite, the failure rate is even higher. Yet on one task, all six methods improve over zero-shot by up to $+6.8$ points.