pass@1 is a gamble — how ensemble coding enhances AI reliability

You ask Claude to fix a bug. It nails it. You ask it again with slightly different phrasing. It refactors half your module and breaks three unrelated tests. Same model, same task, different result. This is the fundamental problem with AI coding today: pass - the chance a single attempt succeeds - is a gamble. Running the same task multiple times and picking the best result dramatically improves reliability. It's the same principle behind ensemble methods in ML - and recent research confirms it works for code generation too, though it warns that naive consensus can amplify shared mistakes.