I Made 4 AI Agents Debate Each Other. Here's Why You Should Never Trust a Single LLM Answer Again.

GPT-4 gave me a confident answer last year. Precise numbers. Named researchers. A specific clinical study with exact findings. It was entirely fabricated. Not partially wrong. Not slightly off. The study did not exist. The researchers were not real. Every single number was invented - delivered with the same calm, authoritative tone the model uses when it is reciting actual facts. And that is the problem. The Real Issue Is Not Hallucination Every developer knows LLMs hallucinate. That is old news. The real issue is there is no signal.