Are AI hallucination reported scores basically meaningless right now?

A lot of reports claim specific hallucination rates for models. But the numbers don’t really line up across studies. Some say low. Others show much higher rates. Found an interesting report that tries to make sense of it - comparing results across OpenAI, Anthropic, and Google and shows how much the methodology changes the outcome.