REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations

ArXi:2605.12813v1 Announce Type: cross Large language models (LLMs) achieve strong performance across many tasks but remain vulnerable to hallucinations, motivating the need for realistic adversarial prompts that elicit such failures. We formulate hallucination elicitation as a constrained optimization problem, where the goal is to find semantically coherent adversarial prompts that are equivalent to benign user prompts.