Retromorphic Testing with Hierarchical Verification for Hallucination Detection in RAG

ArXi:2603.27752v1 Announce Type: new Large language models (LLMs) continue to hallucinate in retrieval-augmented generation (RAG), producing claims that are uned by or conflict with the retrieved context. Detecting such errors remains challenging when faithfulness is evaluated solely with respect to the retrieved context. Existing approaches either provide coarse-grained, answer-level scores or focus on open-domain factuality, often lacking fine-grained, evidence-grounded diagnostics. We present RT4CHART, a retromorphic testing framework for context-faithfulness assessment.