FAGER: Factually Grounded Evaluation and Refinement of Text-to-Image Models

ArXi:2605.19111v1 Announce Type: cross Existing text-to-image (T2I) evaluation metrics mainly assess whether generated images align with information explicitly stated in the prompt, but often fail to capture factual requirements that are implicit, externally grounded, or identity-defining. As a result, they are not well suited for evaluating factual correctness in prompts involving scientific knowledge, historical facts, products, or culture-specific concepts.