SceneEval: Evaluating Semantic Coherence in Text-Conditioned 3D Indoor Scene Synthesis

ArXi:2503.14756v3 Announce Type: replace-cross Despite recent advances in text-conditioned 3D indoor scene generation, there remain gaps in the evaluation of these methods. Existing metrics often measure realism by comparing generated scenes to a set of ground-truth scenes, but they overlook how well scenes follow the input text and capture implicit expectations of plausibility. We present SceneEval, an evaluation framework designed to address these limitations. SceneEval