AI RESEARCH
Fast SceneScript: Fast and Accurate Language-Based 3D Scene Understanding via Multi-Token Prediction
arXiv CS.CV
•
ArXi:2512.05597v2 Announce Type: replace Recent perception-generalist approaches based on language models have achieved state-of-the-art results across diverse tasks, including 3D scene layout estimation and 3D object detection, via unified architecture and interface. However, these approaches rely on autoregressive next-token prediction, which is inherently slow. In this work, we