DOne: Decoupling Structure and Rendering for High-Fidelity Design-to-Code Generation

ArXi:2604.01226v1 Announce Type: new While Vision Language Models (VLMs) have shown promise in Design-to-Code generation, they suffer from a "holistic bottleneck-failing to reconcile high-level structural hierarchy with fine-grained visual details, often resulting in layout distortions or generic placeholders. To bridge this gap, we propose DOne, an end-to-end framework that decouples structure understanding from element rendering. DOne