AI RESEARCH

Audio-sync Video Instance Editing with Granularity-Aware Mask Refiner

arXiv CS.CV

ArXi:2512.10571v3 Announce Type: replace Recent advancements in video generation highlight that realistic audio-visual synchronization is crucial for engaging content creation. However, existing video editing methods largely overlook audio-visual synchronization and lack the fine-grained spatial and temporal controllability required for precise instance-level edits. In this paper, we propose AVI-Edit, a framework for audio-sync video instance editing. We propose a granularity-aware mask refiner that iteratively refines coarse user-provided masks into precise instance-level regions.