ID-LoRA: Identity-Driven Audio-Video Personalization with In-Context LoRA

ArXi:2603.10256v1 Announce Type: cross Existing video personalization methods preserve visual likeness but treat video and audio separately. Without access to the visual scene, audio models cannot synchronize sounds with on-screen actions; and because classical voice-cloning models condition only on a reference recording, a text prompt cannot redirect speaking style or acoustic environment.