OmniTrace: A Unified Framework for Generation-Time Attribution in Omni-Modal LLMs

ArXi:2604.13073v1 Announce Type: cross Modern multimodal large language models (MLLMs) generate fluent responses from interleaved text, image, audio, and video inputs. However, identifying which input sources each generated statement remains an open challenge. Existing attribution methods are primarily designed for classification settings, fixed prediction targets, or single-modality architectures, and do not naturally extend to autoregressive, decoder-only models performing open-ended multimodal generation. We