EgoCross: Benchmarking Multimodal Large Language Models for Cross-Domain Egocentric Video Question Answering

ArXi:2508.10729v2 Announce Type: replace-cross Recent advances in Multimodal Large Language Models (MLLMs) have significantly pushed the frontier of egocentric video question answering (EgocentricQA). However, existing benchmarks and studies are mainly limited to common daily activities such as cooking and cleaning. In contrast, real-world deployment inevitably encounters domain shifts, where target domains differ substantially in both visual style and semantic content. To bridge this gap, we