Incentivizing Temporal-Awareness in Egocentric Video Understanding Models

ArXi:2603.27184v1 Announce Type: new Multimodal large language models (MLLMs) have recently shown strong performance in visual understanding, yet they often lack temporal awareness, particularly in egocentric settings where reasoning depends on the correct ordering and evolution of events. This deficiency stems in part from