ActFER: Agentic Facial Expression Recognition via Active Tool-Augmented Visual Reasoning

ArXi:2604.08990v1 Announce Type: new Recent advances in Multimodal Large Language Models (MLLMs) have created new opportunities for facial expression recognition (FER), moving it beyond pure label prediction toward reasoning-based affect understanding. However, existing MLLM-based FER methods still follow a passive paradigm: they rely on externally prepared facial inputs and perform single-pass reasoning over fixed visual evidence, without the capability for active facial perception.