IDSelect: A RL-Based Cost-Aware Selection Agent for Video-based Multi-Modal Person Recognition

ArXi:2602.18990v2 Announce Type: replace Video-based person recognition achieves robust identification by integrating face, body, and gait. However, current systems waste computational resources by processing all modalities with fixed heavyweight ensembles regardless of input complexity. To address these limitations, we propose IDSelect, a reinforcement learning-based cost-aware selector that chooses one pre-trained model per modality per-sequence to optimize the accuracy-efficiency trade-off.