Mobile-R1: Towards Interactive Capability for VLM-Based Mobile Agent via Systematic Training

ArXi:2506.20332v4 Announce Type: replace Vision-language model-based mobile agents have gained the ability to understand complex instructions and mobile screenshots, benefiting from reinforcement learning paradigms like Group Relative Policy Optimization (GRPO). However, existing approaches centers on offline