AgentLens: Adaptive Visual Modalities for Human-Agent Interaction in Mobile GUI Agents

ArXi:2604.20279v2 Announce Type: cross Mobile GUI agents can automate smarttasks by interacting directly with app interfaces, but how they should communicate with users during execution remains underexplored. Existing systems rely on two extremes: foreground execution, which maximizes transparency but prevents multitasking, and background execution, which s multitasking but provides little visual awareness. Through iterative formative studies, we found that users prefer a hybrid model with just-in-time visual interaction, but the most effective visualization modality depends on the task.