How Many Visual Levers Drive Urban Perception? Interventional Counterfactuals via Multiple Localised Edits

ArXi:2604.22103v1 Announce Type: cross Street-view perception models predict subjective attributes such as safety at scale, but remain correlational: they do not identify which localized visual changes would plausibly shift human judgement for a specific scene. We propose a lever-based interventional counterfactual framework that recasts scene-level explainability as a bounded search over structured counterfactual edits. Each lever specifies a semantic concept, spatial, intervention direction, and constrained edit template.