Towards Policy-Adaptive Image Guardrail: Benchmark and Method

ArXi:2603.01228v2 Announce Type: replace Accurate rejection of sensitive or harmful visual content, i.e., harmful image guardrail, is critical in many application scenarios. This task must continuously adapt to the evolving safety policies and content across various domains and over time. However, traditional classifiers, confined to fixed categories, require frequent re