CAMAL: Improving Attention Alignment and Faithfulness with Segmentation Masks

ArXi:2605.08325v1 Announce Type: cross Many vision datasets now provide segmentation masks in addition to annotated images to a wide range of tasks. In this work, we propose Class Activation Map Attention Learning (CAMAL), an efficient and scalable method that utilizes segmentation masks to improve attention alignment and faithfulness in vision models. Specifically, attention alignment refers to the degree to which a model's attention aligns with ground-truth discriminative regions, while attention faithfulness refers to the degree to which a model's attention influences its decision.