Ensembling Pruned Attention Heads For Uncertainty-Aware Efficient Transformers

ArXi:2510.18358v2 Announce Type: replace Uncertainty quantification (UQ) is essential for deploying deep neural networks in safety-critical settings. Although methods like Deep Ensembles achieve strong UQ performance, their high computational and memory costs hinder scalability to large models. We