LayerBoost: Layer-Aware Attention Reduction for Efficient LLMs

ArXi:2604.22050v1 Announce Type: new Transformers are mostly relying on softmax attention, which This work proposes LayerBoost, a layer-aware attention reduction method that selectively modifies the attention mechanism based on the sensitivity of individual transformer layers. It first performs a systematic sensitivity analysis on a pretrained model to identify layers that are critical for maintaining performance.