[P] I implemented "Screening Is Enough" (arXiv:2604.01178) in PyTorch and benchmarked it

Last week's paper replaces softmax attention with an absolute threshold mechanism: alpha = [max(1 - r * (1 - cosine_sim), 0)] 2 Keys below the threshold get zeroed out entirely - no global competition, no softmax denominator. Paper claims ~40% fewer params at comparable loss (in their full-scale iso-performance experiments up to 4B params - not iso-parameter comparisons) and 3.2x lower latency at 100K context. I built a PyTorch implementation: Latency (torch.utils.benchmark, RTX 4060 Ti, 8GB VRAM) seq_len Screening nn.