The Generalization Ridge: Information Flow in Natural Language Generation

ArXi:2507.05387v5 Announce Type: replace Transformer-based language models have achieved state-of-the-art performance in natural language generation (NLG), yet their internal mechanisms for synthesizing task-relevant information remain insufficiently understood. While prior studies suggest that intermediate layers often yield generalizable representations than final layers, how this generalization ability emerges and propagates across layers during