Hierarchical Latent Structures in Data Generation Process Unify Mechanistic Phenomena across Scale

ArXi:2603.06592v1 Announce Type: cross Contemporary studies have uncovered many puzzling phenomena in the neural information processing of Transformer-based language models. Building a robust, unified understanding of these phenomena requires disassembling a model within the scope of its