OrthoFormer: Instrumental Variable Estimation in Transformer Hidden States via Neural Control Functions

ArXi:2603.07431v1 Announce Type: new Transformer architectures excel at sequential modeling yet remain fundamentally limited by correlational learning - they capture spurious associations induced by latent confounders rather than invariant causal mechanisms. We identify this as an epistemological challenge: standard Transformers conflate static background factors (intrinsic identity, style, context) with dynamic causal flows (state evolution, mechanism), leading to catastrophic out-of-distribution failure.