Shorter, but Still Trustworthy? An Empirical Study of Chain-of-Thought Compression

ArXi:2604.04120v1 Announce Type: new Long chain-of-thought (Long-CoT) reasoning models have motivated a growing body of work on compressing reasoning traces to reduce inference cost, yet existing evaluations focus almost exclusively on task accuracy and token savings. Trustworthiness properties, whether acquired or reinforced through post-