Large Reasoning Models Fail to Follow Instructions During Reasoning: A Benchmark Study

Together AI Blog
Generative AI AI Research

ReasonIF finds frontier LRMs fail to follow reasoning instructions >75% of the time;