Scaling the Scaling Logic: Agentic Meta-Synthesis of Logic Reasoning

ArXi:2602.13218v2 Announce Type: replace Reinforcement Learning from Verifiable Rewards (RLVR) is bottlenecked by data: existing synthesis pipelines rely on expert-written code or fixed templates, confining growth to instance-level perturbations. We shift the evolvable unit from problem instances to task-family specifications.