AI RESEARCH

Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training

arXiv CS.AI

ArXi:2602.00747v2 Announce Type: replace-cross Determining an effective data mixture is a key factor in Large Language Model (LLM) pre-