AI RESEARCH
Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training
arXiv CS.AI
•
ArXi:2602.00747v2 Announce Type: replace-cross Determining an effective data mixture is a key factor in Large Language Model (LLM) pre-