More Agents Improve Math Problem Solving but Adversarial Robustness Gap Persists

ArXi:2511.07112v2 Announce Type: replace-cross When LLM agents work together, they seem to be powerful than a single LLM in mathematical question answering. However, are they also robust to adversarial inputs? We investigate this question using adversarially perturbed math questions. These perturbations include punctuation noise with three intensities (10%, 30%, 50%), plus real-world and human-like typos (WikiTypo, R2