ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

ArXi:2505.19897v3 Announce Type: replace-cross Large Language Models (LLMs) have extended their impact beyond Natural Language Processing, substantially fostering the development of interdisciplinary research. Recently, various LLM-based agents have been developed to assist scientific discovery progress across multiple aspects and domains. Among these, computer-using agents, capable of interacting with operating systems as humans do, are paving the way to automated scientific problem-solving and addressing routines in researchers' workflows.