AI RESEARCH

Experiments or Outcomes? Probing Scientific Feasibility in Large Language Models

arXiv CS.AI

ArXi:2604.18786v1 Announce Type: cross Scientific feasibility assessment asks whether a claim is consistent with established knowledge and whether experimental evidence could or refute it. We frame feasibility assessment as a diagnostic reasoning task in which, given a hypothesis, a model predicts feasible or infeasible and justifies its decision. We evaluate large language models (LLMs) under controlled knowledge conditions (hypothesis-only, with experiments, with outcomes, or both) and probe robustness by progressively removing portions of the experimental and/or outcome context.