SafeSci: Safety Evaluation of Large Language Models in Science Domains and Beyond

ArXi:2603.01589v2 Announce Type: replace-cross The success of large language models (LLMs) in scientific domains has heightened safety concerns, prompting numerous benchmarks to evaluate their scientific safety. Existing benchmarks often suffer from limited risk coverage and a reliance on subjective evaluation. To address these problems, we