LLMs know their vulnerabilities: Uncover Safety Gaps through Natural Distribution Shifts

ArXi:2410.10700v3 Announce Type: replace-cross Safety concerns in large language models (LLMs) have gained significant attention due to their exposure to potentially harmful data during pre-