Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers

ArXi:2604.21700v1 Announce Type: cross The growing application of large language models (LLMs) in safety-critical domains has raised urgent concerns about their security. Many recent studies have nstrated the feasibility of backdoor attacks against LLMs. However, existing methods suffer from three key shortcomings: explicit trigger patterns that compromise naturalness, unreliable injection of attacker-specified payloads in long-form generation, and incompletely specified threat models that obscure how backdoors are delivered and activated in practice.