AI RESEARCH
Adaptive Instruction Composition for Automated LLM Red-Teaming
arXiv CS.LG
•
ArXi:2604.21159v1 Announce Type: cross Many approaches to LLM red-teaming leverage an attacker LLM to discover jailbreaks against a target. Several of them task the attacker with identifying effective strategies through trial and error, resulting in a semantically limited range of successes. Another approach discovers diverse attacks by combining crowdsourced harmful queries and tactics into instructions for the attacker, but does so at random, limiting effectiveness. This article