Benchmarking Self-Hosted LLMs for Offensive Security

Dev.to AI
Generative AI Robotics

This article explores the effectiveness of self-hosted Large Language Models (LLMs) in offensive security scenarios, specifically benchmarking local models against the OWASP Juice Shop. Using a minimal harness and basic HTTP tools, the study evaluates models like gemma4:31b, qwen3.5:27b, and devstral-small-2:24b across challenges involving SQL injection, JWT manipulation, and path traversal.