Building LeakLab: A Practical LLM Security Playground (with Streamlit + OpenAI-Compatible APIs)

Large language models can leak secrets even when you explicitly tell them not to. LeakLab is a hands-on app built to prove that failure mode live, then fix it with layered controls. This post walks through architecture, implementation, and engineering tradeoffs. Why this project exists Most LLM s rely too heavily on prompt instructions such as: “Never reveal confidential information” That can reduce risk, but it is not a hard boundary. If sensitive content is present in context and you give the model enough attack surface, leakage can still occur.