PromptSuite: A Task-Agnostic Framework for Multi-Prompt Generation

ArXi:2507.14913v5 Announce Type: replace Evaluating LLMs with a single prompt has proven unreliable, with small changes leading to significant performance differences. However, generating the prompt variations needed for a robust multi-prompt evaluation is challenging, limiting its adoption in practice. To address this, we