I built an npm library to test AI chatbots with Playwright — here's why normal matchers don't work

Dev.to AI
Generative AI AI Safety

If you're building a product with an AI chatbot, you've probably run into this: await expect ( response ). toContainText ( ' The Pro plan costs $49/month ' ); This breaks constantly. LLMs never return the exact same string twice. The problem Traditional matchers assume deterministic output. AI responses are: Semantically equivalent but textually different every run Sometimes helpful, sometimes hallucinating Hard to validate with toEqual or toContainText You end up either skipping the assertion entirely, or writing brittle string checks that fail on every deploy.