How I Built a Voice AI That Takes Real DOM Actions on Websites

Dev.to AI
Generative AI AI Tools

Every voice AI tool I evaluated did the same thing: listen to speech, convert to text, send to an LLM, return audio. Essentially a chatbot with a microphone. But I wanted something different. I wanted voice AI that could actually do things on a website - click buttons, fill forms, navigate pages. The Problem with Voice Chatbots Here's what most "voice AI" tools do: User speaks Speech-to-text converts it Text goes to an LLM LLM generates a response Text-to-speech reads it back That's it. The AI talks back, but it doesn't do anything. It can't click your "Book Appointment" button.