How I Built Sally, A Voice-First Accessibility Agent Powered by Gemini
Dev.to AI
•
Generative AI
LLMs
I built a desktop app that lets people control any website using only their voice. You talk, it takes a screenshot, sends it to Gemini 2.5 Flash, gets back a structured action, runs it in the browser, and repeats. The whole time it's narrating what it's doing out loud. Here's how it came together for the Gemini Live Agent Challenge. The Problem Picture this: you can't use a mouse. Maybe you can't use a keyboard either. You might have a repetitive strain injury, a motor impairment, or honestly you might just have a broken wrist. The web doesn't really care.