Improve the interaction with Stream AI Responses

Dev.to AI
Generative AI

I had a Spring Boot API talking to AI providers, and at first it did the most obvious thing: send the prompt, wait for the model to finish, and then return the full response as JSON. It worked. But it also felt wrong. When you are dealing with AI-generated text, waiting several seconds for a complete response is a pretty bad experience. The model is already producing tokens progressively, but the API was hiding that and making the client wait for everything. So I decided to fix that and add proper streaming. This post is about that change. Not a giant rewrite.