Local ai that feels as fast as frontier.

r/LocalLLaMA
AI Hardware

A thought occured to me a little bit ago when I was installing a voice model for my local AI. The model i chose was personaplex a model made by Nvidia which featured full duplex interactions. What that means is it listens while you speak and then replies the second you are done. The user experience was infinitely better than a normal STT model. So why dont we do this with text? it takes me a good 20 seconds to type my local assistant the message and then it begins processing then it replies. that is all time we could absolrb by using text streaming.