What can you do if your hardware can generate 15,000 token/s?
r/StableDiffusion
•
Generative AI
: Saw this posted from r/Qwen_AI and r/LocalLLM today. I also remember seeing this from a few years ago when they first published their studies, but completely forgot about it. Basically instead of inference on a graphics card where models are loaded onto memory, we burn the model into hardware. Remember CDs? It is cheap to build this compare to GPUs, they are using 6nm chips instead of the latest tech, no memories needed! The biggest downside is you can't swap models, there is no flexibility.