Acceptable prompt processing speed for you?

r/LocalLLaMA
Generative AI

I am currently optimising some ancient hardware to run qwen3 (4xV100s) but the lack of flash attention means that at longer contexts the processing starts to really slow down. For agentic coding work what processing speeds and contexts lengths do you consider as acceptable or good? submitted by /u/Simple_Library_2700 [link] [comments]