Anyone using Tesla P40 for local LLMs (30B models)?

Hey guys, is anyone here using a Tesla P40 with newer models like Qwen / Mixtral / Llama? RTX 3090 prices are still very high, while P40 is around $250, so I’m considering it as a budget option. Trying to understand real-world usability: how many tokens/sec are you getting on 30B models? is it usable for chat + light coding? how bad does it get with longer context? Thank you! submitted by /u/ScarredPinguin [link] [comments]