To run deepseek v4 flash how much max vram we need? 175 gb or 320gb?
r/LocalLLaMA
•
Open Source AI
AI Tools
As far as i know the weight is of 160gb + 9.6gb needed for max 1M token window + 5 gigs overhead = 175gb vram. But vllm and othere sources said "To use the full 1M context, you need 4x A100 80G" --> thats a 320gb vram? Am i missing something? Sources: Vllm blog of deployment 9.6 gig is also sourced from vllm blog page + official model page says it take 10% k cache of what 3.2 used to take submitted by /u/9r4n4y [link] [comments.