memory
Local LLM VRAM Calculator
Calculate exact VRAM requirements for running Llama 4, Qwen3, Gemma 4, and DeepSeek-V4 locally, and find the perfect hardware to run it.
check_circle You're on the list!
4-bit (Q4_K_M) offers the best balance of inference speed, low VRAM usage, and minimal reasoning loss.
8,192
2K (Chat)
32K (Docs)
128K (Books/Codebases)
Estimated VRAM Required
5.8
GB
Weights: 4.8 GB
KV
Cache: 0.5 GB
CUDA Context: 0.5 GB
shopping_cart_checkout Recommended Hardware for this Model
*Includes 1.5GB OS buffer & overhead