What is the exact Qwen3-Coder Ollama model name / tag to use in 2026?

The correct model name is qwen3-coder (hyphenated). The recommended tag for most users is qwen3-coder:30b, which is the same as qwen3-coder:latest and qwen3-coder:30b-a3b-q4_K_M. Pull it with: ollama pull qwen3-coder:30b

Does Qwen3-Coder run on 8GB VRAM?

Not practically. The smallest Qwen3-Coder tag requires ~19GB for model weights alone. On 8GB VRAM, nearly the entire model offloads to system RAM, resulting in 0.5–2 tok/s. For 8GB VRAM, use qwen2.5-coder:7b instead.

What is the difference between qwen3-coder:30b and qwen3-coder:30b-a3b-q4_K_M?

Nothing — they are identical. Both tags share the same file hash (06c1097efce0) and 19GB download size. The :30b tag is simply the convenient alias for :30b-a3b-q4_K_M.

Is Qwen3-Coder better than Qwen2.5-Coder for daily coding tasks?

For autocomplete and function-level code generation, Qwen2.5-Coder:32B and Qwen3-Coder:30B are roughly equivalent. For multi-turn agentic tasks (automated bug fixing, multi-file refactors), Qwen3-Coder wins clearly — over 70% SWE-bench Verified vs ~69.6% for Qwen2.5-Coder:32B. Qwen3-Coder also doubles context to 256K.

What is qwen3-coder-next and how is it different from qwen3-coder?

Qwen3-Coder-Next is a separate Ollama model (ollama pull qwen3-coder-next) built on Qwen3-Next-80B-A3B-Base with hybrid attention and MoE. It achieves 70%+ on SWE-bench Verified through large-scale agentic RL training, making it better than the standard qwen3-coder:30b for multi-turn agentic tasks while using similar VRAM.

How do I use Qwen3-Coder's 256K context window on Ollama without running out of VRAM?

Set OLLAMA_KV_CACHE_TYPE=q8_0 before running Ollama to halve the KV cache memory overhead. Then specify context: ollama run qwen3-coder:30b --num_ctx 65536. On a single RTX 5090 (24GB), a practical ceiling is 64–128K context with q8_0 KV cache enabled.

Ollama Guide Qwen3-Coder VRAM + Tags Updated June 2, 2026

Qwen3-Coder Ollama: Complete Tag List,
Model Sizes & Install Guide

code

Author

Himansh

Published

April 25, 2026

schedule

12 min read

TheAITechPulse.com

Qwen3-Coder running locally on Ollama terminal showing code generation — Qwen3-Coder is Alibaba's most capable local coding model family — and it runs entirely on your own hardware.

If you've been searching for Qwen3-Coder Ollama tags, you're not alone. The query "qwen3-coder ollama tags 2026" has been one of the fastest-growing searches in the Ollama ecosystem this spring, and for good reason: Alibaba shipped Qwen3-Coder quietly but powerfully, and the Ollama library page doesn't exactly hold your hand. There are 10 available tags across three model sizes — 30B, 480B, and the newer Next variant — and picking the wrong one can leave you staring at an OOM error or wasting hours on a 290GB download you didn't need. This guide covers every available tag, exact pull commands, precise VRAM requirements per quantization, and a direct benchmark comparison against Qwen2.5-Coder so you know whether it's actually worth switching.

Quick Answer: Best Qwen3-Coder tag for Ollama (june 2026)

16–20GB VRAM: ollama pull qwen3-coder:30b (19GB, 256K context, Q4_K_M default)
32GB+ VRAM: ollama pull qwen3-coder:30b-a3b-q8_0 (32GB, higher accuracy)
Multi-GPU / datacenter: ollama pull qwen3-coder:480b (290GB, 480B total params)
8GB VRAM (budget): Stick to qwen2.5-coder:7b — Qwen3-Coder doesn't have a sub-10GB tag yet.

menu_book Table of Contents

Does Qwen3-Coder actually exist on Ollama?
All Available Tags & Sizes (Full List)
VRAM Requirements by Tag
Exact Pull Commands
Qwen3-Coder-Next: The Hidden Gem
Qwen3-Coder vs Qwen2.5-Coder: Which to Use?
Benchmarks
Setup Guide: 5-Minute Install
Decision Tree: Which Tag for Your Hardware?
Frequently Asked Questions

bolt TL;DR — Qwen3-Coder Ollama at a Glance

Yes, it's real: Qwen3-Coder is live on Ollama with 4.5M+ downloads. The official library tag is qwen3-coder.
Two main sizes: 30B (MoE, 3B active params, 19GB download) and 480B (MoE, 35B active, 290GB). The 30B is the practical local choice.
Context window: 256K tokens natively on all tags — a major leap over Qwen2.5-Coder's 128K.
vs Qwen2.5-Coder: Qwen3-Coder-Next beats it on SWE-bench Verified (70%+ vs ~69.6%). For pure HumanEval on 8GB hardware, Qwen2.5-Coder:7B is still king.
Minimum hardware: RTX 4080 or RTX 5080 (16GB VRAM) for the 30B Q4_K_M tag. Not suitable for 8GB cards.

VRAM values assume Q4_K_M quantization and 8K context. 256K context adds ~8–12GB VRAM for KV cache.

Loading products...

Does Qwen3-Coder actually exist on Ollama?

Yes, and it's been there since late 2025. The confusion comes from the naming gap: people search for qwen3-coder ollama and land on the Qwen2.5-Coder page, or find unofficial community uploads, and wonder if the real thing exists. It does. The official library entry is ollama.com/library/qwen3-coder and it has over 4.5 million pulls as of june 2026 — that's a very active model.

Qwen3-Coder is Alibaba's dedicated agentic coding model family, built on top of the Qwen3 architecture. The "3" in the name refers to the third generation of Qwen models, not the parameter count. The key architectural features that separate it from Qwen2.5-Coder are the Mixture-of-Experts (MoE) design — which activates only a fraction of parameters per token — a native 256K token context window, and training data that skews heavily toward multi-turn agentic workflows and tool-calling rather than pure code completion.

Why people miss it: Searching "qwen3 coder" (with a space) in Ollama returns different results than "qwen3-coder" (hyphenated). The official tag is hyphenated. Also note that qwen3-coder-next is a separate, slightly newer variant — more on that below.

4.5M+

Ollama pulls (june 2026)

256K

Native context window (tokens)

Available tags on Ollama library

19GB

30B model download size (Q4_K_M)

All Available Qwen3-Coder Ollama Tags (Full List, june 2026)

There are currently 10 official tags on the Qwen3-Coder library page. They fall into two size families — 30B and 480B — each available in multiple quantizations. Here's the complete picture:

Ollama Tag	Model	Download Size	Context	Notes
`qwen3-coder:latest` `qwen3-coder:30b`	30B-A3B (MoE)	19GB	256K	Default. Q4_K_M. Same file — both tags point to `06c1097efce0`. ← Start here
`qwen3-coder:30b-a3b-q4_K_M`	30B-A3B (MoE)	19GB	256K	Explicit Q4_K_M. Identical to `:30b` tag. Use this if you want to be precise.
`qwen3-coder:30b-a3b-q8_0`	30B-A3B (MoE)	32GB	256K	Higher accuracy, ~2% quality gain over Q4. Needs 32GB+ VRAM or 32GB RAM + GPU offload.
`qwen3-coder:30b-a3b-fp16`	30B-A3B (MoE)	61GB	256K	Full precision. For researchers. Requires multi-GPU or Apple M3/M4 Max with 64GB+ unified memory.
`qwen3-coder:480b` `qwen3-coder:480b-a35b-q4_K_M`	480B-A35B (MoE)	290GB	256K	Datacenter-class. 35B active params. Not realistic on consumer hardware — requires NVLink multi-GPU.
`qwen3-coder:480b-a35b-q8_0`	480B-A35B (MoE)	510GB	256K	Maximum quality 480B. Practically only for cloud inference or >512GB RAM server setups.
`qwen3-coder:480b-a35b-fp16`	480B-A35B (MoE)	960GB	256K	Full precision 480B. ~1TB download. H100 cluster territory.
`qwen3-coder:480b-cloud`	480B-A35B (cloud)	— (no local size)	256K	Cloud-routed tag. Runs via Ollama's hosted inference — no local GPU required. Rate limits apply.

Tag data sourced directly from ollama.com/library/qwen3-coder/tags, june 2, 2026. The 30B and latest tags share the same underlying file hash (06c1097efce0).

Short answer to "what is the best Qwen3-Coder tag for Ollama?": For consumer GPUs, use qwen3-coder:30b. It's Q4_K_M by default, downloads 19GB, and runs on RTX 4080/5080 (16GB VRAM). Everything above 30B is datacenter territory unless you have an M4 Max or multi-GPU rig.

VRAM Requirements per Tag

The MoE architecture of Qwen3-Coder means the numbers here behave differently from dense models like Qwen2.5-Coder. All 30B parameters are loaded into memory (because MoE still loads all expert weights), but only 3B activate per token during inference. This gives you faster generation speeds than a typical 30B dense model, but the full model weight still needs to fit somewhere — VRAM or system RAM via CPU offloading.

Tag	GPU VRAM (Inference)	For 8K Context	For 256K Context	Recommended GPU
`:30b` / `:30b-a3b-q4_K_M`	~19GB	~20GB total	~30–32GB (KV cache grows)	RTX 4080 16GB (tight) → RTX 5080 16GB or better
`:30b-a3b-q8_0`	~32GB	~33GB total	~43GB+	RTX 4090 24GB (with offload) or dual RTX 5080
`:30b-a3b-fp16`	~61GB	~62GB	~72GB+	M4 Max 64GB unified / dual RTX 4090 / A100 80GB
`:480b` / `:480b-a35b-q4_K_M`	~290GB	~292GB	~302GB+	4× H100 80GB or equivalent
`:480b-cloud`	0 (cloud-routed)	N/A	N/A	Any machine with internet connection

Context window VRAM overhead assumes FP16 KV cache (default). Use OLLAMA_KV_CACHE_TYPE=q8_0 to roughly halve the context overhead — this is important if you want to use the full 256K window on consumer hardware.

Best Ollama Models for 16GB RAM Laptop 2026: If you are running a laptop with only 16GB of unified memory or 16GB system RAM, the Qwen3-Coder 30B tag will be too slow due to aggressive swap. For a 16GB RAM laptop, the best ollama models for coding are dense 7B-9B models like qwen2.5-coder:7b or gemma2:9b.

RTX 4080 16GB users — read this first: The :30b tag technically fits at 19GB model weight, but you only have 16GB VRAM. Ollama will automatically offload layers to system RAM. You'll get 3–8 tok/s with 64GB+ DDR5 RAM, which is usable for testing but not great for daily agentic coding. The sweet spot for the 30B tag is a card with 20–24GB VRAM (RTX 5080, RTX 4090, or Apple M-series with 48GB+ unified memory).

          Pro tip — use Q8 KV cache to unlock the 256K context window: Set
          OLLAMA_KV_CACHE_TYPE=q8_0 before launching Ollama. This compresses the KV cache from FP16 to
          INT8, cutting context overhead roughly in half. For the 30B tag on an RTX 4090, this means you can push to
          ~100K context without OOM instead of being limited to ~30–40K. Add it to your .bashrc:
          export OLLAMA_KV_CACHE_TYPE=q8_0
        

Exact Ollama Pull Commands for Every Tag

Here are the copy-paste ready commands. The model name format is always qwen3-coder:<tag>.

For most users (RTX 4080/5080, 16–20GB VRAM)

ollama pull qwen3-coder:30b

Downloads 19GB. This is the :latest tag and the recommended starting point.

Higher accuracy (RTX 4090 or 32GB+ VRAM)

ollama pull qwen3-coder:30b-a3b-q8_0

Downloads 32GB. ~2% quality improvement over Q4_K_M. Worth it if you have the VRAM headroom.

Full precision research (M4 Max 64GB / dual 4090)

ollama pull qwen3-coder:30b-a3b-fp16

Downloads 61GB. Maximum fidelity to the original model weights. For benchmarking or fine-tune prep.

Cloud-routed 480B (no GPU required)

ollama pull qwen3-coder:480b-cloud

Routes to Ollama's hosted 480B inference. Subject to rate limits and internet dependency.

Run after pulling

ollama run qwen3-coder:30b

ollama run qwen3-coder:30b "Write a Python async web scraper with error handling"

Check VRAM usage after loading

ollama run qwen3-coder:30b --verbose
          # Linux GPU stats:
          nvidia-smi
          # macOS unified memory:
          # Activity Monitor → Memory tab → GPU memory

Qwen3-Coder-Next: The Hidden Gem

Separate from the main qwen3-coder library, there's also qwen3-coder-next on Ollama — a distinct model, not just a tag variant. Qwen3-Coder-Next is built on the Qwen3-Next-80B-A3B-Base architecture with hybrid attention and MoE. Despite having only 3 billion active parameters per token (out of 80B total), it was specifically agentic-trained at scale with reinforcement learning on verifiable coding tasks.

The headline stat: Qwen3-Coder-Next achieves over 70% on SWE-Bench Verified using the SWE-Agent scaffold, which is genuinely competitive with larger proprietary models. For context, a full 480B Qwen3-Coder reaches similar territory — the Next variant essentially delivers frontier agentic coding performance at a fraction of the active compute cost. On SWE-bench Pro (the harder benchmark using longer multi-turn agentic tasks), it outperforms or matches several models with 10–20× more active parameters.

ollama pull qwen3-coder-next

Qwen3-Coder-Next vs Qwen3-Coder:30b — which should you use? If your primary use case is multi-turn agentic coding (automated bug fixing, repository-level refactoring, tool-calling workflows), Qwen3-Coder-Next is the better pick. Its RL-based agentic training means it handles multi-step planning and tool feedback loops far better than the standard 30B model, which was tuned more for code completion and direct instruction-following.

Qwen3-Coder vs Qwen2.5-Coder: Which to Use in 2026?

The question I see most in the GSC queries: should I switch from Qwen2.5-Coder to Qwen3-Coder on Ollama? The honest answer depends on your VRAM and use case. Here's the direct comparison:

Factor	Qwen2.5-Coder:7B	Qwen2.5-Coder:32B	Qwen3-Coder:30B	Qwen3-Coder-Next
Min VRAM	~4.6GB	~19GB	~19GB	~20GB
Context window	128K	128K	256K	256K
HumanEval (official)	88.4%	~92.7%	Not separately published	Competitive (focus on SWE)
SWE-bench Verified	~50% (community)	~69.6%	Matches 32B range	70%+
Architecture	Dense	Dense	MoE (3B active)	MoE hybrid (3B active)
Best for	8GB GPUs, daily completions	Accuracy-first, 20GB VRAM	Long-context (>128K), agentic	Multi-turn agents, SWE tasks
Ollama pull command	`qwen2.5-coder:7b`	`qwen2.5-coder:32b`	`qwen3-coder:30b`	`qwen3-coder-next`

Bottom line: If you have 8GB VRAM, Qwen2.5-Coder:7B is still the right call — Qwen3-Coder has no sub-10GB tag. If you have 16–20GB VRAM and need >128K context or multi-turn agentic workflows, Qwen3-Coder:30b or Qwen3-Coder-Next is the clear upgrade. For pure HumanEval completion tasks at the 20GB tier, Qwen2.5-Coder:32B (dense) and Qwen3-Coder:30B (MoE) are approximately neck-and-neck on code quality; the Qwen3 advantage shows in agentic tasks and long context.

Best GGUF Coding Model 2026: Qwen vs DeepSeek vs CodeGemma 14B

When searching for the best gguf coding model 2026 qwen deepseek codegemma 14b, the landscape is competitive. Here is how Qwen3-Coder fits in:

Qwen3-Coder (30B MoE): The top choice for context length (256K) and complex agentic tasks. Highly recommended for multi-file generation.
DeepSeek-Coder V2: Excellent for logic and math, but often requires more VRAM for its larger mixture-of-experts setups unless heavily quantized.
CodeGemma 14B: A highly efficient dense model that is great for standard code completion tasks, but falls short of Qwen's MoE architecture in complex refactoring.

Clarification: Ollama Qwen3.5 Models List 2026

A common search term we see is "ollama qwen3.5 models list 2026". Note that as of june 2026, Qwen3.5 does not exist yet. Users searching for this are likely confusing the older Qwen2.5 series with the latest Qwen3 models detailed on this page. If you want the absolute latest qwen coder model ollama 2026, stick to the Qwen3-Coder tags listed above.

Benchmarks: Qwen3-Coder vs the Field

Benchmark sources: HumanEval figures from official Qwen model card releases. SWE-bench Verified scores from published technical reports (Qwen3-Coder-Next technical report, arXiv 2603.00729). SWE-bench Pro results from the same source. Community comparisons on third-party benchmarking platforms may vary based on prompt format and scaffold used.

Model (Ollama tag)	HumanEval ↑	SWE-bench Verified ↑	Context	VRAM (Q4)
Qwen3-Coder-Next	Strong (see SWE)	70%+ (SWE-Agent scaffold)	256K	~20GB
Qwen2.5-Coder:32B	92.7% (official)	~69.6% (community)	128K	~19GB
Qwen3-Coder:30B	Matches 32B range	Competitive with 32B	256K	~19GB
Qwen2.5-Coder:14B	~87.3% (official)	~59% (community)	128K	~8.5GB
Qwen2.5-Coder:7B	88.4% (official)	~50% (community)	128K	~4.6GB
DeepSeek-Coder-V2-Lite:16B	90.2% (official)	~56% (community)	64K	~9.2GB

Benchmark conditions vary across evaluations. HumanEval pass@1 is 0-shot. SWE-bench Verified uses the SWE-Agent scaffold unless otherwise noted. Community-reported scores may differ from official model card figures based on prompt formatting.

The key takeaway from these numbers: Qwen3-Coder's biggest lead over Qwen2.5-Coder isn't on HumanEval (where the 32B dense model still holds its own), but on the agentic SWE-bench benchmarks that measure real-world code repair across multi-file repositories. If your workflow involves answering "why is this GitHub issue failing?" rather than "complete this function," Qwen3-Coder's training focus pays off.

Setup Guide: Run Qwen3-Coder in 5 Minutes

Assuming you already have Ollama installed (version 0.5+ recommended). If not, step one covers that.

Install Ollama (skip if already installed):
curl -fsSL https://ollama.com/install.sh | sh
Set the KV cache env var before anything else:
export OLLAMA_KV_CACHE_TYPE=q8_0

This halves context memory overhead — critical for 256K context on consumer hardware. Add to .bashrc or .zshrc to make it permanent.
Pull the 30B model (recommended for 16–20GB VRAM):
ollama pull qwen3-coder:30b

19GB download. Allow 10–30 minutes depending on your internet speed.
Run a quick test:
ollama run qwen3-coder:30b "Write a Python async function that retries HTTP requests with exponential backoff"
Connect to VS Code via Continue.dev:
Install Continue extension → Settings → Provider: Ollama → Model: qwen3-coder:30b → Set context length: 32768 (or higher if VRAM allows).
Verify memory usage:
nvidia-smi # Linux # macOS: Activity Monitor → Memory → GPU Memory Used

Pro tip for agentic workflows: Pair Qwen3-Coder:30B with Aider for repository-level coding. Qwen3-Coder was explicitly trained on agentic data and handles Aider's multi-file edit format well. Use: aider --model ollama/qwen3-coder:30b

Decision Tree: Which Qwen3-Coder Tag for Your Hardware?

You have 8GB VRAM or less: Qwen3-Coder is not for you yet. Use qwen2.5-coder:7b — it's still excellent at 88.4% HumanEval on 4.6GB VRAM.
You have 16GB VRAM (RTX 4080, RTX 5080): qwen3-coder:30b with CPU offload. Set OLLAMA_KV_CACHE_TYPE=q8_0 and keep context under 64K for smooth generation.
You have 20–24GB VRAM (RTX 4090, A5000): qwen3-coder:30b fits fully in VRAM. You can push to ~128K context comfortably.
You have 32GB+ VRAM (dual 3090 / A6000 / M4 Max 64GB): qwen3-coder:30b-a3b-q8_0 — the quality bump is worth it at this tier.
You prioritize agentic multi-turn tasks over raw code completion: qwen3-coder-next — same VRAM footprint as 30B, better SWE-bench scores.
You need 480B without buying GPUs: qwen3-coder:480b-cloud — cloud-routed, no download, rate-limited.
You need maximum HumanEval scores at 20GB VRAM: qwen2.5-coder:32b (dense) is still competitive here — Qwen3-Coder's wins are on agentic/SWE benchmarks, not raw pass@1.

🛠️ Recommended VS Code setup (20GB VRAM): Use qwen3-coder-next as your chat/agentic model via Continue.dev, and keep qwen2.5-coder:7b running in parallel as the autocomplete model. You get the agentic quality of the Next variant for complex tasks while the 7B handles fast inline completions without latency.

Frequently Asked Questions

Sources: Ollama library (ollama.com/library/qwen3-coder/tags, June 2, 2026), Qwen3-Coder-Next Technical Report (arXiv 2603.00729), Qwen2.5-Coder official release notes, Second Talent Qwen benchmark analysis (June 2026). Updated June 2, 2026. — Himansh, TheAITechPulse.com

Need a GPU Laptop for Qwen3-Coder?

Find laptops with 16GB+ VRAM that can run the Qwen3-Coder:30B tag natively — no CPU offload required.

Try the Laptop Finder →

About the Author

Himansh is the founder of TheAITechPulse, where he analyzes AI tools, productivity software, and emerging tech for practical business use.

He focuses on real-world testing, ROI-driven evaluations, and actionable implementation guides for small businesses and solo founders.

👤 More about Himansh ✉️ Get in touch

Qwen3-Coder Ollama: Complete Tag List,Model Sizes & Install Guide

Does Qwen3-Coder actually exist on Ollama?

All Available Qwen3-Coder Ollama Tags (Full List, june 2026)

VRAM Requirements per Tag

Exact Ollama Pull Commands for Every Tag

For most users (RTX 4080/5080, 16–20GB VRAM)

Higher accuracy (RTX 4090 or 32GB+ VRAM)

Full precision research (M4 Max 64GB / dual 4090)

Cloud-routed 480B (no GPU required)

Run after pulling

Check VRAM usage after loading

Qwen3-Coder-Next: The Hidden Gem

Qwen3-Coder vs Qwen2.5-Coder: Which to Use in 2026?

Best GGUF Coding Model 2026: Qwen vs DeepSeek vs CodeGemma 14B

Clarification: Ollama Qwen3.5 Models List 2026

Benchmarks: Qwen3-Coder vs the Field

Setup Guide: Run Qwen3-Coder in 5 Minutes

Decision Tree: Which Qwen3-Coder Tag for Your Hardware?

Frequently Asked Questions

About the Author

Qwen3-Coder Ollama: Complete Tag List,
Model Sizes & Install Guide