llm

Can I really code on my PC? Gemma4 26B A4B vs Qwen3.6 35B A3B Coding benchmark

Luigi

20 Apr 2026 • 2 min read

Both models are Mixture of Experts (MoE) architectures. Qwen3.6 35B A3B outperforms Gemma4 26B A4B on coding/agent tasks significantly, while Gemma4 has advantages in multimodal capabilities and smaller file size.

Model Specifications

Specification	Gemma4 26B A4B	Qwen3.6 35B A3B
Architecture	MoE (128 experts)	MoE (8 experts)
Total Parameters	~26B	35B
Active Parameters	3.8-4B	3B
File Size (Q6)	23.3GB	31.8GB
Context Length	256K	256K-1M (KV compression)
Multimodal	Yes (text, image, video)	Yes
License	Apache 2.0	Apache 2.0

Benchmark Results

Benchmark	Qwen3.6-35B	Gemma4-26B	Delta
SWE-Bench Verified	73.4	17.4	+56.0
Terminal-Bench 2.0	51.5	42.9 (31B)	+8.6
MCP Tool Use	37.0	18.1	+2x
AIME 2026	88.3%	N/A	-
LiveCodeBench	80.0% (31B)	N/A	-
Arena ELO	1452 (31B)	#6 rank	-
Source: @namcios, @AIHeadlineJP

Real-World User Testing

Coding/Vibe Coding Tests

@hosiken's game logic test:

Gemma4 26B A4B: Fixed bugs in ~4 iterations, produced working code
Qwen3.6 35B: Hallucinated identifiers, broke after error fixing
@stevibe's vibe coding challenge:
Same stack: Unsloth Q6_K_XL + llama.cpp
Both models tested side-by-side
Results: "Gemma 4 fixed the bugs in ~4 iterations; Qwen hallucinated identifiers"
@taziku_co's comparison:
31.8GB Qwen3.6 vs 23.3GB Gemma4
Same vibe coding test
Note: "Benchmarks are less important than real-world tests for production adoption"

Speed Performance

Hardware	Qwen3.6-35B	Gemma4-26B
M3 Ultra (90K ctx)	21.7 tok/s	-
M3 Max (DFlash)	47→70 tok/s	-
Mac Mini 128GB	100 tok/s	-
M4 Pro 48GB	81.6 tok/s	73.2 tok/s
RTX 4090 (Q4)	5-10 tok/s	5-10 tok/s
DGX Spark	50+ tok/s	80 tok/s
Sources: @Zimo41650079726, @superoo7, @ainopara

Key Strengths

Qwen3.6 35B A3B

Superior coding/agent performance (SWE-Bench +56 points)
Better MCP tool integration (2x score)
Lower active parameters (3B vs 3.8-4B)
Native Ollama support for Claude Code/OpenCode
Runs on 6GB VRAM with quantization
1M context with KV compression (10.7GB→6.9GB)

Gemma4 26B A4B

Smaller file size (23.3GB vs 31.8GB)
True multimodal (text + image + video)
Better for chat/creative tasks
Japanese language quality praised
Easier to run on limited VRAM (16GB)
Works on edge devices (smartphone)

Known Issues

Gemma4 26B A4B

Tool-call format needs JSON sanitization in vLLM/Ollama/llama.cpp
Multiturn generation issues reported
Hallucination more frequent than dense models
Context compression can cause "rewind" behavior
Some users report it thinks current date is 2023/2024

Qwen3.6 35B A3B

Hallucinates identifiers in complex coding tasks
High RAM usage (needs 32GB+ for smooth use)
Gets hot on MacBooks

Community Sentiment

r/LocalLLaMA:

"Qwen3.6 crushes Gemma 4 on my tests" (2.1k upvotes)
"Local model finally reaches Claude-like coding quality" (1.4k upvotes)
Real user quotes:
@word_and_number: "Qwen3.5 couldn't reauth my Google accounts, but Gemma4 26B did. Are Google OAuth docs in the Gemma training set?"
@yamamori_yamori: "Qwen3.6 isn't as good as people say, Gemini 4 31B is actually pretty good too"
@VibeBloxDev: "Qwen3.6 response is good but Qwen3.5 27B answer quality seems better"

Verdict

For Coding/Agent Tasks: Qwen3.6 35B A3B wins

Significantly better SWE-Bench (+56 pts), MCP tool use (2x), and agent workflows.

Can I really code on my PC? Gemma4 26B A4B vs Qwen3.6 35B A3B Coding benchmark

Luigi

Model Specifications

Benchmark Results

Real-World User Testing

Coding/Vibe Coding Tests

Speed Performance

Key Strengths

Qwen3.6 35B A3B

Gemma4 26B A4B

Known Issues

Gemma4 26B A4B

Qwen3.6 35B A3B

Community Sentiment

Verdict

For Coding/Agent Tasks: Qwen3.6 35B A3B wins

For Multimodal/Edge Use: Gemma4 26B A4B wins

For Limited VRAM (16GB): Gemma4 26B A4B wins

23.3GB file vs 31.8GB, easier to fit with quantization.

Recommendations