Can I really code on my PC? Gemma4 26B A4B vs Qwen3.6 35B A3B Coding benchmark
Both models are Mixture of Experts (MoE) architectures. Qwen3.6 35B A3B outperforms Gemma4 26B A4B on coding/agent tasks significantly, while Gemma4 has advantages in multimodal capabilities and smaller file size.
Model Specifications
| Specification | Gemma4 26B A4B | Qwen3.6 35B A3B |
|---|---|---|
| Architecture | MoE (128 experts) | MoE (8 experts) |
| Total Parameters | ~26B | 35B |
| Active Parameters | 3.8-4B | 3B |
| File Size (Q6) | 23.3GB | 31.8GB |
| Context Length | 256K | 256K-1M (KV compression) |
| Multimodal | Yes (text, image, video) | Yes |
| License | Apache 2.0 | Apache 2.0 |
Benchmark Results
| Benchmark | Qwen3.6-35B | Gemma4-26B | Delta |
|---|---|---|---|
| SWE-Bench Verified | 73.4 | 17.4 | +56.0 |
| Terminal-Bench 2.0 | 51.5 | 42.9 (31B) | +8.6 |
| MCP Tool Use | 37.0 | 18.1 | +2x |
| AIME 2026 | 88.3% | N/A | - |
| LiveCodeBench | 80.0% (31B) | N/A | - |
| Arena ELO | 1452 (31B) | #6 rank | - |
| Source: @namcios, @AIHeadlineJP |
Real-World User Testing
Coding/Vibe Coding Tests
@hosiken's game logic test:
- Gemma4 26B A4B: Fixed bugs in ~4 iterations, produced working code
- Qwen3.6 35B: Hallucinated identifiers, broke after error fixing
@stevibe's vibe coding challenge: - Same stack: Unsloth Q6_K_XL + llama.cpp
- Both models tested side-by-side
- Results: "Gemma 4 fixed the bugs in ~4 iterations; Qwen hallucinated identifiers"
@taziku_co's comparison: - 31.8GB Qwen3.6 vs 23.3GB Gemma4
- Same vibe coding test
- Note: "Benchmarks are less important than real-world tests for production adoption"
Speed Performance
| Hardware | Qwen3.6-35B | Gemma4-26B |
|---|---|---|
| M3 Ultra (90K ctx) | 21.7 tok/s | - |
| M3 Max (DFlash) | 47→70 tok/s | - |
| Mac Mini 128GB | 100 tok/s | - |
| M4 Pro 48GB | 81.6 tok/s | 73.2 tok/s |
| RTX 4090 (Q4) | 5-10 tok/s | 5-10 tok/s |
| DGX Spark | 50+ tok/s | 80 tok/s |
| Sources: @Zimo41650079726, @superoo7, @ainopara |
Key Strengths
Qwen3.6 35B A3B
- Superior coding/agent performance (SWE-Bench +56 points)
- Better MCP tool integration (2x score)
- Lower active parameters (3B vs 3.8-4B)
- Native Ollama support for Claude Code/OpenCode
- Runs on 6GB VRAM with quantization
- 1M context with KV compression (10.7GB→6.9GB)
Gemma4 26B A4B
- Smaller file size (23.3GB vs 31.8GB)
- True multimodal (text + image + video)
- Better for chat/creative tasks
- Japanese language quality praised
- Easier to run on limited VRAM (16GB)
- Works on edge devices (smartphone)
Known Issues
Gemma4 26B A4B
- Tool-call format needs JSON sanitization in vLLM/Ollama/llama.cpp
- Multiturn generation issues reported
- Hallucination more frequent than dense models
- Context compression can cause "rewind" behavior
- Some users report it thinks current date is 2023/2024
Qwen3.6 35B A3B
- Hallucinates identifiers in complex coding tasks
- High RAM usage (needs 32GB+ for smooth use)
- Gets hot on MacBooks
Community Sentiment
r/LocalLLaMA:
- "Qwen3.6 crushes Gemma 4 on my tests" (2.1k upvotes)
- "Local model finally reaches Claude-like coding quality" (1.4k upvotes)
Real user quotes: - @word_and_number: "Qwen3.5 couldn't reauth my Google accounts, but Gemma4 26B did. Are Google OAuth docs in the Gemma training set?"
- @yamamori_yamori: "Qwen3.6 isn't as good as people say, Gemini 4 31B is actually pretty good too"
- @VibeBloxDev: "Qwen3.6 response is good but Qwen3.5 27B answer quality seems better"
Verdict
For Coding/Agent Tasks: Qwen3.6 35B A3B wins
Significantly better SWE-Bench (+56 pts), MCP tool use (2x), and agent workflows.
For Multimodal/Edge Use: Gemma4 26B A4B wins
True multimodal support, smaller file, works on phones/edge devices.
For Limited VRAM (16GB): Gemma4 26B A4B wins
23.3GB file vs 31.8GB, easier to fit with quantization.
Recommendations
- Coding agents: Use Qwen3.6-35B-A3B with OpenCode/Claude Code
- Multimedia projects: Use Gemma4-26B-A4B for image/video understanding
- Limited hardware: Gemma4-26B-A4B (Q4 fits in ~16GB VRAM)
- Maximum context: Qwen3.6 supports 1M tokens with KV compression