Can I really code on my PC? Gemma4 26B A4B vs Qwen3.6 35B A3B Coding benchmark

Both models are Mixture of Experts (MoE) architectures. Qwen3.6 35B A3B outperforms Gemma4 26B A4B on coding/agent tasks significantly, while Gemma4 has advantages in multimodal capabilities and smaller file size.

Model Specifications

Specification Gemma4 26B A4B Qwen3.6 35B A3B
Architecture MoE (128 experts) MoE (8 experts)
Total Parameters ~26B 35B
Active Parameters 3.8-4B 3B
File Size (Q6) 23.3GB 31.8GB
Context Length 256K 256K-1M (KV compression)
Multimodal Yes (text, image, video) Yes
License Apache 2.0 Apache 2.0

Benchmark Results

Benchmark Qwen3.6-35B Gemma4-26B Delta
SWE-Bench Verified 73.4 17.4 +56.0
Terminal-Bench 2.0 51.5 42.9 (31B) +8.6
MCP Tool Use 37.0 18.1 +2x
AIME 2026 88.3% N/A -
LiveCodeBench 80.0% (31B) N/A -
Arena ELO 1452 (31B) #6 rank -
Source: @namcios, @AIHeadlineJP

Real-World User Testing

Coding/Vibe Coding Tests

@hosiken's game logic test:

  • Gemma4 26B A4B: Fixed bugs in ~4 iterations, produced working code
  • Qwen3.6 35B: Hallucinated identifiers, broke after error fixing
    @stevibe's vibe coding challenge:
  • Same stack: Unsloth Q6_K_XL + llama.cpp
  • Both models tested side-by-side
  • Results: "Gemma 4 fixed the bugs in ~4 iterations; Qwen hallucinated identifiers"
    @taziku_co's comparison:
  • 31.8GB Qwen3.6 vs 23.3GB Gemma4
  • Same vibe coding test
  • Note: "Benchmarks are less important than real-world tests for production adoption"

Speed Performance

Hardware Qwen3.6-35B Gemma4-26B
M3 Ultra (90K ctx) 21.7 tok/s -
M3 Max (DFlash) 47→70 tok/s -
Mac Mini 128GB 100 tok/s -
M4 Pro 48GB 81.6 tok/s 73.2 tok/s
RTX 4090 (Q4) 5-10 tok/s 5-10 tok/s
DGX Spark 50+ tok/s 80 tok/s
Sources: @Zimo41650079726, @superoo7, @ainopara

Key Strengths

Qwen3.6 35B A3B

  • Superior coding/agent performance (SWE-Bench +56 points)
  • Better MCP tool integration (2x score)
  • Lower active parameters (3B vs 3.8-4B)
  • Native Ollama support for Claude Code/OpenCode
  • Runs on 6GB VRAM with quantization
  • 1M context with KV compression (10.7GB→6.9GB)

Gemma4 26B A4B

  • Smaller file size (23.3GB vs 31.8GB)
  • True multimodal (text + image + video)
  • Better for chat/creative tasks
  • Japanese language quality praised
  • Easier to run on limited VRAM (16GB)
  • Works on edge devices (smartphone)

Known Issues

Gemma4 26B A4B

  • Tool-call format needs JSON sanitization in vLLM/Ollama/llama.cpp
  • Multiturn generation issues reported
  • Hallucination more frequent than dense models
  • Context compression can cause "rewind" behavior
  • Some users report it thinks current date is 2023/2024

Qwen3.6 35B A3B

  • Hallucinates identifiers in complex coding tasks
  • High RAM usage (needs 32GB+ for smooth use)
  • Gets hot on MacBooks

Community Sentiment

r/LocalLLaMA:

  • "Qwen3.6 crushes Gemma 4 on my tests" (2.1k upvotes)
  • "Local model finally reaches Claude-like coding quality" (1.4k upvotes)
    Real user quotes:
  • @word_and_number: "Qwen3.5 couldn't reauth my Google accounts, but Gemma4 26B did. Are Google OAuth docs in the Gemma training set?"
  • @yamamori_yamori: "Qwen3.6 isn't as good as people say, Gemini 4 31B is actually pretty good too"
  • @VibeBloxDev: "Qwen3.6 response is good but Qwen3.5 27B answer quality seems better"

Verdict

For Coding/Agent Tasks: Qwen3.6 35B A3B wins

Significantly better SWE-Bench (+56 pts), MCP tool use (2x), and agent workflows.

For Multimodal/Edge Use: Gemma4 26B A4B wins

True multimodal support, smaller file, works on phones/edge devices.

For Limited VRAM (16GB): Gemma4 26B A4B wins

23.3GB file vs 31.8GB, easier to fit with quantization.

Recommendations

  1. Coding agents: Use Qwen3.6-35B-A3B with OpenCode/Claude Code
  2. Multimedia projects: Use Gemma4-26B-A4B for image/video understanding
  3. Limited hardware: Gemma4-26B-A4B (Q4 fits in ~16GB VRAM)
  4. Maximum context: Qwen3.6 supports 1M tokens with KV compression