llama.cpp benchmarks on AMD Ryzen 7 7700
Inspired by this reddit post, here my results
# HIP_VISIBLE_DEVICES=1 llama-bench --model /models/Qwen3-4B-IQ4_NL.gguf -ngl 99 --flash-attn --no-mmap
ggml_cuda_init: failed to initialize ROCm: no ROCm-capable device is detected
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3 4B IQ4_NL - 4.5 bpw | 2.21 GiB | 4.02 B | ROCm | 99 | pp512 | 101.36 ± 0.92 |
| qwen3 4B IQ4_NL - 4.5 bpw | 2.21 GiB | 4.02 B | ROCm | 99 | tg128 | 14.50 ± 0.02 |
| qwen3 8B Q4_K - Medium | 4.68 GiB | 8.19 B | ROCm | 99 | pp512 | 77.23 ± 0.73 |
| qwen3 8B Q4_K - Medium | 4.68 GiB | 8.19 B | ROCm | 99 | tg128 | 9.23 ± 0.11 |
| qwen3 8B IQ3_XXS - 3.0625 bpw | 3.25 GiB | 8.19 B | ROCm | 99 | pp512 | 16.86 ± 0.02 |
| qwen3 8B IQ3_XXS - 3.0625 bpw | 3.25 GiB | 8.19 B | ROCm | 99 | tg128 | 12.61 ± 0.05 |
| qwen3 14B Q4_K - Medium | 8.38 GiB | 14.77 B | ROCm | 99 | pp512 | 39.89 ± 0.27 |
| qwen3 14B Q4_K - Medium | 8.38 GiB | 14.77 B | ROCm | 99 | tg128 | 4.55 ± 0.19 |
| qwen3 14B IQ4_NL - 4.5 bpw | 7.95 GiB | 14.77 B | ROCm | 99 | pp512 | 30.66 ± 0.13 |
| qwen3 14B IQ4_NL - 4.5 bpw | 7.95 GiB | 14.77 B | ROCm | 99 | tg128 | 5.01 ± 0.01 |
| qwen3moe 30B.A3B Q4_K - Medium | 17.28 GiB | 30.53 B | ROCm | 99 | pp512 | 117.95 ± 1.76 |
| qwen3moe 30B.A3B Q4_K - Medium | 17.28 GiB | 30.53 B | ROCm | 99 | tg128 | 19.97 ± 0.14 |
| glm4 9B Q4_K - Medium | 5.73 GiB | 9.40 B | ROCm | 99 | pp512 | 66.16 ± 0.46 |
| glm4 9B Q4_K - Medium | 5.73 GiB | 9.40 B | ROCm | 99 | tg128 | 8.21 ± 0.02 |
| gemma3 12B Q4_K - Medium | 6.79 GiB | 11.77 B | ROCm | 99 | pp512 | 48.50 ± 0.26 |
| gemma3 12B Q4_K - Medium | 6.79 GiB | 11.77 B | ROCm | 99 | tg128 | 5.74 ± 0.05 |
| gemma3n E4B Q4_K - Medium | 3.94 GiB | 6.87 B | ROCm | 99 | pp512 | 108.83 ± 1.36 |
| gemma3n E4B Q4_K - Medium | 3.94 GiB | 6.87 B | ROCm | 99 | tg128 | 13.56 ± 0.08 |
| gemma3 27B Q3_K - Medium | 12.51 GiB | 27.01 B | ROCm | 99 | pp512 | 13.79 ± 0.05 |
| gemma3 27B Q3_K - Medium | 12.51 GiB | 27.01 B | ROCm | 99 | tg128 | 3.08 ± 0.04 |
| gemma3 27B Q4_K - Medium | 15.40 GiB | 27.01 B | ROCm | 99 | pp512 | 20.54 ± 0.16 |
| gemma3 27B Q4_K - Medium | 15.40 GiB | 27.01 B | ROCm | 99 | tg128 | 2.70 ± 0.01 |
llama-bench --model /models/Mistral-Small-3.2-24B-Instruct-2506-IQ4_NL.gguf -ngl 99 --flash-attn --no-mmap
ggml_cuda_init: failed to initialize ROCm: no ROCm-capable device is detected
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| llama 13B IQ4_NL - 4.5 bpw | 12.54 GiB | 23.57 B | ROCm | 99 | pp512 | 19.22 ± 0.07 |
| llama 13B IQ4_NL - 4.5 bpw | 12.54 GiB | 23.57 B | ROCm | 99 | tg128 | 3.03 ± 0.00 |
build: f667f1e (1)