llm

Opencode Local LLM test with Nemotron-3-Nano-30B-A3B vs Qwen3-Coder-30B-A3B vs gpt-oss-20b-mxfp4

Luigi

18 Dec 2025 • 2 min read

Tested the new Nvidia model, Nemotron 3 Nano 30B A3B, focusing on its performance in local coding and interaction tasks. The model was evaluated using a quantized version, and the tests were conducted in CPU mode due to hardware limitations. The user compared the model's performance with other models such as Qwen3 30B A3B and GPT OSS 20B, using benchmarks and real-world interactions.

Key Findings

The Nemotron 3 Nano 30B A3B model demonstrated solid performance in basic interaction tasks, particularly when using tools like bash, read, write, edit, and others. These tools allowed the model to interact with the system effectively, although it was noted that the model is not powerful enough for handling large codebases. The model was tested in a variety of scenarios, including date formatting, system information retrieval, and code conversion tasks.

In one test, the model correctly formatted the current date in the Y-M-D format by using a combination of tools to fetch and write the date to a file. Another test involved retrieving the Linux version, which the model handled successfully using its available tools (but lost in the thinking part during the demo)

Benchmark Results

The user conducted several benchmark tests using LLM eval simple, a tool designed for analyzing LLMs with specific prompts. The results showed that Nemotron 3 Nano 30B A3B performed similarly to GPT OSS 20B in terms of accuracy and time consumption. Both models achieved a similar number of correct answers, with Nemotron taking slightly more time.

In coding-related tasks, the model showed competence in filling in code functions and converting code from JavaScript to Rust. However, it was slower compared to models like Qwen3 30B A3B and GPT OSS 20B. In a sentiment analysis test, the model incorrectly altered a user comment, which was not part of the original prompt, indicating a potential issue with prompt adherence.

Performance Comparison

When compared to other models, Nemotron 3 Nano 30B A3B was found to be on par with GPT OSS 20B in terms of accuracy. However, models like Qwen3 30B A3B and its 4B version were faster, especially in tasks involving code generation and processing. The user concluded that while Neotron 3 Nano 30B A3B is a capable model, it is not a strong alternative for serious coding tasks. Instead, it is best suited for local interactions that do not require heavy computational power or extensive code handling.

Conclusion

The Nemotron 3 Nano 30B A3B model is a viable option for users seeking local, internet-independent models for basic interaction tasks. While it may not be the fastest or most powerful, it offers a good balance of functionality and accessibility. For those who need more advanced coding capabilities, other models such as Qwen3 or GPT OSS are recommended. Overall, the model is a solid choice for lightweight, local use cases where simplicity and offline functionality are prioritized.