Nvidia introduced LLM Llama-3.1-Nemotron-70B. High-quality Llama 3.1 fine tune for thinking
Finetune is based on the RLHF method (especially REINFORCE) and shows a good result for reasoning and logic tasks.
The new model ranks high in the Arena Hard metric, which includes 500 challenging user requests, mostly logic, riddles, reasoning and math tasks. In these tasks, this model performs better than the 405B size Llama-3.1 or the May 13 gpt-4o version.
At the same time, the model was not trained to write code, so the model performs 3.7% worse than just Llama-3.1-70B.
The context size is the same as in Llama 3.1 and is 128k tokens.
Model card: https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
gguf files: https://huggingface.co/bartowski/Llama-3.1-Nemotron-70B-Instruct-HF-GGUF
Online demo:
How to run models locally on a regular PC (without a video card, just on the CPU, video cards including 8GB of memory, and on AMD video cards).