NVIDIA: Llama 3.1 Nemotron 70B Instruct

NVIDIAID: nvidia/llama-3.1-nemotron-70b-instruct

NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging [Llama 3.1 70B](/models/meta-llama/llama-3.1-70b-instruct) architecture and Reinforcement Learning from Human Feedback (RLHF), it excels in automatic alignment benchmarks. This model is tailored for applications requiring high accuracy in helpfulness and response generation, suitable for diverse user queries across multiple domains. Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).

Pricing per 1M Tokens

Input (Prompt)$1.20
Output (Completion)$1.20
Cache ReadFree
Cache WriteFree
ImageN/A

Specifications

Context Length131K
Max Output Tokens16K
Input ModalitiesText
Output ModalitiesText
TokenizerLlama3
Instruct Typellama3
Top Provider Context131K
Top Provider Max Output16K
ModeratedNo

More from NVIDIA

Last updated: March 23, 2026

First tracked: March 23, 2026