Nemotron 3 Nano Omni

Mistral AIopen-source NewOpen Source

NVIDIA's 30B open multimodal model running vision + audio + text in a single stack. Tops 6 specialty leaderboards. April 2026.

Context Window

256K

tokens

Max Output

16K

tokens

Input Price

Self-host

open weights

Output Price

Self-host

infra cost only

Speed

158

tokens/sec

Released

Apr 2026

2026-04-11

Blended Cost

self-host

Value Score

n/a self-host

Capabilities

ChatVisionAudioCode Generation

Benchmarks

Quality Index
76
MMLU Pro
81.8
HumanEval (Coding)
80.6
MATH
75.4
Arena ELO
1361

Open Source — Licensed under NVIDIA Open Model License

About Nemotron 3 Nano Omni

Nemotron 3 Nano Omni is a open-source AI model by Mistral AI, released on April 11, 2026. It supports a context window of 256K tokens and can generate up to 16K output tokens.

Nemotron 3 Nano Omni is published as open weights (NVIDIA Open Model License) for self-hosting — there is no per-token API price. Cost depends on your inference infrastructure: GPU rental, throughput per GPU, and operating overhead. For commodity 24-48GB GPUs the effective cost typically lands in the $0.10-$0.50 per million output tokens range, well below the cheapest hosted alternatives.

Nemotron 3 Nano Omni is available as an open-source model under the NVIDIA Open Model License license, meaning you can self-host it for predictable costs or use it through API providers like Swfte Connect.

Using Nemotron 3 Nano Omni with Swfte

Access Nemotron 3 Nano Omni through Swfte Connect, our unified LLM gateway. Connect gives you a single API for 50+ models, with automatic routing, cost optimization, and fallback handling. You can also try Nemotron 3 Nano Omni in our AI Playground before integrating.