Inception: Mercury

InceptionID: inception/mercury

Mercury is the first diffusion large language model (dLLM). Applying a breakthrough discrete diffusion approach, the model runs 5-10x faster than even speed optimized models like GPT-4.1 Nano and Claude 3.5 Haiku while matching their performance. Mercury's speed enables developers to provide responsive user experiences, including with voice agents, search interfaces, and chatbots. Read more in the [blog post] (https://www.inceptionlabs.ai/blog/introducing-mercury) here.

Pricing per 1M Tokens

Input (Prompt)	$0.25
Output (Completion)	$0.75
Cache Read	$0.02
Cache Write	Free
Image	N/A

Specifications

Context Length	128K
Max Output Tokens	32K
Input Modalities	Text
Output Modalities	Text
Tokenizer	Other
Instruct Type	N/A
Top Provider Context	128K
Top Provider Max Output	32K
Moderated	No

Compare this model

See how Inception: Mercury stacks up against other models.

vs Inception: Mercury 2 vs Inception: Mercury Coder

More from Inception

Inception: Mercury 2

Input$0.25

Context128K

Inception: Mercury Coder

Input$0.25

Context128K

Last updated: March 23, 2026

First tracked: March 23, 2026