Inception: Mercury

InceptionID: inception/mercury

Mercury is the first diffusion large language model (dLLM). Applying a breakthrough discrete diffusion approach, the model runs 5-10x faster than even speed optimized models like GPT-4.1 Nano and Claude 3.5 Haiku while matching their performance. Mercury's speed enables developers to provide responsive user experiences, including with voice agents, search interfaces, and chatbots. Read more in the [blog post] (https://www.inceptionlabs.ai/blog/introducing-mercury) here.

Pricing per 1M Tokens

Input (Prompt)$0.25
Output (Completion)$0.75
Cache Read$0.02
Cache WriteFree
ImageN/A

Specifications

Context Length128K
Max Output Tokens32K
Input ModalitiesText
Output ModalitiesText
TokenizerOther
Instruct TypeN/A
Top Provider Context128K
Top Provider Max Output32K
ModeratedNo

Compare this model

See how Inception: Mercury stacks up against other models.

More from Inception

Last updated: March 23, 2026

First tracked: March 23, 2026