BenchmarkMarch 24, 2026

xAI's Grok 4.20 Breaks Honesty Barrier, Lags in Intelligence Showdown

xAI's Grok 4.20 has officially exited its beta phase, bringing with it a slew of impressive features, including a massive 2M-token context window and multi-agent configurations. The model's most notable achievement, however, is its record-breaking 78% non-hallucination rate on the Omniscience test, a benchmark that measures a model's ability to provide accurate and truthful responses. This milestone is a significant step forward for the industry, as it demonstrates a commitment to honesty and transparency in AI development.

Despite this achievement, Grok 4.20's overall intelligence ranking is less impressive, coming in at 8th place with 48 points on the Intelligence Index. This puts it behind competitors like Gemini 3.1 Pro and GPT-5.4, which scored 57 points. While this may seem like a disappointment, it's essential to consider that honesty and intelligence are not the same thing, and Grok 4.20's strengths in one area do not necessarily translate to the other. This matters for AI developers and users because it highlights the complexities of evaluating AI models and the need for a nuanced understanding of their capabilities and limitations. As the industry continues to evolve, it's crucial to prioritize both honesty and intelligence in AI development to create models that are not only accurate but also trustworthy and reliable.