GPT-5.4 Pro scored 50% on FrontierMath Tiers 1-3 and 38% on Tier 4, setting new records in mathematical reasoning.
It tied Gemini 3.1 Pro at 57 points on the Artificial Analysis Intelligence Index, marking the first time two models have shared the #1 position.
Meanwhile, Claude Opus 4.6 leads SWE-Bench Verified at 80.8%, just 0.2 points above Gemini 3.1 Pro at 80.6%. This is the tightest three-way race ever at the frontier, with each model leading in different benchmark categories.
See the full leaderboard for rankings across all benchmark categories.