Z.ai: GLM 4.6V

Z AiID: z-ai/glm-4.6v

GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents, and mixed media. It supports up to 128K tokens, processes complex page layouts and charts directly as visual inputs, and integrates native multimodal function calling to connect perception with downstream tool execution. The model also enables interleaved image-text generation and UI reconstruction workflows, including screenshot-to-HTML synthesis and iterative visual editing.

Pricing per 1M Tokens

Input (Prompt)	$0.30
Output (Completion)	$0.90
Cache Read	Free
Cache Write	Free
Image	N/A

Specifications

Context Length	131K
Max Output Tokens	131K
Input Modalities	Image + Text + Video
Output Modalities	Text
Tokenizer	Other
Instruct Type	N/A
Top Provider Context	131K
Top Provider Max Output	131K
Moderated	No