Z.ai: GLM 4.6V

Z AiID: z-ai/glm-4.6v

GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents, and mixed media. It supports up to 128K tokens, processes complex page layouts and charts directly as visual inputs, and integrates native multimodal function calling to connect perception with downstream tool execution. The model also enables interleaved image-text generation and UI reconstruction workflows, including screenshot-to-HTML synthesis and iterative visual editing.

Pricing per 1M Tokens

Input (Prompt)$0.30
Output (Completion)$0.90
Cache ReadFree
Cache WriteFree
ImageN/A

Specifications

Context Length131K
Max Output Tokens131K
Input ModalitiesImage + Text + Video
Output ModalitiesText
TokenizerOther
Instruct TypeN/A
Top Provider Context131K
Top Provider Max Output131K
ModeratedNo

Compare this model

See how Z.ai: GLM 4.6V stacks up against other models.

More from Z Ai

Last updated: March 23, 2026

First tracked: March 23, 2026