[GH-ISSUE #11564] Support for zai-org/GLM-4.5-Air (Thinking & Non-Thinking Modes + Tool Use) #69691

Closed
opened 2026-05-04 18:51:16 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @zytoh0 on GitHub (Jul 29, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11564

Hi team,
Please consider adding support for the zai-org/GLM-4.5-Air model and its FP8 variant zai-org/GLM-4.5-Air-FP8. These models are designed for hybrid reasoning and support:

  • Thinking mode: for complex reasoning and tool usage

  • Non-thinking mode: for quick, direct responses

The Air variant is optimized for lightweight deployment, and the FP8 version further enhances efficiency. Supporting both modes and tool use functionality would make these models highly versatile for a wide range of environments.

Appreciate your consideration!

Originally created by @zytoh0 on GitHub (Jul 29, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11564 Hi team, Please consider adding support for the [zai-org/GLM-4.5-Air](https://huggingface.co/zai-org/GLM-4.5-Air) model and its FP8 variant [zai-org/GLM-4.5-Air-FP8](https://huggingface.co/zai-org/GLM-4.5-Air-FP8). These models are designed for hybrid reasoning and support: - **Thinking mode**: for complex reasoning and tool usage - **Non-thinking mode**: for quick, direct responses The Air variant is optimized for lightweight deployment, and the FP8 version further enhances efficiency. Supporting both modes and **tool use functionality** would make these models highly versatile for a wide range of environments. Appreciate your consideration!
GiteaMirror added the model label 2026-05-04 18:51:16 -05:00
Author
Owner

@zytoh0 commented on GitHub (Aug 2, 2025):

@rick-github This is not a duplicate of Support for zai-org/GLM-4.5 (Thinking & Non-Thinking Modes + Tool Use) #11563, as this issue concerns the Air variant - GLM-4.5-Air - which is a distinct and significantly smaller model compared to the standard GLM-4.5 (106B vs 355B total parameters; 12B vs 32B active parameters). The two versions offer different trade-offs in performance and efficiency, similar to how both phi-4 and phi-4-mini are included separately in Ollama. Given these substantial differences, this request should be tracked independently. Kindly consider reopening this issue.

<!-- gh-comment-id:3146549293 --> @zytoh0 commented on GitHub (Aug 2, 2025): @rick-github This is not a [duplicate](https://github.com/ollama/ollama/issues?q=is%3Aissue+state%3Aclosed+archived%3Afalse+reason%3Aduplicate) of [Support for zai-org/GLM-4.5 (Thinking & Non-Thinking Modes + Tool Use) #11563](https://github.com/ollama/ollama/issues/11563), as this issue concerns the Air variant - GLM-4.5-Air - which is a distinct and significantly smaller model compared to the standard GLM-4.5 (106B vs 355B total parameters; 12B vs 32B active parameters). The two versions offer different trade-offs in performance and efficiency, similar to how both phi-4 and phi-4-mini are included separately in Ollama. Given these substantial differences, this request should be tracked independently. Kindly consider reopening this issue.
Author
Owner

@rick-github commented on GitHub (Aug 2, 2025):

The model architecture is the same.

<!-- gh-comment-id:3146561723 --> @rick-github commented on GitHub (Aug 2, 2025): The model architecture is the same.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#69691