ollama/convert/convert_glm4moelite.go at main

mirror of https://github.com/ollama/ollama.git synced 2026-03-08 23:04:13 -05:00

Files

Jeffrey Morgan 64737330a4 Re-apply "model: add MLA absorption for glm4moelite" with fix (#13870 )

The nvidia_fp32 config for (576, 512) head sizes had nbatch_fa=32,
which caused zero-sized arrays when computing array dimensions:
  nbatch_fa / (np * warp_size) = 32 / (2 * 32) = 0

This resulted in CUDA compilation failures on CUDA 12 (Windows and
Linux arm64):
- "static assertion failed with nbatch_fa % (np*warp_size) != 0"
- "the size of an array must be greater than zero"

Fix by changing nbatch_fa from 32 to 64 for all (576, 512) configs
in the nvidia_fp32 function, matching the nvidia_fp16 and AMD configs.

2026-01-23 18:40:28 -08:00

8.3 KiB

Raw Permalink Blame History

View Raw

8.3 KiB Raw Permalink Blame History

8.3 KiB

Raw Permalink Blame History