[GH-ISSUE #14498] glm-ocr fails to load with 500 Internal Server Error #35164

Open
opened 2026-04-22 19:28:27 -05:00 by GiteaMirror · 10 comments
Owner

Originally created by @AryanKarumuri on GitHub (Feb 27, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14498

What is the issue?

glm-ocr fails to load with 500 Internal Server Error (Ollama 0.17.4, RTX 4090)

Environment

  • Ollama version: 0.17.4
  • GPU: NVIDIA RTX 4090 (24GB VRAM)
  • OS: Windows

Steps to Reproduce

ollama pull glm-ocr
ollama run glm-ocr

Expected Behavior:

The model should load successfully and start an interactive session.

Actual Behavior: Error:

500 Internal Server Error: model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details

Note:

  • The model downloads successfully. The failure happens immediately when running the model.

Has anyone experienced this or knows what might be causing it?

Relevant log output


OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.17.4

Originally created by @AryanKarumuri on GitHub (Feb 27, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14498 ### What is the issue? ## `glm-ocr` fails to load with 500 Internal Server Error (Ollama 0.17.4, RTX 4090) ### Environment - **Ollama version:** 0.17.4 - **GPU:** NVIDIA RTX 4090 (24GB VRAM) - **OS:** Windows ### Steps to Reproduce ```bash ollama pull glm-ocr ollama run glm-ocr ``` ### Expected Behavior: The model should load successfully and start an interactive session. ### Actual Behavior: Error: 500 Internal Server Error: model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details ### Note: - The model downloads successfully. The failure happens immediately when running the model. Has anyone experienced this or knows what might be causing it? ### Relevant log output ```shell ``` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.17.4
GiteaMirror added the bug label 2026-04-22 19:28:27 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 27, 2026):

Server logs will aid in debugging.

<!-- gh-comment-id:3972798034 --> @rick-github commented on GitHub (Feb 27, 2026): [Server logs](https://docs.ollama.com/troubleshooting) will aid in debugging.
Author
Owner

@rick-github commented on GitHub (Feb 28, 2026):

This log doesn't show any errors.

<!-- gh-comment-id:3977709253 --> @rick-github commented on GitHub (Feb 28, 2026): This log doesn't show any errors.
Author
Owner

@rick-github commented on GitHub (Mar 1, 2026):

@vitalii-py #14474

<!-- gh-comment-id:3979839746 --> @rick-github commented on GitHub (Mar 1, 2026): @vitalii-py #14474
Author
Owner

@guan2000910 commented on GitHub (Mar 2, 2026):

我遇到同样的报错:500 Internal Server Error: model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details。但我解决了。
从本地通过create导入模型,之后run遇到此报错,我是下载的release版本在服务器进行开发,增加sudo mv ollama /usr/bin/ 之后通过ps将ollama相关全部kill,再重新create和run就没有报错了。

<!-- gh-comment-id:3983690068 --> @guan2000910 commented on GitHub (Mar 2, 2026): 我遇到同样的报错:500 Internal Server Error: model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details。但我解决了。 从本地通过create导入模型,之后run遇到此报错,我是下载的release版本在服务器进行开发,增加sudo mv ollama /usr/bin/ 之后通过ps将ollama相关全部kill,再重新create和run就没有报错了。
Author
Owner

@a-zb commented on GitHub (Mar 6, 2026):

Same issue.
Device 0: AMD Radeon RX 7800 XT, gfx1101 (0x1101), VMM: no, Wave Size: 32, ID: 0

<!-- gh-comment-id:4009593773 --> @a-zb commented on GitHub (Mar 6, 2026): Same issue. Device 0: AMD Radeon RX 7800 XT, gfx1101 (0x1101), VMM: no, Wave Size: 32, ID: 0
Author
Owner

@RAFOLIE commented on GitHub (Mar 15, 2026):

Same issue on Ollama 0.18.0 (Windows 11, RTX 4090)

Environment:

  • Ollama version: 0.18.0
  • GPU: NVIDIA RTX 4090 (24GB VRAM)
  • OS: Windows 11
  • Model: glm-ocr:bf16

Repro steps:

ollama pull glm-ocr:bf16
# Both /api/chat and /api/generate fail:
curl -X POST http://localhost:11434/api/generate -d '{"model":"glm-ocr:bf16","prompt":"Hello","stream":false}'
# Error: model failed to load, this may be due to resource limitations or an internal error

Additional findings:

  • /api/show returns correct model_info (architecture: glmocr, 1.1B params, BF16)
  • Other models (qwen3-vl:8b, nomic-embed-text) work fine on the same machine
  • Model file size: 2.07GB
  • VRAM is not the issue (tried unloading all other models first, still fails)
  • PR #14584 fixed the "empty markdown" bug, but this is a different issue (model fails to load at all)

Model info from /api/show:

{
  "general.architecture": "glmocr",
  "general.file_type": 1,
  "general.parameter_count": 1107405824,
  "glmocr.vision.max_pixels": 9633792,
  "glmocr.vision.min_pixels": 12544
}

Issue persists even with only glm-ocr loaded (no other models in VRAM).

<!-- gh-comment-id:4062533305 --> @RAFOLIE commented on GitHub (Mar 15, 2026): ### Same issue on Ollama 0.18.0 (Windows 11, RTX 4090) **Environment:** - **Ollama version:** 0.18.0 - **GPU:** NVIDIA RTX 4090 (24GB VRAM) - **OS:** Windows 11 - **Model:** glm-ocr:bf16 **Repro steps:** ```bash ollama pull glm-ocr:bf16 # Both /api/chat and /api/generate fail: curl -X POST http://localhost:11434/api/generate -d '{"model":"glm-ocr:bf16","prompt":"Hello","stream":false}' # Error: model failed to load, this may be due to resource limitations or an internal error ``` **Additional findings:** - `/api/show` returns correct model_info (architecture: glmocr, 1.1B params, BF16) - Other models (qwen3-vl:8b, nomic-embed-text) work fine on the same machine - Model file size: 2.07GB - VRAM is not the issue (tried unloading all other models first, still fails) - PR #14584 fixed the "empty markdown" bug, but this is a different issue (model fails to load at all) **Model info from /api/show:** ```json { "general.architecture": "glmocr", "general.file_type": 1, "general.parameter_count": 1107405824, "glmocr.vision.max_pixels": 9633792, "glmocr.vision.min_pixels": 12544 } ``` Issue persists even with only glm-ocr loaded (no other models in VRAM).
Author
Owner

@rick-github commented on GitHub (Mar 15, 2026):

Server logs will aid in debugging.

<!-- gh-comment-id:4062618509 --> @rick-github commented on GitHub (Mar 15, 2026): [Server logs](https://docs.ollama.com/troubleshooting) will aid in debugging.
Author
Owner

@yoshida0119 commented on GitHub (Mar 19, 2026):

Installed via brew resulted in an error, but reinstalling with curl -fsSL https://ollama.com/install.sh | sh worked correctly

<!-- gh-comment-id:4093627599 --> @yoshida0119 commented on GitHub (Mar 19, 2026): Installed via `brew` resulted in an error, but reinstalling with `curl -fsSL https://ollama.com/install.sh | sh` worked correctly
Author
Owner

@devinguthrie commented on GitHub (Apr 6, 2026):

If anyone is still running into this check your context length size. Thanks to @rick-github suggestion, I looked at the server logs. This was the error msg="server unhealthy" error="llama runner process no longer running: 1 GGML_ASSERT(a->ne[2] * 4 == b->ne[0]) failed" and ultimately this was the issue level=WARN msg="requested context size too large for model" num_ctx=262144 n_ctx_train=131072

I had turned my context length in the settings all the way up when I was testing another model.
Image

Once I turned it back down to 4k it worked again.

<!-- gh-comment-id:4195318183 --> @devinguthrie commented on GitHub (Apr 6, 2026): If anyone is still running into this check your context length size. Thanks to @rick-github suggestion, I looked at the server logs. This was the error ` msg="server unhealthy" error="llama runner process no longer running: 1 GGML_ASSERT(a->ne[2] * 4 == b->ne[0]) failed"` and ultimately this was the issue `level=WARN msg="requested context size too large for model" num_ctx=262144 n_ctx_train=131072` I had turned my context length in the settings all the way up when I was testing another model. <img width="1353" height="745" alt="Image" src="https://github.com/user-attachments/assets/7a229cd8-adf0-43d6-bfc7-86a7409f16bc" /> Once I turned it back down to 4k it worked again.
Author
Owner

@dolacmeo commented on GitHub (Apr 7, 2026):

For the glm-ocr:latest model, the complete command to modify and save a 4k context length using Modelfile in a Windows environment is as follows:

1. Export the current configuration

Open PowerShell or Command Prompt (CMD) and run the following command to export the configuration to your desktop (for easy access):

ollama show glm-ocr:latest --modelfile > Modelfile_glm

2. Modify the configuration file

Open the newly generated Modelfile_glm using Notepad:

notepad Modelfile_glm

Add the following code to the last line of the file, then save and close it:

PARAMETER num_ctx 4096

3. Create a new model with a 4k context limit

Run the following command to create a new model named glm-ocr-4k based on the modified file:

ollama create glm-ocr-4k -f Modelfile_glm

4. Run and test

You can now run this new model, which has been specifically tuned for context size:

ollama run glm-ocr-4k  Text Recognition: ./image.png
<!-- gh-comment-id:4195964989 --> @dolacmeo commented on GitHub (Apr 7, 2026): For the glm-ocr:latest model, the complete command to modify and save a 4k context length using Modelfile in a Windows environment is as follows: ## 1. Export the current configuration Open PowerShell or Command Prompt (CMD) and run the following command to export the configuration to your desktop (for easy access): ``` ollama show glm-ocr:latest --modelfile > Modelfile_glm ``` ## 2. Modify the configuration file Open the newly generated Modelfile_glm using Notepad: ``` notepad Modelfile_glm ``` Add the following code to the last line of the file, then save and close it: ``` PARAMETER num_ctx 4096 ``` ## 3. Create a new model with a 4k context limit Run the following command to create a new model named glm-ocr-4k based on the modified file: ``` ollama create glm-ocr-4k -f Modelfile_glm ``` ## 4. Run and test You can now run this new model, which has been specifically tuned for context size: ``` ollama run glm-ocr-4k Text Recognition: ./image.png ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#35164