[GH-ISSUE #13888] Using previously downloaded models in .15 no longer work. #71148

Closed
opened 2026-05-05 00:33:02 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @sammyvoncheese on GitHub (Jan 24, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/13888

What is the issue?

Error loading any previously downloaded models in .15. Getting this error calling the python API against the .15 server. Does not occur in previous version.

Example Model: glm-4.7-flash:Q4_K_M

Redownloading the model allows it to work. I have 228 models downloaded already; redownloading terabytes of models is a non-starter.

Windows 11, binaries install only.

I tried using the newly re-downloaded model glm-4.7-flash:Q4_K_M against .14.3 and got this

time=2026-01-24T15:48:17.478-05:00 level=INFO source=server.go:3634 msg="http: panic serving 127.0.0.1:60938: runtime error: invalid memory address or nil pointer dereference\ngoroutine 52 [running]:\nnet/http.(*conn).serve.func1()\n\tnet/http/server.go:1947 +0xbe\npanic({0x7ff79e4e1f20?, 0x7ff79f207790?})\n\truntime/panic.go:792 +0x132\ngithub.com/ollama/ollama/runner/ollamarunner.(*Server).allocModel.func1()\n\tgithub.com/ollama/ollama/runner/ollamarunner/runner.go:1187 +0x11a\npanic({0x7ff79e4e1f20?, 0x7ff79f207790?})\n\truntime/panic.go:792 +0x132\ngithub.com/ollama/ollama/ml/nn.(*Linear).Forward(0x0, {0x7ff79e859340, 0xc00148ac00}, {0x7ff79e867680?, 0xc00c8801e0?})\n\tgithub.com/ollama/ollama/ml/nn/linear.go:11 +0x27\ngithub.com/ollama/ollama/model/models/glm4moelite.(*Attention).Forward(0xc001076400, {0x7ff79e859340, 0xc00148ac00}, {0x7ff79e867680, 0xc00c880090}, {0x7ff79e867680, 0xc00c880060}, {0x7ff79e855658, 0xc001354100}, 0xc000525500)\n\tgithub.com/ollama/ollama/model/models/glm4moelite/model.go:81 +0x634\ngithub.com/ollama/ollama/model/models/glm4moelite.(*Layer).Forward(0xc0000472d8, {0x7ff79e859340, 0xc00148ac00}, {0x7ff79e867680, 0xc00c880078}, {0x7ff79e867680, 0xc00c880060}, {0x0, 0x0}, {0x7ff79e855658, ...}, ...)\n\tgithub.com/ollama/ollama/model/models/glm4moelite/model.go:177 +0xd9\ngithub.com/ollama/ollama/model/models/glm4moelite.(*Model).Forward(0xc0000d8b80, {0x7ff79e859340, 0xc00148ac00}, {{0x7ff79e867680, 0xc001494798}, {0x7ff79e867680, 0xc0014947b0}, {0xc0014a4800, 0x200, 0x200}, ...})\n\tgithub.com/ollama/ollama/model/models/glm4moelite/model.go:295 +0x16a\ngithub.com/ollama/ollama/runner/ollamarunner.(*Server).reserveWorstCaseGraph(0xc00022b0e0, 0x1)\n\tgithub.com/ollama/ollama/runner/ollamarunner/runner.go:1157 +0x9ad\ngithub.com/ollama/ollama/runner/ollamarunner.(*Server).allocModel(0xc00022b0e0, {0xc00002c060?, 0x7ff79d54af9a?}, {0x0, 0x10, {0xc0001441c0, 0x1, 0x1}, 0x1}, {0x0, ...}, ...)\n\tgithub.com/ollama/ollama/runner/ollamarunner/runner.go:1226 +0x391\ngithub.com/ollama/ollama/runner/ollamarunner.(*Server).load(0xc00022b0e0, {0x7ff79e84b060, 0xc0001640e0}, 0xc00014e280)\n\tgithub.com/ollama/ollama/runner/ollamarunner/runner.go:1305 +0x54b\nnet/http.HandlerFunc.ServeHTTP(0xc0006a8780?, {0x7ff79e84b060?, 0xc0001640e0?}, 0xc000047b60?)\n\tnet/http/server.go:2294 +0x29\nnet/http.(*ServeMux).ServeHTTP(0x7ff79d1eb785?, {0x7ff79e84b060, 0xc0001640e0}, 0xc00014e280)\n\tnet/http/server.go:2822 +0x1c4\nnet/http.serverHandler.ServeHTTP({0x7ff79e847570?}, {0x7ff79e84b060?, 0xc0001640e0?}, 0x1?)\n\tnet/http/server.go:3301 +0x8e\nnet/http.(*conn).serve(0xc0006e2480, {0x7ff79e84d4c8, 0xc0006def60})\n\tnet/http/server.go:2102 +0x625\ncreated by net/http.(*Server).Serve in goroutine 1\n\tnet/http/server.go:3454 +0x485"

Relevant log output

Error: failed to initialize model: this model uses a weight format that is no longer supported; please re-download it
(status code: 500)

OS

Windows 11

GPU

5090

CPU

AMD Ryzen 9, 995x03D 16 Core

Ollama version

.15

Originally created by @sammyvoncheese on GitHub (Jan 24, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/13888 ### What is the issue? Error loading any previously downloaded models in .15. Getting this error calling the python API against the .15 server. Does not occur in previous version. Example Model: glm-4.7-flash:Q4_K_M Redownloading the model allows it to work. I have 228 models downloaded already; redownloading terabytes of models is a non-starter. Windows 11, binaries install only. I tried using the newly re-downloaded model glm-4.7-flash:Q4_K_M against .14.3 and got this time=2026-01-24T15:48:17.478-05:00 level=INFO source=server.go:3634 msg="http: panic serving 127.0.0.1:60938: runtime error: invalid memory address or nil pointer dereference\ngoroutine 52 [running]:\nnet/http.(*conn).serve.func1()\n\tnet/http/server.go:1947 +0xbe\npanic({0x7ff79e4e1f20?, 0x7ff79f207790?})\n\truntime/panic.go:792 +0x132\ngithub.com/ollama/ollama/runner/ollamarunner.(*Server).allocModel.func1()\n\tgithub.com/ollama/ollama/runner/ollamarunner/runner.go:1187 +0x11a\npanic({0x7ff79e4e1f20?, 0x7ff79f207790?})\n\truntime/panic.go:792 +0x132\ngithub.com/ollama/ollama/ml/nn.(*Linear).Forward(0x0, {0x7ff79e859340, 0xc00148ac00}, {0x7ff79e867680?, 0xc00c8801e0?})\n\tgithub.com/ollama/ollama/ml/nn/linear.go:11 +0x27\ngithub.com/ollama/ollama/model/models/glm4moelite.(*Attention).Forward(0xc001076400, {0x7ff79e859340, 0xc00148ac00}, {0x7ff79e867680, 0xc00c880090}, {0x7ff79e867680, 0xc00c880060}, {0x7ff79e855658, 0xc001354100}, 0xc000525500)\n\tgithub.com/ollama/ollama/model/models/glm4moelite/model.go:81 +0x634\ngithub.com/ollama/ollama/model/models/glm4moelite.(*Layer).Forward(0xc0000472d8, {0x7ff79e859340, 0xc00148ac00}, {0x7ff79e867680, 0xc00c880078}, {0x7ff79e867680, 0xc00c880060}, {0x0, 0x0}, {0x7ff79e855658, ...}, ...)\n\tgithub.com/ollama/ollama/model/models/glm4moelite/model.go:177 +0xd9\ngithub.com/ollama/ollama/model/models/glm4moelite.(*Model).Forward(0xc0000d8b80, {0x7ff79e859340, 0xc00148ac00}, {{0x7ff79e867680, 0xc001494798}, {0x7ff79e867680, 0xc0014947b0}, {0xc0014a4800, 0x200, 0x200}, ...})\n\tgithub.com/ollama/ollama/model/models/glm4moelite/model.go:295 +0x16a\ngithub.com/ollama/ollama/runner/ollamarunner.(*Server).reserveWorstCaseGraph(0xc00022b0e0, 0x1)\n\tgithub.com/ollama/ollama/runner/ollamarunner/runner.go:1157 +0x9ad\ngithub.com/ollama/ollama/runner/ollamarunner.(*Server).allocModel(0xc00022b0e0, {0xc00002c060?, 0x7ff79d54af9a?}, {0x0, 0x10, {0xc0001441c0, 0x1, 0x1}, 0x1}, {0x0, ...}, ...)\n\tgithub.com/ollama/ollama/runner/ollamarunner/runner.go:1226 +0x391\ngithub.com/ollama/ollama/runner/ollamarunner.(*Server).load(0xc00022b0e0, {0x7ff79e84b060, 0xc0001640e0}, 0xc00014e280)\n\tgithub.com/ollama/ollama/runner/ollamarunner/runner.go:1305 +0x54b\nnet/http.HandlerFunc.ServeHTTP(0xc0006a8780?, {0x7ff79e84b060?, 0xc0001640e0?}, 0xc000047b60?)\n\tnet/http/server.go:2294 +0x29\nnet/http.(*ServeMux).ServeHTTP(0x7ff79d1eb785?, {0x7ff79e84b060, 0xc0001640e0}, 0xc00014e280)\n\tnet/http/server.go:2822 +0x1c4\nnet/http.serverHandler.ServeHTTP({0x7ff79e847570?}, {0x7ff79e84b060?, 0xc0001640e0?}, 0x1?)\n\tnet/http/server.go:3301 +0x8e\nnet/http.(*conn).serve(0xc0006e2480, {0x7ff79e84d4c8, 0xc0006def60})\n\tnet/http/server.go:2102 +0x625\ncreated by net/http.(*Server).Serve in goroutine 1\n\tnet/http/server.go:3454 +0x485" ### Relevant log output Error: failed to initialize model: this model uses a weight format that is no longer supported; please re-download it (status code: 500) ### OS Windows 11 ### GPU 5090 ### CPU AMD Ryzen 9, 995x03D 16 Core ### Ollama version .15
GiteaMirror added the bug label 2026-05-05 00:33:02 -05:00
Author
Owner

@rick-github commented on GitHub (Jan 24, 2026):

Only glm-4.7-flash needs to be refreshed for the weight format change, 0.15.0 implements a different weight to reduce the memory footprint. The crash (invalid memory address or nil pointer dereference) should be unrelated to that. A full log after refreshing the model would be helpful.

<!-- gh-comment-id:3795562188 --> @rick-github commented on GitHub (Jan 24, 2026): Only glm-4.7-flash needs to be refreshed for the weight format change, 0.15.0 implements a different weight to reduce the memory footprint. The crash (`invalid memory address or nil pointer dereference`) should be unrelated to that. A full log after refreshing the model would be helpful.
Author
Owner

@UltraRabbit commented on GitHub (Jan 25, 2026):

I tried to use 'ollam pull glm-4.7-flash:Q4_K_M' again when the old model still in my local. It does seem to download the new data but failed when launching 'ollama run glm-4.7-flash:Q4_K_M' and giving the same error message. After I remove the local model and repull it from ollama repo, it could be load and run without memory consumption issue, but still having thinking loop. However, after download the unsloth's gguf model from huggingface and imported into ollama, the thinking loop was fixed but memory consumption issue came back again.

<!-- gh-comment-id:3795858527 --> @UltraRabbit commented on GitHub (Jan 25, 2026): I tried to use 'ollam pull glm-4.7-flash:Q4_K_M' again when the old model still in my local. It does seem to download the new data but failed when launching 'ollama run glm-4.7-flash:Q4_K_M' and giving the same error message. After I remove the local model and repull it from ollama repo, it could be load and run without memory consumption issue, but still having thinking loop. However, after download the unsloth's gguf model from huggingface and imported into ollama, the thinking loop was fixed but memory consumption issue came back again.
Author
Owner

@sammyvoncheese commented on GitHub (Jan 25, 2026):

@rick-github the error was using the new model downloaded with .15 with the 14.3 server. I had deleted the old one (pre .15 not sure when I pulled it now), then downloaded the new one with .15.

Very cool that the new version runs 28gb on my 5090 with 90000 context, I think the last one was like 100 gigs on my machine.

<!-- gh-comment-id:3795868905 --> @sammyvoncheese commented on GitHub (Jan 25, 2026): @rick-github the error was using the new model downloaded with .15 with the 14.3 server. I had deleted the old one (pre .15 not sure when I pulled it now), then downloaded the new one with .15. Very cool that the new version runs 28gb on my 5090 with 90000 context, I think the last one was like 100 gigs on my machine.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71148