[GH-ISSUE #11833] Cannot run gpt-oss #33613

Closed
opened 2026-04-22 16:28:31 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @sinchro1 on GitHub (Aug 9, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11833

What is the issue?

Cannot run gpt-oss-20b-GGUF:Q2_K. "500 Internal Server Error: unable to load model”. Any other models work well.

Relevant log output

time=2025-08-09T16:03:57.387+02:00 level=DEBUG source=amd_common.go:44 msg="detected ROCM next to ollama executable C:\\Users\\xxx\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm"
time=2025-08-09T16:03:57.387+02:00 level=DEBUG source=amd_windows.go:73 msg="detected hip devices" count=1
time=2025-08-09T16:03:57.387+02:00 level=DEBUG source=amd_windows.go:93 msg="hip device" id=0 name="AMD Radeon RX 9070" gfx=gfx1201
time=2025-08-09T16:03:57.761+02:00 level=DEBUG source=amd_windows.go:146 msg="amdgpu is supported" gpu=0 gpu_type=gfx1201
time=2025-08-09T16:03:57.761+02:00 level=DEBUG source=amd_windows.go:149 msg="amdgpu memory" gpu=0 total="15.9 GiB"
time=2025-08-09T16:03:57.761+02:00 level=DEBUG source=amd_windows.go:150 msg="amdgpu memory" gpu=0 available="15.8 GiB"
time=2025-08-09T16:03:57.761+02:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=rocm variant="" compute=gfx1201 driver=6.4 name="AMD Radeon RX 9070" total="15.9 GiB" available="15.8 GiB"
time=2025-08-09T16:03:57.761+02:00 level=INFO source=routes.go:1398 msg="entering low vram mode" "total vram"="15.9 GiB" threshold="20.0 GiB"
[GIN] 2025/08/09 - 16:04:01 | 200 |            0s |       127.0.0.1 | HEAD     "/"
time=2025-08-09T16:04:02.045+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.alignment default=32
[GIN] 2025/08/09 - 16:04:02 | 200 |     71.8335ms |       127.0.0.1 | POST     "/api/show"
time=2025-08-09T16:04:02.095+02:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="47.9 GiB" before.free="37.6 GiB" before.free_swap="35.5 GiB" now.total="47.9 GiB" now.free="37.6 GiB" now.free_swap="35.2 GiB"
time=2025-08-09T16:04:02.501+02:00 level=DEBUG source=amd_windows.go:197 msg="updating rocm free memory" gpu=0 name="AMD Radeon RX 9070" before="15.8 GiB" now="15.6 GiB"
time=2025-08-09T16:04:02.502+02:00 level=INFO source=sched.go:187 msg="one or more GPUs detected that are unable to accurately report free memory - disabling default concurrency"
time=2025-08-09T16:04:02.520+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.alignment default=32
time=2025-08-09T16:04:02.563+02:00 level=DEBUG source=sched.go:226 msg="loading first model" model=D:\OLLAMA_MODELS\blobs\sha256-129b8e3865517cbba55bbb7d4f0cc2444a014e73a619eafbac247ef6573a19ee
time=2025-08-09T16:04:02.563+02:00 level=DEBUG source=memory.go:111 msg=evaluating library=rocm gpu_count=1 available="[15.6 GiB]"
time=2025-08-09T16:04:02.563+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gpt-oss.vision.block_count default=0
time=2025-08-09T16:04:02.563+02:00 level=INFO source=sched.go:786 msg="new model will fit in available VRAM in single GPU, loading" model=D:\OLLAMA_MODELS\blobs\sha256-129b8e3865517cbba55bbb7d4f0cc2444a014e73a619eafbac247ef6573a19ee gpu=0 parallel=1 available=16778919936 required="1.8 GiB"
time=2025-08-09T16:04:02.563+02:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="47.9 GiB" before.free="37.6 GiB" before.free_swap="35.2 GiB" now.total="47.9 GiB" now.free="37.5 GiB" now.free_swap="35.0 GiB"
time=2025-08-09T16:04:02.921+02:00 level=DEBUG source=amd_windows.go:197 msg="updating rocm free memory" gpu=0 name="AMD Radeon RX 9070" before="15.6 GiB" now="15.5 GiB"
time=2025-08-09T16:04:02.922+02:00 level=INFO source=server.go:135 msg="system memory" total="47.9 GiB" free="37.5 GiB" free_swap="35.0 GiB"
time=2025-08-09T16:04:02.922+02:00 level=DEBUG source=memory.go:111 msg=evaluating library=rocm gpu_count=1 available="[15.6 GiB]"
time=2025-08-09T16:04:02.923+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gpt-oss.vision.block_count default=0
time=2025-08-09T16:04:02.923+02:00 level=INFO source=server.go:175 msg=offload library=rocm layers.requested=-1 layers.model=25 layers.offload=25 layers.split="" memory.available="[15.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="1.8 GiB" memory.required.partial="1.8 GiB" memory.required.kv="192.0 MiB" memory.required.allocations="[1.8 GiB]" memory.weights.total="931.9 MiB" memory.weights.repeating="345.1 MiB" memory.weights.nonrepeating="586.8 MiB" memory.graph.full="256.0 MiB" memory.graph.partial="256.0 MiB"
time=2025-08-09T16:04:02.923+02:00 level=DEBUG source=server.go:291 msg="compatible gpu libraries" compatible=[rocm]
gguf_init_from_file_impl: tensor 'blk.0.ffn_down_exps.weight' has invalid ggml type 39 (NONE)
gguf_init_from_file_impl: failed to read tensor info
llama_model_load: error loading model: llama_model_loader: failed to load model from D:\OLLAMA_MODELS\blobs\sha256-129b8e3865517cbba55bbb7d4f0cc2444a014e73a619eafbac247ef6573a19ee

llama_model_load_from_file_impl: failed to load model
time=2025-08-09T16:04:02.970+02:00 level=INFO source=sched.go:453 msg="NewLlamaServer failed" model=D:\OLLAMA_MODELS\blobs\sha256-129b8e3865517cbba55bbb7d4f0cc2444a014e73a619eafbac247ef6573a19ee error="unable to load model: D:\\OLLAMA_MODELS\\blobs\\sha256-129b8e3865517cbba55bbb7d4f0cc2444a014e73a619eafbac247ef6573a19ee"
[GIN] 2025/08/09 - 16:04:02 | 500 |    922.0023ms |       127.0.0.1 | POST     "/api/generate"

OS

Windows

GPU

AMD

CPU

AMD

Ollama version

0.11.4

Originally created by @sinchro1 on GitHub (Aug 9, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11833 ### What is the issue? Cannot run gpt-oss-20b-GGUF:Q2_K. "500 Internal Server Error: unable to load model”. Any other models work well. ### Relevant log output ```shell time=2025-08-09T16:03:57.387+02:00 level=DEBUG source=amd_common.go:44 msg="detected ROCM next to ollama executable C:\\Users\\xxx\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm" time=2025-08-09T16:03:57.387+02:00 level=DEBUG source=amd_windows.go:73 msg="detected hip devices" count=1 time=2025-08-09T16:03:57.387+02:00 level=DEBUG source=amd_windows.go:93 msg="hip device" id=0 name="AMD Radeon RX 9070" gfx=gfx1201 time=2025-08-09T16:03:57.761+02:00 level=DEBUG source=amd_windows.go:146 msg="amdgpu is supported" gpu=0 gpu_type=gfx1201 time=2025-08-09T16:03:57.761+02:00 level=DEBUG source=amd_windows.go:149 msg="amdgpu memory" gpu=0 total="15.9 GiB" time=2025-08-09T16:03:57.761+02:00 level=DEBUG source=amd_windows.go:150 msg="amdgpu memory" gpu=0 available="15.8 GiB" time=2025-08-09T16:03:57.761+02:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=rocm variant="" compute=gfx1201 driver=6.4 name="AMD Radeon RX 9070" total="15.9 GiB" available="15.8 GiB" time=2025-08-09T16:03:57.761+02:00 level=INFO source=routes.go:1398 msg="entering low vram mode" "total vram"="15.9 GiB" threshold="20.0 GiB" [GIN] 2025/08/09 - 16:04:01 | 200 | 0s | 127.0.0.1 | HEAD "/" time=2025-08-09T16:04:02.045+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.alignment default=32 [GIN] 2025/08/09 - 16:04:02 | 200 | 71.8335ms | 127.0.0.1 | POST "/api/show" time=2025-08-09T16:04:02.095+02:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="47.9 GiB" before.free="37.6 GiB" before.free_swap="35.5 GiB" now.total="47.9 GiB" now.free="37.6 GiB" now.free_swap="35.2 GiB" time=2025-08-09T16:04:02.501+02:00 level=DEBUG source=amd_windows.go:197 msg="updating rocm free memory" gpu=0 name="AMD Radeon RX 9070" before="15.8 GiB" now="15.6 GiB" time=2025-08-09T16:04:02.502+02:00 level=INFO source=sched.go:187 msg="one or more GPUs detected that are unable to accurately report free memory - disabling default concurrency" time=2025-08-09T16:04:02.520+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.alignment default=32 time=2025-08-09T16:04:02.563+02:00 level=DEBUG source=sched.go:226 msg="loading first model" model=D:\OLLAMA_MODELS\blobs\sha256-129b8e3865517cbba55bbb7d4f0cc2444a014e73a619eafbac247ef6573a19ee time=2025-08-09T16:04:02.563+02:00 level=DEBUG source=memory.go:111 msg=evaluating library=rocm gpu_count=1 available="[15.6 GiB]" time=2025-08-09T16:04:02.563+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gpt-oss.vision.block_count default=0 time=2025-08-09T16:04:02.563+02:00 level=INFO source=sched.go:786 msg="new model will fit in available VRAM in single GPU, loading" model=D:\OLLAMA_MODELS\blobs\sha256-129b8e3865517cbba55bbb7d4f0cc2444a014e73a619eafbac247ef6573a19ee gpu=0 parallel=1 available=16778919936 required="1.8 GiB" time=2025-08-09T16:04:02.563+02:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="47.9 GiB" before.free="37.6 GiB" before.free_swap="35.2 GiB" now.total="47.9 GiB" now.free="37.5 GiB" now.free_swap="35.0 GiB" time=2025-08-09T16:04:02.921+02:00 level=DEBUG source=amd_windows.go:197 msg="updating rocm free memory" gpu=0 name="AMD Radeon RX 9070" before="15.6 GiB" now="15.5 GiB" time=2025-08-09T16:04:02.922+02:00 level=INFO source=server.go:135 msg="system memory" total="47.9 GiB" free="37.5 GiB" free_swap="35.0 GiB" time=2025-08-09T16:04:02.922+02:00 level=DEBUG source=memory.go:111 msg=evaluating library=rocm gpu_count=1 available="[15.6 GiB]" time=2025-08-09T16:04:02.923+02:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=gpt-oss.vision.block_count default=0 time=2025-08-09T16:04:02.923+02:00 level=INFO source=server.go:175 msg=offload library=rocm layers.requested=-1 layers.model=25 layers.offload=25 layers.split="" memory.available="[15.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="1.8 GiB" memory.required.partial="1.8 GiB" memory.required.kv="192.0 MiB" memory.required.allocations="[1.8 GiB]" memory.weights.total="931.9 MiB" memory.weights.repeating="345.1 MiB" memory.weights.nonrepeating="586.8 MiB" memory.graph.full="256.0 MiB" memory.graph.partial="256.0 MiB" time=2025-08-09T16:04:02.923+02:00 level=DEBUG source=server.go:291 msg="compatible gpu libraries" compatible=[rocm] gguf_init_from_file_impl: tensor 'blk.0.ffn_down_exps.weight' has invalid ggml type 39 (NONE) gguf_init_from_file_impl: failed to read tensor info llama_model_load: error loading model: llama_model_loader: failed to load model from D:\OLLAMA_MODELS\blobs\sha256-129b8e3865517cbba55bbb7d4f0cc2444a014e73a619eafbac247ef6573a19ee llama_model_load_from_file_impl: failed to load model time=2025-08-09T16:04:02.970+02:00 level=INFO source=sched.go:453 msg="NewLlamaServer failed" model=D:\OLLAMA_MODELS\blobs\sha256-129b8e3865517cbba55bbb7d4f0cc2444a014e73a619eafbac247ef6573a19ee error="unable to load model: D:\\OLLAMA_MODELS\\blobs\\sha256-129b8e3865517cbba55bbb7d4f0cc2444a014e73a619eafbac247ef6573a19ee" [GIN] 2025/08/09 - 16:04:02 | 500 | 922.0023ms | 127.0.0.1 | POST "/api/generate" ``` ### OS Windows ### GPU AMD ### CPU AMD ### Ollama version 0.11.4
GiteaMirror added the bug label 2026-04-22 16:28:31 -05:00
Author
Owner

@rick-github commented on GitHub (Aug 9, 2025):

#11714

<!-- gh-comment-id:3171562651 --> @rick-github commented on GitHub (Aug 9, 2025): #11714
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#33613