[GH-ISSUE #14832] Buffer Allocation Error Without Manual Context Trim #56084

New Issue

GiteaMirror · 2026-04-29T10:14:50-05:00

GiteaMirror commented

2026-04-29 10:14:50 -05:00

Originally created by @najumancheril on GitHub (Mar 13, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14832

What is the issue?

I am new to ollama, so I apologize if this error is normal behavior due to a lack of physical memory or some other hardware constraint. However, from the log, I don't see any obvious requirement that my system cannot meet, so I am creating this bug report.

System Specs

I am running an AMD Ryzen AI Max 300 Series with 128GB RAM. I have used the AMD software to allocate 96GB to the GPU and left 32GB for system memory.

> ollama --version
ollama version is 0.17.7

> systeminfo

Host Name:                    [redacted]
OS Name:                       Microsoft Windows 11 Enterprise
OS Version:                    10.0.26100 N/A Build 26100
OS Manufacturer:               Microsoft Corporation
OS Configuration:              Member Workstation
OS Build Type:                 Multiprocessor Free
Registered Owner:              [redacted]
Registered Organization:       [redacted]
Product ID:                    00329-00000-00003-AA783
Original Install Date:         3/11/2026, 4:22:10 PM
System Boot Time:              3/12/2026, 5:03:32 PM
System Manufacturer:           Framework
System Model:                  Desktop (AMD Ryzen AI Max 300 Series)
System Type:                   x64-based PC
Processor(s):                  1 Processor(s) Installed.
                               [01]: AMD64 Family 26 Model 112 Stepping 0 AuthenticAMD ~3000 Mhz
BIOS Version:                  INSYDE Corp. 03.03, 9/16/2025
Windows Directory:             C:\WINDOWS
System Directory:              C:\WINDOWS\system32
Boot Device:                   \Device\HarddiskVolume1
System Locale:                 en-us;English (United States)
Input Locale:                  en-us;English (United States)
Time Zone:                     (UTC-05:00) Eastern Time (US & Canada)
Total Physical Memory:         32,554 MB
Available Physical Memory:     24,366 MB
Virtual Memory: Max Size:      37,674 MB
Virtual Memory: Available:     25,066 MB
Virtual Memory: In Use:        12,608 MB
Page File Location(s):         C:\pagefile.sys
...

I encounter the error when I try to load the gpt-oss:120b model.

> ollama run gpt-oss:120b "Reply with exactly: ok"
Error: 500 Internal Server Error: model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details

However, I am able to run this if I explicitly set the context length to 64k or less.

> export OLLAMA_CONTEXT_LENGTH=65536
> ollama run gpt-oss:120b "Reply with exactly: ok"
Thinking...
The user asks: "Reply with exactly: ok". So the response should be exactly "ok". No extra whitespace, no punctuation, no extra characters. Should be just "ok". Ensure nothing else.
...done thinking.

ok

If I review the logs of the success vs. the failure, I see that the failure requests 66 GB of total memory, while the success log requests only 63.5GB. I don't see why this would make a difference since it detects that total_vram is 96.0GB.

Failure log snippet:

time=2026-03-13T17:50:48.454-04:00 level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:60481/load\": read tcp 127.0.0.1:60488->127.0.0.1:60481: wsarecv: An existing connection was forcibly closed by the remote host."
time=2026-03-13T17:50:48.455-04:00 level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:60481/load\": dial tcp 127.0.0.1:60481: connectex: No connection could be made because the target machine actively refused it."
time=2026-03-13T17:50:48.455-04:00 level=INFO source=device.go:240 msg="model weights" device=ROCm0 size="59.8 GiB"
time=2026-03-13T17:50:48.455-04:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="1.1 GiB"
time=2026-03-13T17:50:48.455-04:00 level=INFO source=device.go:251 msg="kv cache" device=ROCm0 size="4.7 GiB"
time=2026-03-13T17:50:48.455-04:00 level=INFO source=device.go:262 msg="compute graph" device=ROCm0 size="443.1 MiB"
time=2026-03-13T17:50:48.455-04:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="5.6 MiB"
time=2026-03-13T17:50:48.455-04:00 level=INFO source=device.go:272 msg="total memory" size="66.0 GiB"
time=2026-03-13T17:50:48.455-04:00 level=INFO source=sched.go:516 msg="Load failed" model=C:\Users\naju\.ollama\models\blobs\sha256-6be6d66a3f546d8c19b130dc41dc24b2fc159f84ffbc76a0ee0676205083cf5a error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details"

Success log snippet

time=2026-03-13T18:24:29.271-04:00 level=INFO source=device.go:240 msg="model weights" device=ROCm0 size="59.8 GiB"
time=2026-03-13T18:24:29.271-04:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="1.1 GiB"
time=2026-03-13T18:24:29.270-04:00 level=INFO source=ggml.go:482 msg="offloading 36 repeating layers to GPU"
time=2026-03-13T18:24:29.271-04:00 level=INFO source=ggml.go:489 msg="offloading output layer to GPU"
time=2026-03-13T18:24:29.271-04:00 level=INFO source=ggml.go:494 msg="offloaded 37/37 layers to GPU"
time=2026-03-13T18:24:29.271-04:00 level=INFO source=device.go:251 msg="kv cache" device=ROCm0 size="2.4 GiB"
time=2026-03-13T18:24:29.271-04:00 level=INFO source=device.go:262 msg="compute graph" device=ROCm0 size="251.1 MiB"
time=2026-03-13T18:24:29.271-04:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="5.6 MiB"
time=2026-03-13T18:24:29.271-04:00 level=INFO source=device.go:272 msg="total memory" size="63.5 GiB"
time=2026-03-13T18:24:29.271-04:00 level=INFO source=sched.go:565 msg="loaded runners" count=1
time=2026-03-13T18:24:29.271-04:00 level=INFO source=server.go:1350 msg="waiting for llama runner to start responding"
time=2026-03-13T18:24:29.271-04:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server loading model"
time=2026-03-13T18:24:54.874-04:00 level=INFO source=server.go:1388 msg="llama runner started in 31.77 seconds"
[GIN] 2026/03/13 - 18:24:56 | 200 |   34.5499393s |       127.0.0.1 | POST     "/api/generate"

Another odd thing is that I am fairly certain that this same setup (without explicit content length restriction) was working earlier this week. However, at some point I restarted my machine, and then it stopped working with this 500 error (sorry I don't have logs from the first fail). Since then, I have re-installed ollama, re-fetched the models, and validated that the SHA256 hash on the model file matches the checksum in the model filename.

Relevant log output

time=2026-03-13T17:50:35.954-04:00 level=INFO source=routes.go:1658 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\naju\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:true OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES:]"
time=2026-03-13T17:50:35.956-04:00 level=INFO source=routes.go:1660 msg="Ollama cloud disabled: true"
time=2026-03-13T17:50:35.958-04:00 level=INFO source=images.go:477 msg="total blobs: 11"
time=2026-03-13T17:50:35.959-04:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0"
time=2026-03-13T17:50:35.961-04:00 level=INFO source=routes.go:1713 msg="Listening on [::]:11434 (version 0.17.7)"
time=2026-03-13T17:50:35.962-04:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-03-13T17:50:36.006-04:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\naju\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 60444"
time=2026-03-13T17:50:36.141-04:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\naju\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 60451"
time=2026-03-13T17:50:36.533-04:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\naju\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 60459"
time=2026-03-13T17:50:37.066-04:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled.  To enable, set OLLAMA_VULKAN=1"
time=2026-03-13T17:50:37.068-04:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\naju\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 60466"
time=2026-03-13T17:50:38.003-04:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 description="AMD Radeon(TM) Graphics" libdirs=ollama,rocm driver=60241.51 pci_id=0000:c3:00.0 type=iGPU total="96.0 GiB" available="94.7 GiB"
time=2026-03-13T17:50:38.003-04:00 level=INFO source=routes.go:1763 msg="vram-based default context" total_vram="96.0 GiB" default_num_ctx=262144
[GIN] 2026/03/13 - 17:50:38 | 200 |            0s |       127.0.0.1 | HEAD     "/"
[GIN] 2026/03/13 - 17:50:38 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2026/03/13 - 17:50:38 | 200 |    128.1289ms |       127.0.0.1 | POST     "/api/show"
time=2026-03-13T17:50:38.275-04:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\naju\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 60474"
time=2026-03-13T17:50:38.743-04:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-03-13T17:50:38.743-04:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=16 efficiency=0 threads=32
time=2026-03-13T17:50:38.824-04:00 level=WARN source=server.go:168 msg="requested context size too large for model" num_ctx=262144 n_ctx_train=131072
time=2026-03-13T17:50:38.824-04:00 level=INFO source=server.go:246 msg="enabling flash attention"
time=2026-03-13T17:50:38.824-04:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\naju\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\naju\\.ollama\\models\\blobs\\sha256-6be6d66a3f546d8c19b130dc41dc24b2fc159f84ffbc76a0ee0676205083cf5a --port 60481"
time=2026-03-13T17:50:38.836-04:00 level=INFO source=sched.go:489 msg="system memory" total="31.8 GiB" free="23.7 GiB" free_swap="24.4 GiB"
time=2026-03-13T17:50:38.836-04:00 level=INFO source=sched.go:496 msg="gpu memory" id=0 library=ROCm available="94.2 GiB" free="94.7 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-03-13T17:50:38.836-04:00 level=INFO source=server.go:757 msg="loading model" "model layers"=37 requested=-1
time=2026-03-13T17:50:38.887-04:00 level=INFO source=runner.go:1429 msg="starting ollama engine"
time=2026-03-13T17:50:38.890-04:00 level=INFO source=runner.go:1464 msg="Server listening on 127.0.0.1:60481"
time=2026-03-13T17:50:38.901-04:00 level=INFO source=runner.go:1302 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:131072 KvCacheType: NumThreads:16 GPULayers:37[ID:0 Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-13T17:50:38.931-04:00 level=INFO source=ggml.go:136 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=687 num_key_values=32
load_backend: loaded CPU backend from C:\Users\naju\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-icelake.dll
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon(TM) Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, ID: 0
load_backend: loaded ROCm backend from C:\Users\naju\AppData\Local\Programs\Ollama\lib\ollama\rocm\ggml-hip.dll
time=2026-03-13T17:50:39.010-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2026-03-13T17:50:39.507-04:00 level=INFO source=runner.go:1302 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:131072 KvCacheType: NumThreads:16 GPULayers:37[ID:0 Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Exception 0xc0000005 0x1 0x10 0x7fff73d69176
PC=0x7fff73d69176
signal arrived during external code execution

runtime.cgocall(0x7ff7d2ebda40, 0xc00004ec08)
	runtime/cgocall.go:167 +0x3e fp=0xc00004ebe0 sp=0xc00004eb78 pc=0x7ff7d1fd243e
github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_buft_alloc_buffer(0x7fff3cb823e0, 0x8000000)
	_cgo_gotypes.go:657 +0x51 fp=0xc00004ec08 sp=0xc00004ebe0 pc=0x7ff7d24d4b31
github.com/ollama/ollama/ml/backend/ggml.(*Context).newTensor.func5(...)
	github.com/ollama/ollama/ml/backend/ggml/ggml.go:912
github.com/ollama/ollama/ml/backend/ggml.(*Context).newTensor(0xc0000c7dc0, 0x61?, {0xc0003c75a8, 0x3, 0x18?})
	github.com/ollama/ollama/ml/backend/ggml/ggml.go:912 +0x3cf fp=0xc00004ed90 sp=0xc00004ec08 pc=0x7ff7d24e654f
github.com/ollama/ollama/ml/backend/ggml.(*Context).Zeros(0xc0000c7dc0, 0xc001963d10?, {0xc0003c75a8?, 0xc00004ee18?, 0x7ff7d1fd5b5f?})
	github.com/ollama/ollama/ml/backend/ggml/ggml.go:931 +0x1d fp=0xc00004edd8 sp=0xc00004ed90 pc=0x7ff7d24e693d
github.com/ollama/ollama/kvcache.(*Causal).Put(0xc000057500, {0x7ff7d375c5e0, 0xc013401600}, {0x7ff7d376c3a8, 0xc0133c3170}, {0x7ff7d376c3a8, 0xc0133c30b0})
	github.com/ollama/ollama/kvcache/causal.go:471 +0x4ee fp=0xc00004eed8 sp=0xc00004edd8 pc=0x7ff7d24c8d6e
github.com/ollama/ollama/kvcache.(*WrapperCache).Put(0xc013401600?, {0x7ff7d375c5e0?, 0xc013401600?}, {0x7ff7d376c3a8?, 0xc0133c3170?}, {0x7ff7d376c3a8?, 0xc0133c30b0?})
	github.com/ollama/ollama/kvcache/wrapper.go:81 +0x4f fp=0xc00004ef20 sp=0xc00004eed8 pc=0x7ff7d24d284f
github.com/ollama/ollama/ml/nn.AttentionWithVMLA({0x7ff7d375c5e0, 0xc013401600}, {0x7ff7d376c3a8, 0xc0133c3110}, {0x7ff7d376c3a8, 0xc0133c3170}, {0x7ff7d376c3a8, 0xc0133c30b0}, {0x7ff7d376c3a8, 0xc0133f8d80}, ...)
	github.com/ollama/ollama/ml/nn/attention.go:49 +0x2bc fp=0xc00004f028 sp=0xc00004ef20 pc=0x7ff7d2544c3c
github.com/ollama/ollama/ml/nn.AttentionWithSinks(...)
	github.com/ollama/ollama/ml/nn/attention.go:29
github.com/ollama/ollama/model/models/gptoss.(*AttentionBlock).Forward(0xc0133f1480, {0x7ff7d375c5e0, 0xc013401600}, {0x7ff7d376c3a8, 0xc0133c2f78}, {0x7ff7d376c3a8, 0xc013404048}, {0x7ff7d3758520, 0xc0002d7a60}, 0xc0000ff920)
	github.com/ollama/ollama/model/models/gptoss/model.go:141 +0x94b fp=0xc00004f218 sp=0xc00004f028 pc=0x7ff7d25b304b
github.com/ollama/ollama/model/models/gptoss.(*TransformerBlock).Forward(0xc00004f320, {0x7ff7d375c5e0, 0xc013401600}, {0x7ff7d376c3a8?, 0xc0133c2f78?}, {0x7ff7d376c3a8?, 0xc013404048?}, {0x0, 0x0}, {0x7ff7d3758520, ...}, ...)
	github.com/ollama/ollama/model/models/gptoss/model.go:93 +0x72 fp=0xc00004f278 sp=0xc00004f218 pc=0x7ff7d25b2632
github.com/ollama/ollama/model/models/gptoss.(*Transformer).Forward(0xc0000ff8c0, {0x7ff7d375c5e0, 0xc013401600}, {{0x7ff7d376c3a8, 0xc013405d28}, {0x7ff7d376c3a8, 0xc013405d40}, {0xc000371000, 0x200, 0x200}, ...})
	github.com/ollama/ollama/model/models/gptoss/model.go:47 +0x17c fp=0xc00004f350 sp=0xc00004f278 pc=0x7ff7d25b1f7c
github.com/ollama/ollama/runner/ollamarunner.(*Server).reserveWorstCaseGraph(0xc000642780, 0x1)
	github.com/ollama/ollama/runner/ollamarunner/runner.go:1181 +0x9ad fp=0xc00004f680 sp=0xc00004f350 pc=0x7ff7d26154cd
github.com/ollama/ollama/runner/ollamarunner.(*Server).allocModel(0xc000642780, {0xc00003c230?, 0x7ff7d22db21a?}, {0x1, 0x10, {0xc0019ab6c0, 0x1, 0x1}, 0x1}, {0x0, ...}, ...)
	github.com/ollama/ollama/runner/ollamarunner/runner.go:1250 +0x391 fp=0xc00004f730 sp=0xc00004f680 pc=0x7ff7d2615e11
github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0xc000642780, {0x7ff7d374d760, 0xc000354000}, 0xc000312640)
	github.com/ollama/ollama/runner/ollamarunner/runner.go:1329 +0x54b fp=0xc00004fac0 sp=0xc00004f730 pc=0x7ff7d261686b
github.com/ollama/ollama/runner/ollamarunner.(*Server).load-fm({0x7ff7d374d760?, 0xc000354000?}, 0xc00004fb40?)
	<autogenerated>:1 +0x36 fp=0xc00004faf0 sp=0xc00004fac0 pc=0x7ff7d2618bb6
net/http.HandlerFunc.ServeHTTP(0xc000625b00?, {0x7ff7d374d760?, 0xc000354000?}, 0xc00004fb60?)
	net/http/server.go:2294 +0x29 fp=0xc00004fb18 sp=0xc00004faf0 pc=0x7ff7d22e5ee9
net/http.(*ServeMux).ServeHTTP(0x7ff7d1f7b785?, {0x7ff7d374d760, 0xc000354000}, 0xc000312640)
	net/http/server.go:2822 +0x1c4 fp=0xc00004fb68 sp=0xc00004fb18 pc=0x7ff7d22e7de4
net/http.serverHandler.ServeHTTP({0x7ff7d3749990?}, {0x7ff7d374d760?, 0xc000354000?}, 0x1?)
	net/http/server.go:3301 +0x8e fp=0xc00004fb98 sp=0xc00004fb68 pc=0x7ff7d230586e
net/http.(*conn).serve(0xc0002e0480, {0x7ff7d374fe98, 0xc0002dd2f0})
	net/http/server.go:2102 +0x625 fp=0xc00004ffb8 sp=0xc00004fb98 pc=0x7ff7d22e43e5
net/http.(*Server).Serve.gowrap3()
	net/http/server.go:3454 +0x28 fp=0xc00004ffe0 sp=0xc00004ffb8 pc=0x7ff7d22e9ca8
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00004ffe8 sp=0xc00004ffe0 pc=0x7ff7d1fdd9a1
created by net/http.(*Server).Serve in goroutine 1
	net/http/server.go:3454 +0x485

goroutine 1 gp=0xc0000021c0 m=nil [IO wait]:
runtime.gopark(0x7ff7d1fdf1a0?, 0x7ff7d42142e0?, 0x20?, 0xf4?, 0xc00020f4cc?)
	runtime/proc.go:435 +0xce fp=0xc0003ab630 sp=0xc0003ab610 pc=0x7ff7d1fd598e
runtime.netpollblock(0x240?, 0xd1f70406?, 0xf7?)
	runtime/netpoll.go:575 +0xf7 fp=0xc0003ab668 sp=0xc0003ab630 pc=0x7ff7d1f9bdf7
internal/poll.runtime_pollWait(0x2947e792170, 0x72)
	runtime/netpoll.go:351 +0x85 fp=0xc0003ab688 sp=0xc0003ab668 pc=0x7ff7d1fd4b25
internal/poll.(*pollDesc).wait(0x7ff7d206a953?, 0x0?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0003ab6b0 sp=0xc0003ab688 pc=0x7ff7d206bf47
internal/poll.execIO(0xc00020f420, 0xc000587758)
	internal/poll/fd_windows.go:177 +0x105 fp=0xc0003ab728 sp=0xc0003ab6b0 pc=0x7ff7d206d3a5
internal/poll.(*FD).acceptOne(0xc00020f408, 0x234, {0xc00003a1e0?, 0xc0005877b8?, 0x7ff7d2075065?}, 0xc0005877ec?)
	internal/poll/fd_windows.go:946 +0x65 fp=0xc0003ab788 sp=0xc0003ab728 pc=0x7ff7d2071925
internal/poll.(*FD).Accept(0xc00020f408, 0xc0003ab938)
	internal/poll/fd_windows.go:980 +0x1b6 fp=0xc0003ab840 sp=0xc0003ab788 pc=0x7ff7d2071c56
net.(*netFD).accept(0xc00020f408)
	net/fd_windows.go:182 +0x4b fp=0xc0003ab958 sp=0xc0003ab840 pc=0x7ff7d20e358b
net.(*TCPListener).accept(0xc000609880)
	net/tcpsock_posix.go:159 +0x1b fp=0xc0003ab9a8 sp=0xc0003ab958 pc=0x7ff7d20f9b3b
net.(*TCPListener).Accept(0xc000609880)
	net/tcpsock.go:380 +0x30 fp=0xc0003ab9d8 sp=0xc0003ab9a8 pc=0x7ff7d20f88f0
net/http.(*onceCloseListener).Accept(0xc0002e0480?)
	<autogenerated>:1 +0x24 fp=0xc0003ab9f0 sp=0xc0003ab9d8 pc=0x7ff7d2311fe4
net/http.(*Server).Serve(0xc000057c00, {0x7ff7d374d5b0, 0xc000609880})
	net/http/server.go:3424 +0x30c fp=0xc0003abb20 sp=0xc0003ab9f0 pc=0x7ff7d22e98ac
github.com/ollama/ollama/runner/ollamarunner.Execute({0xc0000e0030, 0x4, 0x5})
	github.com/ollama/ollama/runner/ollamarunner/runner.go:1465 +0x94e fp=0xc0003abcf0 sp=0xc0003abb20 pc=0x7ff7d261854e
github.com/ollama/ollama/runner.Execute({0xc0000e0010?, 0x0?, 0x0?})
	github.com/ollama/ollama/runner/runner.go:18 +0x12b fp=0xc0003abd30 sp=0xc0003abcf0 pc=0x7ff7d2622dab
github.com/ollama/ollama/cmd.NewCLI.func3(0xc000057900?, {0x7ff7d35247ef?, 0x4?, 0x7ff7d35247f3?})
	github.com/ollama/ollama/cmd/cmd.go:2271 +0x45 fp=0xc0003abd58 sp=0xc0003abd30 pc=0x7ff7d2e4ee65
github.com/spf13/cobra.(*Command).execute(0xc0002e5b08, {0xc00037f720, 0x5, 0x5})
	github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc0003abe78 sp=0xc0003abd58 pc=0x7ff7d215e75c
github.com/spf13/cobra.(*Command).ExecuteC(0xc000627508)
	github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc0003abf30 sp=0xc0003abe78 pc=0x7ff7d215efa5
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
	github.com/ollama/ollama/main.go:12 +0x4d fp=0xc0003abf50 sp=0xc0003abf30 pc=0x7ff7d2e5130d
runtime.main()
	runtime/proc.go:283 +0x27d fp=0xc0003abfe0 sp=0xc0003abf50 pc=0x7ff7d1fa4ddd
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0003abfe8 sp=0xc0003abfe0 pc=0x7ff7d1fdd9a1

goroutine 2 gp=0xc0000028c0 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc0000adfa8 sp=0xc0000adf88 pc=0x7ff7d1fd598e
runtime.goparkunlock(...)
	runtime/proc.go:441
runtime.forcegchelper()
	runtime/proc.go:348 +0xb8 fp=0xc0000adfe0 sp=0xc0000adfa8 pc=0x7ff7d1fa50f8
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000adfe8 sp=0xc0000adfe0 pc=0x7ff7d1fdd9a1
created by runtime.init.7 in goroutine 1
	runtime/proc.go:336 +0x1a

goroutine 3 gp=0xc000002c40 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc0000aff80 sp=0xc0000aff60 pc=0x7ff7d1fd598e
runtime.goparkunlock(...)
	runtime/proc.go:441
runtime.bgsweep(0xc0000bc000)
	runtime/mgcsweep.go:316 +0xdf fp=0xc0000affc8 sp=0xc0000aff80 pc=0x7ff7d1f8debf
runtime.gcenable.gowrap1()
	runtime/mgc.go:204 +0x25 fp=0xc0000affe0 sp=0xc0000affc8 pc=0x7ff7d1f82285
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000affe8 sp=0xc0000affe0 pc=0x7ff7d1fdd9a1
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000002e00 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x7ff7d3737ac0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc0000c3f78 sp=0xc0000c3f58 pc=0x7ff7d1fd598e
runtime.goparkunlock(...)
	runtime/proc.go:441
runtime.(*scavengerState).park(0x7ff7d423e080)
	runtime/mgcscavenge.go:425 +0x49 fp=0xc0000c3fa8 sp=0xc0000c3f78 pc=0x7ff7d1f8b909
runtime.bgscavenge(0xc0000bc000)
	runtime/mgcscavenge.go:658 +0x59 fp=0xc0000c3fc8 sp=0xc0000c3fa8 pc=0x7ff7d1f8be99
runtime.gcenable.gowrap2()
	runtime/mgc.go:205 +0x25 fp=0xc0000c3fe0 sp=0xc0000c3fc8 pc=0x7ff7d1f82225
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000c3fe8 sp=0xc0000c3fe0 pc=0x7ff7d1fdd9a1
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:205 +0xa5

goroutine 5 gp=0xc000003340 m=nil [finalizer wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc0000c5e30 sp=0xc0000c5e10 pc=0x7ff7d1fd598e
runtime.runfinq()
	runtime/mfinal.go:196 +0x107 fp=0xc0000c5fe0 sp=0xc0000c5e30 pc=0x7ff7d1f81207
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000c5fe8 sp=0xc0000c5fe0 pc=0x7ff7d1fdd9a1
created by runtime.createfing in goroutine 1
	runtime/mfinal.go:166 +0x3d

goroutine 6 gp=0xc000003dc0 m=nil [chan receive]:
runtime.gopark(0xc0001b7540?, 0xc013405dd0?, 0x60?, 0x1f?, 0x7ff7d20cc1a8?)
	runtime/proc.go:435 +0xce fp=0xc0000b1f18 sp=0xc0000b1ef8 pc=0x7ff7d1fd598e
runtime.chanrecv(0xc00003c540, 0x0, 0x1)
	runtime/chan.go:664 +0x445 fp=0xc0000b1f90 sp=0xc0000b1f18 pc=0x7ff7d1f72d45
runtime.chanrecv1(0x7ff7d1fa4f40?, 0xc0000b1f76?)
	runtime/chan.go:506 +0x12 fp=0xc0000b1fb8 sp=0xc0000b1f90 pc=0x7ff7d1f728d2
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
	runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
	runtime/mgc.go:1799 +0x2f fp=0xc0000b1fe0 sp=0xc0000b1fb8 pc=0x7ff7d1f854af
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000b1fe8 sp=0xc0000b1fe0 pc=0x7ff7d1fdd9a1
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
	runtime/mgc.go:1794 +0x85

goroutine 7 gp=0xc00041e1c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc0000bff38 sp=0xc0000bff18 pc=0x7ff7d1fd598e
runtime.gcBgMarkWorker(0xc00003d960)
	runtime/mgc.go:1423 +0xe9 fp=0xc0000bffc8 sp=0xc0000bff38 pc=0x7ff7d1f847a9
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc0000bffe0 sp=0xc0000bffc8 pc=0x7ff7d1f84685
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000bffe8 sp=0xc0000bffe0 pc=0x7ff7d1fdd9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

*** SEVERAL OTHER GOROUTINE CALL STACKS LIKE GOROUTINE 7 ABOVE ***

goroutine 43 gp=0xc000484fc0 m=nil [GC worker (idle)]:
runtime.gopark(0x7ff7d42910e0?, 0x1?, 0xd0?, 0x7?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000495f38 sp=0xc000495f18 pc=0x7ff7d1fd598e
runtime.gcBgMarkWorker(0xc00003d960)
	runtime/mgc.go:1423 +0xe9 fp=0xc000495fc8 sp=0xc000495f38 pc=0x7ff7d1f847a9
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000495fe0 sp=0xc000495fc8 pc=0x7ff7d1f84685
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000495fe8 sp=0xc000495fe0 pc=0x7ff7d1fdd9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 51 gp=0xc000485a40 m=nil [sync.WaitGroup.Wait]:
runtime.gopark(0x0?, 0x0?, 0x60?, 0xfe?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc0002b3a90 sp=0xc0002b3a70 pc=0x7ff7d1fd598e
runtime.goparkunlock(...)
	runtime/proc.go:441
runtime.semacquire1(0xc000642838, 0x0, 0x1, 0x0, 0x18)
	runtime/sema.go:188 +0x22f fp=0xc0002b3af8 sp=0xc0002b3a90 pc=0x7ff7d1fb750f
sync.runtime_SemacquireWaitGroup(0x0?)
	runtime/sema.go:110 +0x25 fp=0xc0002b3b30 sp=0xc0002b3af8 pc=0x7ff7d1fd6f85
sync.(*WaitGroup).Wait(0x0?)
	sync/waitgroup.go:118 +0x48 fp=0xc0002b3b58 sp=0xc0002b3b30 pc=0x7ff7d1feb988
github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc000642780, {0x7ff7d374fed0, 0xc00037f7c0})
	github.com/ollama/ollama/runner/ollamarunner/runner.go:442 +0x45 fp=0xc0002b3fb8 sp=0xc0002b3b58 pc=0x7ff7d260eb65
github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1()
	github.com/ollama/ollama/runner/ollamarunner/runner.go:1442 +0x28 fp=0xc0002b3fe0 sp=0xc0002b3fb8 pc=0x7ff7d26187c8
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0002b3fe8 sp=0xc0002b3fe0 pc=0x7ff7d1fdd9a1
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
	github.com/ollama/ollama/runner/ollamarunner/runner.go:1442 +0x4c9

goroutine 53 gp=0xc000289880 m=nil [IO wait]:
runtime.gopark(0x0?, 0xc00020f6a0?, 0x48?, 0xf7?, 0xc00020f74c?)
	runtime/proc.go:435 +0xce fp=0xc0003add58 sp=0xc0003add38 pc=0x7ff7d1fd598e
runtime.netpollblock(0x248?, 0xd1f70406?, 0xf7?)
	runtime/netpoll.go:575 +0xf7 fp=0xc0003add90 sp=0xc0003add58 pc=0x7ff7d1f9bdf7
internal/poll.runtime_pollWait(0x2947e792058, 0x72)
	runtime/netpoll.go:351 +0x85 fp=0xc0003addb0 sp=0xc0003add90 pc=0x7ff7d1fd4b25
internal/poll.(*pollDesc).wait(0x248?, 0x72?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0003addd8 sp=0xc0003addb0 pc=0x7ff7d206bf47
internal/poll.execIO(0xc00020f6a0, 0x7ff7d35a9828)
	internal/poll/fd_windows.go:177 +0x105 fp=0xc0003ade50 sp=0xc0003addd8 pc=0x7ff7d206d3a5
internal/poll.(*FD).Read(0xc00020f688, {0xc00030e041, 0x1, 0x1})
	internal/poll/fd_windows.go:438 +0x29b fp=0xc0003adef0 sp=0xc0003ade50 pc=0x7ff7d206e07b
net.(*netFD).Read(0xc00020f688, {0xc00030e041?, 0xc0000c6118?, 0xc0003adf70?})
	net/fd_posix.go:55 +0x25 fp=0xc0003adf38 sp=0xc0003adef0 pc=0x7ff7d20e1465
net.(*conn).Read(0xc0000b4958, {0xc00030e041?, 0x3?, 0x7ff7d346b380?})
	net/net.go:194 +0x45 fp=0xc0003adf80 sp=0xc0003adf38 pc=0x7ff7d20f0b85
net/http.(*connReader).backgroundRead(0xc00030e030)
	net/http/server.go:690 +0x37 fp=0xc0003adfc8 sp=0xc0003adf80 pc=0x7ff7d22de2b7
net/http.(*connReader).startBackgroundRead.gowrap2()
	net/http/server.go:686 +0x25 fp=0xc0003adfe0 sp=0xc0003adfc8 pc=0x7ff7d22de1e5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0003adfe8 sp=0xc0003adfe0 pc=0x7ff7d1fdd9a1
created by net/http.(*connReader).startBackgroundRead in goroutine 52
	net/http/server.go:686 +0xb6
rax     0x0
rbx     0x2947f4f2db8
rcx     0x2947f4f2db8
rdx     0x2947f4f2db8
rdi     0x2947f4f2d88
rsi     0x0
rbp     0x2947f4f2db8
rsp     0xf0f85fe730
r8      0xfffffffd00000000
r9      0x2947f4f2d68
r10     0x2947f4f2db8
r11     0x0
r12     0x0
r13     0x0
r14     0x2947f4f2d88
r15     0xf0f85fe920
rip     0x7fff73d69176
rflags  0x10246
cs      0x33
fs      0x53
gs      0x2b
time=2026-03-13T17:50:48.454-04:00 level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:60481/load\": read tcp 127.0.0.1:60488->127.0.0.1:60481: wsarecv: An existing connection was forcibly closed by the remote host."
time=2026-03-13T17:50:48.455-04:00 level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:60481/load\": dial tcp 127.0.0.1:60481: connectex: No connection could be made because the target machine actively refused it."
time=2026-03-13T17:50:48.455-04:00 level=INFO source=device.go:240 msg="model weights" device=ROCm0 size="59.8 GiB"
time=2026-03-13T17:50:48.455-04:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="1.1 GiB"
time=2026-03-13T17:50:48.455-04:00 level=INFO source=device.go:251 msg="kv cache" device=ROCm0 size="4.7 GiB"
time=2026-03-13T17:50:48.455-04:00 level=INFO source=device.go:262 msg="compute graph" device=ROCm0 size="443.1 MiB"
time=2026-03-13T17:50:48.455-04:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="5.6 MiB"
time=2026-03-13T17:50:48.455-04:00 level=INFO source=device.go:272 msg="total memory" size="66.0 GiB"
time=2026-03-13T17:50:48.455-04:00 level=INFO source=sched.go:516 msg="Load failed" model=C:\Users\naju\.ollama\models\blobs\sha256-6be6d66a3f546d8c19b130dc41dc24b2fc159f84ffbc76a0ee0676205083cf5a error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details"
[GIN] 2026/03/13 - 17:50:48 | 500 |    10.321251s |       127.0.0.1 | POST     "/api/generate"

OS

Windows

GPU

AMD

CPU

AMD

Ollama version

0.17.7

Originally created by @najumancheril on GitHub (Mar 13, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14832 ### What is the issue? I am new to ollama, so I apologize if this error is normal behavior due to a lack of physical memory or some other hardware constraint. However, from the log, I don't see any obvious requirement that my system cannot meet, so I am creating this bug report. *System Specs* I am running an AMD Ryzen AI Max 300 Series with 128GB RAM. I have used the AMD software to allocate 96GB to the GPU and left 32GB for system memory. ```bash > ollama --version ollama version is 0.17.7 > systeminfo Host Name: [redacted] OS Name: Microsoft Windows 11 Enterprise OS Version: 10.0.26100 N/A Build 26100 OS Manufacturer: Microsoft Corporation OS Configuration: Member Workstation OS Build Type: Multiprocessor Free Registered Owner: [redacted] Registered Organization: [redacted] Product ID: 00329-00000-00003-AA783 Original Install Date: 3/11/2026, 4:22:10 PM System Boot Time: 3/12/2026, 5:03:32 PM System Manufacturer: Framework System Model: Desktop (AMD Ryzen AI Max 300 Series) System Type: x64-based PC Processor(s): 1 Processor(s) Installed. [01]: AMD64 Family 26 Model 112 Stepping 0 AuthenticAMD ~3000 Mhz BIOS Version: INSYDE Corp. 03.03, 9/16/2025 Windows Directory: C:\WINDOWS System Directory: C:\WINDOWS\system32 Boot Device: \Device\HarddiskVolume1 System Locale: en-us;English (United States) Input Locale: en-us;English (United States) Time Zone: (UTC-05:00) Eastern Time (US & Canada) Total Physical Memory: 32,554 MB Available Physical Memory: 24,366 MB Virtual Memory: Max Size: 37,674 MB Virtual Memory: Available: 25,066 MB Virtual Memory: In Use: 12,608 MB Page File Location(s): C:\pagefile.sys ... ``` I encounter the error when I try to load the gpt-oss:120b model. ```bash > ollama run gpt-oss:120b "Reply with exactly: ok" Error: 500 Internal Server Error: model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details ``` However, I am able to run this if I explicitly set the context length to 64k or less. ```bash > export OLLAMA_CONTEXT_LENGTH=65536 > ollama run gpt-oss:120b "Reply with exactly: ok" Thinking... The user asks: "Reply with exactly: ok". So the response should be exactly "ok". No extra whitespace, no punctuation, no extra characters. Should be just "ok". Ensure nothing else. ...done thinking. ok ``` If I review the logs of the success vs. the failure, I see that the failure requests 66 GB of total memory, while the success log requests only 63.5GB. I don't see why this would make a difference since it detects that total_vram is 96.0GB. Failure log snippet: ```log time=2026-03-13T17:50:48.454-04:00 level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:60481/load\": read tcp 127.0.0.1:60488->127.0.0.1:60481: wsarecv: An existing connection was forcibly closed by the remote host." time=2026-03-13T17:50:48.455-04:00 level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:60481/load\": dial tcp 127.0.0.1:60481: connectex: No connection could be made because the target machine actively refused it." time=2026-03-13T17:50:48.455-04:00 level=INFO source=device.go:240 msg="model weights" device=ROCm0 size="59.8 GiB" time=2026-03-13T17:50:48.455-04:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="1.1 GiB" time=2026-03-13T17:50:48.455-04:00 level=INFO source=device.go:251 msg="kv cache" device=ROCm0 size="4.7 GiB" time=2026-03-13T17:50:48.455-04:00 level=INFO source=device.go:262 msg="compute graph" device=ROCm0 size="443.1 MiB" time=2026-03-13T17:50:48.455-04:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="5.6 MiB" time=2026-03-13T17:50:48.455-04:00 level=INFO source=device.go:272 msg="total memory" size="66.0 GiB" time=2026-03-13T17:50:48.455-04:00 level=INFO source=sched.go:516 msg="Load failed" model=C:\Users\naju\.ollama\models\blobs\sha256-6be6d66a3f546d8c19b130dc41dc24b2fc159f84ffbc76a0ee0676205083cf5a error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details" ``` Success log snippet ```log time=2026-03-13T18:24:29.271-04:00 level=INFO source=device.go:240 msg="model weights" device=ROCm0 size="59.8 GiB" time=2026-03-13T18:24:29.271-04:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="1.1 GiB" time=2026-03-13T18:24:29.270-04:00 level=INFO source=ggml.go:482 msg="offloading 36 repeating layers to GPU" time=2026-03-13T18:24:29.271-04:00 level=INFO source=ggml.go:489 msg="offloading output layer to GPU" time=2026-03-13T18:24:29.271-04:00 level=INFO source=ggml.go:494 msg="offloaded 37/37 layers to GPU" time=2026-03-13T18:24:29.271-04:00 level=INFO source=device.go:251 msg="kv cache" device=ROCm0 size="2.4 GiB" time=2026-03-13T18:24:29.271-04:00 level=INFO source=device.go:262 msg="compute graph" device=ROCm0 size="251.1 MiB" time=2026-03-13T18:24:29.271-04:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="5.6 MiB" time=2026-03-13T18:24:29.271-04:00 level=INFO source=device.go:272 msg="total memory" size="63.5 GiB" time=2026-03-13T18:24:29.271-04:00 level=INFO source=sched.go:565 msg="loaded runners" count=1 time=2026-03-13T18:24:29.271-04:00 level=INFO source=server.go:1350 msg="waiting for llama runner to start responding" time=2026-03-13T18:24:29.271-04:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server loading model" time=2026-03-13T18:24:54.874-04:00 level=INFO source=server.go:1388 msg="llama runner started in 31.77 seconds" [GIN] 2026/03/13 - 18:24:56 | 200 | 34.5499393s | 127.0.0.1 | POST "/api/generate" ``` Another odd thing is that I am fairly certain that this same setup (without explicit content length restriction) was working earlier this week. However, at some point I restarted my machine, and then it stopped working with this 500 error (sorry I don't have logs from the first fail). Since then, I have re-installed ollama, re-fetched the models, and validated that the SHA256 hash on the model file matches the checksum in the model filename. ### Relevant log output ```shell time=2026-03-13T17:50:35.954-04:00 level=INFO source=routes.go:1658 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\naju\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:true OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES:]" time=2026-03-13T17:50:35.956-04:00 level=INFO source=routes.go:1660 msg="Ollama cloud disabled: true" time=2026-03-13T17:50:35.958-04:00 level=INFO source=images.go:477 msg="total blobs: 11" time=2026-03-13T17:50:35.959-04:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0" time=2026-03-13T17:50:35.961-04:00 level=INFO source=routes.go:1713 msg="Listening on [::]:11434 (version 0.17.7)" time=2026-03-13T17:50:35.962-04:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-03-13T17:50:36.006-04:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\naju\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 60444" time=2026-03-13T17:50:36.141-04:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\naju\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 60451" time=2026-03-13T17:50:36.533-04:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\naju\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 60459" time=2026-03-13T17:50:37.066-04:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" time=2026-03-13T17:50:37.068-04:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\naju\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 60466" time=2026-03-13T17:50:38.003-04:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 description="AMD Radeon(TM) Graphics" libdirs=ollama,rocm driver=60241.51 pci_id=0000:c3:00.0 type=iGPU total="96.0 GiB" available="94.7 GiB" time=2026-03-13T17:50:38.003-04:00 level=INFO source=routes.go:1763 msg="vram-based default context" total_vram="96.0 GiB" default_num_ctx=262144 [GIN] 2026/03/13 - 17:50:38 | 200 | 0s | 127.0.0.1 | HEAD "/" [GIN] 2026/03/13 - 17:50:38 | 200 | 0s | 127.0.0.1 | GET "/api/version" [GIN] 2026/03/13 - 17:50:38 | 200 | 128.1289ms | 127.0.0.1 | POST "/api/show" time=2026-03-13T17:50:38.275-04:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\naju\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 60474" time=2026-03-13T17:50:38.743-04:00 level=INFO source=cpu_windows.go:148 msg=packages count=1 time=2026-03-13T17:50:38.743-04:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=16 efficiency=0 threads=32 time=2026-03-13T17:50:38.824-04:00 level=WARN source=server.go:168 msg="requested context size too large for model" num_ctx=262144 n_ctx_train=131072 time=2026-03-13T17:50:38.824-04:00 level=INFO source=server.go:246 msg="enabling flash attention" time=2026-03-13T17:50:38.824-04:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\naju\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\naju\\.ollama\\models\\blobs\\sha256-6be6d66a3f546d8c19b130dc41dc24b2fc159f84ffbc76a0ee0676205083cf5a --port 60481" time=2026-03-13T17:50:38.836-04:00 level=INFO source=sched.go:489 msg="system memory" total="31.8 GiB" free="23.7 GiB" free_swap="24.4 GiB" time=2026-03-13T17:50:38.836-04:00 level=INFO source=sched.go:496 msg="gpu memory" id=0 library=ROCm available="94.2 GiB" free="94.7 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-03-13T17:50:38.836-04:00 level=INFO source=server.go:757 msg="loading model" "model layers"=37 requested=-1 time=2026-03-13T17:50:38.887-04:00 level=INFO source=runner.go:1429 msg="starting ollama engine" time=2026-03-13T17:50:38.890-04:00 level=INFO source=runner.go:1464 msg="Server listening on 127.0.0.1:60481" time=2026-03-13T17:50:38.901-04:00 level=INFO source=runner.go:1302 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:131072 KvCacheType: NumThreads:16 GPULayers:37[ID:0 Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-03-13T17:50:38.931-04:00 level=INFO source=ggml.go:136 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=687 num_key_values=32 load_backend: loaded CPU backend from C:\Users\naju\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-icelake.dll ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon(TM) Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, ID: 0 load_backend: loaded ROCm backend from C:\Users\naju\AppData\Local\Programs\Ollama\lib\ollama\rocm\ggml-hip.dll time=2026-03-13T17:50:39.010-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) time=2026-03-13T17:50:39.507-04:00 level=INFO source=runner.go:1302 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:131072 KvCacheType: NumThreads:16 GPULayers:37[ID:0 Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Exception 0xc0000005 0x1 0x10 0x7fff73d69176 PC=0x7fff73d69176 signal arrived during external code execution runtime.cgocall(0x7ff7d2ebda40, 0xc00004ec08) runtime/cgocall.go:167 +0x3e fp=0xc00004ebe0 sp=0xc00004eb78 pc=0x7ff7d1fd243e github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_buft_alloc_buffer(0x7fff3cb823e0, 0x8000000) _cgo_gotypes.go:657 +0x51 fp=0xc00004ec08 sp=0xc00004ebe0 pc=0x7ff7d24d4b31 github.com/ollama/ollama/ml/backend/ggml.(*Context).newTensor.func5(...) github.com/ollama/ollama/ml/backend/ggml/ggml.go:912 github.com/ollama/ollama/ml/backend/ggml.(*Context).newTensor(0xc0000c7dc0, 0x61?, {0xc0003c75a8, 0x3, 0x18?}) github.com/ollama/ollama/ml/backend/ggml/ggml.go:912 +0x3cf fp=0xc00004ed90 sp=0xc00004ec08 pc=0x7ff7d24e654f github.com/ollama/ollama/ml/backend/ggml.(*Context).Zeros(0xc0000c7dc0, 0xc001963d10?, {0xc0003c75a8?, 0xc00004ee18?, 0x7ff7d1fd5b5f?}) github.com/ollama/ollama/ml/backend/ggml/ggml.go:931 +0x1d fp=0xc00004edd8 sp=0xc00004ed90 pc=0x7ff7d24e693d github.com/ollama/ollama/kvcache.(*Causal).Put(0xc000057500, {0x7ff7d375c5e0, 0xc013401600}, {0x7ff7d376c3a8, 0xc0133c3170}, {0x7ff7d376c3a8, 0xc0133c30b0}) github.com/ollama/ollama/kvcache/causal.go:471 +0x4ee fp=0xc00004eed8 sp=0xc00004edd8 pc=0x7ff7d24c8d6e github.com/ollama/ollama/kvcache.(*WrapperCache).Put(0xc013401600?, {0x7ff7d375c5e0?, 0xc013401600?}, {0x7ff7d376c3a8?, 0xc0133c3170?}, {0x7ff7d376c3a8?, 0xc0133c30b0?}) github.com/ollama/ollama/kvcache/wrapper.go:81 +0x4f fp=0xc00004ef20 sp=0xc00004eed8 pc=0x7ff7d24d284f github.com/ollama/ollama/ml/nn.AttentionWithVMLA({0x7ff7d375c5e0, 0xc013401600}, {0x7ff7d376c3a8, 0xc0133c3110}, {0x7ff7d376c3a8, 0xc0133c3170}, {0x7ff7d376c3a8, 0xc0133c30b0}, {0x7ff7d376c3a8, 0xc0133f8d80}, ...) github.com/ollama/ollama/ml/nn/attention.go:49 +0x2bc fp=0xc00004f028 sp=0xc00004ef20 pc=0x7ff7d2544c3c github.com/ollama/ollama/ml/nn.AttentionWithSinks(...) github.com/ollama/ollama/ml/nn/attention.go:29 github.com/ollama/ollama/model/models/gptoss.(*AttentionBlock).Forward(0xc0133f1480, {0x7ff7d375c5e0, 0xc013401600}, {0x7ff7d376c3a8, 0xc0133c2f78}, {0x7ff7d376c3a8, 0xc013404048}, {0x7ff7d3758520, 0xc0002d7a60}, 0xc0000ff920) github.com/ollama/ollama/model/models/gptoss/model.go:141 +0x94b fp=0xc00004f218 sp=0xc00004f028 pc=0x7ff7d25b304b github.com/ollama/ollama/model/models/gptoss.(*TransformerBlock).Forward(0xc00004f320, {0x7ff7d375c5e0, 0xc013401600}, {0x7ff7d376c3a8?, 0xc0133c2f78?}, {0x7ff7d376c3a8?, 0xc013404048?}, {0x0, 0x0}, {0x7ff7d3758520, ...}, ...) github.com/ollama/ollama/model/models/gptoss/model.go:93 +0x72 fp=0xc00004f278 sp=0xc00004f218 pc=0x7ff7d25b2632 github.com/ollama/ollama/model/models/gptoss.(*Transformer).Forward(0xc0000ff8c0, {0x7ff7d375c5e0, 0xc013401600}, {{0x7ff7d376c3a8, 0xc013405d28}, {0x7ff7d376c3a8, 0xc013405d40}, {0xc000371000, 0x200, 0x200}, ...}) github.com/ollama/ollama/model/models/gptoss/model.go:47 +0x17c fp=0xc00004f350 sp=0xc00004f278 pc=0x7ff7d25b1f7c github.com/ollama/ollama/runner/ollamarunner.(*Server).reserveWorstCaseGraph(0xc000642780, 0x1) github.com/ollama/ollama/runner/ollamarunner/runner.go:1181 +0x9ad fp=0xc00004f680 sp=0xc00004f350 pc=0x7ff7d26154cd github.com/ollama/ollama/runner/ollamarunner.(*Server).allocModel(0xc000642780, {0xc00003c230?, 0x7ff7d22db21a?}, {0x1, 0x10, {0xc0019ab6c0, 0x1, 0x1}, 0x1}, {0x0, ...}, ...) github.com/ollama/ollama/runner/ollamarunner/runner.go:1250 +0x391 fp=0xc00004f730 sp=0xc00004f680 pc=0x7ff7d2615e11 github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0xc000642780, {0x7ff7d374d760, 0xc000354000}, 0xc000312640) github.com/ollama/ollama/runner/ollamarunner/runner.go:1329 +0x54b fp=0xc00004fac0 sp=0xc00004f730 pc=0x7ff7d261686b github.com/ollama/ollama/runner/ollamarunner.(*Server).load-fm({0x7ff7d374d760?, 0xc000354000?}, 0xc00004fb40?) <autogenerated>:1 +0x36 fp=0xc00004faf0 sp=0xc00004fac0 pc=0x7ff7d2618bb6 net/http.HandlerFunc.ServeHTTP(0xc000625b00?, {0x7ff7d374d760?, 0xc000354000?}, 0xc00004fb60?) net/http/server.go:2294 +0x29 fp=0xc00004fb18 sp=0xc00004faf0 pc=0x7ff7d22e5ee9 net/http.(*ServeMux).ServeHTTP(0x7ff7d1f7b785?, {0x7ff7d374d760, 0xc000354000}, 0xc000312640) net/http/server.go:2822 +0x1c4 fp=0xc00004fb68 sp=0xc00004fb18 pc=0x7ff7d22e7de4 net/http.serverHandler.ServeHTTP({0x7ff7d3749990?}, {0x7ff7d374d760?, 0xc000354000?}, 0x1?) net/http/server.go:3301 +0x8e fp=0xc00004fb98 sp=0xc00004fb68 pc=0x7ff7d230586e net/http.(*conn).serve(0xc0002e0480, {0x7ff7d374fe98, 0xc0002dd2f0}) net/http/server.go:2102 +0x625 fp=0xc00004ffb8 sp=0xc00004fb98 pc=0x7ff7d22e43e5 net/http.(*Server).Serve.gowrap3() net/http/server.go:3454 +0x28 fp=0xc00004ffe0 sp=0xc00004ffb8 pc=0x7ff7d22e9ca8 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00004ffe8 sp=0xc00004ffe0 pc=0x7ff7d1fdd9a1 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3454 +0x485 goroutine 1 gp=0xc0000021c0 m=nil [IO wait]: runtime.gopark(0x7ff7d1fdf1a0?, 0x7ff7d42142e0?, 0x20?, 0xf4?, 0xc00020f4cc?) runtime/proc.go:435 +0xce fp=0xc0003ab630 sp=0xc0003ab610 pc=0x7ff7d1fd598e runtime.netpollblock(0x240?, 0xd1f70406?, 0xf7?) runtime/netpoll.go:575 +0xf7 fp=0xc0003ab668 sp=0xc0003ab630 pc=0x7ff7d1f9bdf7 internal/poll.runtime_pollWait(0x2947e792170, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc0003ab688 sp=0xc0003ab668 pc=0x7ff7d1fd4b25 internal/poll.(*pollDesc).wait(0x7ff7d206a953?, 0x0?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0003ab6b0 sp=0xc0003ab688 pc=0x7ff7d206bf47 internal/poll.execIO(0xc00020f420, 0xc000587758) internal/poll/fd_windows.go:177 +0x105 fp=0xc0003ab728 sp=0xc0003ab6b0 pc=0x7ff7d206d3a5 internal/poll.(*FD).acceptOne(0xc00020f408, 0x234, {0xc00003a1e0?, 0xc0005877b8?, 0x7ff7d2075065?}, 0xc0005877ec?) internal/poll/fd_windows.go:946 +0x65 fp=0xc0003ab788 sp=0xc0003ab728 pc=0x7ff7d2071925 internal/poll.(*FD).Accept(0xc00020f408, 0xc0003ab938) internal/poll/fd_windows.go:980 +0x1b6 fp=0xc0003ab840 sp=0xc0003ab788 pc=0x7ff7d2071c56 net.(*netFD).accept(0xc00020f408) net/fd_windows.go:182 +0x4b fp=0xc0003ab958 sp=0xc0003ab840 pc=0x7ff7d20e358b net.(*TCPListener).accept(0xc000609880) net/tcpsock_posix.go:159 +0x1b fp=0xc0003ab9a8 sp=0xc0003ab958 pc=0x7ff7d20f9b3b net.(*TCPListener).Accept(0xc000609880) net/tcpsock.go:380 +0x30 fp=0xc0003ab9d8 sp=0xc0003ab9a8 pc=0x7ff7d20f88f0 net/http.(*onceCloseListener).Accept(0xc0002e0480?) <autogenerated>:1 +0x24 fp=0xc0003ab9f0 sp=0xc0003ab9d8 pc=0x7ff7d2311fe4 net/http.(*Server).Serve(0xc000057c00, {0x7ff7d374d5b0, 0xc000609880}) net/http/server.go:3424 +0x30c fp=0xc0003abb20 sp=0xc0003ab9f0 pc=0x7ff7d22e98ac github.com/ollama/ollama/runner/ollamarunner.Execute({0xc0000e0030, 0x4, 0x5}) github.com/ollama/ollama/runner/ollamarunner/runner.go:1465 +0x94e fp=0xc0003abcf0 sp=0xc0003abb20 pc=0x7ff7d261854e github.com/ollama/ollama/runner.Execute({0xc0000e0010?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:18 +0x12b fp=0xc0003abd30 sp=0xc0003abcf0 pc=0x7ff7d2622dab github.com/ollama/ollama/cmd.NewCLI.func3(0xc000057900?, {0x7ff7d35247ef?, 0x4?, 0x7ff7d35247f3?}) github.com/ollama/ollama/cmd/cmd.go:2271 +0x45 fp=0xc0003abd58 sp=0xc0003abd30 pc=0x7ff7d2e4ee65 github.com/spf13/cobra.(*Command).execute(0xc0002e5b08, {0xc00037f720, 0x5, 0x5}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc0003abe78 sp=0xc0003abd58 pc=0x7ff7d215e75c github.com/spf13/cobra.(*Command).ExecuteC(0xc000627508) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc0003abf30 sp=0xc0003abe78 pc=0x7ff7d215efa5 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x4d fp=0xc0003abf50 sp=0xc0003abf30 pc=0x7ff7d2e5130d runtime.main() runtime/proc.go:283 +0x27d fp=0xc0003abfe0 sp=0xc0003abf50 pc=0x7ff7d1fa4ddd runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0003abfe8 sp=0xc0003abfe0 pc=0x7ff7d1fdd9a1 goroutine 2 gp=0xc0000028c0 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0000adfa8 sp=0xc0000adf88 pc=0x7ff7d1fd598e runtime.goparkunlock(...) runtime/proc.go:441 runtime.forcegchelper() runtime/proc.go:348 +0xb8 fp=0xc0000adfe0 sp=0xc0000adfa8 pc=0x7ff7d1fa50f8 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000adfe8 sp=0xc0000adfe0 pc=0x7ff7d1fdd9a1 created by runtime.init.7 in goroutine 1 runtime/proc.go:336 +0x1a goroutine 3 gp=0xc000002c40 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0000aff80 sp=0xc0000aff60 pc=0x7ff7d1fd598e runtime.goparkunlock(...) runtime/proc.go:441 runtime.bgsweep(0xc0000bc000) runtime/mgcsweep.go:316 +0xdf fp=0xc0000affc8 sp=0xc0000aff80 pc=0x7ff7d1f8debf runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x25 fp=0xc0000affe0 sp=0xc0000affc8 pc=0x7ff7d1f82285 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000affe8 sp=0xc0000affe0 pc=0x7ff7d1fdd9a1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x66 goroutine 4 gp=0xc000002e00 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x7ff7d3737ac0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0000c3f78 sp=0xc0000c3f58 pc=0x7ff7d1fd598e runtime.goparkunlock(...) runtime/proc.go:441 runtime.(*scavengerState).park(0x7ff7d423e080) runtime/mgcscavenge.go:425 +0x49 fp=0xc0000c3fa8 sp=0xc0000c3f78 pc=0x7ff7d1f8b909 runtime.bgscavenge(0xc0000bc000) runtime/mgcscavenge.go:658 +0x59 fp=0xc0000c3fc8 sp=0xc0000c3fa8 pc=0x7ff7d1f8be99 runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x25 fp=0xc0000c3fe0 sp=0xc0000c3fc8 pc=0x7ff7d1f82225 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000c3fe8 sp=0xc0000c3fe0 pc=0x7ff7d1fdd9a1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xa5 goroutine 5 gp=0xc000003340 m=nil [finalizer wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0000c5e30 sp=0xc0000c5e10 pc=0x7ff7d1fd598e runtime.runfinq() runtime/mfinal.go:196 +0x107 fp=0xc0000c5fe0 sp=0xc0000c5e30 pc=0x7ff7d1f81207 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000c5fe8 sp=0xc0000c5fe0 pc=0x7ff7d1fdd9a1 created by runtime.createfing in goroutine 1 runtime/mfinal.go:166 +0x3d goroutine 6 gp=0xc000003dc0 m=nil [chan receive]: runtime.gopark(0xc0001b7540?, 0xc013405dd0?, 0x60?, 0x1f?, 0x7ff7d20cc1a8?) runtime/proc.go:435 +0xce fp=0xc0000b1f18 sp=0xc0000b1ef8 pc=0x7ff7d1fd598e runtime.chanrecv(0xc00003c540, 0x0, 0x1) runtime/chan.go:664 +0x445 fp=0xc0000b1f90 sp=0xc0000b1f18 pc=0x7ff7d1f72d45 runtime.chanrecv1(0x7ff7d1fa4f40?, 0xc0000b1f76?) runtime/chan.go:506 +0x12 fp=0xc0000b1fb8 sp=0xc0000b1f90 pc=0x7ff7d1f728d2 runtime.unique_runtime_registerUniqueMapCleanup.func2(...) runtime/mgc.go:1796 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1799 +0x2f fp=0xc0000b1fe0 sp=0xc0000b1fb8 pc=0x7ff7d1f854af runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000b1fe8 sp=0xc0000b1fe0 pc=0x7ff7d1fdd9a1 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1794 +0x85 goroutine 7 gp=0xc00041e1c0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0000bff38 sp=0xc0000bff18 pc=0x7ff7d1fd598e runtime.gcBgMarkWorker(0xc00003d960) runtime/mgc.go:1423 +0xe9 fp=0xc0000bffc8 sp=0xc0000bff38 pc=0x7ff7d1f847a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000bffe0 sp=0xc0000bffc8 pc=0x7ff7d1f84685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000bffe8 sp=0xc0000bffe0 pc=0x7ff7d1fdd9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 *** SEVERAL OTHER GOROUTINE CALL STACKS LIKE GOROUTINE 7 ABOVE *** goroutine 43 gp=0xc000484fc0 m=nil [GC worker (idle)]: runtime.gopark(0x7ff7d42910e0?, 0x1?, 0xd0?, 0x7?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000495f38 sp=0xc000495f18 pc=0x7ff7d1fd598e runtime.gcBgMarkWorker(0xc00003d960) runtime/mgc.go:1423 +0xe9 fp=0xc000495fc8 sp=0xc000495f38 pc=0x7ff7d1f847a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000495fe0 sp=0xc000495fc8 pc=0x7ff7d1f84685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000495fe8 sp=0xc000495fe0 pc=0x7ff7d1fdd9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 51 gp=0xc000485a40 m=nil [sync.WaitGroup.Wait]: runtime.gopark(0x0?, 0x0?, 0x60?, 0xfe?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0002b3a90 sp=0xc0002b3a70 pc=0x7ff7d1fd598e runtime.goparkunlock(...) runtime/proc.go:441 runtime.semacquire1(0xc000642838, 0x0, 0x1, 0x0, 0x18) runtime/sema.go:188 +0x22f fp=0xc0002b3af8 sp=0xc0002b3a90 pc=0x7ff7d1fb750f sync.runtime_SemacquireWaitGroup(0x0?) runtime/sema.go:110 +0x25 fp=0xc0002b3b30 sp=0xc0002b3af8 pc=0x7ff7d1fd6f85 sync.(*WaitGroup).Wait(0x0?) sync/waitgroup.go:118 +0x48 fp=0xc0002b3b58 sp=0xc0002b3b30 pc=0x7ff7d1feb988 github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc000642780, {0x7ff7d374fed0, 0xc00037f7c0}) github.com/ollama/ollama/runner/ollamarunner/runner.go:442 +0x45 fp=0xc0002b3fb8 sp=0xc0002b3b58 pc=0x7ff7d260eb65 github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1() github.com/ollama/ollama/runner/ollamarunner/runner.go:1442 +0x28 fp=0xc0002b3fe0 sp=0xc0002b3fb8 pc=0x7ff7d26187c8 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0002b3fe8 sp=0xc0002b3fe0 pc=0x7ff7d1fdd9a1 created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/ollamarunner/runner.go:1442 +0x4c9 goroutine 53 gp=0xc000289880 m=nil [IO wait]: runtime.gopark(0x0?, 0xc00020f6a0?, 0x48?, 0xf7?, 0xc00020f74c?) runtime/proc.go:435 +0xce fp=0xc0003add58 sp=0xc0003add38 pc=0x7ff7d1fd598e runtime.netpollblock(0x248?, 0xd1f70406?, 0xf7?) runtime/netpoll.go:575 +0xf7 fp=0xc0003add90 sp=0xc0003add58 pc=0x7ff7d1f9bdf7 internal/poll.runtime_pollWait(0x2947e792058, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc0003addb0 sp=0xc0003add90 pc=0x7ff7d1fd4b25 internal/poll.(*pollDesc).wait(0x248?, 0x72?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0003addd8 sp=0xc0003addb0 pc=0x7ff7d206bf47 internal/poll.execIO(0xc00020f6a0, 0x7ff7d35a9828) internal/poll/fd_windows.go:177 +0x105 fp=0xc0003ade50 sp=0xc0003addd8 pc=0x7ff7d206d3a5 internal/poll.(*FD).Read(0xc00020f688, {0xc00030e041, 0x1, 0x1}) internal/poll/fd_windows.go:438 +0x29b fp=0xc0003adef0 sp=0xc0003ade50 pc=0x7ff7d206e07b net.(*netFD).Read(0xc00020f688, {0xc00030e041?, 0xc0000c6118?, 0xc0003adf70?}) net/fd_posix.go:55 +0x25 fp=0xc0003adf38 sp=0xc0003adef0 pc=0x7ff7d20e1465 net.(*conn).Read(0xc0000b4958, {0xc00030e041?, 0x3?, 0x7ff7d346b380?}) net/net.go:194 +0x45 fp=0xc0003adf80 sp=0xc0003adf38 pc=0x7ff7d20f0b85 net/http.(*connReader).backgroundRead(0xc00030e030) net/http/server.go:690 +0x37 fp=0xc0003adfc8 sp=0xc0003adf80 pc=0x7ff7d22de2b7 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:686 +0x25 fp=0xc0003adfe0 sp=0xc0003adfc8 pc=0x7ff7d22de1e5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0003adfe8 sp=0xc0003adfe0 pc=0x7ff7d1fdd9a1 created by net/http.(*connReader).startBackgroundRead in goroutine 52 net/http/server.go:686 +0xb6 rax 0x0 rbx 0x2947f4f2db8 rcx 0x2947f4f2db8 rdx 0x2947f4f2db8 rdi 0x2947f4f2d88 rsi 0x0 rbp 0x2947f4f2db8 rsp 0xf0f85fe730 r8 0xfffffffd00000000 r9 0x2947f4f2d68 r10 0x2947f4f2db8 r11 0x0 r12 0x0 r13 0x0 r14 0x2947f4f2d88 r15 0xf0f85fe920 rip 0x7fff73d69176 rflags 0x10246 cs 0x33 fs 0x53 gs 0x2b time=2026-03-13T17:50:48.454-04:00 level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:60481/load\": read tcp 127.0.0.1:60488->127.0.0.1:60481: wsarecv: An existing connection was forcibly closed by the remote host." time=2026-03-13T17:50:48.455-04:00 level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:60481/load\": dial tcp 127.0.0.1:60481: connectex: No connection could be made because the target machine actively refused it." time=2026-03-13T17:50:48.455-04:00 level=INFO source=device.go:240 msg="model weights" device=ROCm0 size="59.8 GiB" time=2026-03-13T17:50:48.455-04:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="1.1 GiB" time=2026-03-13T17:50:48.455-04:00 level=INFO source=device.go:251 msg="kv cache" device=ROCm0 size="4.7 GiB" time=2026-03-13T17:50:48.455-04:00 level=INFO source=device.go:262 msg="compute graph" device=ROCm0 size="443.1 MiB" time=2026-03-13T17:50:48.455-04:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="5.6 MiB" time=2026-03-13T17:50:48.455-04:00 level=INFO source=device.go:272 msg="total memory" size="66.0 GiB" time=2026-03-13T17:50:48.455-04:00 level=INFO source=sched.go:516 msg="Load failed" model=C:\Users\naju\.ollama\models\blobs\sha256-6be6d66a3f546d8c19b130dc41dc24b2fc159f84ffbc76a0ee0676205083cf5a error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details" [GIN] 2026/03/13 - 17:50:48 | 500 | 10.321251s | 127.0.0.1 | POST "/api/generate" ``` ### OS Windows ### GPU AMD ### CPU AMD ### Ollama version 0.17.7

GiteaMirror added the bug label 2026-04-29 10:14:50 -05:00

GiteaMirror closed this issue

2026-04-29 10:14:51 -05:00

GiteaMirror commented

2026-04-29 10:14:53 -05:00

@rick-github commented on GitHub (Mar 14, 2026):

signal arrived during external code execution

The process being killed by a signal usually means the OS killed it. On a Linux system I would look at the output of dmesg, unfortunately I don't know how Windows logs kernel events. Is there a kernel log that can be checked for possible causes of process termination?

@rick-github commented on GitHub (Mar 14, 2026): ``` signal arrived during external code execution ``` The process being killed by a signal usually means the OS killed it. On a Linux system I would look at the output of `dmesg`, unfortunately I don't know how Windows logs kernel events. Is there a kernel log that can be checked for possible causes of process termination?

GiteaMirror commented

2026-04-29 10:14:54 -05:00

@najumancheril commented on GitHub (Mar 14, 2026):

Thank you for the pointer! This crash has been fixed on my machine by updating to the latest Framework drivers package. Here is what I installed:

BIOS Release Date 2025-12-09
Driver Bundle Release Date 2025-11-27

https://knowledgebase.frame.work/en_us/framework-desktop-bios-and-driver-releases-amd-ryzen-ai-max-300-series-BJHcn1Y4gg

@najumancheril commented on GitHub (Mar 14, 2026): Thank you for the pointer! This crash has been fixed on my machine by updating to the latest Framework drivers package. Here is what I installed: BIOS Release Date 2025-12-09 Driver Bundle Release Date 2025-11-27 https://knowledgebase.frame.work/en_us/framework-desktop-bios-and-driver-releases-amd-ryzen-ai-max-300-series-BJHcn1Y4gg

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

dhiltgen/llama-runner

parth-launch-codex-app

hoyyeva/anthropic-local-image-path

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#56084