[GH-ISSUE #3769] An existing connection was forcibly closed by the remote host.Could you help me? #48838

Closed
opened 2026-04-28 09:41:53 -05:00 by GiteaMirror · 37 comments
Owner

Originally created by @risingnew on GitHub (Apr 20, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3769

What is the issue?

PS C:\Users\Administrator\AppData\Local\Ollama> ollama run llama3
pulling manifest
Error: pull model manifest: Get "https://ollama.com/token?nonce=1AKxIvoajv-NPGYukzWJcA&scope=repository%!A(MISSING)library%!F(MISSING)llama3%!A(MISSING)pull&service=ollama.com&ts=1713578711": read tcp 192.168.124.11:53463->34.120.132.20:443: wsarecv: An existing connection was forcibly closed by the remote host.
100614
(https://github.com/ollama/ollama/files/15046570/server.log)

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

No response

Originally created by @risingnew on GitHub (Apr 20, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3769 ### What is the issue? PS C:\Users\Administrator\AppData\Local\Ollama> ollama run llama3 pulling manifest Error: pull model manifest: Get "https://ollama.com/token?nonce=1AKxIvoajv-NPGYukzWJcA&scope=repository%!A(MISSING)library%!F(MISSING)llama3%!A(MISSING)pull&service=ollama.com&ts=1713578711": read tcp 192.168.124.11:53463->34.120.132.20:443: wsarecv: An existing connection was forcibly closed by the remote host. ![100614](https://github.com/ollama/ollama/assets/128674607/c25a50e8-0f0d-4291-b9bb-a5c40e7a604d) (https://github.com/ollama/ollama/files/15046570/server.log) ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-28 09:41:53 -05:00
Author
Owner

@igorschlum commented on GitHub (Apr 20, 2024):

Hi @risingnew can you tell us What version of Ollama you are using?

<!-- gh-comment-id:2067603774 --> @igorschlum commented on GitHub (Apr 20, 2024): Hi @risingnew can you tell us What version of Ollama you are using?
Author
Owner

@risingnew commented on GitHub (Apr 20, 2024):

the latest vision 0.1.32

<!-- gh-comment-id:2067611403 --> @risingnew commented on GitHub (Apr 20, 2024): the latest vision 0.1.32
Author
Owner

@risingnew commented on GitHub (Apr 20, 2024):

Thankyou for helping

<!-- gh-comment-id:2067611639 --> @risingnew commented on GitHub (Apr 20, 2024): Thankyou for helping
Author
Owner

@risingnew commented on GitHub (Apr 20, 2024):

Hi @risingnew can you tell us What version of Ollama you are using?

Latest version 0.1.32,Thankyou

<!-- gh-comment-id:2067661342 --> @risingnew commented on GitHub (Apr 20, 2024): > Hi @risingnew can you tell us What version of Ollama you are using? Latest version 0.1.32,Thankyou
Author
Owner

@igorschlum commented on GitHub (Apr 20, 2024):

@risingnew I'm not on Windows, so I cannot reproduce Windows Issues. Did you try to load another model? Do you see same Issue?
In which country are you? Are at a home internet access or at a company internet access?

<!-- gh-comment-id:2067759533 --> @igorschlum commented on GitHub (Apr 20, 2024): @risingnew I'm not on Windows, so I cannot reproduce Windows Issues. Did you try to load another model? Do you see same Issue? In which country are you? Are at a home internet access or at a company internet access?
Author
Owner

@risingnew commented on GitHub (Apr 20, 2024):

@risingnew I'm not on Windows, so I cannot reproduce Windows Issues. Did you try to load another model? Do you see same Issue?
In which country are you? Are at a home internet access or at a company internet access?

Thank you. Do you mean the network problem is the cause of this problem? The same error occurred when I loaded the other models

<!-- gh-comment-id:2067805439 --> @risingnew commented on GitHub (Apr 20, 2024): > @risingnew I'm not on Windows, so I cannot reproduce Windows Issues. Did you try to load another model? Do you see same Issue? > In which country are you? Are at a home internet access or at a company internet access? Thank you. Do you mean the network problem is the cause of this problem? The same error occurred when I loaded the other models
Author
Owner

@risingnew commented on GitHub (Apr 20, 2024):

@igorschlum Is there any way to manually download the LLAMA3 model? And manually start him.

<!-- gh-comment-id:2067806770 --> @risingnew commented on GitHub (Apr 20, 2024): @igorschlum Is there any way to manually download the LLAMA3 model? And manually start him.
Author
Owner

@igorschlum commented on GitHub (Apr 22, 2024):

you can ask someone who successfully download it to copy it for you.

<!-- gh-comment-id:2068421297 --> @igorschlum commented on GitHub (Apr 22, 2024): you can ask someone who successfully download it to copy it for you.
Author
Owner

@Miameister commented on GitHub (Apr 22, 2024):

i have the same problem, have you solved this? Thankyou for helping

<!-- gh-comment-id:2068547979 --> @Miameister commented on GitHub (Apr 22, 2024): i have the same problem, have you solved this? Thankyou for helping
Author
Owner

@Saydoudou commented on GitHub (Apr 25, 2024):

same problem +1,have you solved it? @risingnew Thankyou for helping

<!-- gh-comment-id:2076664058 --> @Saydoudou commented on GitHub (Apr 25, 2024): same problem +1,have you solved it? @risingnew Thankyou for helping
Author
Owner

@igorschlum commented on GitHub (Apr 25, 2024):

There is an interesting other topic where someone is asking to limit the number of LLM that can be downloaded directly from Ollama to prevent the use of non free LLM's.
I think that Ollama should propose to download manually LLM's and allow the app to load LLMs from a file. Let see what the team will do on this.

<!-- gh-comment-id:2076679043 --> @igorschlum commented on GitHub (Apr 25, 2024): There is an interesting other topic where someone is asking to limit the number of LLM that can be downloaded directly from Ollama to prevent the use of non free LLM's. I think that Ollama should propose to download manually LLM's and allow the app to load LLMs from a file. Let see what the team will do on this.
Author
Owner

@Miameister commented on GitHub (Apr 25, 2024):

same problem +1,have you solved it? @risingnew Thankyou for helping

I`ve already solved it. At first I opened my vpn and tried a lot of time, it didnt work. but when i closed my vpn and repeated the command for many times it started to download. Maybe you can change your network and ping more times

<!-- gh-comment-id:2076767384 --> @Miameister commented on GitHub (Apr 25, 2024): > same problem +1,have you solved it? @risingnew Thankyou for helping I`ve already solved it. At first I opened my vpn and tried a lot of time, it didnt work. but when i closed my vpn and repeated the command for many times it started to download. Maybe you can change your network and ping more times
Author
Owner

@Saydoudou commented on GitHub (Apr 26, 2024):

The problem has been solved. Clash turns on tun mode, rule mode and system agent.

<!-- gh-comment-id:2079403907 --> @Saydoudou commented on GitHub (Apr 26, 2024): The problem has been solved. Clash turns on tun mode, rule mode and system agent.
Author
Owner

@igorschlum commented on GitHub (Apr 26, 2024):

@risingnew do you use a VPN? Do you still have the issue?

<!-- gh-comment-id:2080092514 --> @igorschlum commented on GitHub (Apr 26, 2024): @risingnew do you use a VPN? Do you still have the issue?
Author
Owner

@Saydoudou commented on GitHub (Apr 28, 2024):

ref: https://github.com/microsoft/WSL/releases/tag/2.0.0

<!-- gh-comment-id:2081333975 --> @Saydoudou commented on GitHub (Apr 28, 2024): ref: https://github.com/microsoft/WSL/releases/tag/2.0.0
Author
Owner

@Saydoudou commented on GitHub (Apr 28, 2024):

参考: https: //github.com/microsoft/WSL/releases/tag/2.0.0
It may be caused by "wsl: localhost proxy configuration detected, but not mirrored to WSL. WSL in NAT mode does not support localhost proxy"

<!-- gh-comment-id:2081338204 --> @Saydoudou commented on GitHub (Apr 28, 2024): > 参考: https: [//github.com/microsoft/WSL/releases/tag/2.0.0](https://github.com/microsoft/WSL/releases/tag/2.0.0) It may be caused by "wsl: localhost proxy configuration detected, but not mirrored to WSL. WSL in NAT mode does not support localhost proxy"
Author
Owner

@ALiftTime commented on GitHub (Apr 28, 2024):

参考: https: //github.com/microsoft/WSL/releases/tag/2.0.0
It may be caused by "wsl: localhost proxy configuration detected, but not mirrored to WSL. WSL in NAT mode does not support localhost proxy"

I used the above method to install WSL, but still cannot use it

<!-- gh-comment-id:2081378478 --> @ALiftTime commented on GitHub (Apr 28, 2024): > > 参考: https: [//github.com/microsoft/WSL/releases/tag/2.0.0](https://github.com/microsoft/WSL/releases/tag/2.0.0) > > It may be caused by "wsl: localhost proxy configuration detected, but not mirrored to WSL. WSL in NAT mode does not support localhost proxy" I used the above method to install WSL, but still cannot use it
Author
Owner

@wtrendong commented on GitHub (Apr 29, 2024):

The same problem also troubled me for several days. Inspired by Saydoudou's reply content, I turned on the network accelerator's
'global proxy'<全局代理> or called netcard mode<网卡模式>, in order to completely open the terminal's proxy function, and the model downloaded quickly. Thank you Saydoudou, thank you all.

<!-- gh-comment-id:2082366523 --> @wtrendong commented on GitHub (Apr 29, 2024): The same problem also troubled me for several days. Inspired by Saydoudou's reply content, I turned on the network accelerator's 'global proxy'<全局代理> or called netcard mode<网卡模式>, in order to completely open the terminal's proxy function, and the model downloaded quickly. Thank you Saydoudou, thank you all.
Author
Owner

@Saydoudou commented on GitHub (Apr 30, 2024):

参考: https://github.com/microsoft/WSL/releases/tag/2.0.0可能
是由于“wsl: 检测到本地主机代理配置,但未镜像到 WSL。NAT 模式下的 WSL 不支持本地主机代理”引起的

我按照上面的方法安装了WSL,但是还是无法使用

It is recommended to change to win10 system.My computer at the company uses win11 system, but I still get errors frequently. The computer at home uses the win10 Pro system, and no error problems have been found.
微信图片_20240430172219

<!-- gh-comment-id:2084809904 --> @Saydoudou commented on GitHub (Apr 30, 2024): > > > 参考: https://github.com/microsoft/WSL/releases/tag/2.0.0[可能](https://github.com/microsoft/WSL/releases/tag/2.0.0) > > > 是由于“wsl: 检测到本地主机代理配置,但未镜像到 WSL。NAT 模式下的 WSL 不支持本地主机代理”引起的 > > 我按照上面的方法安装了WSL,但是还是无法使用 It is recommended to change to win10 system.My computer at the company uses win11 system, but I still get errors frequently. The computer at home uses the win10 Pro system, and no error problems have been found. ![微信图片_20240430172219](https://github.com/ollama/ollama/assets/167050763/03c029dc-e3b1-4cac-93c2-78873e920aeb)
Author
Owner

@dhiltgen commented on GitHub (May 2, 2024):

This looks like a dup of #3504

<!-- gh-comment-id:2089331465 --> @dhiltgen commented on GitHub (May 2, 2024): This looks like a dup of #3504
Author
Owner

@vobear commented on GitHub (Jul 25, 2024):

Clash打开Tun模式,可能需要1-2分钟重试ollama run qwen2即可。

<!-- gh-comment-id:2249593625 --> @vobear commented on GitHub (Jul 25, 2024): Clash打开Tun模式,可能需要1-2分钟重试ollama run qwen2即可。
Author
Owner

@mgks commented on GitHub (Jul 29, 2024):

This looks like a dup of #3504

not the same.

<!-- gh-comment-id:2255341249 --> @mgks commented on GitHub (Jul 29, 2024): > This looks like a dup of #3504 not the same.
Author
Owner

@Panican-Whyasker commented on GitHub (Jan 30, 2025):

Hello all,
Just got a similar error while the model (deepseek-r1:671b) was in the middle of answering my question.

  • **Error: an error was encountered while running the model: read tcp 127.0.0.1:58122->127.0.0.1:52358: wsarecv: An existing connection was forcibly closed by the remote host.

Windows Server 2016 here on a NUMA machine with 4 Xeons.

<!-- gh-comment-id:2624660096 --> @Panican-Whyasker commented on GitHub (Jan 30, 2025): Hello all, Just got a similar error while the model (deepseek-r1:671b) was in the middle of answering my question. - **Error: an error was encountered while running the model: read tcp 127.0.0.1:58122->127.0.0.1:52358: wsarecv: An existing connection was forcibly closed by the remote host. Windows Server 2016 here on a NUMA machine with 4 Xeons.
Author
Owner

@Inakito commented on GitHub (Jan 31, 2025):

same here when loading deepseek r1 on windows 11.

<!-- gh-comment-id:2627804991 --> @Inakito commented on GitHub (Jan 31, 2025): same here when loading deepseek r1 on windows 11.
Author
Owner

@Set27 commented on GitHub (Feb 11, 2025):

+1

<!-- gh-comment-id:2652010404 --> @Set27 commented on GitHub (Feb 11, 2025): +1
Author
Owner

@Inakito commented on GitHub (Feb 11, 2025):

if you keep insisting it eventually finishes the load (i tried four times at different moments: morning, evening, night..)

<!-- gh-comment-id:2652168772 --> @Inakito commented on GitHub (Feb 11, 2025): if you keep insisting it eventually finishes the load (i tried four times at different moments: morning, evening, night..)
Author
Owner

@Panican-Whyasker commented on GitHub (Feb 11, 2025):

if you keep insisting it eventually finishes the load (i tried four times at different moments: morning, evening, night..)

Well, it eventually worked without error, once. When I asked my next question, its narrative was broken by the same error.

<!-- gh-comment-id:2652243642 --> @Panican-Whyasker commented on GitHub (Feb 11, 2025): > if you keep insisting it eventually finishes the load (i tried four times at different moments: morning, evening, night..) Well, it eventually worked without error, once. When I asked my next question, its narrative was broken by the same error.
Author
Owner

@Panican-Whyasker commented on GitHub (Feb 14, 2025):

Okay, it seems that I have found the cause for this error. See here:

https://github.com/ollama/ollama/issues/8074#issuecomment-2658648380

By default, Ollama sets the parameter num_ctx to just 2048 (max. tokens in a LLM's output).

If your LLM ouotputs more that 2048 tokens, it will crash.

I ran successfully DeepSeek-R1 (671B) a few times but the output was always below 2048 tokens.

WARNING: They say that a model sized at 404 GB would require ~600 GB of free system RAM in order to run with a num_ctx set at 4048.

/set parameter num_ctx 4096
Set parameter 'num_ctx' to '4096'

With Ollama 0.5.1, the model takes slightly more RAM than usual (~460 GB).

<!-- gh-comment-id:2658989102 --> @Panican-Whyasker commented on GitHub (Feb 14, 2025): Okay, it seems that I have found the cause for this error. See here: [https://github.com/ollama/ollama/issues/8074#issuecomment-2658648380](url) By default, Ollama sets the parameter num_ctx to just 2048 (max. tokens in a LLM's output). If your LLM ouotputs more that 2048 tokens, it will crash. I ran successfully DeepSeek-R1 (671B) a few times but the output was always below 2048 tokens. WARNING: They say that a model sized at 404 GB would require ~600 GB of free system RAM in order to run with a num_ctx set at 4048. >>> /set parameter num_ctx 4096 Set parameter 'num_ctx' to '4096' With Ollama 0.5.1, the model takes slightly more RAM than usual (~460 GB).
Author
Owner

@Panican-Whyasker commented on GitHub (Feb 20, 2025):

This time, deepseek-r1:671b crashed with an error almost immediately, without being able to yield any output.

Right after setting 'num_thread' to '36' and 'num_ctx' to '16384' and entering my prompt, I got:

Error: llama runner process has terminated: exit status 2

Ollama was updated to version 0.5.11 before that and I re-launched the Windows PowerShell.

Things are getting worse. :(

<!-- gh-comment-id:2671357836 --> @Panican-Whyasker commented on GitHub (Feb 20, 2025): This time, deepseek-r1:671b crashed with an error almost immediately, without being able to yield any output. Right after setting 'num_thread' to '36' and 'num_ctx' to '16384' and entering my prompt, I got: Error: llama runner process has terminated: exit status 2 Ollama was updated to version 0.5.11 before that and I re-launched the Windows PowerShell. Things are getting worse. :(
Author
Owner

@mcDandy commented on GitHub (Mar 16, 2025):

I have the same issue with gemma3 and ollama 0.6.1

<!-- gh-comment-id:2727677114 --> @mcDandy commented on GitHub (Mar 16, 2025): I have the same issue with gemma3 and ollama 0.6.1
Author
Owner

@chenbridge commented on GitHub (Apr 6, 2025):

I have the same issue with ollama 0.6.4

C:\Users\Bridge>ollama -v
ollama version is 0.6.4

C:\Users\Bridge>ollama run gemma3:4b hi
Error: POST predict: Post "http://127.0.0.1:54956/completion": read tcp 127.0.0.1:54959->127.0.0.1:54956: wsarecv: An existing connection was forcibly closed by the remote host.

<!-- gh-comment-id:2781346001 --> @chenbridge commented on GitHub (Apr 6, 2025): I have the same issue with ollama 0.6.4 ``` C:\Users\Bridge>ollama -v ollama version is 0.6.4 C:\Users\Bridge>ollama run gemma3:4b hi Error: POST predict: Post "http://127.0.0.1:54956/completion": read tcp 127.0.0.1:54959->127.0.0.1:54956: wsarecv: An existing connection was forcibly closed by the remote host. ```
Author
Owner

@igorschlum commented on GitHub (Apr 6, 2025):

deepseek-r1:671b needs 671GB of Vram a big big big computer. Do you have that? If not choose a smaller deepseek LLM.

<!-- gh-comment-id:2781405080 --> @igorschlum commented on GitHub (Apr 6, 2025): deepseek-r1:671b needs 671GB of Vram a big big big computer. Do you have that? If not choose a smaller deepseek LLM.
Author
Owner

@zhiyang12345 commented on GitHub (Apr 27, 2025):

大家好,当模型 (deepseek-r1:671b) 正在回答我的问题时,刚刚遇到了类似的错误。

  • 错误:运行模型时遇到错误:读取 tcp 127.0.0.1:58122->127.0.0.1:52358:wsarecv:远程主机强行关闭了现有连接。

Windows Server 2016 位于具有 4 个 Xeon 的 NUMA 计算机上。

do you resolve this bug?i meet this bug also

<!-- gh-comment-id:2833211374 --> @zhiyang12345 commented on GitHub (Apr 27, 2025): > 大家好,当模型 (deepseek-r1:671b) 正在回答我的问题时,刚刚遇到了类似的错误。 > > * 错误:运行模型时遇到错误:读取 tcp 127.0.0.1:58122->127.0.0.1:52358:wsarecv:远程主机强行关闭了现有连接。 > > Windows Server 2016 位于具有 4 个 Xeon 的 NUMA 计算机上。 do you resolve this bug?i meet this bug also
Author
Owner

@redsun1988 commented on GitHub (Apr 30, 2025):

The issue is still here.
Any help is greatly appreciated.

My ollama version is:

ollama -v
ollama version is 0.6.6

I run it on Windows 11 Pro.
Here is the error message:

Error: POST predict: Post "http://127.0.0.1:51470/completion": read tcp 127.0.0.1:51473->127.0.0.1:51470: wsarecv: An existing connection was forcibly closed by the remote host.

No issue to download models. I have this error practically everywhere. Working with tiny deepseek-r1:1.5b or requesting embeddings for paraphrase-multilingual. Yesterday morning it worked just fine and suddenly had started to throw this error.

Tried to reinstall ollama and redownload models.

Here is the part of my server.txt

time=2025-04-30T11:10:00.205+03:00 level=INFO source=server.go:619 msg="llama runner started in 0.75 seconds"
[GIN] 2025/04/30 - 11:10:00 | 200 |    1.5762114s |       127.0.0.1 | POST     "/api/generate"
time=2025-04-30T11:10:05.275+03:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed
[GIN] 2025/04/30 - 11:10:05 | 200 |     62.4184ms |       127.0.0.1 | POST     "/api/chat"
time=2025-04-30T11:10:05.386+03:00 level=ERROR source=server.go:449 msg="llama runner terminated" error="exit status 0xc0000409"
[GIN] 2025/04/30 - 11:16:55 | 200 |      6.2555ms |       127.0.0.1 | GET      "/api/version"
<!-- gh-comment-id:2841202560 --> @redsun1988 commented on GitHub (Apr 30, 2025): The issue is still here. Any help is greatly appreciated. My ollama version is: ```cmd ollama -v ollama version is 0.6.6 ``` I run it on Windows 11 Pro. Here is the error message: ``` Error: POST predict: Post "http://127.0.0.1:51470/completion": read tcp 127.0.0.1:51473->127.0.0.1:51470: wsarecv: An existing connection was forcibly closed by the remote host. ``` No issue to download models. I have this error practically everywhere. Working with tiny **deepseek-r1:1.5b** or requesting embeddings for **paraphrase-multilingual**. Yesterday morning it worked just fine and suddenly had started to throw this error. Tried to reinstall ollama and redownload models. Here is the part of my server.txt ``` time=2025-04-30T11:10:00.205+03:00 level=INFO source=server.go:619 msg="llama runner started in 0.75 seconds" [GIN] 2025/04/30 - 11:10:00 | 200 | 1.5762114s | 127.0.0.1 | POST "/api/generate" time=2025-04-30T11:10:05.275+03:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed [GIN] 2025/04/30 - 11:10:05 | 200 | 62.4184ms | 127.0.0.1 | POST "/api/chat" time=2025-04-30T11:10:05.386+03:00 level=ERROR source=server.go:449 msg="llama runner terminated" error="exit status 0xc0000409" [GIN] 2025/04/30 - 11:16:55 | 200 | 6.2555ms | 127.0.0.1 | GET "/api/version" ```
Author
Owner

@arkimium commented on GitHub (May 2, 2025):

Here now is also appeared in Windows 21H2 with AMD Radeon RX 5500, with version 0.6.7.

the error is the same as above:

Error: POST predict: Post "http://127.0.0.1:61115/completion": read tcp 127.0.0.1:61117->127.0.0.1:61115: wsarecv: An existing connection was forcibly closed by the remote host.

my fully server log:

2025/05/02 20:10:48 routes.go:1233: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\Administrator\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]"
time=2025-05-02T20:10:48.853+08:00 level=INFO source=images.go:458 msg="total blobs: 5"
time=2025-05-02T20:10:48.853+08:00 level=INFO source=images.go:465 msg="total unused blobs removed: 0"
time=2025-05-02T20:10:48.854+08:00 level=INFO source=routes.go:1300 msg="Listening on 127.0.0.1:11434 (version 0.6.7)"
time=2025-05-02T20:10:48.854+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-05-02T20:10:48.854+08:00 level=INFO source=gpu_windows.go:167 msg=packages count=1
time=2025-05-02T20:10:48.854+08:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=4 efficiency=0 threads=8
time=2025-05-02T20:10:48.866+08:00 level=INFO source=gpu.go:377 msg="no compatible GPUs were discovered"
time=2025-05-02T20:10:48.866+08:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="16.0 GiB" available="11.2 GiB"
[GIN] 2025/05/02 - 20:10:49 | 200 |            0s |       127.0.0.1 | HEAD     "/"
time=2025-05-02T20:10:49.244+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:10:49.292+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/02 - 20:10:49 | 200 |    101.7662ms |       127.0.0.1 | POST     "/api/show"
time=2025-05-02T20:10:49.356+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:10:49.402+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:10:49.449+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:10:49.453+08:00 level=INFO source=server.go:105 msg="system memory" total="16.0 GiB" free="11.2 GiB" free_swap="9.4 GiB"
time=2025-05-02T20:10:49.454+08:00 level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[11.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
time=2025-05-02T20:10:49.534+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:10:49.539+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-02T20:10:49.544+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-05-02T20:10:49.544+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-05-02T20:10:49.544+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-05-02T20:10:49.544+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-02T20:10:49.544+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-02T20:10:49.561+08:00 level=INFO source=server.go:409 msg="starting llama server" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Administrator\\.ollama\\models\\blobs\\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 4 --no-mmap --parallel 2 --port 62929"
time=2025-05-02T20:10:49.565+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-05-02T20:10:49.565+08:00 level=INFO source=server.go:585 msg="waiting for llama runner to start responding"
time=2025-05-02T20:10:49.566+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error"
time=2025-05-02T20:10:49.593+08:00 level=INFO source=runner.go:861 msg="starting ollama engine"
time=2025-05-02T20:10:49.609+08:00 level=INFO source=runner.go:924 msg="Server listening on 127.0.0.1:62929"
time=2025-05-02T20:10:49.689+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:10:49.693+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default=""
time=2025-05-02T20:10:49.693+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default=""
time=2025-05-02T20:10:49.693+08:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
load_backend: loaded CPU backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll
time=2025-05-02T20:10:49.711+08:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang)
time=2025-05-02T20:10:49.716+08:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="3.6 GiB"
time=2025-05-02T20:10:49.825+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model"
time=2025-05-02T20:10:50.725+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-02T20:10:50.731+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-05-02T20:10:50.731+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-05-02T20:10:50.731+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-05-02T20:10:50.731+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-02T20:10:50.731+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-02T20:10:50.832+08:00 level=INFO source=server.go:624 msg="llama runner started in 1.27 seconds"
[GIN] 2025/05/02 - 20:10:50 | 200 |    1.5180319s |       127.0.0.1 | POST     "/api/generate"
time=2025-05-02T20:10:51.549+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed
[GIN] 2025/05/02 - 20:10:51 | 200 |    259.0518ms |       127.0.0.1 | POST     "/api/chat"
time=2025-05-02T20:10:52.043+08:00 level=ERROR source=server.go:454 msg="llama runner terminated" error="exit status 0xc0000409"
[GIN] 2025/05/02 - 20:11:05 | 200 |            0s |       127.0.0.1 | HEAD     "/"
time=2025-05-02T20:11:05.056+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:11:05.101+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/02 - 20:11:05 | 200 |     94.8025ms |       127.0.0.1 | POST     "/api/show"
time=2025-05-02T20:11:05.166+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:11:05.213+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:11:05.259+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:11:05.264+08:00 level=INFO source=server.go:105 msg="system memory" total="16.0 GiB" free="11.1 GiB" free_swap="9.3 GiB"
time=2025-05-02T20:11:05.265+08:00 level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[11.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
time=2025-05-02T20:11:05.335+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:11:05.339+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-02T20:11:05.343+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-05-02T20:11:05.343+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-05-02T20:11:05.343+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-05-02T20:11:05.343+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-02T20:11:05.343+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-02T20:11:05.355+08:00 level=INFO source=server.go:409 msg="starting llama server" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Administrator\\.ollama\\models\\blobs\\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 4 --no-mmap --parallel 2 --port 62964"
time=2025-05-02T20:11:05.359+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-05-02T20:11:05.359+08:00 level=INFO source=server.go:585 msg="waiting for llama runner to start responding"
time=2025-05-02T20:11:05.360+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error"
time=2025-05-02T20:11:05.389+08:00 level=INFO source=runner.go:861 msg="starting ollama engine"
time=2025-05-02T20:11:05.405+08:00 level=INFO source=runner.go:924 msg="Server listening on 127.0.0.1:62964"
time=2025-05-02T20:11:05.483+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:11:05.487+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default=""
time=2025-05-02T20:11:05.487+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default=""
time=2025-05-02T20:11:05.487+08:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
load_backend: loaded CPU backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll
time=2025-05-02T20:11:05.505+08:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang)
time=2025-05-02T20:11:05.510+08:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="3.6 GiB"
time=2025-05-02T20:11:05.616+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model"
time=2025-05-02T20:11:06.504+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-02T20:11:06.509+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-05-02T20:11:06.510+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-05-02T20:11:06.510+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-05-02T20:11:06.510+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-02T20:11:06.510+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-02T20:11:06.632+08:00 level=INFO source=server.go:624 msg="llama runner started in 1.27 seconds"
[GIN] 2025/05/02 - 20:11:06 | 200 |    1.5093505s |       127.0.0.1 | POST     "/api/generate"
time=2025-05-02T20:11:07.814+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed
[GIN] 2025/05/02 - 20:11:08 | 200 |    269.4994ms |       127.0.0.1 | POST     "/api/chat"
time=2025-05-02T20:11:08.297+08:00 level=ERROR source=server.go:454 msg="llama runner terminated" error="exit status 0xc0000409"
[GIN] 2025/05/02 - 20:11:09 | 200 |            0s |       127.0.0.1 | HEAD     "/"
time=2025-05-02T20:11:09.191+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:11:09.235+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/02 - 20:11:09 | 200 |     98.6684ms |       127.0.0.1 | POST     "/api/show"
time=2025-05-02T20:11:09.299+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:11:09.343+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:11:09.390+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:11:09.394+08:00 level=INFO source=server.go:105 msg="system memory" total="16.0 GiB" free="11.1 GiB" free_swap="9.3 GiB"
time=2025-05-02T20:11:09.396+08:00 level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[11.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
time=2025-05-02T20:11:09.467+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:11:09.472+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-02T20:11:09.477+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-05-02T20:11:09.477+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-05-02T20:11:09.477+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-05-02T20:11:09.477+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-02T20:11:09.477+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-02T20:11:09.478+08:00 level=INFO source=server.go:409 msg="starting llama server" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Administrator\\.ollama\\models\\blobs\\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 4 --no-mmap --parallel 2 --port 62977"
time=2025-05-02T20:11:09.482+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-05-02T20:11:09.482+08:00 level=INFO source=server.go:585 msg="waiting for llama runner to start responding"
time=2025-05-02T20:11:09.483+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error"
time=2025-05-02T20:11:09.512+08:00 level=INFO source=runner.go:861 msg="starting ollama engine"
time=2025-05-02T20:11:09.529+08:00 level=INFO source=runner.go:924 msg="Server listening on 127.0.0.1:62977"
time=2025-05-02T20:11:09.604+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:11:09.608+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default=""
time=2025-05-02T20:11:09.608+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default=""
time=2025-05-02T20:11:09.608+08:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
load_backend: loaded CPU backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll
time=2025-05-02T20:11:09.625+08:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang)
time=2025-05-02T20:11:09.631+08:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="3.6 GiB"
time=2025-05-02T20:11:09.736+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model"
time=2025-05-02T20:11:10.652+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-02T20:11:10.657+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-05-02T20:11:10.657+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-05-02T20:11:10.657+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-05-02T20:11:10.657+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-02T20:11:10.657+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
[GIN] 2025/05/02 - 20:11:10 | 200 |    1.4949226s |       127.0.0.1 | POST     "/api/generate"
time=2025-05-02T20:11:10.751+08:00 level=INFO source=server.go:624 msg="llama runner started in 1.27 seconds"
time=2025-05-02T20:11:14.476+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed
[GIN] 2025/05/02 - 20:11:14 | 200 |    222.3765ms |       127.0.0.1 | POST     "/api/chat"
time=2025-05-02T20:11:14.904+08:00 level=ERROR source=server.go:454 msg="llama runner terminated" error="exit status 0xc0000409"
[GIN] 2025/05/02 - 20:13:09 | 200 |            0s |       127.0.0.1 | GET      "/"
[GIN] 2025/05/02 - 20:13:56 | 200 |            0s |       127.0.0.1 | HEAD     "/"
time=2025-05-02T20:13:56.689+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:13:56.734+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/02 - 20:13:56 | 200 |     96.0357ms |       127.0.0.1 | POST     "/api/show"
time=2025-05-02T20:13:56.797+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:13:56.841+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:13:56.887+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:13:56.891+08:00 level=INFO source=server.go:105 msg="system memory" total="16.0 GiB" free="10.8 GiB" free_swap="9.0 GiB"
time=2025-05-02T20:13:56.892+08:00 level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[10.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
time=2025-05-02T20:13:56.965+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:13:56.970+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-02T20:13:56.975+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-05-02T20:13:56.975+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-05-02T20:13:56.975+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-05-02T20:13:56.975+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-02T20:13:56.975+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-02T20:13:56.986+08:00 level=INFO source=server.go:409 msg="starting llama server" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Administrator\\.ollama\\models\\blobs\\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 4 --no-mmap --parallel 2 --port 63140"
time=2025-05-02T20:13:56.991+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-05-02T20:13:56.991+08:00 level=INFO source=server.go:585 msg="waiting for llama runner to start responding"
time=2025-05-02T20:13:56.991+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error"
time=2025-05-02T20:13:57.020+08:00 level=INFO source=runner.go:861 msg="starting ollama engine"
time=2025-05-02T20:13:57.035+08:00 level=INFO source=runner.go:924 msg="Server listening on 127.0.0.1:63140"
time=2025-05-02T20:13:57.112+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:13:57.116+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default=""
time=2025-05-02T20:13:57.116+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default=""
time=2025-05-02T20:13:57.116+08:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
load_backend: loaded CPU backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll
time=2025-05-02T20:13:57.133+08:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang)
time=2025-05-02T20:13:57.139+08:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="3.6 GiB"
time=2025-05-02T20:13:57.247+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model"
time=2025-05-02T20:13:58.121+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-02T20:13:58.127+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-05-02T20:13:58.127+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-05-02T20:13:58.127+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-05-02T20:13:58.127+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-02T20:13:58.127+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-02T20:13:58.258+08:00 level=INFO source=server.go:624 msg="llama runner started in 1.27 seconds"
[GIN] 2025/05/02 - 20:13:58 | 200 |    1.5034494s |       127.0.0.1 | POST     "/api/generate"
time=2025-05-02T20:13:58.941+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed
[GIN] 2025/05/02 - 20:13:59 | 200 |    331.2297ms |       127.0.0.1 | POST     "/api/chat"
time=2025-05-02T20:13:59.555+08:00 level=ERROR source=server.go:454 msg="llama runner terminated" error="exit status 0xc0000409"
[GIN] 2025/05/02 - 20:17:11 | 200 |            0s |       127.0.0.1 | HEAD     "/"
time=2025-05-02T20:17:11.992+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:17:12.037+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/02 - 20:17:12 | 200 |     93.5256ms |       127.0.0.1 | POST     "/api/show"
time=2025-05-02T20:17:12.103+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:17:12.148+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:17:12.195+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:17:12.200+08:00 level=INFO source=server.go:105 msg="system memory" total="16.0 GiB" free="10.8 GiB" free_swap="9.1 GiB"
time=2025-05-02T20:17:12.202+08:00 level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[10.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
time=2025-05-02T20:17:12.270+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:17:12.274+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-02T20:17:12.279+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-05-02T20:17:12.279+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-05-02T20:17:12.279+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-05-02T20:17:12.279+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-02T20:17:12.279+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-02T20:17:12.290+08:00 level=INFO source=server.go:409 msg="starting llama server" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Administrator\\.ollama\\models\\blobs\\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 4 --no-mmap --parallel 2 --port 63257"
time=2025-05-02T20:17:12.300+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-05-02T20:17:12.300+08:00 level=INFO source=server.go:585 msg="waiting for llama runner to start responding"
time=2025-05-02T20:17:12.301+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error"
time=2025-05-02T20:17:12.328+08:00 level=INFO source=runner.go:861 msg="starting ollama engine"
time=2025-05-02T20:17:12.345+08:00 level=INFO source=runner.go:924 msg="Server listening on 127.0.0.1:63257"
time=2025-05-02T20:17:12.420+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:17:12.424+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default=""
time=2025-05-02T20:17:12.424+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default=""
time=2025-05-02T20:17:12.424+08:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
load_backend: loaded CPU backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll
time=2025-05-02T20:17:12.441+08:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang)
time=2025-05-02T20:17:12.445+08:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="3.6 GiB"
time=2025-05-02T20:17:12.553+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model"
time=2025-05-02T20:17:13.414+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-02T20:17:13.419+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-05-02T20:17:13.419+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-05-02T20:17:13.419+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-05-02T20:17:13.419+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-02T20:17:13.419+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-02T20:17:13.562+08:00 level=INFO source=server.go:624 msg="llama runner started in 1.26 seconds"
[GIN] 2025/05/02 - 20:17:13 | 200 |    1.5043563s |       127.0.0.1 | POST     "/api/generate"
time=2025-05-02T20:17:14.507+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed
[GIN] 2025/05/02 - 20:17:14 | 200 |    300.4559ms |       127.0.0.1 | POST     "/api/chat"
time=2025-05-02T20:17:15.092+08:00 level=ERROR source=server.go:454 msg="llama runner terminated" error="exit status 0xc0000409"
[GIN] 2025/05/02 - 20:20:52 | 200 |            0s |       127.0.0.1 | HEAD     "/"
time=2025-05-02T20:20:52.933+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:20:52.976+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/02 - 20:20:52 | 200 |     89.5658ms |       127.0.0.1 | POST     "/api/show"
time=2025-05-02T20:20:53.038+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:20:53.083+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:20:53.130+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:20:53.133+08:00 level=INFO source=server.go:105 msg="system memory" total="16.0 GiB" free="10.6 GiB" free_swap="8.9 GiB"
time=2025-05-02T20:20:53.135+08:00 level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[10.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
time=2025-05-02T20:20:53.204+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:20:53.208+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-02T20:20:53.213+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-05-02T20:20:53.213+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-05-02T20:20:53.213+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-05-02T20:20:53.213+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-02T20:20:53.213+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-02T20:20:53.225+08:00 level=INFO source=server.go:409 msg="starting llama server" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Administrator\\.ollama\\models\\blobs\\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 4 --no-mmap --parallel 2 --port 60447"
time=2025-05-02T20:20:53.234+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-05-02T20:20:53.234+08:00 level=INFO source=server.go:585 msg="waiting for llama runner to start responding"
time=2025-05-02T20:20:53.234+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error"
time=2025-05-02T20:20:53.261+08:00 level=INFO source=runner.go:861 msg="starting ollama engine"
time=2025-05-02T20:20:53.276+08:00 level=INFO source=runner.go:924 msg="Server listening on 127.0.0.1:60447"
time=2025-05-02T20:20:53.351+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:20:53.356+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default=""
time=2025-05-02T20:20:53.356+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default=""
time=2025-05-02T20:20:53.356+08:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
load_backend: loaded CPU backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll
time=2025-05-02T20:20:53.373+08:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang)
time=2025-05-02T20:20:53.377+08:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="3.6 GiB"
time=2025-05-02T20:20:53.488+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model"
time=2025-05-02T20:20:54.305+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-02T20:20:54.310+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-05-02T20:20:54.310+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-05-02T20:20:54.310+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-05-02T20:20:54.310+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-02T20:20:54.310+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-02T20:20:54.494+08:00 level=INFO source=server.go:624 msg="llama runner started in 1.26 seconds"
[GIN] 2025/05/02 - 20:20:54 | 200 |    1.4984859s |       127.0.0.1 | POST     "/api/generate"
time=2025-05-02T20:21:07.835+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed
[GIN] 2025/05/02 - 20:21:08 | 200 |    289.4455ms |       127.0.0.1 | POST     "/api/chat"
time=2025-05-02T20:21:08.342+08:00 level=ERROR source=server.go:454 msg="llama runner terminated" error="exit status 0xc0000409"

it's all done while pulling gemma3:4b, but just cannot run and interact with it. So what's that means with the msg key not found?

<!-- gh-comment-id:2847091358 --> @arkimium commented on GitHub (May 2, 2025): Here now is also appeared in Windows 21H2 with AMD Radeon RX 5500, with version 0.6.7. the error is the same as above: ``` Error: POST predict: Post "http://127.0.0.1:61115/completion": read tcp 127.0.0.1:61117->127.0.0.1:61115: wsarecv: An existing connection was forcibly closed by the remote host. ``` my fully server log: ``` 2025/05/02 20:10:48 routes.go:1233: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\Administrator\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]" time=2025-05-02T20:10:48.853+08:00 level=INFO source=images.go:458 msg="total blobs: 5" time=2025-05-02T20:10:48.853+08:00 level=INFO source=images.go:465 msg="total unused blobs removed: 0" time=2025-05-02T20:10:48.854+08:00 level=INFO source=routes.go:1300 msg="Listening on 127.0.0.1:11434 (version 0.6.7)" time=2025-05-02T20:10:48.854+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-05-02T20:10:48.854+08:00 level=INFO source=gpu_windows.go:167 msg=packages count=1 time=2025-05-02T20:10:48.854+08:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=4 efficiency=0 threads=8 time=2025-05-02T20:10:48.866+08:00 level=INFO source=gpu.go:377 msg="no compatible GPUs were discovered" time=2025-05-02T20:10:48.866+08:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="16.0 GiB" available="11.2 GiB" [GIN] 2025/05/02 - 20:10:49 | 200 | 0s | 127.0.0.1 | HEAD "/" time=2025-05-02T20:10:49.244+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:10:49.292+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/02 - 20:10:49 | 200 | 101.7662ms | 127.0.0.1 | POST "/api/show" time=2025-05-02T20:10:49.356+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:10:49.402+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:10:49.449+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:10:49.453+08:00 level=INFO source=server.go:105 msg="system memory" total="16.0 GiB" free="11.2 GiB" free_swap="9.4 GiB" time=2025-05-02T20:10:49.454+08:00 level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[11.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" time=2025-05-02T20:10:49.534+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:10:49.539+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-02T20:10:49.544+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-05-02T20:10:49.544+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-05-02T20:10:49.544+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-05-02T20:10:49.544+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-02T20:10:49.544+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-02T20:10:49.561+08:00 level=INFO source=server.go:409 msg="starting llama server" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Administrator\\.ollama\\models\\blobs\\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 4 --no-mmap --parallel 2 --port 62929" time=2025-05-02T20:10:49.565+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2025-05-02T20:10:49.565+08:00 level=INFO source=server.go:585 msg="waiting for llama runner to start responding" time=2025-05-02T20:10:49.566+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error" time=2025-05-02T20:10:49.593+08:00 level=INFO source=runner.go:861 msg="starting ollama engine" time=2025-05-02T20:10:49.609+08:00 level=INFO source=runner.go:924 msg="Server listening on 127.0.0.1:62929" time=2025-05-02T20:10:49.689+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:10:49.693+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default="" time=2025-05-02T20:10:49.693+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default="" time=2025-05-02T20:10:49.693+08:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 load_backend: loaded CPU backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll time=2025-05-02T20:10:49.711+08:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang) time=2025-05-02T20:10:49.716+08:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="3.6 GiB" time=2025-05-02T20:10:49.825+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model" time=2025-05-02T20:10:50.725+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-02T20:10:50.731+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-05-02T20:10:50.731+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-05-02T20:10:50.731+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-05-02T20:10:50.731+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-02T20:10:50.731+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-02T20:10:50.832+08:00 level=INFO source=server.go:624 msg="llama runner started in 1.27 seconds" [GIN] 2025/05/02 - 20:10:50 | 200 | 1.5180319s | 127.0.0.1 | POST "/api/generate" time=2025-05-02T20:10:51.549+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed [GIN] 2025/05/02 - 20:10:51 | 200 | 259.0518ms | 127.0.0.1 | POST "/api/chat" time=2025-05-02T20:10:52.043+08:00 level=ERROR source=server.go:454 msg="llama runner terminated" error="exit status 0xc0000409" [GIN] 2025/05/02 - 20:11:05 | 200 | 0s | 127.0.0.1 | HEAD "/" time=2025-05-02T20:11:05.056+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:11:05.101+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/02 - 20:11:05 | 200 | 94.8025ms | 127.0.0.1 | POST "/api/show" time=2025-05-02T20:11:05.166+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:11:05.213+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:11:05.259+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:11:05.264+08:00 level=INFO source=server.go:105 msg="system memory" total="16.0 GiB" free="11.1 GiB" free_swap="9.3 GiB" time=2025-05-02T20:11:05.265+08:00 level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[11.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" time=2025-05-02T20:11:05.335+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:11:05.339+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-02T20:11:05.343+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-05-02T20:11:05.343+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-05-02T20:11:05.343+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-05-02T20:11:05.343+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-02T20:11:05.343+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-02T20:11:05.355+08:00 level=INFO source=server.go:409 msg="starting llama server" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Administrator\\.ollama\\models\\blobs\\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 4 --no-mmap --parallel 2 --port 62964" time=2025-05-02T20:11:05.359+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2025-05-02T20:11:05.359+08:00 level=INFO source=server.go:585 msg="waiting for llama runner to start responding" time=2025-05-02T20:11:05.360+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error" time=2025-05-02T20:11:05.389+08:00 level=INFO source=runner.go:861 msg="starting ollama engine" time=2025-05-02T20:11:05.405+08:00 level=INFO source=runner.go:924 msg="Server listening on 127.0.0.1:62964" time=2025-05-02T20:11:05.483+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:11:05.487+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default="" time=2025-05-02T20:11:05.487+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default="" time=2025-05-02T20:11:05.487+08:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 load_backend: loaded CPU backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll time=2025-05-02T20:11:05.505+08:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang) time=2025-05-02T20:11:05.510+08:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="3.6 GiB" time=2025-05-02T20:11:05.616+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model" time=2025-05-02T20:11:06.504+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-02T20:11:06.509+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-05-02T20:11:06.510+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-05-02T20:11:06.510+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-05-02T20:11:06.510+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-02T20:11:06.510+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-02T20:11:06.632+08:00 level=INFO source=server.go:624 msg="llama runner started in 1.27 seconds" [GIN] 2025/05/02 - 20:11:06 | 200 | 1.5093505s | 127.0.0.1 | POST "/api/generate" time=2025-05-02T20:11:07.814+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed [GIN] 2025/05/02 - 20:11:08 | 200 | 269.4994ms | 127.0.0.1 | POST "/api/chat" time=2025-05-02T20:11:08.297+08:00 level=ERROR source=server.go:454 msg="llama runner terminated" error="exit status 0xc0000409" [GIN] 2025/05/02 - 20:11:09 | 200 | 0s | 127.0.0.1 | HEAD "/" time=2025-05-02T20:11:09.191+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:11:09.235+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/02 - 20:11:09 | 200 | 98.6684ms | 127.0.0.1 | POST "/api/show" time=2025-05-02T20:11:09.299+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:11:09.343+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:11:09.390+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:11:09.394+08:00 level=INFO source=server.go:105 msg="system memory" total="16.0 GiB" free="11.1 GiB" free_swap="9.3 GiB" time=2025-05-02T20:11:09.396+08:00 level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[11.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" time=2025-05-02T20:11:09.467+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:11:09.472+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-02T20:11:09.477+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-05-02T20:11:09.477+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-05-02T20:11:09.477+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-05-02T20:11:09.477+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-02T20:11:09.477+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-02T20:11:09.478+08:00 level=INFO source=server.go:409 msg="starting llama server" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Administrator\\.ollama\\models\\blobs\\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 4 --no-mmap --parallel 2 --port 62977" time=2025-05-02T20:11:09.482+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2025-05-02T20:11:09.482+08:00 level=INFO source=server.go:585 msg="waiting for llama runner to start responding" time=2025-05-02T20:11:09.483+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error" time=2025-05-02T20:11:09.512+08:00 level=INFO source=runner.go:861 msg="starting ollama engine" time=2025-05-02T20:11:09.529+08:00 level=INFO source=runner.go:924 msg="Server listening on 127.0.0.1:62977" time=2025-05-02T20:11:09.604+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:11:09.608+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default="" time=2025-05-02T20:11:09.608+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default="" time=2025-05-02T20:11:09.608+08:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 load_backend: loaded CPU backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll time=2025-05-02T20:11:09.625+08:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang) time=2025-05-02T20:11:09.631+08:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="3.6 GiB" time=2025-05-02T20:11:09.736+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model" time=2025-05-02T20:11:10.652+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-02T20:11:10.657+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-05-02T20:11:10.657+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-05-02T20:11:10.657+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-05-02T20:11:10.657+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-02T20:11:10.657+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 [GIN] 2025/05/02 - 20:11:10 | 200 | 1.4949226s | 127.0.0.1 | POST "/api/generate" time=2025-05-02T20:11:10.751+08:00 level=INFO source=server.go:624 msg="llama runner started in 1.27 seconds" time=2025-05-02T20:11:14.476+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed [GIN] 2025/05/02 - 20:11:14 | 200 | 222.3765ms | 127.0.0.1 | POST "/api/chat" time=2025-05-02T20:11:14.904+08:00 level=ERROR source=server.go:454 msg="llama runner terminated" error="exit status 0xc0000409" [GIN] 2025/05/02 - 20:13:09 | 200 | 0s | 127.0.0.1 | GET "/" [GIN] 2025/05/02 - 20:13:56 | 200 | 0s | 127.0.0.1 | HEAD "/" time=2025-05-02T20:13:56.689+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:13:56.734+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/02 - 20:13:56 | 200 | 96.0357ms | 127.0.0.1 | POST "/api/show" time=2025-05-02T20:13:56.797+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:13:56.841+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:13:56.887+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:13:56.891+08:00 level=INFO source=server.go:105 msg="system memory" total="16.0 GiB" free="10.8 GiB" free_swap="9.0 GiB" time=2025-05-02T20:13:56.892+08:00 level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[10.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" time=2025-05-02T20:13:56.965+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:13:56.970+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-02T20:13:56.975+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-05-02T20:13:56.975+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-05-02T20:13:56.975+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-05-02T20:13:56.975+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-02T20:13:56.975+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-02T20:13:56.986+08:00 level=INFO source=server.go:409 msg="starting llama server" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Administrator\\.ollama\\models\\blobs\\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 4 --no-mmap --parallel 2 --port 63140" time=2025-05-02T20:13:56.991+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2025-05-02T20:13:56.991+08:00 level=INFO source=server.go:585 msg="waiting for llama runner to start responding" time=2025-05-02T20:13:56.991+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error" time=2025-05-02T20:13:57.020+08:00 level=INFO source=runner.go:861 msg="starting ollama engine" time=2025-05-02T20:13:57.035+08:00 level=INFO source=runner.go:924 msg="Server listening on 127.0.0.1:63140" time=2025-05-02T20:13:57.112+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:13:57.116+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default="" time=2025-05-02T20:13:57.116+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default="" time=2025-05-02T20:13:57.116+08:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 load_backend: loaded CPU backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll time=2025-05-02T20:13:57.133+08:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang) time=2025-05-02T20:13:57.139+08:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="3.6 GiB" time=2025-05-02T20:13:57.247+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model" time=2025-05-02T20:13:58.121+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-02T20:13:58.127+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-05-02T20:13:58.127+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-05-02T20:13:58.127+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-05-02T20:13:58.127+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-02T20:13:58.127+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-02T20:13:58.258+08:00 level=INFO source=server.go:624 msg="llama runner started in 1.27 seconds" [GIN] 2025/05/02 - 20:13:58 | 200 | 1.5034494s | 127.0.0.1 | POST "/api/generate" time=2025-05-02T20:13:58.941+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed [GIN] 2025/05/02 - 20:13:59 | 200 | 331.2297ms | 127.0.0.1 | POST "/api/chat" time=2025-05-02T20:13:59.555+08:00 level=ERROR source=server.go:454 msg="llama runner terminated" error="exit status 0xc0000409" [GIN] 2025/05/02 - 20:17:11 | 200 | 0s | 127.0.0.1 | HEAD "/" time=2025-05-02T20:17:11.992+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:17:12.037+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/02 - 20:17:12 | 200 | 93.5256ms | 127.0.0.1 | POST "/api/show" time=2025-05-02T20:17:12.103+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:17:12.148+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:17:12.195+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:17:12.200+08:00 level=INFO source=server.go:105 msg="system memory" total="16.0 GiB" free="10.8 GiB" free_swap="9.1 GiB" time=2025-05-02T20:17:12.202+08:00 level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[10.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" time=2025-05-02T20:17:12.270+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:17:12.274+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-02T20:17:12.279+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-05-02T20:17:12.279+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-05-02T20:17:12.279+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-05-02T20:17:12.279+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-02T20:17:12.279+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-02T20:17:12.290+08:00 level=INFO source=server.go:409 msg="starting llama server" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Administrator\\.ollama\\models\\blobs\\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 4 --no-mmap --parallel 2 --port 63257" time=2025-05-02T20:17:12.300+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2025-05-02T20:17:12.300+08:00 level=INFO source=server.go:585 msg="waiting for llama runner to start responding" time=2025-05-02T20:17:12.301+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error" time=2025-05-02T20:17:12.328+08:00 level=INFO source=runner.go:861 msg="starting ollama engine" time=2025-05-02T20:17:12.345+08:00 level=INFO source=runner.go:924 msg="Server listening on 127.0.0.1:63257" time=2025-05-02T20:17:12.420+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:17:12.424+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default="" time=2025-05-02T20:17:12.424+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default="" time=2025-05-02T20:17:12.424+08:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 load_backend: loaded CPU backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll time=2025-05-02T20:17:12.441+08:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang) time=2025-05-02T20:17:12.445+08:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="3.6 GiB" time=2025-05-02T20:17:12.553+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model" time=2025-05-02T20:17:13.414+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-02T20:17:13.419+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-05-02T20:17:13.419+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-05-02T20:17:13.419+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-05-02T20:17:13.419+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-02T20:17:13.419+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-02T20:17:13.562+08:00 level=INFO source=server.go:624 msg="llama runner started in 1.26 seconds" [GIN] 2025/05/02 - 20:17:13 | 200 | 1.5043563s | 127.0.0.1 | POST "/api/generate" time=2025-05-02T20:17:14.507+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed [GIN] 2025/05/02 - 20:17:14 | 200 | 300.4559ms | 127.0.0.1 | POST "/api/chat" time=2025-05-02T20:17:15.092+08:00 level=ERROR source=server.go:454 msg="llama runner terminated" error="exit status 0xc0000409" [GIN] 2025/05/02 - 20:20:52 | 200 | 0s | 127.0.0.1 | HEAD "/" time=2025-05-02T20:20:52.933+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:20:52.976+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/02 - 20:20:52 | 200 | 89.5658ms | 127.0.0.1 | POST "/api/show" time=2025-05-02T20:20:53.038+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:20:53.083+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:20:53.130+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:20:53.133+08:00 level=INFO source=server.go:105 msg="system memory" total="16.0 GiB" free="10.6 GiB" free_swap="8.9 GiB" time=2025-05-02T20:20:53.135+08:00 level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[10.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" time=2025-05-02T20:20:53.204+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:20:53.208+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-02T20:20:53.213+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-05-02T20:20:53.213+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-05-02T20:20:53.213+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-05-02T20:20:53.213+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-02T20:20:53.213+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-02T20:20:53.225+08:00 level=INFO source=server.go:409 msg="starting llama server" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Administrator\\.ollama\\models\\blobs\\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 4 --no-mmap --parallel 2 --port 60447" time=2025-05-02T20:20:53.234+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2025-05-02T20:20:53.234+08:00 level=INFO source=server.go:585 msg="waiting for llama runner to start responding" time=2025-05-02T20:20:53.234+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error" time=2025-05-02T20:20:53.261+08:00 level=INFO source=runner.go:861 msg="starting ollama engine" time=2025-05-02T20:20:53.276+08:00 level=INFO source=runner.go:924 msg="Server listening on 127.0.0.1:60447" time=2025-05-02T20:20:53.351+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:20:53.356+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default="" time=2025-05-02T20:20:53.356+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default="" time=2025-05-02T20:20:53.356+08:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 load_backend: loaded CPU backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll time=2025-05-02T20:20:53.373+08:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang) time=2025-05-02T20:20:53.377+08:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="3.6 GiB" time=2025-05-02T20:20:53.488+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model" time=2025-05-02T20:20:54.305+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-02T20:20:54.310+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-05-02T20:20:54.310+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-05-02T20:20:54.310+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-05-02T20:20:54.310+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-02T20:20:54.310+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-02T20:20:54.494+08:00 level=INFO source=server.go:624 msg="llama runner started in 1.26 seconds" [GIN] 2025/05/02 - 20:20:54 | 200 | 1.4984859s | 127.0.0.1 | POST "/api/generate" time=2025-05-02T20:21:07.835+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed [GIN] 2025/05/02 - 20:21:08 | 200 | 289.4455ms | 127.0.0.1 | POST "/api/chat" time=2025-05-02T20:21:08.342+08:00 level=ERROR source=server.go:454 msg="llama runner terminated" error="exit status 0xc0000409" ``` it's all done while pulling `gemma3:4b`, but just cannot run and interact with it. So what's that means with the msg `key not found`?
Author
Owner

@anton-karlovskiy commented on GitHub (Sep 16, 2025):

@risingnew

👋 I’ve been trying to download llama3.2 on Windows 11 (ollama run llama3.2) and keep running into issues:

  • An existing connection was forcibly closed by the remote host
  • Error: max retries exceeded

What I’ve tried so far:

Image

From what I can tell, the main culprit seems to be an unstable connection. The workaround right now is to just re-run the command until it eventually completes, but this gets tedious.

To automate retries, I wrapped it in a small script. Sharing here in case it helps others or the devs want to consider integrating retry logic directly:

Windows
Open new notepad and put this code.

@echo off
:loop
echo Running ollama run llama3.2...
ollama run llama3.2

if %errorlevel% neq 0 (
    echo The command failed. Retrying... Press Ctrl+C to exit.
    goto loop
)
echo The command completed successfully!

Next go to save as… and pick All files as file type and name the file with .bat extension.
And finally you can run it.
To edit the code above, just change model name from llama3.2 to any other model you like.

Linux
I didn’t test the code for Linux but this should hopefully work.

#!/bin/bash

# Function to handle Ctrl+C
trap 'echo "Script terminated by user. Exiting..."; exit' SIGINT
while true; do
    echo "Running ollama run llama3.2..."
    ollama run llama3.2
    if [ $? -eq 0 ]; then
        echo "The command completed successfully!"
        break
    else
        echo "The command failed. Retrying... Press Ctrl+C to exit."
    fi
done

Save this as run_loop.sh then run these commands.

chmod +x run_loop.sh
./run_loop.sh

Re: https://medium.com/@timnirmal/ollama-max-retries-exceeded-error-de6e0f86383e

Good luck 🤞

<!-- gh-comment-id:3295893136 --> @anton-karlovskiy commented on GitHub (Sep 16, 2025): @risingnew 👋 I’ve been trying to download llama3.2 on Windows 11 (ollama run llama3.2) and keep running into issues: - An existing connection was forcibly closed by the remote host - Error: max retries exceeded What I’ve tried so far: - https://github.com/ollama/ollama/issues/3769#issuecomment-2076767384: closed VPN → no change - https://github.com/ollama/ollama/issues/6211#issuecomment-2626567037: changed DNS → made some progress (got to ~37%) but stuck again <img width="1086" height="124" alt="Image" src="https://github.com/user-attachments/assets/774acce1-b0af-4b66-b2ae-415baf6f0fe0" /> From what I can tell, the main culprit seems to be an unstable connection. The workaround right now is to just re-run the command until it eventually completes, but this gets tedious. To automate retries, I wrapped it in a small script. Sharing here in case it helps others or the devs want to consider integrating retry logic directly: **Windows** Open new notepad and put this code. ```bash @echo off :loop echo Running ollama run llama3.2... ollama run llama3.2 if %errorlevel% neq 0 ( echo The command failed. Retrying... Press Ctrl+C to exit. goto loop ) echo The command completed successfully! ``` Next go to save as… and pick All files as file type and name the file with `.bat` extension. And finally you can run it. To edit the code above, just change model name from `llama3.2` to any other model you like. **Linux** I didn’t test the code for Linux but this should hopefully work. #!/bin/bash ```bash # Function to handle Ctrl+C trap 'echo "Script terminated by user. Exiting..."; exit' SIGINT while true; do echo "Running ollama run llama3.2..." ollama run llama3.2 if [ $? -eq 0 ]; then echo "The command completed successfully!" break else echo "The command failed. Retrying... Press Ctrl+C to exit." fi done ``` Save this as `run_loop.sh` then run these commands. ```bash chmod +x run_loop.sh ``` ```bash ./run_loop.sh ``` Re: https://medium.com/@timnirmal/ollama-max-retries-exceeded-error-de6e0f86383e Good luck 🤞
Author
Owner

@sudo-Ram commented on GitHub (Oct 19, 2025):

改host文件

<!-- gh-comment-id:3419269263 --> @sudo-Ram commented on GitHub (Oct 19, 2025): 改host文件
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#48838