[GH-ISSUE #3769] An existing connection was forcibly closed by the remote host.Could you help me? #48838

New Issue

GiteaMirror · 2026-04-28T09:41:53-05:00

GiteaMirror commented

2026-04-28 09:41:53 -05:00

Originally created by @risingnew on GitHub (Apr 20, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3769

What is the issue?

PS C:\Users\Administrator\AppData\Local\Ollama> ollama run llama3
pulling manifest
Error: pull model manifest: Get "https://ollama.com/token?nonce=1AKxIvoajv-NPGYukzWJcA&scope=repository%!A(MISSING)library%!F(MISSING)llama3%!A(MISSING)pull&service=ollama.com&ts=1713578711": read tcp 192.168.124.11:53463->34.120.132.20:443: wsarecv: An existing connection was forcibly closed by the remote host.

(https://github.com/ollama/ollama/files/15046570/server.log)

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

No response

Originally created by @risingnew on GitHub (Apr 20, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3769 ### What is the issue? PS C:\Users\Administrator\AppData\Local\Ollama> ollama run llama3 pulling manifest Error: pull model manifest: Get "https://ollama.com/token?nonce=1AKxIvoajv-NPGYukzWJcA&scope=repository%!A(MISSING)library%!F(MISSING)llama3%!A(MISSING)pull&service=ollama.com&ts=1713578711": read tcp 192.168.124.11:53463->34.120.132.20:443: wsarecv: An existing connection was forcibly closed by the remote host. ![100614](https://github.com/ollama/ollama/assets/128674607/c25a50e8-0f0d-4291-b9bb-a5c40e7a604d) (https://github.com/ollama/ollama/files/15046570/server.log) ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version _No response_

GiteaMirror added the bug label 2026-04-28 09:41:53 -05:00

GiteaMirror closed this issue

2026-04-28 09:41:54 -05:00

GiteaMirror commented

2026-04-28 09:41:56 -05:00

@igorschlum commented on GitHub (Apr 20, 2024):

Hi @risingnew can you tell us What version of Ollama you are using?

@igorschlum commented on GitHub (Apr 20, 2024): Hi @risingnew can you tell us What version of Ollama you are using?

GiteaMirror commented

2026-04-28 09:41:56 -05:00

@risingnew commented on GitHub (Apr 20, 2024):

the latest vision 0.1.32

@risingnew commented on GitHub (Apr 20, 2024): the latest vision 0.1.32

GiteaMirror commented

2026-04-28 09:41:57 -05:00

@risingnew commented on GitHub (Apr 20, 2024):

Thankyou for helping

@risingnew commented on GitHub (Apr 20, 2024): Thankyou for helping

GiteaMirror commented

2026-04-28 09:41:58 -05:00

@risingnew commented on GitHub (Apr 20, 2024):

Hi @risingnew can you tell us What version of Ollama you are using?

Latest version 0.1.32,Thankyou

@risingnew commented on GitHub (Apr 20, 2024): > Hi @risingnew can you tell us What version of Ollama you are using? Latest version 0.1.32,Thankyou

GiteaMirror commented

2026-04-28 09:41:59 -05:00

@igorschlum commented on GitHub (Apr 20, 2024):

@risingnew I'm not on Windows, so I cannot reproduce Windows Issues. Did you try to load another model? Do you see same Issue?
In which country are you? Are at a home internet access or at a company internet access?

@igorschlum commented on GitHub (Apr 20, 2024): @risingnew I'm not on Windows, so I cannot reproduce Windows Issues. Did you try to load another model? Do you see same Issue? In which country are you? Are at a home internet access or at a company internet access?

GiteaMirror commented

2026-04-28 09:42:00 -05:00

@risingnew commented on GitHub (Apr 20, 2024):

@risingnew I'm not on Windows, so I cannot reproduce Windows Issues. Did you try to load another model? Do you see same Issue?
In which country are you? Are at a home internet access or at a company internet access?

Thank you. Do you mean the network problem is the cause of this problem? The same error occurred when I loaded the other models

@risingnew commented on GitHub (Apr 20, 2024): > @risingnew I'm not on Windows, so I cannot reproduce Windows Issues. Did you try to load another model? Do you see same Issue? > In which country are you? Are at a home internet access or at a company internet access? Thank you. Do you mean the network problem is the cause of this problem? The same error occurred when I loaded the other models

GiteaMirror commented

2026-04-28 09:42:01 -05:00

@risingnew commented on GitHub (Apr 20, 2024):

@igorschlum Is there any way to manually download the LLAMA3 model? And manually start him.

@risingnew commented on GitHub (Apr 20, 2024): @igorschlum Is there any way to manually download the LLAMA3 model? And manually start him.

GiteaMirror commented

2026-04-28 09:42:01 -05:00

@igorschlum commented on GitHub (Apr 22, 2024):

you can ask someone who successfully download it to copy it for you.

@igorschlum commented on GitHub (Apr 22, 2024): you can ask someone who successfully download it to copy it for you.

GiteaMirror commented

2026-04-28 09:42:03 -05:00

@Miameister commented on GitHub (Apr 22, 2024):

i have the same problem, have you solved this? Thankyou for helping

@Miameister commented on GitHub (Apr 22, 2024): i have the same problem, have you solved this? Thankyou for helping

GiteaMirror commented

2026-04-28 09:42:03 -05:00

@Saydoudou commented on GitHub (Apr 25, 2024):

same problem +1，have you solved it? @risingnew Thankyou for helping

@Saydoudou commented on GitHub (Apr 25, 2024): same problem +1，have you solved it? @risingnew Thankyou for helping

GiteaMirror commented

2026-04-28 09:42:06 -05:00

@igorschlum commented on GitHub (Apr 25, 2024):

There is an interesting other topic where someone is asking to limit the number of LLM that can be downloaded directly from Ollama to prevent the use of non free LLM's.
I think that Ollama should propose to download manually LLM's and allow the app to load LLMs from a file. Let see what the team will do on this.

@igorschlum commented on GitHub (Apr 25, 2024): There is an interesting other topic where someone is asking to limit the number of LLM that can be downloaded directly from Ollama to prevent the use of non free LLM's. I think that Ollama should propose to download manually LLM's and allow the app to load LLMs from a file. Let see what the team will do on this.

GiteaMirror commented

2026-04-28 09:42:10 -05:00

@Miameister commented on GitHub (Apr 25, 2024):

same problem +1，have you solved it? @risingnew Thankyou for helping

I`ve already solved it. At first I opened my vpn and tried a lot of time, it didnt work. but when i closed my vpn and repeated the command for many times it started to download. Maybe you can change your network and ping more times

@Miameister commented on GitHub (Apr 25, 2024): > same problem +1，have you solved it? @risingnew Thankyou for helping I`ve already solved it. At first I opened my vpn and tried a lot of time, it didnt work. but when i closed my vpn and repeated the command for many times it started to download. Maybe you can change your network and ping more times

GiteaMirror commented

2026-04-28 09:42:11 -05:00

@Saydoudou commented on GitHub (Apr 26, 2024):

The problem has been solved. Clash turns on tun mode, rule mode and system agent.

@Saydoudou commented on GitHub (Apr 26, 2024): The problem has been solved. Clash turns on tun mode, rule mode and system agent.

GiteaMirror commented

2026-04-28 09:42:12 -05:00

@igorschlum commented on GitHub (Apr 26, 2024):

@risingnew do you use a VPN? Do you still have the issue?

@igorschlum commented on GitHub (Apr 26, 2024): @risingnew do you use a VPN? Do you still have the issue?

GiteaMirror commented

2026-04-28 09:42:13 -05:00

@Saydoudou commented on GitHub (Apr 28, 2024):

ref: https://github.com/microsoft/WSL/releases/tag/2.0.0

@Saydoudou commented on GitHub (Apr 28, 2024): ref: https://github.com/microsoft/WSL/releases/tag/2.0.0

GiteaMirror commented

2026-04-28 09:42:14 -05:00

@Saydoudou commented on GitHub (Apr 28, 2024):

参考： https: //github.com/microsoft/WSL/releases/tag/2.0.0
It may be caused by "wsl: localhost proxy configuration detected, but not mirrored to WSL. WSL in NAT mode does not support localhost proxy"

@Saydoudou commented on GitHub (Apr 28, 2024): > 参考： https: [//github.com/microsoft/WSL/releases/tag/2.0.0](https://github.com/microsoft/WSL/releases/tag/2.0.0) It may be caused by "wsl: localhost proxy configuration detected, but not mirrored to WSL. WSL in NAT mode does not support localhost proxy"

GiteaMirror commented

2026-04-28 09:42:15 -05:00

@ALiftTime commented on GitHub (Apr 28, 2024):

参考： https: //github.com/microsoft/WSL/releases/tag/2.0.0
It may be caused by "wsl: localhost proxy configuration detected, but not mirrored to WSL. WSL in NAT mode does not support localhost proxy"

I used the above method to install WSL, but still cannot use it

@ALiftTime commented on GitHub (Apr 28, 2024): > > 参考： https: [//github.com/microsoft/WSL/releases/tag/2.0.0](https://github.com/microsoft/WSL/releases/tag/2.0.0) > > It may be caused by "wsl: localhost proxy configuration detected, but not mirrored to WSL. WSL in NAT mode does not support localhost proxy" I used the above method to install WSL, but still cannot use it

GiteaMirror commented

2026-04-28 09:42:15 -05:00

@wtrendong commented on GitHub (Apr 29, 2024):

The same problem also troubled me for several days. Inspired by Saydoudou's reply content, I turned on the network accelerator's
'global proxy'<全局代理> or called netcard mode<网卡模式>, in order to completely open the terminal's proxy function, and the model downloaded quickly. Thank you Saydoudou, thank you all.

@wtrendong commented on GitHub (Apr 29, 2024): The same problem also troubled me for several days. Inspired by Saydoudou's reply content, I turned on the network accelerator's 'global proxy'<全局代理> or called netcard mode<网卡模式>, in order to completely open the terminal's proxy function, and the model downloaded quickly. Thank you Saydoudou, thank you all.

GiteaMirror commented

2026-04-28 09:42:16 -05:00

@Saydoudou commented on GitHub (Apr 30, 2024):

参考： https://github.com/microsoft/WSL/releases/tag/2.0.0 可能
是由于“wsl: 检测到本地主机代理配置，但未镜像到 WSL。NAT 模式下的 WSL 不支持本地主机代理”引起的

我按照上面的方法安装了WSL，但是还是无法使用

It is recommended to change to win10 system.My computer at the company uses win11 system, but I still get errors frequently. The computer at home uses the win10 Pro system, and no error problems have been found.

@Saydoudou commented on GitHub (Apr 30, 2024): > > > 参考： https://github.com/microsoft/WSL/releases/tag/2.0.0[可能](https://github.com/microsoft/WSL/releases/tag/2.0.0) > > > 是由于“wsl: 检测到本地主机代理配置，但未镜像到 WSL。NAT 模式下的 WSL 不支持本地主机代理”引起的 > > 我按照上面的方法安装了WSL，但是还是无法使用 It is recommended to change to win10 system.My computer at the company uses win11 system, but I still get errors frequently. The computer at home uses the win10 Pro system, and no error problems have been found. ![微信图片_20240430172219](https://github.com/ollama/ollama/assets/167050763/03c029dc-e3b1-4cac-93c2-78873e920aeb)

GiteaMirror commented

2026-04-28 09:42:17 -05:00

@dhiltgen commented on GitHub (May 2, 2024):

This looks like a dup of #3504

@dhiltgen commented on GitHub (May 2, 2024): This looks like a dup of #3504

GiteaMirror commented

2026-04-28 09:42:18 -05:00

@vobear commented on GitHub (Jul 25, 2024):

Clash打开Tun模式，可能需要1-2分钟重试ollama run qwen2即可。

@vobear commented on GitHub (Jul 25, 2024): Clash打开Tun模式，可能需要1-2分钟重试ollama run qwen2即可。

GiteaMirror commented

2026-04-28 09:42:18 -05:00

@mgks commented on GitHub (Jul 29, 2024):

This looks like a dup of #3504

not the same.

@mgks commented on GitHub (Jul 29, 2024): > This looks like a dup of #3504 not the same.

GiteaMirror commented

2026-04-28 09:42:19 -05:00

@Panican-Whyasker commented on GitHub (Jan 30, 2025):

Hello all,
Just got a similar error while the model (deepseek-r1:671b) was in the middle of answering my question.

**Error: an error was encountered while running the model: read tcp 127.0.0.1:58122->127.0.0.1:52358: wsarecv: An existing connection was forcibly closed by the remote host.

Windows Server 2016 here on a NUMA machine with 4 Xeons.

@Panican-Whyasker commented on GitHub (Jan 30, 2025): Hello all, Just got a similar error while the model (deepseek-r1:671b) was in the middle of answering my question. - **Error: an error was encountered while running the model: read tcp 127.0.0.1:58122->127.0.0.1:52358: wsarecv: An existing connection was forcibly closed by the remote host. Windows Server 2016 here on a NUMA machine with 4 Xeons.

GiteaMirror commented

2026-04-28 09:42:19 -05:00

@Inakito commented on GitHub (Jan 31, 2025):

same here when loading deepseek r1 on windows 11.

@Inakito commented on GitHub (Jan 31, 2025): same here when loading deepseek r1 on windows 11.

GiteaMirror commented

2026-04-28 09:42:20 -05:00

@Set27 commented on GitHub (Feb 11, 2025):

+1

@Set27 commented on GitHub (Feb 11, 2025): +1

GiteaMirror commented

2026-04-28 09:42:21 -05:00

@Inakito commented on GitHub (Feb 11, 2025):

if you keep insisting it eventually finishes the load (i tried four times at different moments: morning, evening, night..)

@Inakito commented on GitHub (Feb 11, 2025): if you keep insisting it eventually finishes the load (i tried four times at different moments: morning, evening, night..)

GiteaMirror commented

2026-04-28 09:42:21 -05:00

@Panican-Whyasker commented on GitHub (Feb 11, 2025):

if you keep insisting it eventually finishes the load (i tried four times at different moments: morning, evening, night..)

Well, it eventually worked without error, once. When I asked my next question, its narrative was broken by the same error.

@Panican-Whyasker commented on GitHub (Feb 11, 2025): > if you keep insisting it eventually finishes the load (i tried four times at different moments: morning, evening, night..) Well, it eventually worked without error, once. When I asked my next question, its narrative was broken by the same error.

GiteaMirror commented

2026-04-28 09:42:22 -05:00

@Panican-Whyasker commented on GitHub (Feb 14, 2025):

Okay, it seems that I have found the cause for this error. See here:

https://github.com/ollama/ollama/issues/8074#issuecomment-2658648380

By default, Ollama sets the parameter num_ctx to just 2048 (max. tokens in a LLM's output).

If your LLM ouotputs more that 2048 tokens, it will crash.

I ran successfully DeepSeek-R1 (671B) a few times but the output was always below 2048 tokens.

WARNING: They say that a model sized at 404 GB would require ~600 GB of free system RAM in order to run with a num_ctx set at 4048.

/set parameter num_ctx 4096
Set parameter 'num_ctx' to '4096'

With Ollama 0.5.1, the model takes slightly more RAM than usual (~460 GB).

@Panican-Whyasker commented on GitHub (Feb 14, 2025): Okay, it seems that I have found the cause for this error. See here: [https://github.com/ollama/ollama/issues/8074#issuecomment-2658648380](url) By default, Ollama sets the parameter num_ctx to just 2048 (max. tokens in a LLM's output). If your LLM ouotputs more that 2048 tokens, it will crash. I ran successfully DeepSeek-R1 (671B) a few times but the output was always below 2048 tokens. WARNING: They say that a model sized at 404 GB would require ~600 GB of free system RAM in order to run with a num_ctx set at 4048. >>> /set parameter num_ctx 4096 Set parameter 'num_ctx' to '4096' With Ollama 0.5.1, the model takes slightly more RAM than usual (~460 GB).

GiteaMirror commented

2026-04-28 09:42:23 -05:00

@Panican-Whyasker commented on GitHub (Feb 20, 2025):

This time, deepseek-r1:671b crashed with an error almost immediately, without being able to yield any output.

Right after setting 'num_thread' to '36' and 'num_ctx' to '16384' and entering my prompt, I got:

Error: llama runner process has terminated: exit status 2

Ollama was updated to version 0.5.11 before that and I re-launched the Windows PowerShell.

Things are getting worse. :(

@Panican-Whyasker commented on GitHub (Feb 20, 2025): This time, deepseek-r1:671b crashed with an error almost immediately, without being able to yield any output. Right after setting 'num_thread' to '36' and 'num_ctx' to '16384' and entering my prompt, I got: Error: llama runner process has terminated: exit status 2 Ollama was updated to version 0.5.11 before that and I re-launched the Windows PowerShell. Things are getting worse. :(

GiteaMirror commented

2026-04-28 09:42:23 -05:00

@mcDandy commented on GitHub (Mar 16, 2025):

I have the same issue with gemma3 and ollama 0.6.1

@mcDandy commented on GitHub (Mar 16, 2025): I have the same issue with gemma3 and ollama 0.6.1

GiteaMirror commented

2026-04-28 09:42:24 -05:00

@chenbridge commented on GitHub (Apr 6, 2025):

I have the same issue with ollama 0.6.4

C:\Users\Bridge>ollama -v
ollama version is 0.6.4

C:\Users\Bridge>ollama run gemma3:4b hi
Error: POST predict: Post "http://127.0.0.1:54956/completion": read tcp 127.0.0.1:54959->127.0.0.1:54956: wsarecv: An existing connection was forcibly closed by the remote host.

@chenbridge commented on GitHub (Apr 6, 2025): I have the same issue with ollama 0.6.4 ``` C:\Users\Bridge>ollama -v ollama version is 0.6.4 C:\Users\Bridge>ollama run gemma3:4b hi Error: POST predict: Post "http://127.0.0.1:54956/completion": read tcp 127.0.0.1:54959->127.0.0.1:54956: wsarecv: An existing connection was forcibly closed by the remote host. ```

GiteaMirror commented

2026-04-28 09:42:25 -05:00

@igorschlum commented on GitHub (Apr 6, 2025):

deepseek-r1:671b needs 671GB of Vram a big big big computer. Do you have that? If not choose a smaller deepseek LLM.

@igorschlum commented on GitHub (Apr 6, 2025): deepseek-r1:671b needs 671GB of Vram a big big big computer. Do you have that? If not choose a smaller deepseek LLM.

GiteaMirror commented

2026-04-28 09:42:25 -05:00

@zhiyang12345 commented on GitHub (Apr 27, 2025):

大家好，当模型（deepseek-r1：671b）正在回答我的问题时，刚刚遇到了类似的错误。

错误：运行模型时遇到错误：读取 tcp 127.0.0.1：58122->127.0.0.1：52358：wsarecv：远程主机强行关闭了现有连接。

Windows Server 2016 位于具有 4 个 Xeon 的 NUMA 计算机上。

do you resolve this bug？i meet this bug also

@zhiyang12345 commented on GitHub (Apr 27, 2025): > 大家好，当模型（deepseek-r1：671b）正在回答我的问题时，刚刚遇到了类似的错误。 > > * 错误：运行模型时遇到错误：读取 tcp 127.0.0.1：58122->127.0.0.1：52358：wsarecv：远程主机强行关闭了现有连接。 > > Windows Server 2016 位于具有 4 个 Xeon 的 NUMA 计算机上。 do you resolve this bug？i meet this bug also

GiteaMirror commented

2026-04-28 09:42:26 -05:00

@redsun1988 commented on GitHub (Apr 30, 2025):

The issue is still here.
Any help is greatly appreciated.

My ollama version is:

ollama -v
ollama version is 0.6.6

I run it on Windows 11 Pro.
Here is the error message:

Error: POST predict: Post "http://127.0.0.1:51470/completion": read tcp 127.0.0.1:51473->127.0.0.1:51470: wsarecv: An existing connection was forcibly closed by the remote host.

No issue to download models. I have this error practically everywhere. Working with tiny deepseek-r1:1.5b or requesting embeddings for paraphrase-multilingual. Yesterday morning it worked just fine and suddenly had started to throw this error.

Tried to reinstall ollama and redownload models.

Here is the part of my server.txt

time=2025-04-30T11:10:00.205+03:00 level=INFO source=server.go:619 msg="llama runner started in 0.75 seconds"
[GIN] 2025/04/30 - 11:10:00 | 200 |    1.5762114s |       127.0.0.1 | POST     "/api/generate"
time=2025-04-30T11:10:05.275+03:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed
[GIN] 2025/04/30 - 11:10:05 | 200 |     62.4184ms |       127.0.0.1 | POST     "/api/chat"
time=2025-04-30T11:10:05.386+03:00 level=ERROR source=server.go:449 msg="llama runner terminated" error="exit status 0xc0000409"
[GIN] 2025/04/30 - 11:16:55 | 200 |      6.2555ms |       127.0.0.1 | GET      "/api/version"

@redsun1988 commented on GitHub (Apr 30, 2025): The issue is still here. Any help is greatly appreciated. My ollama version is: ```cmd ollama -v ollama version is 0.6.6 ``` I run it on Windows 11 Pro. Here is the error message: ``` Error: POST predict: Post "http://127.0.0.1:51470/completion": read tcp 127.0.0.1:51473->127.0.0.1:51470: wsarecv: An existing connection was forcibly closed by the remote host. ``` No issue to download models. I have this error practically everywhere. Working with tiny **deepseek-r1:1.5b** or requesting embeddings for **paraphrase-multilingual**. Yesterday morning it worked just fine and suddenly had started to throw this error. Tried to reinstall ollama and redownload models. Here is the part of my server.txt ``` time=2025-04-30T11:10:00.205+03:00 level=INFO source=server.go:619 msg="llama runner started in 0.75 seconds" [GIN] 2025/04/30 - 11:10:00 | 200 | 1.5762114s | 127.0.0.1 | POST "/api/generate" time=2025-04-30T11:10:05.275+03:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed [GIN] 2025/04/30 - 11:10:05 | 200 | 62.4184ms | 127.0.0.1 | POST "/api/chat" time=2025-04-30T11:10:05.386+03:00 level=ERROR source=server.go:449 msg="llama runner terminated" error="exit status 0xc0000409" [GIN] 2025/04/30 - 11:16:55 | 200 | 6.2555ms | 127.0.0.1 | GET "/api/version" ```

GiteaMirror commented

2026-04-28 09:42:27 -05:00

@arkimium commented on GitHub (May 2, 2025):

Here now is also appeared in Windows 21H2 with AMD Radeon RX 5500, with version 0.6.7.

the error is the same as above:

Error: POST predict: Post "http://127.0.0.1:61115/completion": read tcp 127.0.0.1:61117->127.0.0.1:61115: wsarecv: An existing connection was forcibly closed by the remote host.

my fully server log:

2025/05/02 20:10:48 routes.go:1233: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\Administrator\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]"
time=2025-05-02T20:10:48.853+08:00 level=INFO source=images.go:458 msg="total blobs: 5"
time=2025-05-02T20:10:48.853+08:00 level=INFO source=images.go:465 msg="total unused blobs removed: 0"
time=2025-05-02T20:10:48.854+08:00 level=INFO source=routes.go:1300 msg="Listening on 127.0.0.1:11434 (version 0.6.7)"
time=2025-05-02T20:10:48.854+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-05-02T20:10:48.854+08:00 level=INFO source=gpu_windows.go:167 msg=packages count=1
time=2025-05-02T20:10:48.854+08:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=4 efficiency=0 threads=8
time=2025-05-02T20:10:48.866+08:00 level=INFO source=gpu.go:377 msg="no compatible GPUs were discovered"
time=2025-05-02T20:10:48.866+08:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="16.0 GiB" available="11.2 GiB"
[GIN] 2025/05/02 - 20:10:49 | 200 |            0s |       127.0.0.1 | HEAD     "/"
time=2025-05-02T20:10:49.244+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:10:49.292+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/02 - 20:10:49 | 200 |    101.7662ms |       127.0.0.1 | POST     "/api/show"
time=2025-05-02T20:10:49.356+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:10:49.402+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:10:49.449+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:10:49.453+08:00 level=INFO source=server.go:105 msg="system memory" total="16.0 GiB" free="11.2 GiB" free_swap="9.4 GiB"
time=2025-05-02T20:10:49.454+08:00 level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[11.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
time=2025-05-02T20:10:49.534+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:10:49.539+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-02T20:10:49.544+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-05-02T20:10:49.544+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-05-02T20:10:49.544+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-05-02T20:10:49.544+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-02T20:10:49.544+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-02T20:10:49.561+08:00 level=INFO source=server.go:409 msg="starting llama server" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Administrator\\.ollama\\models\\blobs\\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 4 --no-mmap --parallel 2 --port 62929"
time=2025-05-02T20:10:49.565+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-05-02T20:10:49.565+08:00 level=INFO source=server.go:585 msg="waiting for llama runner to start responding"
time=2025-05-02T20:10:49.566+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error"
time=2025-05-02T20:10:49.593+08:00 level=INFO source=runner.go:861 msg="starting ollama engine"
time=2025-05-02T20:10:49.609+08:00 level=INFO source=runner.go:924 msg="Server listening on 127.0.0.1:62929"
time=2025-05-02T20:10:49.689+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:10:49.693+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default=""
time=2025-05-02T20:10:49.693+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default=""
time=2025-05-02T20:10:49.693+08:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
load_backend: loaded CPU backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll
time=2025-05-02T20:10:49.711+08:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang)
time=2025-05-02T20:10:49.716+08:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="3.6 GiB"
time=2025-05-02T20:10:49.825+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model"
time=2025-05-02T20:10:50.725+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-02T20:10:50.731+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-05-02T20:10:50.731+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-05-02T20:10:50.731+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-05-02T20:10:50.731+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-02T20:10:50.731+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-02T20:10:50.832+08:00 level=INFO source=server.go:624 msg="llama runner started in 1.27 seconds"
[GIN] 2025/05/02 - 20:10:50 | 200 |    1.5180319s |       127.0.0.1 | POST     "/api/generate"
time=2025-05-02T20:10:51.549+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed
[GIN] 2025/05/02 - 20:10:51 | 200 |    259.0518ms |       127.0.0.1 | POST     "/api/chat"
time=2025-05-02T20:10:52.043+08:00 level=ERROR source=server.go:454 msg="llama runner terminated" error="exit status 0xc0000409"
[GIN] 2025/05/02 - 20:11:05 | 200 |            0s |       127.0.0.1 | HEAD     "/"
time=2025-05-02T20:11:05.056+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:11:05.101+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/02 - 20:11:05 | 200 |     94.8025ms |       127.0.0.1 | POST     "/api/show"
time=2025-05-02T20:11:05.166+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:11:05.213+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:11:05.259+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:11:05.264+08:00 level=INFO source=server.go:105 msg="system memory" total="16.0 GiB" free="11.1 GiB" free_swap="9.3 GiB"
time=2025-05-02T20:11:05.265+08:00 level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[11.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
time=2025-05-02T20:11:05.335+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:11:05.339+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-02T20:11:05.343+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-05-02T20:11:05.343+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-05-02T20:11:05.343+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-05-02T20:11:05.343+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-02T20:11:05.343+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-02T20:11:05.355+08:00 level=INFO source=server.go:409 msg="starting llama server" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Administrator\\.ollama\\models\\blobs\\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 4 --no-mmap --parallel 2 --port 62964"
time=2025-05-02T20:11:05.359+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-05-02T20:11:05.359+08:00 level=INFO source=server.go:585 msg="waiting for llama runner to start responding"
time=2025-05-02T20:11:05.360+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error"
time=2025-05-02T20:11:05.389+08:00 level=INFO source=runner.go:861 msg="starting ollama engine"
time=2025-05-02T20:11:05.405+08:00 level=INFO source=runner.go:924 msg="Server listening on 127.0.0.1:62964"
time=2025-05-02T20:11:05.483+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:11:05.487+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default=""
time=2025-05-02T20:11:05.487+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default=""
time=2025-05-02T20:11:05.487+08:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
load_backend: loaded CPU backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll
time=2025-05-02T20:11:05.505+08:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang)
time=2025-05-02T20:11:05.510+08:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="3.6 GiB"
time=2025-05-02T20:11:05.616+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model"
time=2025-05-02T20:11:06.504+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-02T20:11:06.509+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-05-02T20:11:06.510+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-05-02T20:11:06.510+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-05-02T20:11:06.510+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-02T20:11:06.510+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-02T20:11:06.632+08:00 level=INFO source=server.go:624 msg="llama runner started in 1.27 seconds"
[GIN] 2025/05/02 - 20:11:06 | 200 |    1.5093505s |       127.0.0.1 | POST     "/api/generate"
time=2025-05-02T20:11:07.814+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed
[GIN] 2025/05/02 - 20:11:08 | 200 |    269.4994ms |       127.0.0.1 | POST     "/api/chat"
time=2025-05-02T20:11:08.297+08:00 level=ERROR source=server.go:454 msg="llama runner terminated" error="exit status 0xc0000409"
[GIN] 2025/05/02 - 20:11:09 | 200 |            0s |       127.0.0.1 | HEAD     "/"
time=2025-05-02T20:11:09.191+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:11:09.235+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/02 - 20:11:09 | 200 |     98.6684ms |       127.0.0.1 | POST     "/api/show"
time=2025-05-02T20:11:09.299+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:11:09.343+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:11:09.390+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:11:09.394+08:00 level=INFO source=server.go:105 msg="system memory" total="16.0 GiB" free="11.1 GiB" free_swap="9.3 GiB"
time=2025-05-02T20:11:09.396+08:00 level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[11.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
time=2025-05-02T20:11:09.467+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:11:09.472+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-02T20:11:09.477+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-05-02T20:11:09.477+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-05-02T20:11:09.477+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-05-02T20:11:09.477+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-02T20:11:09.477+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-02T20:11:09.478+08:00 level=INFO source=server.go:409 msg="starting llama server" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Administrator\\.ollama\\models\\blobs\\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 4 --no-mmap --parallel 2 --port 62977"
time=2025-05-02T20:11:09.482+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-05-02T20:11:09.482+08:00 level=INFO source=server.go:585 msg="waiting for llama runner to start responding"
time=2025-05-02T20:11:09.483+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error"
time=2025-05-02T20:11:09.512+08:00 level=INFO source=runner.go:861 msg="starting ollama engine"
time=2025-05-02T20:11:09.529+08:00 level=INFO source=runner.go:924 msg="Server listening on 127.0.0.1:62977"
time=2025-05-02T20:11:09.604+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:11:09.608+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default=""
time=2025-05-02T20:11:09.608+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default=""
time=2025-05-02T20:11:09.608+08:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
load_backend: loaded CPU backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll
time=2025-05-02T20:11:09.625+08:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang)
time=2025-05-02T20:11:09.631+08:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="3.6 GiB"
time=2025-05-02T20:11:09.736+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model"
time=2025-05-02T20:11:10.652+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-02T20:11:10.657+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-05-02T20:11:10.657+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-05-02T20:11:10.657+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-05-02T20:11:10.657+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-02T20:11:10.657+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
[GIN] 2025/05/02 - 20:11:10 | 200 |    1.4949226s |       127.0.0.1 | POST     "/api/generate"
time=2025-05-02T20:11:10.751+08:00 level=INFO source=server.go:624 msg="llama runner started in 1.27 seconds"
time=2025-05-02T20:11:14.476+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed
[GIN] 2025/05/02 - 20:11:14 | 200 |    222.3765ms |       127.0.0.1 | POST     "/api/chat"
time=2025-05-02T20:11:14.904+08:00 level=ERROR source=server.go:454 msg="llama runner terminated" error="exit status 0xc0000409"
[GIN] 2025/05/02 - 20:13:09 | 200 |            0s |       127.0.0.1 | GET      "/"
[GIN] 2025/05/02 - 20:13:56 | 200 |            0s |       127.0.0.1 | HEAD     "/"
time=2025-05-02T20:13:56.689+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:13:56.734+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/02 - 20:13:56 | 200 |     96.0357ms |       127.0.0.1 | POST     "/api/show"
time=2025-05-02T20:13:56.797+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:13:56.841+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:13:56.887+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:13:56.891+08:00 level=INFO source=server.go:105 msg="system memory" total="16.0 GiB" free="10.8 GiB" free_swap="9.0 GiB"
time=2025-05-02T20:13:56.892+08:00 level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[10.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
time=2025-05-02T20:13:56.965+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:13:56.970+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-02T20:13:56.975+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-05-02T20:13:56.975+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-05-02T20:13:56.975+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-05-02T20:13:56.975+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-02T20:13:56.975+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-02T20:13:56.986+08:00 level=INFO source=server.go:409 msg="starting llama server" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Administrator\\.ollama\\models\\blobs\\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 4 --no-mmap --parallel 2 --port 63140"
time=2025-05-02T20:13:56.991+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-05-02T20:13:56.991+08:00 level=INFO source=server.go:585 msg="waiting for llama runner to start responding"
time=2025-05-02T20:13:56.991+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error"
time=2025-05-02T20:13:57.020+08:00 level=INFO source=runner.go:861 msg="starting ollama engine"
time=2025-05-02T20:13:57.035+08:00 level=INFO source=runner.go:924 msg="Server listening on 127.0.0.1:63140"
time=2025-05-02T20:13:57.112+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:13:57.116+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default=""
time=2025-05-02T20:13:57.116+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default=""
time=2025-05-02T20:13:57.116+08:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
load_backend: loaded CPU backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll
time=2025-05-02T20:13:57.133+08:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang)
time=2025-05-02T20:13:57.139+08:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="3.6 GiB"
time=2025-05-02T20:13:57.247+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model"
time=2025-05-02T20:13:58.121+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-02T20:13:58.127+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-05-02T20:13:58.127+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-05-02T20:13:58.127+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-05-02T20:13:58.127+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-02T20:13:58.127+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-02T20:13:58.258+08:00 level=INFO source=server.go:624 msg="llama runner started in 1.27 seconds"
[GIN] 2025/05/02 - 20:13:58 | 200 |    1.5034494s |       127.0.0.1 | POST     "/api/generate"
time=2025-05-02T20:13:58.941+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed
[GIN] 2025/05/02 - 20:13:59 | 200 |    331.2297ms |       127.0.0.1 | POST     "/api/chat"
time=2025-05-02T20:13:59.555+08:00 level=ERROR source=server.go:454 msg="llama runner terminated" error="exit status 0xc0000409"
[GIN] 2025/05/02 - 20:17:11 | 200 |            0s |       127.0.0.1 | HEAD     "/"
time=2025-05-02T20:17:11.992+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:17:12.037+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/02 - 20:17:12 | 200 |     93.5256ms |       127.0.0.1 | POST     "/api/show"
time=2025-05-02T20:17:12.103+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:17:12.148+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:17:12.195+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:17:12.200+08:00 level=INFO source=server.go:105 msg="system memory" total="16.0 GiB" free="10.8 GiB" free_swap="9.1 GiB"
time=2025-05-02T20:17:12.202+08:00 level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[10.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
time=2025-05-02T20:17:12.270+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:17:12.274+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-02T20:17:12.279+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-05-02T20:17:12.279+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-05-02T20:17:12.279+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-05-02T20:17:12.279+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-02T20:17:12.279+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-02T20:17:12.290+08:00 level=INFO source=server.go:409 msg="starting llama server" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Administrator\\.ollama\\models\\blobs\\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 4 --no-mmap --parallel 2 --port 63257"
time=2025-05-02T20:17:12.300+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-05-02T20:17:12.300+08:00 level=INFO source=server.go:585 msg="waiting for llama runner to start responding"
time=2025-05-02T20:17:12.301+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error"
time=2025-05-02T20:17:12.328+08:00 level=INFO source=runner.go:861 msg="starting ollama engine"
time=2025-05-02T20:17:12.345+08:00 level=INFO source=runner.go:924 msg="Server listening on 127.0.0.1:63257"
time=2025-05-02T20:17:12.420+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:17:12.424+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default=""
time=2025-05-02T20:17:12.424+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default=""
time=2025-05-02T20:17:12.424+08:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
load_backend: loaded CPU backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll
time=2025-05-02T20:17:12.441+08:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang)
time=2025-05-02T20:17:12.445+08:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="3.6 GiB"
time=2025-05-02T20:17:12.553+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model"
time=2025-05-02T20:17:13.414+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-02T20:17:13.419+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-05-02T20:17:13.419+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-05-02T20:17:13.419+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-05-02T20:17:13.419+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-02T20:17:13.419+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-02T20:17:13.562+08:00 level=INFO source=server.go:624 msg="llama runner started in 1.26 seconds"
[GIN] 2025/05/02 - 20:17:13 | 200 |    1.5043563s |       127.0.0.1 | POST     "/api/generate"
time=2025-05-02T20:17:14.507+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed
[GIN] 2025/05/02 - 20:17:14 | 200 |    300.4559ms |       127.0.0.1 | POST     "/api/chat"
time=2025-05-02T20:17:15.092+08:00 level=ERROR source=server.go:454 msg="llama runner terminated" error="exit status 0xc0000409"
[GIN] 2025/05/02 - 20:20:52 | 200 |            0s |       127.0.0.1 | HEAD     "/"
time=2025-05-02T20:20:52.933+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:20:52.976+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/02 - 20:20:52 | 200 |     89.5658ms |       127.0.0.1 | POST     "/api/show"
time=2025-05-02T20:20:53.038+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:20:53.083+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:20:53.130+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:20:53.133+08:00 level=INFO source=server.go:105 msg="system memory" total="16.0 GiB" free="10.6 GiB" free_swap="8.9 GiB"
time=2025-05-02T20:20:53.135+08:00 level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[10.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
time=2025-05-02T20:20:53.204+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:20:53.208+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-02T20:20:53.213+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-05-02T20:20:53.213+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-05-02T20:20:53.213+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-05-02T20:20:53.213+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-02T20:20:53.213+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-02T20:20:53.225+08:00 level=INFO source=server.go:409 msg="starting llama server" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Administrator\\.ollama\\models\\blobs\\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 4 --no-mmap --parallel 2 --port 60447"
time=2025-05-02T20:20:53.234+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-05-02T20:20:53.234+08:00 level=INFO source=server.go:585 msg="waiting for llama runner to start responding"
time=2025-05-02T20:20:53.234+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error"
time=2025-05-02T20:20:53.261+08:00 level=INFO source=runner.go:861 msg="starting ollama engine"
time=2025-05-02T20:20:53.276+08:00 level=INFO source=runner.go:924 msg="Server listening on 127.0.0.1:60447"
time=2025-05-02T20:20:53.351+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-02T20:20:53.356+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default=""
time=2025-05-02T20:20:53.356+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default=""
time=2025-05-02T20:20:53.356+08:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
load_backend: loaded CPU backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll
time=2025-05-02T20:20:53.373+08:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang)
time=2025-05-02T20:20:53.377+08:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="3.6 GiB"
time=2025-05-02T20:20:53.488+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model"
time=2025-05-02T20:20:54.305+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-02T20:20:54.310+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-05-02T20:20:54.310+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-05-02T20:20:54.310+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-05-02T20:20:54.310+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-02T20:20:54.310+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-02T20:20:54.494+08:00 level=INFO source=server.go:624 msg="llama runner started in 1.26 seconds"
[GIN] 2025/05/02 - 20:20:54 | 200 |    1.4984859s |       127.0.0.1 | POST     "/api/generate"
time=2025-05-02T20:21:07.835+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed
[GIN] 2025/05/02 - 20:21:08 | 200 |    289.4455ms |       127.0.0.1 | POST     "/api/chat"
time=2025-05-02T20:21:08.342+08:00 level=ERROR source=server.go:454 msg="llama runner terminated" error="exit status 0xc0000409"

it's all done while pulling gemma3:4b, but just cannot run and interact with it. So what's that means with the msg key not found?

@arkimium commented on GitHub (May 2, 2025): Here now is also appeared in Windows 21H2 with AMD Radeon RX 5500, with version 0.6.7. the error is the same as above: ``` Error: POST predict: Post "http://127.0.0.1:61115/completion": read tcp 127.0.0.1:61117->127.0.0.1:61115: wsarecv: An existing connection was forcibly closed by the remote host. ``` my fully server log: ``` 2025/05/02 20:10:48 routes.go:1233: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\Administrator\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]" time=2025-05-02T20:10:48.853+08:00 level=INFO source=images.go:458 msg="total blobs: 5" time=2025-05-02T20:10:48.853+08:00 level=INFO source=images.go:465 msg="total unused blobs removed: 0" time=2025-05-02T20:10:48.854+08:00 level=INFO source=routes.go:1300 msg="Listening on 127.0.0.1:11434 (version 0.6.7)" time=2025-05-02T20:10:48.854+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-05-02T20:10:48.854+08:00 level=INFO source=gpu_windows.go:167 msg=packages count=1 time=2025-05-02T20:10:48.854+08:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=4 efficiency=0 threads=8 time=2025-05-02T20:10:48.866+08:00 level=INFO source=gpu.go:377 msg="no compatible GPUs were discovered" time=2025-05-02T20:10:48.866+08:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="16.0 GiB" available="11.2 GiB" [GIN] 2025/05/02 - 20:10:49 | 200 | 0s | 127.0.0.1 | HEAD "/" time=2025-05-02T20:10:49.244+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:10:49.292+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/02 - 20:10:49 | 200 | 101.7662ms | 127.0.0.1 | POST "/api/show" time=2025-05-02T20:10:49.356+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:10:49.402+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:10:49.449+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:10:49.453+08:00 level=INFO source=server.go:105 msg="system memory" total="16.0 GiB" free="11.2 GiB" free_swap="9.4 GiB" time=2025-05-02T20:10:49.454+08:00 level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[11.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" time=2025-05-02T20:10:49.534+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:10:49.539+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-02T20:10:49.544+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-05-02T20:10:49.544+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-05-02T20:10:49.544+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-05-02T20:10:49.544+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-02T20:10:49.544+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-02T20:10:49.561+08:00 level=INFO source=server.go:409 msg="starting llama server" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Administrator\\.ollama\\models\\blobs\\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 4 --no-mmap --parallel 2 --port 62929" time=2025-05-02T20:10:49.565+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2025-05-02T20:10:49.565+08:00 level=INFO source=server.go:585 msg="waiting for llama runner to start responding" time=2025-05-02T20:10:49.566+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error" time=2025-05-02T20:10:49.593+08:00 level=INFO source=runner.go:861 msg="starting ollama engine" time=2025-05-02T20:10:49.609+08:00 level=INFO source=runner.go:924 msg="Server listening on 127.0.0.1:62929" time=2025-05-02T20:10:49.689+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:10:49.693+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default="" time=2025-05-02T20:10:49.693+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default="" time=2025-05-02T20:10:49.693+08:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 load_backend: loaded CPU backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll time=2025-05-02T20:10:49.711+08:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang) time=2025-05-02T20:10:49.716+08:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="3.6 GiB" time=2025-05-02T20:10:49.825+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model" time=2025-05-02T20:10:50.725+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-02T20:10:50.731+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-05-02T20:10:50.731+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-05-02T20:10:50.731+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-05-02T20:10:50.731+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-02T20:10:50.731+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-02T20:10:50.832+08:00 level=INFO source=server.go:624 msg="llama runner started in 1.27 seconds" [GIN] 2025/05/02 - 20:10:50 | 200 | 1.5180319s | 127.0.0.1 | POST "/api/generate" time=2025-05-02T20:10:51.549+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed [GIN] 2025/05/02 - 20:10:51 | 200 | 259.0518ms | 127.0.0.1 | POST "/api/chat" time=2025-05-02T20:10:52.043+08:00 level=ERROR source=server.go:454 msg="llama runner terminated" error="exit status 0xc0000409" [GIN] 2025/05/02 - 20:11:05 | 200 | 0s | 127.0.0.1 | HEAD "/" time=2025-05-02T20:11:05.056+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:11:05.101+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/02 - 20:11:05 | 200 | 94.8025ms | 127.0.0.1 | POST "/api/show" time=2025-05-02T20:11:05.166+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:11:05.213+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:11:05.259+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:11:05.264+08:00 level=INFO source=server.go:105 msg="system memory" total="16.0 GiB" free="11.1 GiB" free_swap="9.3 GiB" time=2025-05-02T20:11:05.265+08:00 level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[11.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" time=2025-05-02T20:11:05.335+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:11:05.339+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-02T20:11:05.343+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-05-02T20:11:05.343+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-05-02T20:11:05.343+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-05-02T20:11:05.343+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-02T20:11:05.343+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-02T20:11:05.355+08:00 level=INFO source=server.go:409 msg="starting llama server" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Administrator\\.ollama\\models\\blobs\\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 4 --no-mmap --parallel 2 --port 62964" time=2025-05-02T20:11:05.359+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2025-05-02T20:11:05.359+08:00 level=INFO source=server.go:585 msg="waiting for llama runner to start responding" time=2025-05-02T20:11:05.360+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error" time=2025-05-02T20:11:05.389+08:00 level=INFO source=runner.go:861 msg="starting ollama engine" time=2025-05-02T20:11:05.405+08:00 level=INFO source=runner.go:924 msg="Server listening on 127.0.0.1:62964" time=2025-05-02T20:11:05.483+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:11:05.487+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default="" time=2025-05-02T20:11:05.487+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default="" time=2025-05-02T20:11:05.487+08:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 load_backend: loaded CPU backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll time=2025-05-02T20:11:05.505+08:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang) time=2025-05-02T20:11:05.510+08:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="3.6 GiB" time=2025-05-02T20:11:05.616+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model" time=2025-05-02T20:11:06.504+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-02T20:11:06.509+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-05-02T20:11:06.510+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-05-02T20:11:06.510+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-05-02T20:11:06.510+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-02T20:11:06.510+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-02T20:11:06.632+08:00 level=INFO source=server.go:624 msg="llama runner started in 1.27 seconds" [GIN] 2025/05/02 - 20:11:06 | 200 | 1.5093505s | 127.0.0.1 | POST "/api/generate" time=2025-05-02T20:11:07.814+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed [GIN] 2025/05/02 - 20:11:08 | 200 | 269.4994ms | 127.0.0.1 | POST "/api/chat" time=2025-05-02T20:11:08.297+08:00 level=ERROR source=server.go:454 msg="llama runner terminated" error="exit status 0xc0000409" [GIN] 2025/05/02 - 20:11:09 | 200 | 0s | 127.0.0.1 | HEAD "/" time=2025-05-02T20:11:09.191+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:11:09.235+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/02 - 20:11:09 | 200 | 98.6684ms | 127.0.0.1 | POST "/api/show" time=2025-05-02T20:11:09.299+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:11:09.343+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:11:09.390+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:11:09.394+08:00 level=INFO source=server.go:105 msg="system memory" total="16.0 GiB" free="11.1 GiB" free_swap="9.3 GiB" time=2025-05-02T20:11:09.396+08:00 level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[11.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" time=2025-05-02T20:11:09.467+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:11:09.472+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-02T20:11:09.477+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-05-02T20:11:09.477+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-05-02T20:11:09.477+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-05-02T20:11:09.477+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-02T20:11:09.477+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-02T20:11:09.478+08:00 level=INFO source=server.go:409 msg="starting llama server" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Administrator\\.ollama\\models\\blobs\\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 4 --no-mmap --parallel 2 --port 62977" time=2025-05-02T20:11:09.482+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2025-05-02T20:11:09.482+08:00 level=INFO source=server.go:585 msg="waiting for llama runner to start responding" time=2025-05-02T20:11:09.483+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error" time=2025-05-02T20:11:09.512+08:00 level=INFO source=runner.go:861 msg="starting ollama engine" time=2025-05-02T20:11:09.529+08:00 level=INFO source=runner.go:924 msg="Server listening on 127.0.0.1:62977" time=2025-05-02T20:11:09.604+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:11:09.608+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default="" time=2025-05-02T20:11:09.608+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default="" time=2025-05-02T20:11:09.608+08:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 load_backend: loaded CPU backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll time=2025-05-02T20:11:09.625+08:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang) time=2025-05-02T20:11:09.631+08:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="3.6 GiB" time=2025-05-02T20:11:09.736+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model" time=2025-05-02T20:11:10.652+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-02T20:11:10.657+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-05-02T20:11:10.657+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-05-02T20:11:10.657+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-05-02T20:11:10.657+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-02T20:11:10.657+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 [GIN] 2025/05/02 - 20:11:10 | 200 | 1.4949226s | 127.0.0.1 | POST "/api/generate" time=2025-05-02T20:11:10.751+08:00 level=INFO source=server.go:624 msg="llama runner started in 1.27 seconds" time=2025-05-02T20:11:14.476+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed [GIN] 2025/05/02 - 20:11:14 | 200 | 222.3765ms | 127.0.0.1 | POST "/api/chat" time=2025-05-02T20:11:14.904+08:00 level=ERROR source=server.go:454 msg="llama runner terminated" error="exit status 0xc0000409" [GIN] 2025/05/02 - 20:13:09 | 200 | 0s | 127.0.0.1 | GET "/" [GIN] 2025/05/02 - 20:13:56 | 200 | 0s | 127.0.0.1 | HEAD "/" time=2025-05-02T20:13:56.689+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:13:56.734+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/02 - 20:13:56 | 200 | 96.0357ms | 127.0.0.1 | POST "/api/show" time=2025-05-02T20:13:56.797+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:13:56.841+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:13:56.887+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:13:56.891+08:00 level=INFO source=server.go:105 msg="system memory" total="16.0 GiB" free="10.8 GiB" free_swap="9.0 GiB" time=2025-05-02T20:13:56.892+08:00 level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[10.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" time=2025-05-02T20:13:56.965+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:13:56.970+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-02T20:13:56.975+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-05-02T20:13:56.975+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-05-02T20:13:56.975+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-05-02T20:13:56.975+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-02T20:13:56.975+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-02T20:13:56.986+08:00 level=INFO source=server.go:409 msg="starting llama server" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Administrator\\.ollama\\models\\blobs\\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 4 --no-mmap --parallel 2 --port 63140" time=2025-05-02T20:13:56.991+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2025-05-02T20:13:56.991+08:00 level=INFO source=server.go:585 msg="waiting for llama runner to start responding" time=2025-05-02T20:13:56.991+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error" time=2025-05-02T20:13:57.020+08:00 level=INFO source=runner.go:861 msg="starting ollama engine" time=2025-05-02T20:13:57.035+08:00 level=INFO source=runner.go:924 msg="Server listening on 127.0.0.1:63140" time=2025-05-02T20:13:57.112+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:13:57.116+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default="" time=2025-05-02T20:13:57.116+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default="" time=2025-05-02T20:13:57.116+08:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 load_backend: loaded CPU backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll time=2025-05-02T20:13:57.133+08:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang) time=2025-05-02T20:13:57.139+08:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="3.6 GiB" time=2025-05-02T20:13:57.247+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model" time=2025-05-02T20:13:58.121+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-02T20:13:58.127+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-05-02T20:13:58.127+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-05-02T20:13:58.127+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-05-02T20:13:58.127+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-02T20:13:58.127+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-02T20:13:58.258+08:00 level=INFO source=server.go:624 msg="llama runner started in 1.27 seconds" [GIN] 2025/05/02 - 20:13:58 | 200 | 1.5034494s | 127.0.0.1 | POST "/api/generate" time=2025-05-02T20:13:58.941+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed [GIN] 2025/05/02 - 20:13:59 | 200 | 331.2297ms | 127.0.0.1 | POST "/api/chat" time=2025-05-02T20:13:59.555+08:00 level=ERROR source=server.go:454 msg="llama runner terminated" error="exit status 0xc0000409" [GIN] 2025/05/02 - 20:17:11 | 200 | 0s | 127.0.0.1 | HEAD "/" time=2025-05-02T20:17:11.992+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:17:12.037+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/02 - 20:17:12 | 200 | 93.5256ms | 127.0.0.1 | POST "/api/show" time=2025-05-02T20:17:12.103+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:17:12.148+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:17:12.195+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:17:12.200+08:00 level=INFO source=server.go:105 msg="system memory" total="16.0 GiB" free="10.8 GiB" free_swap="9.1 GiB" time=2025-05-02T20:17:12.202+08:00 level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[10.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" time=2025-05-02T20:17:12.270+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:17:12.274+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-02T20:17:12.279+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-05-02T20:17:12.279+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-05-02T20:17:12.279+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-05-02T20:17:12.279+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-02T20:17:12.279+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-02T20:17:12.290+08:00 level=INFO source=server.go:409 msg="starting llama server" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Administrator\\.ollama\\models\\blobs\\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 4 --no-mmap --parallel 2 --port 63257" time=2025-05-02T20:17:12.300+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2025-05-02T20:17:12.300+08:00 level=INFO source=server.go:585 msg="waiting for llama runner to start responding" time=2025-05-02T20:17:12.301+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error" time=2025-05-02T20:17:12.328+08:00 level=INFO source=runner.go:861 msg="starting ollama engine" time=2025-05-02T20:17:12.345+08:00 level=INFO source=runner.go:924 msg="Server listening on 127.0.0.1:63257" time=2025-05-02T20:17:12.420+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:17:12.424+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default="" time=2025-05-02T20:17:12.424+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default="" time=2025-05-02T20:17:12.424+08:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 load_backend: loaded CPU backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll time=2025-05-02T20:17:12.441+08:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang) time=2025-05-02T20:17:12.445+08:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="3.6 GiB" time=2025-05-02T20:17:12.553+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model" time=2025-05-02T20:17:13.414+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-02T20:17:13.419+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-05-02T20:17:13.419+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-05-02T20:17:13.419+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-05-02T20:17:13.419+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-02T20:17:13.419+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-02T20:17:13.562+08:00 level=INFO source=server.go:624 msg="llama runner started in 1.26 seconds" [GIN] 2025/05/02 - 20:17:13 | 200 | 1.5043563s | 127.0.0.1 | POST "/api/generate" time=2025-05-02T20:17:14.507+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed [GIN] 2025/05/02 - 20:17:14 | 200 | 300.4559ms | 127.0.0.1 | POST "/api/chat" time=2025-05-02T20:17:15.092+08:00 level=ERROR source=server.go:454 msg="llama runner terminated" error="exit status 0xc0000409" [GIN] 2025/05/02 - 20:20:52 | 200 | 0s | 127.0.0.1 | HEAD "/" time=2025-05-02T20:20:52.933+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:20:52.976+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/02 - 20:20:52 | 200 | 89.5658ms | 127.0.0.1 | POST "/api/show" time=2025-05-02T20:20:53.038+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:20:53.083+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:20:53.130+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:20:53.133+08:00 level=INFO source=server.go:105 msg="system memory" total="16.0 GiB" free="10.6 GiB" free_swap="8.9 GiB" time=2025-05-02T20:20:53.135+08:00 level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[10.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" time=2025-05-02T20:20:53.204+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:20:53.208+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-02T20:20:53.213+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-05-02T20:20:53.213+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-05-02T20:20:53.213+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-05-02T20:20:53.213+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-02T20:20:53.213+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-02T20:20:53.225+08:00 level=INFO source=server.go:409 msg="starting llama server" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Administrator\\.ollama\\models\\blobs\\sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 4 --no-mmap --parallel 2 --port 60447" time=2025-05-02T20:20:53.234+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2025-05-02T20:20:53.234+08:00 level=INFO source=server.go:585 msg="waiting for llama runner to start responding" time=2025-05-02T20:20:53.234+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error" time=2025-05-02T20:20:53.261+08:00 level=INFO source=runner.go:861 msg="starting ollama engine" time=2025-05-02T20:20:53.276+08:00 level=INFO source=runner.go:924 msg="Server listening on 127.0.0.1:60447" time=2025-05-02T20:20:53.351+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-02T20:20:53.356+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default="" time=2025-05-02T20:20:53.356+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default="" time=2025-05-02T20:20:53.356+08:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 load_backend: loaded CPU backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll time=2025-05-02T20:20:53.373+08:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang) time=2025-05-02T20:20:53.377+08:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="3.6 GiB" time=2025-05-02T20:20:53.488+08:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model" time=2025-05-02T20:20:54.305+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-02T20:20:54.310+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-05-02T20:20:54.310+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-05-02T20:20:54.310+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-05-02T20:20:54.310+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-02T20:20:54.310+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-02T20:20:54.494+08:00 level=INFO source=server.go:624 msg="llama runner started in 1.26 seconds" [GIN] 2025/05/02 - 20:20:54 | 200 | 1.4984859s | 127.0.0.1 | POST "/api/generate" time=2025-05-02T20:21:07.835+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 D:\a\desktop-inference-engine-llama.cpp\desktop-inference-engine-llama.cpp\native\vendor\llama.cpp\ggml\src\ggml.c:1729: GGML_ASSERT(tensor->op == GGML_OP_UNARY) failed [GIN] 2025/05/02 - 20:21:08 | 200 | 289.4455ms | 127.0.0.1 | POST "/api/chat" time=2025-05-02T20:21:08.342+08:00 level=ERROR source=server.go:454 msg="llama runner terminated" error="exit status 0xc0000409" ``` it's all done while pulling `gemma3:4b`, but just cannot run and interact with it. So what's that means with the msg `key not found`?

GiteaMirror commented

2026-04-28 09:42:27 -05:00

@anton-karlovskiy commented on GitHub (Sep 16, 2025):

@risingnew

👋 I’ve been trying to download llama3.2 on Windows 11 (ollama run llama3.2) and keep running into issues:

An existing connection was forcibly closed by the remote host
Error: max retries exceeded

What I’ve tried so far:

https://github.com/ollama/ollama/issues/3769#issuecomment-2076767384: closed VPN → no change
https://github.com/ollama/ollama/issues/6211#issuecomment-2626567037: changed DNS → made some progress (got to ~37%) but stuck again

From what I can tell, the main culprit seems to be an unstable connection. The workaround right now is to just re-run the command until it eventually completes, but this gets tedious.

To automate retries, I wrapped it in a small script. Sharing here in case it helps others or the devs want to consider integrating retry logic directly:

Windows
Open new notepad and put this code.

@echo off
:loop
echo Running ollama run llama3.2...
ollama run llama3.2

if %errorlevel% neq 0 (
    echo The command failed. Retrying... Press Ctrl+C to exit.
    goto loop
)
echo The command completed successfully!

Next go to save as… and pick All files as file type and name the file with .bat extension.
And finally you can run it.
To edit the code above, just change model name from llama3.2 to any other model you like.

Linux
I didn’t test the code for Linux but this should hopefully work.

#!/bin/bash

# Function to handle Ctrl+C
trap 'echo "Script terminated by user. Exiting..."; exit' SIGINT
while true; do
    echo "Running ollama run llama3.2..."
    ollama run llama3.2
    if [ $? -eq 0 ]; then
        echo "The command completed successfully!"
        break
    else
        echo "The command failed. Retrying... Press Ctrl+C to exit."
    fi
done

Save this as run_loop.sh then run these commands.

chmod +x run_loop.sh

./run_loop.sh

Re: https://medium.com/@timnirmal/ollama-max-retries-exceeded-error-de6e0f86383e

Good luck 🤞

@anton-karlovskiy commented on GitHub (Sep 16, 2025): @risingnew 👋 I’ve been trying to download llama3.2 on Windows 11 (ollama run llama3.2) and keep running into issues: - An existing connection was forcibly closed by the remote host - Error: max retries exceeded What I’ve tried so far: - https://github.com/ollama/ollama/issues/3769#issuecomment-2076767384: closed VPN → no change - https://github.com/ollama/ollama/issues/6211#issuecomment-2626567037: changed DNS → made some progress (got to ~37%) but stuck again <img width="1086" height="124" alt="Image" src="https://github.com/user-attachments/assets/774acce1-b0af-4b66-b2ae-415baf6f0fe0" /> From what I can tell, the main culprit seems to be an unstable connection. The workaround right now is to just re-run the command until it eventually completes, but this gets tedious. To automate retries, I wrapped it in a small script. Sharing here in case it helps others or the devs want to consider integrating retry logic directly: **Windows** Open new notepad and put this code. ```bash @echo off :loop echo Running ollama run llama3.2... ollama run llama3.2 if %errorlevel% neq 0 ( echo The command failed. Retrying... Press Ctrl+C to exit. goto loop ) echo The command completed successfully! ``` Next go to save as… and pick All files as file type and name the file with `.bat` extension. And finally you can run it. To edit the code above, just change model name from `llama3.2` to any other model you like. **Linux** I didn’t test the code for Linux but this should hopefully work. #!/bin/bash ```bash # Function to handle Ctrl+C trap 'echo "Script terminated by user. Exiting..."; exit' SIGINT while true; do echo "Running ollama run llama3.2..." ollama run llama3.2 if [ $? -eq 0 ]; then echo "The command completed successfully!" break else echo "The command failed. Retrying... Press Ctrl+C to exit." fi done ``` Save this as `run_loop.sh` then run these commands. ```bash chmod +x run_loop.sh ``` ```bash ./run_loop.sh ``` Re: https://medium.com/@timnirmal/ollama-max-retries-exceeded-error-de6e0f86383e Good luck 🤞

GiteaMirror commented

2026-04-28 09:42:28 -05:00

@sudo-Ram commented on GitHub (Oct 19, 2025):

改host文件

@sudo-Ram commented on GitHub (Oct 19, 2025): 改host文件

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

parth-launch-plan-gating

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#48838