[GH-ISSUE #5764] Error: llama runner process has terminated: exit status 0xc0000409 error loading model: unable to allocate backend buffer #3589

Closed
opened 2026-04-12 14:19:42 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @mohibovais79 on GitHub (Jul 18, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5764

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

when i try to run this command ollama run gemma2 this error shows up.

OS

Windows

GPU

No response

CPU

Intel

Ollama version

0.2.5

Originally created by @mohibovais79 on GitHub (Jul 18, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5764 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? when i try to run this command ollama run gemma2 this error shows up. ### OS Windows ### GPU _No response_ ### CPU Intel ### Ollama version 0.2.5
GiteaMirror added the bugwindows labels 2026-04-12 14:19:42 -05:00
Author
Owner

@arfans22 commented on GitHub (Jul 18, 2024):

same issue here after updating ollama, or after updating graphic driver. idk just realized today.

<!-- gh-comment-id:2236631281 --> @arfans22 commented on GitHub (Jul 18, 2024): same issue here after updating ollama, or after updating graphic driver. idk just realized today.
Author
Owner

@rick-github commented on GitHub (Jul 18, 2024):

Server logs might enable diagnosis of the problem.

<!-- gh-comment-id:2236957533 --> @rick-github commented on GitHub (Jul 18, 2024): Server logs might enable diagnosis of the problem.
Author
Owner

@ytm369 commented on GitHub (Jul 21, 2024):

Yeah please some one fix this issue soon I am not even able to run small models also

<!-- gh-comment-id:2241522728 --> @ytm369 commented on GitHub (Jul 21, 2024): Yeah please some one fix this issue soon I am not even able to run small models also
Author
Owner

@arfans22 commented on GitHub (Jul 22, 2024):

if you guys using windows 11, you can follow this tutorial to fix, my problem are on some windows 11 corrupted.
I glad I found this video to fixed my error after 1 week looking for the solution.

https://www.youtube.com/watch?v=R8tyJE_Izb4

<!-- gh-comment-id:2242332001 --> @arfans22 commented on GitHub (Jul 22, 2024): if you guys using windows 11, you can follow this tutorial to fix, my problem are on some windows 11 corrupted. I glad I found this video to fixed my error after 1 week looking for the solution. https://www.youtube.com/watch?v=R8tyJE_Izb4
Author
Owner

@dhiltgen commented on GitHub (Jul 22, 2024):

I wasn't able to reproduce in 0.2.7. Can you try upgrading? If you're still seeing crashes, please share your server log so we can see what may be going wrong.

<!-- gh-comment-id:2243999600 --> @dhiltgen commented on GitHub (Jul 22, 2024): I wasn't able to reproduce in 0.2.7. Can you try upgrading? If you're still seeing crashes, please share your server log so we can see what may be going wrong.
Author
Owner

@kuole-o commented on GitHub (Jul 24, 2024):

I also encountered this problem using 0.2.8. Here is my log file for reference:
[
app.log
app-1.log
app-2.log
app-3.log
config.json
server.log
server-1.log
server-2.log
server-3.log
](url)

<!-- gh-comment-id:2248498615 --> @kuole-o commented on GitHub (Jul 24, 2024): I also encountered this problem using 0.2.8. Here is my log file for reference: [ [app.log](https://github.com/user-attachments/files/16365337/app.log) [app-1.log](https://github.com/user-attachments/files/16365338/app-1.log) [app-2.log](https://github.com/user-attachments/files/16365339/app-2.log) [app-3.log](https://github.com/user-attachments/files/16365340/app-3.log) [config.json](https://github.com/user-attachments/files/16365341/config.json) [server.log](https://github.com/user-attachments/files/16365342/server.log) [server-1.log](https://github.com/user-attachments/files/16365343/server-1.log) [server-2.log](https://github.com/user-attachments/files/16365344/server-2.log) [server-3.log](https://github.com/user-attachments/files/16365345/server-3.log) ](url)
Author
Owner

@dhiltgen commented on GitHub (Aug 2, 2024):

@kuole-o it looks like you're trying to load a ~40G model, on an 8G GPU (with ~1G already used) and don't have ~32G of available host memory. How much physical RAM does your system have? I think you're trying to load a model that can't fit in your available memory. #5926 may be sufficient to detect that and give a better error message explaining why it can't work, but knowing how much RAM you have will confirm that, although with enough swap, that logic won't kick in, so we might still have a corner case where there's insufficient physical memory available for the pinned allocation of the CPU portion of the load to fit but we proceed anyway and crash.

<!-- gh-comment-id:2265726257 --> @dhiltgen commented on GitHub (Aug 2, 2024): @kuole-o it looks like you're trying to load a ~40G model, on an 8G GPU (with ~1G already used) and don't have ~32G of available host memory. How much physical RAM does your system have? I think you're trying to load a model that can't fit in your available memory. #5926 may be sufficient to detect that and give a better error message explaining why it can't work, but knowing how much RAM you have will confirm that, although with enough swap, that logic won't kick in, so we might still have a corner case where there's insufficient physical memory available for the pinned allocation of the CPU portion of the load to fit but we proceed anyway and crash.
Author
Owner

@kuole-o commented on GitHub (Aug 7, 2024):

@kuole-o it looks like you're trying to load a ~40G model, on an 8G GPU (with ~1G already used) and don't have ~32G of available host memory. How much physical RAM does your system have? I think you're trying to load a model that can't fit in your available memory. #5926 may be sufficient to detect that and give a better error message explaining why it can't work, but knowing how much RAM you have will confirm that, although with enough swap, that logic won't kick in, so we might still have a corner case where there's insufficient physical memory available for the pinned allocation of the CPU portion of the load to fit but we proceed anyway and crash.

My computer's physical memory is 16GB * 2, which is 32GB. The graphics card model is "RTX 4060", which has 8GB video memory.

I have solved this problem later, and the solution is:

  1. First, ollama is installed to my C drive by default, and the model is also here. My C drive originally had only about 40GB of remaining space. So I configured the environment variables and downloaded the model to another 1TB disk;
  2. In Advanced Management, adjust the virtual memory of Windows 11 (I think this is very important), and also configure the free 1TB memory to the virtual memory usage range;
  3. After operating the above steps, when I installed it again, it was normal. It's just that the 40GB model is very slow... Now the llama 3.1 4.7GB version is more suitable for me;

Finally, thank you for your contribution to this, and thank you for this great project ollama!

我的电脑物理内存为 16GB * 2,也就是 32GB。显卡型号为“RTX 4060”,它有 8GB 显存。

这个问题后来我已经解决了,解决方案是:
1、首先,ollama 默认安装到我的 C 盘,模型也在这里,我的 C 盘原本剩余空间仅有 40GB 左右。因此我配置了环境变量,将模型下载到了另一个 1TB 的磁盘;
2、在高级管理中,调整 Windows 11 的虚拟内存(我认为这个很重要),并将空闲的 1TB 内存也配置到虚拟内存使用范围;
3、操作以上步骤后,我再次安装时,已经正常了。只不过 40GB 的模型很慢很慢……现在 llama 3.1 4.7GB 版本更适合我;

最后,感谢你们对此做出的贡献,感谢 ollama 这个伟大的项目!

<!-- gh-comment-id:2272509605 --> @kuole-o commented on GitHub (Aug 7, 2024): > @kuole-o it looks like you're trying to load a ~40G model, on an 8G GPU (with ~1G already used) and don't have ~32G of available host memory. How much physical RAM does your system have? I think you're trying to load a model that can't fit in your available memory. #5926 may be sufficient to detect that and give a better error message explaining why it can't work, but knowing how much RAM you have will confirm that, although with enough swap, that logic won't kick in, so we might still have a corner case where there's insufficient physical memory available for the pinned allocation of the CPU portion of the load to fit but we proceed anyway and crash. My computer's physical memory is 16GB * 2, which is 32GB. The graphics card model is "RTX 4060", which has 8GB video memory. I have solved this problem later, and the solution is: 1. First, ollama is installed to my C drive by default, and the model is also here. My C drive originally had only about 40GB of remaining space. So I configured the environment variables and downloaded the model to another 1TB disk; 2. In Advanced Management, adjust the virtual memory of Windows 11 (I think this is very important), and also configure the free 1TB memory to the virtual memory usage range; 3. After operating the above steps, when I installed it again, it was normal. It's just that the 40GB model is very slow... Now the llama 3.1 4.7GB version is more suitable for me; Finally, thank you for your contribution to this, and thank you for this great project ollama! ----------------------------------------------------------------------------- 我的电脑物理内存为 16GB * 2,也就是 32GB。显卡型号为“RTX 4060”,它有 8GB 显存。 这个问题后来我已经解决了,解决方案是: 1、首先,ollama 默认安装到我的 C 盘,模型也在这里,我的 C 盘原本剩余空间仅有 40GB 左右。因此我配置了环境变量,将模型下载到了另一个 1TB 的磁盘; 2、在高级管理中,调整 Windows 11 的虚拟内存(我认为这个很重要),并将空闲的 1TB 内存也配置到虚拟内存使用范围; 3、操作以上步骤后,我再次安装时,已经正常了。只不过 40GB 的模型很慢很慢……现在 llama 3.1 4.7GB 版本更适合我; 最后,感谢你们对此做出的贡献,感谢 ollama 这个伟大的项目!
Author
Owner

@dhiltgen commented on GitHub (Aug 8, 2024):

Yes, adding swap will give you more headroom, but you are pushing the limits of what your system is capable of with such a large model. In future versions of Ollama we'll do a better job of erroring out quickly instead of crashing.

<!-- gh-comment-id:2276375611 --> @dhiltgen commented on GitHub (Aug 8, 2024): Yes, adding swap will give you more headroom, but you are pushing the limits of what your system is capable of with such a large model. In future versions of Ollama we'll do a better job of erroring out quickly instead of crashing.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3589