[GH-ISSUE #7497] llama slows down a lot on the second and subsequent runs. #66825

New Issue

GiteaMirror · 2026-05-04T08:17:12-05:00

GiteaMirror commented

2026-05-04 08:17:12 -05:00

Originally created by @vertikalm on GitHub (Nov 4, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7497

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

Configuration: Intel i3-8100 | RTX_3050_LP | Debian_12 i3wm

I have the following problem:

When booting the system, the conversation with any model downloaded from the ollama-library is very fast, perfect for me.
But after a few minutes, or when executing the command "/bye" ... when re-running the model, the speed drops by 90% and the CPU consumption rises to 100%.

This slowness continues until the system is rebooted. If the ollama parent process is killed, it automatically starts up again by itself and the same slow speed is maintained.

There are no other programs consuming CPU or GPU between re-executions.

server_log

Any ideas?

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.3.13

Originally created by @vertikalm on GitHub (Nov 4, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7497 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? Configuration: Intel i3-8100 | RTX_3050_LP | Debian_12 i3wm I have the following problem: When booting the system, the conversation with any model downloaded from the ollama-library is very fast, perfect for me. But after a few minutes, or when executing the command "/bye" ... when re-running the model, the speed drops by 90% and the CPU consumption rises to 100%. This slowness continues until the system is rebooted. If the ollama parent process is killed, it automatically starts up again by itself and the same slow speed is maintained. There are no other programs consuming CPU or GPU between re-executions. [server_log](https://github.com/user-attachments/files/17623651/ollama_logs.txt) Any ideas? ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.3.13

GiteaMirror added the bug nvidia labels 2026-05-04 08:17:13 -05:00

GiteaMirror closed this issue

2026-05-04 08:17:14 -05:00

GiteaMirror commented

2026-05-04 08:17:15 -05:00

@ShivamSrng commented on GitHub (Nov 4, 2024):

Hey, even I am facing this issue recently, with llama 3.1. I have no idea, why its not working, as it used to work perfectly and recently after like 3-4 requests I am getting error as "error reading llm response: read tcp 127.0.0.1:51122->127.0.0.1:51113: wsarecv: An existing connection was forcibly closed by the remote host.".

@ShivamSrng commented on GitHub (Nov 4, 2024): Hey, even I am facing this issue recently, with llama 3.1. I have no idea, why its not working, as it used to work perfectly and recently after like 3-4 requests I am getting error as "error reading llm response: read tcp 127.0.0.1:51122->127.0.0.1:51113: wsarecv: An existing connection was forcibly closed by the remote host.".

GiteaMirror commented

2026-05-04 08:17:17 -05:00

@rick-github commented on GitHub (Nov 4, 2024):

Server logs may help in debugging.

@rick-github commented on GitHub (Nov 4, 2024): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may help in debugging.

GiteaMirror commented

2026-05-04 08:17:19 -05:00

@vertikalm commented on GitHub (Nov 4, 2024):

Server logs may help in debugging.

ollama_logs.txt

Thanks in advance !

@vertikalm commented on GitHub (Nov 4, 2024): > [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may help in debugging. [ollama_logs.txt](https://github.com/user-attachments/files/17623651/ollama_logs.txt) Thanks in advance !

GiteaMirror commented

2026-05-04 08:17:21 -05:00

@ShivamSrng commented on GitHub (Nov 4, 2024):

Server logs may help in debugging.

I have this as my server log for ollama.
ollama server log.txt

@ShivamSrng commented on GitHub (Nov 4, 2024): > [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may help in debugging. I have this as my server log for ollama. [ollama server log.txt](https://github.com/user-attachments/files/17623749/ollama.server.log.txt)

GiteaMirror commented

2026-05-04 08:17:24 -05:00

@rick-github commented on GitHub (Nov 4, 2024):

@vertikalm

oct 24 10:35:22 thinkcentre ollama[552]: cuda driver library failed to get device context 999time=2024-10-24T10:35:22.939+02:00 level=WARN source=gpu.go:400 msg="error looking up nvidia GPU memory"

ollama loses connection to the GPU and from then until the system is rebooted, schedules models onto the CPU. Is your system a laptop or a device that goes into suspend/hibernation? Rather than rebooting, see if the following helps:

sudo systemctl stop ollama
sudo rmmod nvidia_uvm
sudo modprobe nvidia_uvm
sudo systemctl start ollama

@rick-github commented on GitHub (Nov 4, 2024): @vertikalm ``` oct 24 10:35:22 thinkcentre ollama[552]: cuda driver library failed to get device context 999time=2024-10-24T10:35:22.939+02:00 level=WARN source=gpu.go:400 msg="error looking up nvidia GPU memory" ``` ollama loses connection to the GPU and from then until the system is rebooted, schedules models onto the CPU. Is your system a laptop or a device that goes into suspend/hibernation? Rather than rebooting, see if the following helps: ``` sudo systemctl stop ollama sudo rmmod nvidia_uvm sudo modprobe nvidia_uvm sudo systemctl start ollama ```

GiteaMirror commented

2026-05-04 08:17:24 -05:00

@rick-github commented on GitHub (Nov 4, 2024):

@ShivamSrng

CUDA error: an illegal instruction was encountered
  current device: 0, in function ggml_backend_cuda_synchronize at C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:2473
  cudaStreamSynchronize(cuda_ctx->stream())
C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:106: CUDA error

Your problem is different. There's no obvious cause, so we'll have to dig a bit. What's the output of nvidia-smi?

@rick-github commented on GitHub (Nov 4, 2024): @ShivamSrng ``` CUDA error: an illegal instruction was encountered current device: 0, in function ggml_backend_cuda_synchronize at C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:2473 cudaStreamSynchronize(cuda_ctx->stream()) C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:106: CUDA error ``` Your problem is different. There's no obvious cause, so we'll have to dig a bit. What's the output of `nvidia-smi`?

GiteaMirror commented

2026-05-04 08:17:25 -05:00

@ShivamSrng commented on GitHub (Nov 4, 2024):

@ShivamSrng

CUDA error: an illegal instruction was encountered
  current device: 0, in function ggml_backend_cuda_synchronize at C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:2473
  cudaStreamSynchronize(cuda_ctx->stream())
C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:106: CUDA error

Your problem is different. There's no obvious cause, so we'll have to dig a bit. What's the output of nvidia-smi?

Here's what I see when I ran the command in CMD:

I realized when I run "nvcc -version" it throws an error stating nvcc is not recognized as an internal command. Is something wrong here?

@ShivamSrng commented on GitHub (Nov 4, 2024): > @ShivamSrng > > ``` > CUDA error: an illegal instruction was encountered > current device: 0, in function ggml_backend_cuda_synchronize at C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:2473 > cudaStreamSynchronize(cuda_ctx->stream()) > C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:106: CUDA error > ``` > > Your problem is different. There's no obvious cause, so we'll have to dig a bit. What's the output of `nvidia-smi`? Here's what I see when I ran the command in CMD: ![image](https://github.com/user-attachments/assets/f434fc05-2920-4afd-8c25-66d52272ee52) I realized when I run "nvcc -version" it throws an error stating nvcc is not recognized as an internal command. Is something wrong here?

GiteaMirror commented

2026-05-04 08:17:26 -05:00

@rick-github commented on GitHub (Nov 4, 2024):

@ShivamSrng Is it possible for you to downgrade the nvidia driver to a slightly older version, say one that supports 12.4? I don't know how hard this is on Windows, but 12.7 looks pretty bleeding edge and there was a recent issue (#7463) on system that also had 12.7.

@rick-github commented on GitHub (Nov 4, 2024): @ShivamSrng Is it possible for you to [downgrade](https://www.nvidia.com/en-gb/drivers/driver-rollback/) the nvidia driver to a slightly older version, say one that supports [12.4](https://www.nvidia.com/download/driverResults.aspx/224484/en-us/)? I don't know how hard this is on Windows, but 12.7 looks pretty bleeding edge and there was a recent issue (#7463) on system that also had 12.7.

GiteaMirror commented

2026-05-04 08:17:27 -05:00

@vertikalm commented on GitHub (Nov 4, 2024):

@rick-github
This worked perfectly :-)
For now I have created a small script with these commands. This is a Lenovo ThinkCentre desktop. I don't have any power manager running and there is no suspend, just lightdm and i3wm ;
Thanks a lot for the help.

@vertikalm commented on GitHub (Nov 4, 2024): @rick-github This worked perfectly :-) For now I have created a small script with these commands. This is a Lenovo ThinkCentre desktop. I don't have any power manager running and there is no suspend, just lightdm and i3wm ; Thanks a lot for the help.

GiteaMirror commented

2026-05-04 08:17:28 -05:00

@ShivamSrng commented on GitHub (Nov 4, 2024):

@ShivamSrng Is it possible for you to downgrade the nvidia driver to a slightly older version, say one that supports 12.4? I don't know how hard this is on Windows, but 12.7 looks pretty bleeding edge and there was a recent issue (#7463) on system that also had 12.7.

I just did downgraded and now after executing "nvidia-smi" I see:

And still I see the same issue: "error reading llm response: read tcp 127.0.0.1:54819->127.0.0.1:54798: wsarecv: An existing connection was forcibly closed by the remote host." and seeing the server logs, I just realized I have more than 1 server logs and I am attaching all for your reference.
server.log
server-1.log
server-2.log
server-3.log
server-4.log
server-5.log

Thank you in advance.

@ShivamSrng commented on GitHub (Nov 4, 2024): > @ShivamSrng Is it possible for you to [downgrade](https://www.nvidia.com/en-gb/drivers/driver-rollback/) the nvidia driver to a slightly older version, say one that supports [12.4](https://www.nvidia.com/download/driverResults.aspx/224484/en-us/)? I don't know how hard this is on Windows, but 12.7 looks pretty bleeding edge and there was a recent issue (#7463) on system that also had 12.7. I just did downgraded and now after executing "nvidia-smi" I see: ![image](https://github.com/user-attachments/assets/76821636-81b4-4d10-bad9-b2be3d3578c7) And still I see the same issue: "error reading llm response: read tcp 127.0.0.1:54819->127.0.0.1:54798: wsarecv: An existing connection was forcibly closed by the remote host." and seeing the server logs, I just realized I have more than 1 server logs and I am attaching all for your reference. [server.log](https://github.com/user-attachments/files/17624600/server.log) [server-1.log](https://github.com/user-attachments/files/17624601/server-1.log) [server-2.log](https://github.com/user-attachments/files/17624603/server-2.log) [server-3.log](https://github.com/user-attachments/files/17624604/server-3.log) [server-4.log](https://github.com/user-attachments/files/17624605/server-4.log) [server-5.log](https://github.com/user-attachments/files/17624607/server-5.log) Thank you in advance.

GiteaMirror commented

2026-05-04 08:17:28 -05:00

@rick-github commented on GitHub (Nov 5, 2024):

There are several errors here:

CUDA error: an illegal instruction was encountered
CUDA error: an illegal memory access was encountered
CUDA error: misaligned address

all happening due to a cudaStreamSynchronize() call. There's an issue on llama.cpp which discusses seeing these types of failures when a connection is closed before a completion is finished, which looks similar to here:

[GIN] 2024/11/04 - 17:14:21 | 200 |     8.172552s |       127.0.0.1 | POST     "/api/chat"
CUDA error: an illegal instruction was encountered
  current device: 0, in function ggml_backend_cuda_synchronize at C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:2473
  cudaStreamSynchronize(cuda_ctx->stream())
C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:106: CUDA error
[GIN] 2024/11/04 - 17:14:25 | 500 |    4.1498238s |       127.0.0.1 | POST     "/api/chat"

The 500 error code indicates the connection was closed before the server could finish fulfilling the request. Most of the failures occur during a multi-second /api/chat request, when most chat requests are less than a second:

[GIN] 2024/11/04 - 17:14:08 | 200 |    260.4043ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2024/11/04 - 17:14:08 | 200 |    373.1565ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2024/11/04 - 17:14:09 | 200 |    715.4463ms |       127.0.0.1 | POST     "/api/chat"
CUDA error: an illegal memory access was encountered
  current device: 0, in function ggml_backend_cuda_synchronize at C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:2473
  cudaStreamSynchronize(cuda_ctx->stream())
C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:106: CUDA error
[GIN] 2024/11/04 - 17:14:13 | 500 |    3.7701583s |       127.0.0.1 | POST     "/api/chat"

It's not clear if the error is cause or effect.

The very short /api/chat requests are interesting because that's not much time for a RTX 3070 Ti to do a completion. The logs also show a lot of short-lived request to /api/create, usually as a block of 4 requests after a failed chat:

[GIN] 2024/11/04 - 15:30:30 | 200 |    3.9250618s |       127.0.0.1 | POST     "/api/chat"
CUDA error: an illegal instruction was encountered
  current device: 0, in function ggml_backend_cuda_synchronize at C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:2473
  cudaStreamSynchronize(cuda_ctx->stream())
C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:106: CUDA error
[GIN] 2024/11/04 - 15:30:33 | 500 |    3.6965137s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2024/11/04 - 15:32:46 | 200 |     20.7034ms |       127.0.0.1 | POST     "/api/create"
[GIN] 2024/11/04 - 15:32:46 | 200 |     23.1497ms |       127.0.0.1 | POST     "/api/create"
[GIN] 2024/11/04 - 15:32:46 | 200 |     19.8468ms |       127.0.0.1 | POST     "/api/create"
[GIN] 2024/11/04 - 15:32:46 | 200 |     20.4612ms |       127.0.0.1 | POST     "/api/create"

So this might come down to the type of input the model is receiving. What client are you using and what is is trying to do here?

@rick-github commented on GitHub (Nov 5, 2024): There are several errors here: ``` CUDA error: an illegal instruction was encountered CUDA error: an illegal memory access was encountered CUDA error: misaligned address ``` all happening due to a `cudaStreamSynchronize()` call. There's an [issue](https://github.com/ggerganov/llama.cpp/issues/9928) on llama.cpp which discusses seeing these types of failures when a connection is closed before a completion is finished, which looks similar to here: ``` [GIN] 2024/11/04 - 17:14:21 | 200 | 8.172552s | 127.0.0.1 | POST "/api/chat" CUDA error: an illegal instruction was encountered current device: 0, in function ggml_backend_cuda_synchronize at C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:2473 cudaStreamSynchronize(cuda_ctx->stream()) C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:106: CUDA error [GIN] 2024/11/04 - 17:14:25 | 500 | 4.1498238s | 127.0.0.1 | POST "/api/chat" ``` The 500 error code indicates the connection was closed before the server could finish fulfilling the request. Most of the failures occur during a multi-second `/api/chat` request, when most chat requests are less than a second: ``` [GIN] 2024/11/04 - 17:14:08 | 200 | 260.4043ms | 127.0.0.1 | POST "/api/chat" [GIN] 2024/11/04 - 17:14:08 | 200 | 373.1565ms | 127.0.0.1 | POST "/api/chat" [GIN] 2024/11/04 - 17:14:09 | 200 | 715.4463ms | 127.0.0.1 | POST "/api/chat" CUDA error: an illegal memory access was encountered current device: 0, in function ggml_backend_cuda_synchronize at C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:2473 cudaStreamSynchronize(cuda_ctx->stream()) C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:106: CUDA error [GIN] 2024/11/04 - 17:14:13 | 500 | 3.7701583s | 127.0.0.1 | POST "/api/chat" ``` It's not clear if the error is cause or effect. The very short `/api/chat` requests are interesting because that's not much time for a RTX 3070 Ti to do a completion. The logs also show a lot of short-lived request to `/api/create`, usually as a block of 4 requests after a failed chat: ``` [GIN] 2024/11/04 - 15:30:30 | 200 | 3.9250618s | 127.0.0.1 | POST "/api/chat" CUDA error: an illegal instruction was encountered current device: 0, in function ggml_backend_cuda_synchronize at C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:2473 cudaStreamSynchronize(cuda_ctx->stream()) C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:106: CUDA error [GIN] 2024/11/04 - 15:30:33 | 500 | 3.6965137s | 127.0.0.1 | POST "/api/chat" [GIN] 2024/11/04 - 15:32:46 | 200 | 20.7034ms | 127.0.0.1 | POST "/api/create" [GIN] 2024/11/04 - 15:32:46 | 200 | 23.1497ms | 127.0.0.1 | POST "/api/create" [GIN] 2024/11/04 - 15:32:46 | 200 | 19.8468ms | 127.0.0.1 | POST "/api/create" [GIN] 2024/11/04 - 15:32:46 | 200 | 20.4612ms | 127.0.0.1 | POST "/api/create" ``` So this might come down to the type of input the model is receiving. What client are you using and what is is trying to do here?

GiteaMirror commented

2026-05-04 08:17:29 -05:00

@ShivamSrng commented on GitHub (Nov 5, 2024):

So, the complete information about my system is:

nvidia-smi

nvcc --version

OS: Windows
GPU: Nvidia
CPU: Intel
Ollama version: 0.3.14

I am just typing out some basic comments. See here's one example:

(same issue with other models too).

I am sorry but not able to understand "What client am I using ?"

@ShivamSrng commented on GitHub (Nov 5, 2024): So, the complete information about my system is: > nvidia-smi ![image](https://github.com/user-attachments/assets/15edaa9b-844e-4729-a5ba-616110983427) > nvcc --version ![image](https://github.com/user-attachments/assets/fe60ad9d-8c5e-444e-a370-50549f58420e) > OS: Windows > GPU: Nvidia > CPU: Intel > Ollama version: 0.3.14 I am just typing out some basic comments. See here's one example: ![image](https://github.com/user-attachments/assets/983040c8-72a1-495e-8d14-2351e0ec77bf) (same issue with other models too). I am sorry but not able to understand "What client am I using ?"

GiteaMirror commented

2026-05-04 08:17:30 -05:00

@rick-github commented on GitHub (Nov 5, 2024):

By "What client am I using ?", I mean what is being used to connect to the ollama server. If your only interaction with ollama is via the command line interface, that doesn't explain the short /api/chat calls or the repeated calls to /api/create. I suspect that you have another program installed or a browser extension which is making these calls. This would match your experience that "it used to work perfectly".

@rick-github commented on GitHub (Nov 5, 2024): By "What client am I using ?", I mean what is being used to connect to the ollama server. If your only interaction with ollama is via the command line interface, that doesn't explain the short `/api/chat` calls or the repeated calls to `/api/create`. I suspect that you have another program installed or a browser extension which is making these calls. This would match your experience that "it used to work perfectly".

GiteaMirror commented

2026-05-04 08:17:30 -05:00

@ShivamSrng commented on GitHub (Nov 5, 2024):

So before using CMD, I used Python's ollama library in a code to get some text summarization done using llama3.1:latest model. Since, I started facing the issue in my python code, I decided to check whether Ollama works in CMD or not. But as expected it failed giving me out these errors which I mentioned.

@ShivamSrng commented on GitHub (Nov 5, 2024): So before using CMD, I used Python's ollama library in a code to get some text summarization done using llama3.1:latest model. Since, I started facing the issue in my python code, I decided to check whether Ollama works in CMD or not. But as expected it failed giving me out these errors which I mentioned.

GiteaMirror commented

2026-05-04 08:17:30 -05:00

@rick-github commented on GitHub (Nov 5, 2024):

Something, somewhere on your computer, is calling /api/chat and /api/create. It's too quick to be from typing commands in the command line interface, so the most likely explanation is another program, something like Open WebUI or a browser extension like nextjs-ollama-llm-ui. Did you install anything around the time that ollama started crashing?

@rick-github commented on GitHub (Nov 5, 2024): Something, somewhere on your computer, is calling `/api/chat` and `/api/create`. It's too quick to be from typing commands in the command line interface, so the most likely explanation is another program, something like [Open WebUI](https://github.com/open-webui/open-webui) or a browser extension like [nextjs-ollama-llm-ui](https://github.com/jakobhoeg/nextjs-ollama-llm-ui). Did you install anything around the time that ollama started crashing?

GiteaMirror commented

2026-05-04 08:17:30 -05:00

@ShivamSrng commented on GitHub (Nov 5, 2024):

Something, somewhere on your computer, is calling /api/chat and /api/create. It's too quick to be from typing commands in the command line interface, so the most likely explanation is another program, something like Open WebUI or a browser extension like nextjs-ollama-llm-ui. Did you install anything around the time that ollama started crashing?

I don't have anything installed you mentioned here not even such extensions. I started facing this issue yesterday morning, and I don't remember anything installing rather I cleared up some applications from C Drive to get some free space. But I took care that I don't remove something that might cause interference.

@ShivamSrng commented on GitHub (Nov 5, 2024): > Something, somewhere on your computer, is calling `/api/chat` and `/api/create`. It's too quick to be from typing commands in the command line interface, so the most likely explanation is another program, something like [Open WebUI](https://github.com/open-webui/open-webui) or a browser extension like [nextjs-ollama-llm-ui](https://github.com/jakobhoeg/nextjs-ollama-llm-ui). Did you install anything around the time that ollama started crashing? I don't have anything installed you mentioned here not even such extensions. I started facing this issue yesterday morning, and I don't remember anything installing rather I cleared up some applications from C Drive to get some free space. But I took care that I don't remove something that might cause interference.

GiteaMirror commented

2026-05-04 08:17:31 -05:00

@rick-github commented on GitHub (Nov 5, 2024):

Have you tried un-installing and re-installing ollama?

@rick-github commented on GitHub (Nov 5, 2024): Have you tried un-installing and re-installing ollama?

GiteaMirror commented

2026-05-04 08:17:31 -05:00

@ShivamSrng commented on GitHub (Nov 5, 2024):

Have you tried un-installing and re-installing ollama?

I did it multiple times, but no success. Even following the windows installation process provided in repo to execute on the GPU still no improvement. Also, is CMake required for successful execution?

@ShivamSrng commented on GitHub (Nov 5, 2024): > Have you tried un-installing and re-installing ollama? I did it multiple times, but no success. Even following the windows installation process provided in repo to execute on the GPU still no improvement. Also, is CMake required for successful execution?

GiteaMirror commented

2026-05-04 08:17:32 -05:00

@rick-github commented on GitHub (Nov 5, 2024):

cmake is required to build from source, not to run the service. Are you trying build ollama rather than just installing it?

@rick-github commented on GitHub (Nov 5, 2024): cmake is required to build from source, not to run the service. Are you trying build ollama rather than just installing it?

GiteaMirror commented

2026-05-04 08:17:32 -05:00

@ShivamSrng commented on GitHub (Nov 5, 2024):

cmake is required to build from source, not to run the service. Are you trying build ollama rather than just installing it?

No, I just installed the windows from the official Ollama's website. I was just confirming that it is not required. I have no idea what went wrong. Believe me it even used to work even in these bleeding edge upgrades that I had first. And now it just stops after 4-5 chats. I have installed CUDA properly which is even detected through the torch, downgraded the upgrades. Every CMD command seems to work but still Ollama's failing with those CUDA errors either showing illegal instruction or illegal memory address.

@ShivamSrng commented on GitHub (Nov 5, 2024): > cmake is required to build from source, not to run the service. Are you trying build ollama rather than just installing it? No, I just installed the windows from the official Ollama's website. I was just confirming that it is not required. I have no idea what went wrong. Believe me it even used to work even in these bleeding edge upgrades that I had first. And now it just stops after 4-5 chats. I have installed CUDA properly which is even detected through the torch, downgraded the upgrades. Every CMD command seems to work but still Ollama's failing with those CUDA errors either showing illegal instruction or illegal memory address.

GiteaMirror commented

2026-05-04 08:17:33 -05:00

@rick-github commented on GitHub (Nov 5, 2024):

I think that more data is required to make progress. Install wireshark as described here, start packet capture, use ollama until you get a crash, and then stop capture and add the file here.

@rick-github commented on GitHub (Nov 5, 2024): I think that more data is required to make progress. Install wireshark as described [here](https://github.com/ollama/ollama/issues/7163#issuecomment-2442076351), start packet capture, use ollama until you get a crash, and then stop capture and add the file here.

GiteaMirror commented

2026-05-04 08:17:33 -05:00

@nikhil-swamix commented on GitHub (Nov 6, 2024):

set CUDA_VISIBLE_DEVICES=0,1
in env variable,
once model loaded with ollama run <TAG>
run ollama ps
and share output.
best regards,
swamix

@nikhil-swamix commented on GitHub (Nov 6, 2024): set CUDA_VISIBLE_DEVICES=0,1 in env variable, once model loaded with `ollama run <TAG>` run `ollama ps` and share output. best regards, swamix

GiteaMirror commented

2026-05-04 08:17:34 -05:00

@dhiltgen commented on GitHub (Nov 7, 2024):

@vertikalm how did you install Ollama? If you used our script, did you already have the nvidia driver installed? We added some logic to our install script when this problem first appeared for users here but it only comes into play if we install the driver during install. You can use a similar approach on your system to make the fix permanent.

@dhiltgen commented on GitHub (Nov 7, 2024): @vertikalm how did you install Ollama? If you used our script, did you already have the nvidia driver installed? We added some logic to our install script when this problem first appeared for users [here](https://github.com/ollama/ollama/blob/main/scripts/install.sh#L358-L367) but it only comes into play if we install the driver during install. You can use a similar approach on your system to make the fix permanent.

GiteaMirror commented

2026-05-04 08:17:35 -05:00

@dhiltgen commented on GitHub (Nov 7, 2024):

@ShivamSrng I would suggest upgrading to 0.4.0 to see if that changes the behavior.

@dhiltgen commented on GitHub (Nov 7, 2024): @ShivamSrng I would suggest upgrading to 0.4.0 to see if that changes the behavior.

GiteaMirror commented

2026-05-04 08:17:36 -05:00

@ShivamSrng commented on GitHub (Nov 7, 2024):

@ShivamSrng I would suggest upgrading to 0.4.0 to see if that changes the behavior.

So, I tried many ways to resolve this issue, but I had some time-bound work and for that rather than troubleshooting with the library, within the last reply provided by @rick-github I went through the entire thread. I tried my best to get hold of the issue but eventually decided to just perform the clean installation of Windows, having taken the backup. I am sorry for not keeping you updated, but just now I am done with the complete OS re-installation with backup with Ollama working perfectly.

Thank you very much @rick-github @nikhil-swamix and @dhiltgen .

@ShivamSrng commented on GitHub (Nov 7, 2024): > @ShivamSrng I would suggest upgrading to 0.4.0 to see if that changes the behavior. So, I tried many ways to resolve this issue, but I had some time-bound work and for that rather than troubleshooting with the library, within the last reply provided by @rick-github I went through the entire thread. I tried my best to get hold of the issue but eventually decided to just perform the clean installation of Windows, having taken the backup. I am sorry for not keeping you updated, but just now I am done with the complete OS re-installation with backup with Ollama working perfectly. Thank you very much @rick-github @nikhil-swamix and @dhiltgen .

GiteaMirror commented

2026-05-04 08:17:37 -05:00

@vertikalm commented on GitHub (Nov 12, 2024):

@dhiltgen
Indeed, my installation of ollama was through the official script and it was after the installation of the video driver (in this case "nvidia-driver", but not "noveau").

The thing is that, once I used the little trick that @rick-github gave me, removing and re-inserting the kernel module), then ollama works correctly at the expected speed for days, even after suspensions in RAM.

So the problem could be summarized as that after a hard-reset or a first system start, my RTX3050LP disconnects shortly after from the ollama server, but after the kernel trick, it works as expected until a new "poweroff + boot".

I also comment that, with the same configuration and a GTX1660Ti there is no problem, so could it be some low consumption or inactivity message from my low profile card?

For now I am happy with the solution and I do not want to reinstall the graphic driver through a new ollama script. So again, thank you very much for the help.

@vertikalm commented on GitHub (Nov 12, 2024): @dhiltgen Indeed, my installation of ollama was through the official script and it was after the installation of the video driver (in this case "nvidia-driver", but not "noveau"). The thing is that, once I used the little trick that @rick-github gave me, removing and re-inserting the kernel module), then ollama works correctly at the expected speed for days, even after suspensions in RAM. So the problem could be summarized as that after a hard-reset or a first system start, my RTX3050LP disconnects shortly after from the ollama server, but after the kernel trick, it works as expected until a new "poweroff + boot". I also comment that, with the same configuration and a GTX1660Ti there is no problem, so could it be some low consumption or inactivity message from my low profile card? For now I am happy with the solution and I do not want to reinstall the graphic driver through a new ollama script. So again, thank you very much for the help.

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#66825