[GH-ISSUE #7497] llama slows down a lot on the second and subsequent runs. #66825

Closed
opened 2026-05-04 08:17:12 -05:00 by GiteaMirror · 26 comments
Owner

Originally created by @vertikalm on GitHub (Nov 4, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7497

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

Configuration: Intel i3-8100 | RTX_3050_LP | Debian_12 i3wm

I have the following problem:

When booting the system, the conversation with any model downloaded from the ollama-library is very fast, perfect for me.
But after a few minutes, or when executing the command "/bye" ... when re-running the model, the speed drops by 90% and the CPU consumption rises to 100%.

This slowness continues until the system is rebooted. If the ollama parent process is killed, it automatically starts up again by itself and the same slow speed is maintained.

There are no other programs consuming CPU or GPU between re-executions.

server_log

Any ideas?

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.3.13

Originally created by @vertikalm on GitHub (Nov 4, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7497 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? Configuration: Intel i3-8100 | RTX_3050_LP | Debian_12 i3wm I have the following problem: When booting the system, the conversation with any model downloaded from the ollama-library is very fast, perfect for me. But after a few minutes, or when executing the command "/bye" ... when re-running the model, the speed drops by 90% and the CPU consumption rises to 100%. This slowness continues until the system is rebooted. If the ollama parent process is killed, it automatically starts up again by itself and the same slow speed is maintained. There are no other programs consuming CPU or GPU between re-executions. [server_log](https://github.com/user-attachments/files/17623651/ollama_logs.txt) Any ideas? ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.3.13
GiteaMirror added the bugnvidia labels 2026-05-04 08:17:13 -05:00
Author
Owner

@ShivamSrng commented on GitHub (Nov 4, 2024):

Hey, even I am facing this issue recently, with llama 3.1. I have no idea, why its not working, as it used to work perfectly and recently after like 3-4 requests I am getting error as "error reading llm response: read tcp 127.0.0.1:51122->127.0.0.1:51113: wsarecv: An existing connection was forcibly closed by the remote host.".

<!-- gh-comment-id:2455600067 --> @ShivamSrng commented on GitHub (Nov 4, 2024): Hey, even I am facing this issue recently, with llama 3.1. I have no idea, why its not working, as it used to work perfectly and recently after like 3-4 requests I am getting error as "error reading llm response: read tcp 127.0.0.1:51122->127.0.0.1:51113: wsarecv: An existing connection was forcibly closed by the remote host.".
Author
Owner

@rick-github commented on GitHub (Nov 4, 2024):

Server logs may help in debugging.

<!-- gh-comment-id:2455617818 --> @rick-github commented on GitHub (Nov 4, 2024): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may help in debugging.
Author
Owner

@vertikalm commented on GitHub (Nov 4, 2024):

Server logs may help in debugging.

ollama_logs.txt

Thanks in advance !

<!-- gh-comment-id:2455670478 --> @vertikalm commented on GitHub (Nov 4, 2024): > [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may help in debugging. [ollama_logs.txt](https://github.com/user-attachments/files/17623651/ollama_logs.txt) Thanks in advance !
Author
Owner

@ShivamSrng commented on GitHub (Nov 4, 2024):

Server logs may help in debugging.

I have this as my server log for ollama.
ollama server log.txt

<!-- gh-comment-id:2455688258 --> @ShivamSrng commented on GitHub (Nov 4, 2024): > [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may help in debugging. I have this as my server log for ollama. [ollama server log.txt](https://github.com/user-attachments/files/17623749/ollama.server.log.txt)
Author
Owner

@rick-github commented on GitHub (Nov 4, 2024):

@vertikalm

oct 24 10:35:22 thinkcentre ollama[552]: cuda driver library failed to get device context 999time=2024-10-24T10:35:22.939+02:00 level=WARN source=gpu.go:400 msg="error looking up nvidia GPU memory"

ollama loses connection to the GPU and from then until the system is rebooted, schedules models onto the CPU. Is your system a laptop or a device that goes into suspend/hibernation? Rather than rebooting, see if the following helps:

sudo systemctl stop ollama
sudo rmmod nvidia_uvm
sudo modprobe nvidia_uvm
sudo systemctl start ollama
<!-- gh-comment-id:2455698693 --> @rick-github commented on GitHub (Nov 4, 2024): @vertikalm ``` oct 24 10:35:22 thinkcentre ollama[552]: cuda driver library failed to get device context 999time=2024-10-24T10:35:22.939+02:00 level=WARN source=gpu.go:400 msg="error looking up nvidia GPU memory" ``` ollama loses connection to the GPU and from then until the system is rebooted, schedules models onto the CPU. Is your system a laptop or a device that goes into suspend/hibernation? Rather than rebooting, see if the following helps: ``` sudo systemctl stop ollama sudo rmmod nvidia_uvm sudo modprobe nvidia_uvm sudo systemctl start ollama ```
Author
Owner

@rick-github commented on GitHub (Nov 4, 2024):

@ShivamSrng

CUDA error: an illegal instruction was encountered
  current device: 0, in function ggml_backend_cuda_synchronize at C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:2473
  cudaStreamSynchronize(cuda_ctx->stream())
C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:106: CUDA error

Your problem is different. There's no obvious cause, so we'll have to dig a bit. What's the output of nvidia-smi?

<!-- gh-comment-id:2455707059 --> @rick-github commented on GitHub (Nov 4, 2024): @ShivamSrng ``` CUDA error: an illegal instruction was encountered current device: 0, in function ggml_backend_cuda_synchronize at C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:2473 cudaStreamSynchronize(cuda_ctx->stream()) C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:106: CUDA error ``` Your problem is different. There's no obvious cause, so we'll have to dig a bit. What's the output of `nvidia-smi`?
Author
Owner

@ShivamSrng commented on GitHub (Nov 4, 2024):

@ShivamSrng

CUDA error: an illegal instruction was encountered
  current device: 0, in function ggml_backend_cuda_synchronize at C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:2473
  cudaStreamSynchronize(cuda_ctx->stream())
C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:106: CUDA error

Your problem is different. There's no obvious cause, so we'll have to dig a bit. What's the output of nvidia-smi?

Here's what I see when I ran the command in CMD:
image
I realized when I run "nvcc -version" it throws an error stating nvcc is not recognized as an internal command. Is something wrong here?

<!-- gh-comment-id:2455709711 --> @ShivamSrng commented on GitHub (Nov 4, 2024): > @ShivamSrng > > ``` > CUDA error: an illegal instruction was encountered > current device: 0, in function ggml_backend_cuda_synchronize at C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:2473 > cudaStreamSynchronize(cuda_ctx->stream()) > C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:106: CUDA error > ``` > > Your problem is different. There's no obvious cause, so we'll have to dig a bit. What's the output of `nvidia-smi`? Here's what I see when I ran the command in CMD: ![image](https://github.com/user-attachments/assets/f434fc05-2920-4afd-8c25-66d52272ee52) I realized when I run "nvcc -version" it throws an error stating nvcc is not recognized as an internal command. Is something wrong here?
Author
Owner

@rick-github commented on GitHub (Nov 4, 2024):

@ShivamSrng Is it possible for you to downgrade the nvidia driver to a slightly older version, say one that supports 12.4? I don't know how hard this is on Windows, but 12.7 looks pretty bleeding edge and there was a recent issue (#7463) on system that also had 12.7.

<!-- gh-comment-id:2455760818 --> @rick-github commented on GitHub (Nov 4, 2024): @ShivamSrng Is it possible for you to [downgrade](https://www.nvidia.com/en-gb/drivers/driver-rollback/) the nvidia driver to a slightly older version, say one that supports [12.4](https://www.nvidia.com/download/driverResults.aspx/224484/en-us/)? I don't know how hard this is on Windows, but 12.7 looks pretty bleeding edge and there was a recent issue (#7463) on system that also had 12.7.
Author
Owner

@vertikalm commented on GitHub (Nov 4, 2024):

@rick-github
This worked perfectly :-)
For now I have created a small script with these commands. This is a Lenovo ThinkCentre desktop. I don't have any power manager running and there is no suspend, just lightdm and i3wm ;
Thanks a lot for the help.

<!-- gh-comment-id:2455776093 --> @vertikalm commented on GitHub (Nov 4, 2024): @rick-github This worked perfectly :-) For now I have created a small script with these commands. This is a Lenovo ThinkCentre desktop. I don't have any power manager running and there is no suspend, just lightdm and i3wm ; Thanks a lot for the help.
Author
Owner

@ShivamSrng commented on GitHub (Nov 4, 2024):

@ShivamSrng Is it possible for you to downgrade the nvidia driver to a slightly older version, say one that supports 12.4? I don't know how hard this is on Windows, but 12.7 looks pretty bleeding edge and there was a recent issue (#7463) on system that also had 12.7.

I just did downgraded and now after executing "nvidia-smi" I see:
image

And still I see the same issue: "error reading llm response: read tcp 127.0.0.1:54819->127.0.0.1:54798: wsarecv: An existing connection was forcibly closed by the remote host." and seeing the server logs, I just realized I have more than 1 server logs and I am attaching all for your reference.
server.log
server-1.log
server-2.log
server-3.log
server-4.log
server-5.log

Thank you in advance.

<!-- gh-comment-id:2455842080 --> @ShivamSrng commented on GitHub (Nov 4, 2024): > @ShivamSrng Is it possible for you to [downgrade](https://www.nvidia.com/en-gb/drivers/driver-rollback/) the nvidia driver to a slightly older version, say one that supports [12.4](https://www.nvidia.com/download/driverResults.aspx/224484/en-us/)? I don't know how hard this is on Windows, but 12.7 looks pretty bleeding edge and there was a recent issue (#7463) on system that also had 12.7. I just did downgraded and now after executing "nvidia-smi" I see: ![image](https://github.com/user-attachments/assets/76821636-81b4-4d10-bad9-b2be3d3578c7) And still I see the same issue: "error reading llm response: read tcp 127.0.0.1:54819->127.0.0.1:54798: wsarecv: An existing connection was forcibly closed by the remote host." and seeing the server logs, I just realized I have more than 1 server logs and I am attaching all for your reference. [server.log](https://github.com/user-attachments/files/17624600/server.log) [server-1.log](https://github.com/user-attachments/files/17624601/server-1.log) [server-2.log](https://github.com/user-attachments/files/17624603/server-2.log) [server-3.log](https://github.com/user-attachments/files/17624604/server-3.log) [server-4.log](https://github.com/user-attachments/files/17624605/server-4.log) [server-5.log](https://github.com/user-attachments/files/17624607/server-5.log) Thank you in advance.
Author
Owner

@rick-github commented on GitHub (Nov 5, 2024):

There are several errors here:

CUDA error: an illegal instruction was encountered
CUDA error: an illegal memory access was encountered
CUDA error: misaligned address

all happening due to a cudaStreamSynchronize() call. There's an issue on llama.cpp which discusses seeing these types of failures when a connection is closed before a completion is finished, which looks similar to here:

[GIN] 2024/11/04 - 17:14:21 | 200 |     8.172552s |       127.0.0.1 | POST     "/api/chat"
CUDA error: an illegal instruction was encountered
  current device: 0, in function ggml_backend_cuda_synchronize at C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:2473
  cudaStreamSynchronize(cuda_ctx->stream())
C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:106: CUDA error
[GIN] 2024/11/04 - 17:14:25 | 500 |    4.1498238s |       127.0.0.1 | POST     "/api/chat"

The 500 error code indicates the connection was closed before the server could finish fulfilling the request. Most of the failures occur during a multi-second /api/chat request, when most chat requests are less than a second:

[GIN] 2024/11/04 - 17:14:08 | 200 |    260.4043ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2024/11/04 - 17:14:08 | 200 |    373.1565ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2024/11/04 - 17:14:09 | 200 |    715.4463ms |       127.0.0.1 | POST     "/api/chat"
CUDA error: an illegal memory access was encountered
  current device: 0, in function ggml_backend_cuda_synchronize at C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:2473
  cudaStreamSynchronize(cuda_ctx->stream())
C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:106: CUDA error
[GIN] 2024/11/04 - 17:14:13 | 500 |    3.7701583s |       127.0.0.1 | POST     "/api/chat"

It's not clear if the error is cause or effect.

The very short /api/chat requests are interesting because that's not much time for a RTX 3070 Ti to do a completion. The logs also show a lot of short-lived request to /api/create, usually as a block of 4 requests after a failed chat:

[GIN] 2024/11/04 - 15:30:30 | 200 |    3.9250618s |       127.0.0.1 | POST     "/api/chat"
CUDA error: an illegal instruction was encountered
  current device: 0, in function ggml_backend_cuda_synchronize at C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:2473
  cudaStreamSynchronize(cuda_ctx->stream())
C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:106: CUDA error
[GIN] 2024/11/04 - 15:30:33 | 500 |    3.6965137s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2024/11/04 - 15:32:46 | 200 |     20.7034ms |       127.0.0.1 | POST     "/api/create"
[GIN] 2024/11/04 - 15:32:46 | 200 |     23.1497ms |       127.0.0.1 | POST     "/api/create"
[GIN] 2024/11/04 - 15:32:46 | 200 |     19.8468ms |       127.0.0.1 | POST     "/api/create"
[GIN] 2024/11/04 - 15:32:46 | 200 |     20.4612ms |       127.0.0.1 | POST     "/api/create"

So this might come down to the type of input the model is receiving. What client are you using and what is is trying to do here?

<!-- gh-comment-id:2457044365 --> @rick-github commented on GitHub (Nov 5, 2024): There are several errors here: ``` CUDA error: an illegal instruction was encountered CUDA error: an illegal memory access was encountered CUDA error: misaligned address ``` all happening due to a `cudaStreamSynchronize()` call. There's an [issue](https://github.com/ggerganov/llama.cpp/issues/9928) on llama.cpp which discusses seeing these types of failures when a connection is closed before a completion is finished, which looks similar to here: ``` [GIN] 2024/11/04 - 17:14:21 | 200 | 8.172552s | 127.0.0.1 | POST "/api/chat" CUDA error: an illegal instruction was encountered current device: 0, in function ggml_backend_cuda_synchronize at C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:2473 cudaStreamSynchronize(cuda_ctx->stream()) C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:106: CUDA error [GIN] 2024/11/04 - 17:14:25 | 500 | 4.1498238s | 127.0.0.1 | POST "/api/chat" ``` The 500 error code indicates the connection was closed before the server could finish fulfilling the request. Most of the failures occur during a multi-second `/api/chat` request, when most chat requests are less than a second: ``` [GIN] 2024/11/04 - 17:14:08 | 200 | 260.4043ms | 127.0.0.1 | POST "/api/chat" [GIN] 2024/11/04 - 17:14:08 | 200 | 373.1565ms | 127.0.0.1 | POST "/api/chat" [GIN] 2024/11/04 - 17:14:09 | 200 | 715.4463ms | 127.0.0.1 | POST "/api/chat" CUDA error: an illegal memory access was encountered current device: 0, in function ggml_backend_cuda_synchronize at C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:2473 cudaStreamSynchronize(cuda_ctx->stream()) C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:106: CUDA error [GIN] 2024/11/04 - 17:14:13 | 500 | 3.7701583s | 127.0.0.1 | POST "/api/chat" ``` It's not clear if the error is cause or effect. The very short `/api/chat` requests are interesting because that's not much time for a RTX 3070 Ti to do a completion. The logs also show a lot of short-lived request to `/api/create`, usually as a block of 4 requests after a failed chat: ``` [GIN] 2024/11/04 - 15:30:30 | 200 | 3.9250618s | 127.0.0.1 | POST "/api/chat" CUDA error: an illegal instruction was encountered current device: 0, in function ggml_backend_cuda_synchronize at C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:2473 cudaStreamSynchronize(cuda_ctx->stream()) C:\a\ollama\ollama\llm\llama.cpp\ggml\src\ggml-cuda.cu:106: CUDA error [GIN] 2024/11/04 - 15:30:33 | 500 | 3.6965137s | 127.0.0.1 | POST "/api/chat" [GIN] 2024/11/04 - 15:32:46 | 200 | 20.7034ms | 127.0.0.1 | POST "/api/create" [GIN] 2024/11/04 - 15:32:46 | 200 | 23.1497ms | 127.0.0.1 | POST "/api/create" [GIN] 2024/11/04 - 15:32:46 | 200 | 19.8468ms | 127.0.0.1 | POST "/api/create" [GIN] 2024/11/04 - 15:32:46 | 200 | 20.4612ms | 127.0.0.1 | POST "/api/create" ``` So this might come down to the type of input the model is receiving. What client are you using and what is is trying to do here?
Author
Owner

@ShivamSrng commented on GitHub (Nov 5, 2024):

So, the complete information about my system is:

nvidia-smi
image

nvcc --version
image

OS: Windows
GPU: Nvidia
CPU: Intel
Ollama version: 0.3.14

I am just typing out some basic comments. See here's one example:
image
(same issue with other models too).

I am sorry but not able to understand "What client am I using ?"

<!-- gh-comment-id:2458067840 --> @ShivamSrng commented on GitHub (Nov 5, 2024): So, the complete information about my system is: > nvidia-smi ![image](https://github.com/user-attachments/assets/15edaa9b-844e-4729-a5ba-616110983427) > nvcc --version ![image](https://github.com/user-attachments/assets/fe60ad9d-8c5e-444e-a370-50549f58420e) > OS: Windows > GPU: Nvidia > CPU: Intel > Ollama version: 0.3.14 I am just typing out some basic comments. See here's one example: ![image](https://github.com/user-attachments/assets/983040c8-72a1-495e-8d14-2351e0ec77bf) (same issue with other models too). I am sorry but not able to understand "What client am I using ?"
Author
Owner

@rick-github commented on GitHub (Nov 5, 2024):

By "What client am I using ?", I mean what is being used to connect to the ollama server. If your only interaction with ollama is via the command line interface, that doesn't explain the short /api/chat calls or the repeated calls to /api/create. I suspect that you have another program installed or a browser extension which is making these calls. This would match your experience that "it used to work perfectly".

<!-- gh-comment-id:2458105358 --> @rick-github commented on GitHub (Nov 5, 2024): By "What client am I using ?", I mean what is being used to connect to the ollama server. If your only interaction with ollama is via the command line interface, that doesn't explain the short `/api/chat` calls or the repeated calls to `/api/create`. I suspect that you have another program installed or a browser extension which is making these calls. This would match your experience that "it used to work perfectly".
Author
Owner

@ShivamSrng commented on GitHub (Nov 5, 2024):

So before using CMD, I used Python's ollama library in a code to get some text summarization done using llama3.1:latest model. Since, I started facing the issue in my python code, I decided to check whether Ollama works in CMD or not. But as expected it failed giving me out these errors which I mentioned.

<!-- gh-comment-id:2458173777 --> @ShivamSrng commented on GitHub (Nov 5, 2024): So before using CMD, I used Python's ollama library in a code to get some text summarization done using llama3.1:latest model. Since, I started facing the issue in my python code, I decided to check whether Ollama works in CMD or not. But as expected it failed giving me out these errors which I mentioned.
Author
Owner

@rick-github commented on GitHub (Nov 5, 2024):

Something, somewhere on your computer, is calling /api/chat and /api/create. It's too quick to be from typing commands in the command line interface, so the most likely explanation is another program, something like Open WebUI or a browser extension like nextjs-ollama-llm-ui. Did you install anything around the time that ollama started crashing?

<!-- gh-comment-id:2458212546 --> @rick-github commented on GitHub (Nov 5, 2024): Something, somewhere on your computer, is calling `/api/chat` and `/api/create`. It's too quick to be from typing commands in the command line interface, so the most likely explanation is another program, something like [Open WebUI](https://github.com/open-webui/open-webui) or a browser extension like [nextjs-ollama-llm-ui](https://github.com/jakobhoeg/nextjs-ollama-llm-ui). Did you install anything around the time that ollama started crashing?
Author
Owner

@ShivamSrng commented on GitHub (Nov 5, 2024):

Something, somewhere on your computer, is calling /api/chat and /api/create. It's too quick to be from typing commands in the command line interface, so the most likely explanation is another program, something like Open WebUI or a browser extension like nextjs-ollama-llm-ui. Did you install anything around the time that ollama started crashing?

I don't have anything installed you mentioned here not even such extensions. I started facing this issue yesterday morning, and I don't remember anything installing rather I cleared up some applications from C Drive to get some free space. But I took care that I don't remove something that might cause interference.

<!-- gh-comment-id:2458272493 --> @ShivamSrng commented on GitHub (Nov 5, 2024): > Something, somewhere on your computer, is calling `/api/chat` and `/api/create`. It's too quick to be from typing commands in the command line interface, so the most likely explanation is another program, something like [Open WebUI](https://github.com/open-webui/open-webui) or a browser extension like [nextjs-ollama-llm-ui](https://github.com/jakobhoeg/nextjs-ollama-llm-ui). Did you install anything around the time that ollama started crashing? I don't have anything installed you mentioned here not even such extensions. I started facing this issue yesterday morning, and I don't remember anything installing rather I cleared up some applications from C Drive to get some free space. But I took care that I don't remove something that might cause interference.
Author
Owner

@rick-github commented on GitHub (Nov 5, 2024):

Have you tried un-installing and re-installing ollama?

<!-- gh-comment-id:2458275914 --> @rick-github commented on GitHub (Nov 5, 2024): Have you tried un-installing and re-installing ollama?
Author
Owner

@ShivamSrng commented on GitHub (Nov 5, 2024):

Have you tried un-installing and re-installing ollama?

I did it multiple times, but no success. Even following the windows installation process provided in repo to execute on the GPU still no improvement. Also, is CMake required for successful execution?

<!-- gh-comment-id:2458385622 --> @ShivamSrng commented on GitHub (Nov 5, 2024): > Have you tried un-installing and re-installing ollama? I did it multiple times, but no success. Even following the windows installation process provided in repo to execute on the GPU still no improvement. Also, is CMake required for successful execution?
Author
Owner

@rick-github commented on GitHub (Nov 5, 2024):

cmake is required to build from source, not to run the service. Are you trying build ollama rather than just installing it?

<!-- gh-comment-id:2458390349 --> @rick-github commented on GitHub (Nov 5, 2024): cmake is required to build from source, not to run the service. Are you trying build ollama rather than just installing it?
Author
Owner

@ShivamSrng commented on GitHub (Nov 5, 2024):

cmake is required to build from source, not to run the service. Are you trying build ollama rather than just installing it?

No, I just installed the windows from the official Ollama's website. I was just confirming that it is not required. I have no idea what went wrong. Believe me it even used to work even in these bleeding edge upgrades that I had first. And now it just stops after 4-5 chats. I have installed CUDA properly which is even detected through the torch, downgraded the upgrades. Every CMD command seems to work but still Ollama's failing with those CUDA errors either showing illegal instruction or illegal memory address.

<!-- gh-comment-id:2458398413 --> @ShivamSrng commented on GitHub (Nov 5, 2024): > cmake is required to build from source, not to run the service. Are you trying build ollama rather than just installing it? No, I just installed the windows from the official Ollama's website. I was just confirming that it is not required. I have no idea what went wrong. Believe me it even used to work even in these bleeding edge upgrades that I had first. And now it just stops after 4-5 chats. I have installed CUDA properly which is even detected through the torch, downgraded the upgrades. Every CMD command seems to work but still Ollama's failing with those CUDA errors either showing illegal instruction or illegal memory address.
Author
Owner

@rick-github commented on GitHub (Nov 5, 2024):

I think that more data is required to make progress. Install wireshark as described here, start packet capture, use ollama until you get a crash, and then stop capture and add the file here.

<!-- gh-comment-id:2458409838 --> @rick-github commented on GitHub (Nov 5, 2024): I think that more data is required to make progress. Install wireshark as described [here](https://github.com/ollama/ollama/issues/7163#issuecomment-2442076351), start packet capture, use ollama until you get a crash, and then stop capture and add the file here.
Author
Owner

@nikhil-swamix commented on GitHub (Nov 6, 2024):

set CUDA_VISIBLE_DEVICES=0,1
in env variable,
once model loaded with ollama run <TAG>
run ollama ps
and share output.
best regards,
swamix

<!-- gh-comment-id:2458830484 --> @nikhil-swamix commented on GitHub (Nov 6, 2024): set CUDA_VISIBLE_DEVICES=0,1 in env variable, once model loaded with `ollama run <TAG>` run `ollama ps` and share output. best regards, swamix
Author
Owner

@dhiltgen commented on GitHub (Nov 7, 2024):

@vertikalm how did you install Ollama? If you used our script, did you already have the nvidia driver installed? We added some logic to our install script when this problem first appeared for users here but it only comes into play if we install the driver during install. You can use a similar approach on your system to make the fix permanent.

<!-- gh-comment-id:2461093735 --> @dhiltgen commented on GitHub (Nov 7, 2024): @vertikalm how did you install Ollama? If you used our script, did you already have the nvidia driver installed? We added some logic to our install script when this problem first appeared for users [here](https://github.com/ollama/ollama/blob/main/scripts/install.sh#L358-L367) but it only comes into play if we install the driver during install. You can use a similar approach on your system to make the fix permanent.
Author
Owner

@dhiltgen commented on GitHub (Nov 7, 2024):

@ShivamSrng I would suggest upgrading to 0.4.0 to see if that changes the behavior.

<!-- gh-comment-id:2461101950 --> @dhiltgen commented on GitHub (Nov 7, 2024): @ShivamSrng I would suggest upgrading to 0.4.0 to see if that changes the behavior.
Author
Owner

@ShivamSrng commented on GitHub (Nov 7, 2024):

@ShivamSrng I would suggest upgrading to 0.4.0 to see if that changes the behavior.

So, I tried many ways to resolve this issue, but I had some time-bound work and for that rather than troubleshooting with the library, within the last reply provided by @rick-github I went through the entire thread. I tried my best to get hold of the issue but eventually decided to just perform the clean installation of Windows, having taken the backup. I am sorry for not keeping you updated, but just now I am done with the complete OS re-installation with backup with Ollama working perfectly.

Thank you very much @rick-github @nikhil-swamix and @dhiltgen .

<!-- gh-comment-id:2461116452 --> @ShivamSrng commented on GitHub (Nov 7, 2024): > @ShivamSrng I would suggest upgrading to 0.4.0 to see if that changes the behavior. So, I tried many ways to resolve this issue, but I had some time-bound work and for that rather than troubleshooting with the library, within the last reply provided by @rick-github I went through the entire thread. I tried my best to get hold of the issue but eventually decided to just perform the clean installation of Windows, having taken the backup. I am sorry for not keeping you updated, but just now I am done with the complete OS re-installation with backup with Ollama working perfectly. Thank you very much @rick-github @nikhil-swamix and @dhiltgen .
Author
Owner

@vertikalm commented on GitHub (Nov 12, 2024):

@dhiltgen
Indeed, my installation of ollama was through the official script and it was after the installation of the video driver (in this case "nvidia-driver", but not "noveau").

The thing is that, once I used the little trick that @rick-github gave me, removing and re-inserting the kernel module), then ollama works correctly at the expected speed for days, even after suspensions in RAM.

So the problem could be summarized as that after a hard-reset or a first system start, my RTX3050LP disconnects shortly after from the ollama server, but after the kernel trick, it works as expected until a new "poweroff + boot".

I also comment that, with the same configuration and a GTX1660Ti there is no problem, so could it be some low consumption or inactivity message from my low profile card?

For now I am happy with the solution and I do not want to reinstall the graphic driver through a new ollama script. So again, thank you very much for the help.

<!-- gh-comment-id:2470381726 --> @vertikalm commented on GitHub (Nov 12, 2024): @dhiltgen Indeed, my installation of ollama was through the official script and it was after the installation of the video driver (in this case "nvidia-driver", but not "noveau"). The thing is that, once I used the little trick that @rick-github gave me, removing and re-inserting the kernel module), then ollama works correctly at the expected speed for days, even after suspensions in RAM. So the problem could be summarized as that after a hard-reset or a first system start, my RTX3050LP disconnects shortly after from the ollama server, but after the kernel trick, it works as expected until a new "poweroff + boot". I also comment that, with the same configuration and a GTX1660Ti there is no problem, so could it be some low consumption or inactivity message from my low profile card? For now I am happy with the solution and I do not want to reinstall the graphic driver through a new ollama script. So again, thank you very much for the help.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#66825