[GH-ISSUE #2123] After upgrading Ollama. It just doesn't run anymore any model: Error: Post "http://127.0.0.1:11434/api/generate": EOF #1214

Closed
opened 2026-04-12 10:59:19 -05:00 by GiteaMirror · 19 comments
Owner

Originally created by @venturaEffect on GitHub (Jan 21, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2123

I see this error everywhere during months. There are plenty of releases but this error continuesly appears.

Followed on Discord, searched on the web, saw issues on the repo related to this.

Did things like creating a Modelfile "dolphin-mistral" with FROM dolphin-2.1-mistral-7b PARAMETER num_gpu 0.

Ollama upgraded: curl https://ollama.ai/install.sh | sh

Nothing:

ollama run dolphin-mistral Error: Post "http://127.0.0.1:11434/api/generate": EOF

Any suggestion will be much appreciated.

Originally created by @venturaEffect on GitHub (Jan 21, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2123 I see this error everywhere during months. There are plenty of releases but this error continuesly appears. Followed on Discord, searched on the web, saw issues on the repo related to this. Did things like creating a Modelfile "dolphin-mistral" with `FROM dolphin-2.1-mistral-7b PARAMETER num_gpu 0`. Ollama upgraded: `curl https://ollama.ai/install.sh | sh` Nothing: `ollama run dolphin-mistral Error: Post "http://127.0.0.1:11434/api/generate": EOF` Any suggestion will be much appreciated.
Author
Owner

@ssebastianoo commented on GitHub (Jan 21, 2024):

Same thing, if I do ollama run llama2 it works fine but ollama run mario (created from this) raises this error:

Error: Post "http://127.0.0.1:11434/api/generate": EOF
<!-- gh-comment-id:1902682607 --> @ssebastianoo commented on GitHub (Jan 21, 2024): Same thing, if I do `ollama run llama2` it works fine but `ollama run mario` (created from [this](https://github.com/jmorganca/ollama?tab=readme-ov-file#customize-a-prompt)) raises this error: ``` Error: Post "http://127.0.0.1:11434/api/generate": EOF ```
Author
Owner

@ssebastianoo commented on GitHub (Jan 21, 2024):

watching ollama serve I found out this:

2024/01/21 16:26:56 images.go:430: [model] - llama2
2024/01/21 16:26:56 images.go:430: [temperature] - 1
2024/01/21 16:26:56 images.go:430: [system] - 
You are Mario from super mario bros, acting as an assistant.
[GIN] 2024/01/21 - 16:26:56 | 200 |    2.255856ms |       127.0.0.1 | POST     "/api/create"
[GIN] 2024/01/21 - 16:27:04 | 200 |      37.785µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/01/21 - 16:27:04 | 200 |     835.181µs |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/01/21 - 16:27:04 | 200 |     741.592µs |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/01/21 - 16:27:04 | 200 |     549.943µs |       127.0.0.1 | POST     "/api/generate"
2024/01/21 16:27:05 ext_server_common.go:158: loaded 0 images

CUDA error 2 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:6600: out of memory
current device: 0
GGML_ASSERT: /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:6600: !"CUDA error"
Aborted
<!-- gh-comment-id:1902687216 --> @ssebastianoo commented on GitHub (Jan 21, 2024): watching `ollama serve` I found out this: ``` 2024/01/21 16:26:56 images.go:430: [model] - llama2 2024/01/21 16:26:56 images.go:430: [temperature] - 1 2024/01/21 16:26:56 images.go:430: [system] - You are Mario from super mario bros, acting as an assistant. [GIN] 2024/01/21 - 16:26:56 | 200 | 2.255856ms | 127.0.0.1 | POST "/api/create" [GIN] 2024/01/21 - 16:27:04 | 200 | 37.785µs | 127.0.0.1 | HEAD "/" [GIN] 2024/01/21 - 16:27:04 | 200 | 835.181µs | 127.0.0.1 | POST "/api/show" [GIN] 2024/01/21 - 16:27:04 | 200 | 741.592µs | 127.0.0.1 | POST "/api/show" [GIN] 2024/01/21 - 16:27:04 | 200 | 549.943µs | 127.0.0.1 | POST "/api/generate" 2024/01/21 16:27:05 ext_server_common.go:158: loaded 0 images CUDA error 2 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:6600: out of memory current device: 0 GGML_ASSERT: /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:6600: !"CUDA error" Aborted ```
Author
Owner

@venturaEffect commented on GitHub (Jan 21, 2024):

Thanks for sharing. Did you solve it? It seems more people are facing this issue.

<!-- gh-comment-id:1902704418 --> @venturaEffect commented on GitHub (Jan 21, 2024): Thanks for sharing. Did you solve it? It seems more people are facing this issue.
Author
Owner

@basillicus commented on GitHub (Jan 21, 2024):

It may be related to #1952 ?

<!-- gh-comment-id:1902707604 --> @basillicus commented on GitHub (Jan 21, 2024): It may be related to #1952 ?
Author
Owner

@venturaEffect commented on GitHub (Jan 21, 2024):

It may be related to #1952 ?

It could and makes sense as I'm using it by doing a RAG on Langchain. But there is no really a workaround without the RAG. Is there any solution that you know that could solve the issue? I'm using dolphin-mistral because is a good one and needs to be uncensored.

Appreciate

<!-- gh-comment-id:1902722841 --> @venturaEffect commented on GitHub (Jan 21, 2024): > It may be related to #1952 ? It could and makes sense as I'm using it by doing a RAG on Langchain. But there is no really a workaround without the RAG. Is there any solution that you know that could solve the issue? I'm using dolphin-mistral because is a good one and needs to be uncensored. Appreciate
Author
Owner

@t0m3k commented on GitHub (Jan 21, 2024):

Same problem for me on Manjaro, 6900 xtx:

Ollama serve

> ollama serve
2024/01/21 22:00:11 images.go:810: INFO total blobs: 6
2024/01/21 22:00:11 images.go:817: INFO total unused blobs removed: 0
2024/01/21 22:00:11 routes.go:943: INFO Listening on 127.0.0.1:11434 (version 0.1.21)
2024/01/21 22:00:11 payload_common.go:106: INFO Extracting dynamic libraries...
2024/01/21 22:00:13 payload_common.go:145: INFO Dynamic LLM libraries [rocm_v6 cpu cpu_avx cuda_v11 rocm_v5 cpu_avx2]
2024/01/21 22:00:13 gpu.go:91: INFO Detecting GPU type
2024/01/21 22:00:13 gpu.go:210: INFO Searching for GPU management library libnvidia-ml.so
2024/01/21 22:00:13 gpu.go:256: INFO Discovered GPU libraries: []
2024/01/21 22:00:13 gpu.go:210: INFO Searching for GPU management library librocm_smi64.so
2024/01/21 22:00:13 gpu.go:256: INFO Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0]
2024/01/21 22:00:13 gpu.go:106: INFO Radeon GPU detected
[GIN] 2024/01/21 - 22:00:15 | 200 |       40.73µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/01/21 - 22:00:15 | 200 |     376.902µs |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/01/21 - 22:00:15 | 200 |     236.512µs |       127.0.0.1 | POST     "/api/show"
2024/01/21 22:00:15 cpu_common.go:11: INFO CPU has AVX2
loading library /tmp/ollama1546965028/rocm_v5/libext_server.so
2024/01/21 22:00:15 dyn_ext_server.go:90: INFO Loading Dynamic llm server: /tmp/ollama1546965028/rocm_v5/libext_server.so
2024/01/21 22:00:15 dyn_ext_server.go:139: INFO Initializing llama server
free(): invalid pointer
[1]    275518 IOT instruction (core dumped)  ollama serve

Run

❯ ollama run codellama
Error: Post "http://127.0.0.1:11434/api/chat": EOF
<!-- gh-comment-id:1902781055 --> @t0m3k commented on GitHub (Jan 21, 2024): Same problem for me on Manjaro, 6900 xtx: ## Ollama serve ``` > ollama serve 2024/01/21 22:00:11 images.go:810: INFO total blobs: 6 2024/01/21 22:00:11 images.go:817: INFO total unused blobs removed: 0 2024/01/21 22:00:11 routes.go:943: INFO Listening on 127.0.0.1:11434 (version 0.1.21) 2024/01/21 22:00:11 payload_common.go:106: INFO Extracting dynamic libraries... 2024/01/21 22:00:13 payload_common.go:145: INFO Dynamic LLM libraries [rocm_v6 cpu cpu_avx cuda_v11 rocm_v5 cpu_avx2] 2024/01/21 22:00:13 gpu.go:91: INFO Detecting GPU type 2024/01/21 22:00:13 gpu.go:210: INFO Searching for GPU management library libnvidia-ml.so 2024/01/21 22:00:13 gpu.go:256: INFO Discovered GPU libraries: [] 2024/01/21 22:00:13 gpu.go:210: INFO Searching for GPU management library librocm_smi64.so 2024/01/21 22:00:13 gpu.go:256: INFO Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0] 2024/01/21 22:00:13 gpu.go:106: INFO Radeon GPU detected [GIN] 2024/01/21 - 22:00:15 | 200 | 40.73µs | 127.0.0.1 | HEAD "/" [GIN] 2024/01/21 - 22:00:15 | 200 | 376.902µs | 127.0.0.1 | POST "/api/show" [GIN] 2024/01/21 - 22:00:15 | 200 | 236.512µs | 127.0.0.1 | POST "/api/show" 2024/01/21 22:00:15 cpu_common.go:11: INFO CPU has AVX2 loading library /tmp/ollama1546965028/rocm_v5/libext_server.so 2024/01/21 22:00:15 dyn_ext_server.go:90: INFO Loading Dynamic llm server: /tmp/ollama1546965028/rocm_v5/libext_server.so 2024/01/21 22:00:15 dyn_ext_server.go:139: INFO Initializing llama server free(): invalid pointer [1] 275518 IOT instruction (core dumped) ollama serve ``` ## Run ``` ❯ ollama run codellama Error: Post "http://127.0.0.1:11434/api/chat": EOF ```
Author
Owner

@ssebastianoo commented on GitHub (Jan 21, 2024):

I can't rn but did anyone try to do the same things using an older version from the releases?

<!-- gh-comment-id:1902784829 --> @ssebastianoo commented on GitHub (Jan 21, 2024): I can't rn but did anyone try to do the same things using an older version from the releases?
Author
Owner

@dhiltgen commented on GitHub (Jan 22, 2024):

@venturaEffect could you provide the server logs so we can see why it crashed?

@ssebastianoo as others have noted, we're continuing to refine our memory prediction logic to balance using as much GPU memory as possible, without exceeding the capacity. Can you clarify which version of ollama you were running? 0.1.21 has fixes that may solve this for you, but if you still see OOMs please let us know.

@t0m3k your crash looks like a Radeon related defect. Depending on what @venturaEffect ran into, we might want to track the Radeon crash with a different issue.

<!-- gh-comment-id:1904707030 --> @dhiltgen commented on GitHub (Jan 22, 2024): @venturaEffect could you provide the server logs so we can see why it crashed? @ssebastianoo as others have noted, we're continuing to refine our memory prediction logic to balance using as much GPU memory as possible, without exceeding the capacity. Can you clarify which version of ollama you were running? [0.1.21](https://github.com/jmorganca/ollama/releases/tag/v0.1.21) has fixes that may solve this for you, but if you still see OOMs please let us know. @t0m3k your crash looks like a Radeon related defect. Depending on what @venturaEffect ran into, we might want to track the Radeon crash with a different issue.
Author
Owner

@venturaEffect commented on GitHub (Jan 22, 2024):

@dhiltgen thanks! I have downgraded to older version suggested by @jmorganca . This solved the issue but now I'm facing another problem and it is that I can't use it for a RAG on Langchain because the context window is very limited. So it isn't useful at all. I'm trying to figure out how to solve this but it seems with Ollama llms it looks like a no exit road. Don't like it because we are loosing the power to use all these llms and are depending again on OpenAI and it's polite GPT. If you know any solution would be super appreciated 👍

<!-- gh-comment-id:1904851859 --> @venturaEffect commented on GitHub (Jan 22, 2024): @dhiltgen thanks! I have downgraded to older version suggested by @jmorganca . This solved the issue but now I'm facing another problem and it is that I can't use it for a RAG on Langchain because the context window is very limited. So it isn't useful at all. I'm trying to figure out how to solve this but it seems with Ollama llms it looks like a no exit road. Don't like it because we are loosing the power to use all these llms and are depending again on OpenAI and it's polite GPT. If you know any solution would be super appreciated 👍
Author
Owner

@nathaniel-brough commented on GitHub (Jan 23, 2024):

I'm also experiencing this problem but only in some cases. e.g.

# These work fine
$ ollama run phi # 1.6GB, 2.7B parameters
$ ollama run llama2 # 3.8GB, 7B parameters

# This crashes with the same error
$ ollama run stable-code  # 1.6GB, 3B parameters
Error: Post "http://127.0.0.1:11434/api/generate": EOF

I have been running these using the docker image on a fairly low end laptop GPU and/or hybrid CPU in the case of llama2.

GPU specs:

  • NVIDIA GeForce MX150
  • CUDA core 384
  • Total dedicated memory 2048MB

I did find it interesting that stable-code 3B parameters, is approximately the same size as phi with 2.7B parameters. I would have expected the size to be about 10% difference between the two models. Perhaps there is some miscalculation in the model size which might make the CUDA memory estimation wrong?

@dhiltgen I've attached the server log here. i.e. the output of docker logs ollama 2> ~/ollama_crash.txt
ollama_crash.txt

<!-- gh-comment-id:1906678414 --> @nathaniel-brough commented on GitHub (Jan 23, 2024): I'm also experiencing this problem but only in some cases. e.g. ``` # These work fine $ ollama run phi # 1.6GB, 2.7B parameters $ ollama run llama2 # 3.8GB, 7B parameters # This crashes with the same error $ ollama run stable-code # 1.6GB, 3B parameters Error: Post "http://127.0.0.1:11434/api/generate": EOF ``` I have been running these using the docker image on a fairly low end laptop GPU and/or hybrid CPU in the case of llama2. GPU specs: - NVIDIA GeForce MX150 - CUDA core 384 - Total dedicated memory 2048MB I did find it interesting that `stable-code` 3B parameters, is approximately the same size as `phi` with 2.7B parameters. I would have expected the size to be about 10% difference between the two models. Perhaps there is some miscalculation in the model size which might make the CUDA memory estimation wrong? @dhiltgen I've attached the server log here. i.e. the output of `docker logs ollama 2> ~/ollama_crash.txt` [ollama_crash.txt](https://github.com/jmorganca/ollama/files/14028441/ollama_crash.txt)
Author
Owner

@dhiltgen commented on GitHub (Jan 23, 2024):

@silvergasp looks like you hit a GPU out-of-memory on 0.1.20. We've added some fixes to 0.1.21 to improve low memory GPUs, but the algorithm still isn't quite perfect.

@venturaEffect if you can try with 0.1.21 and share the server logs that will help us understand if this is a known issue we're working on, or something new.

<!-- gh-comment-id:1907083966 --> @dhiltgen commented on GitHub (Jan 23, 2024): @silvergasp looks like you hit a GPU out-of-memory on 0.1.20. We've added some fixes to [0.1.21](https://github.com/ollama/ollama/releases/tag/v0.1.21) to improve low memory GPUs, but the algorithm still isn't quite perfect. @venturaEffect if you can try with [0.1.21](https://github.com/ollama/ollama/releases/tag/v0.1.21) and share the server logs that will help us understand if this is a known issue we're working on, or something new.
Author
Owner

@YeryODell commented on GitHub (Jan 24, 2024):

@venturaEffect could you provide the server logs so we can see why it crashed?

@ssebastianoo as others have noted, we're continuing to refine our memory prediction logic to balance using as much GPU memory as possible, without exceeding the capacity. Can you clarify which version of ollama you were running? 0.1.21 has fixes that may solve this for you, but if you still see OOMs please let us know.

@t0m3k your crash looks like a Radeon related defect. Depending on what @venturaEffect ran into, we might want to track the Radeon crash with a different issue.

I'm having the same exact issue and have tried all the same fixes. I can't find any instructions on how to check my version or to upgrade to a pre-release version. Can you please provide instructions for Ubuntu?

<!-- gh-comment-id:1907244542 --> @YeryODell commented on GitHub (Jan 24, 2024): > @venturaEffect could you provide the server logs so we can see why it crashed? > > @ssebastianoo as others have noted, we're continuing to refine our memory prediction logic to balance using as much GPU memory as possible, without exceeding the capacity. Can you clarify which version of ollama you were running? [0.1.21](https://github.com/jmorganca/ollama/releases/tag/v0.1.21) has fixes that may solve this for you, but if you still see OOMs please let us know. > > @t0m3k your crash looks like a Radeon related defect. Depending on what @venturaEffect ran into, we might want to track the Radeon crash with a different issue. I'm having the same exact issue and have tried all the same fixes. I can't find any instructions on how to check my version or to upgrade to a pre-release version. Can you please provide instructions for Ubuntu?
Author
Owner

@dhiltgen commented on GitHub (Jan 24, 2024):

I'm having the same exact issue and have tried all the same fixes. I can't find any instructions on how to check my version or to upgrade to a pre-release version. Can you please provide instructions for Ubuntu?

To do a quick test:

wget https://github.com/ollama/ollama/releases/download/v0.1.21/ollama-linux-amd64
chmod a+x ollama-linux-amd64
sudo systemctl stop ollama
OLLAMA_DEBUG=1 ./ollama-linux-amd64 serve
<!-- gh-comment-id:1908735512 --> @dhiltgen commented on GitHub (Jan 24, 2024): > I'm having the same exact issue and have tried all the same fixes. I can't find any instructions on how to check my version or to upgrade to a pre-release version. Can you please provide instructions for Ubuntu? To do a quick test: ``` wget https://github.com/ollama/ollama/releases/download/v0.1.21/ollama-linux-amd64 chmod a+x ollama-linux-amd64 sudo systemctl stop ollama OLLAMA_DEBUG=1 ./ollama-linux-amd64 serve ```
Author
Owner

@venturaEffect commented on GitHub (Jan 25, 2024):

@silvergasp looks like you hit a GPU out-of-memory on 0.1.20. We've added some fixes to 0.1.21 to improve low memory GPUs, but the algorithm still isn't quite perfect.

@venturaEffect if you can try with 0.1.21 and share the server logs that will help us understand if this is a known issue we're working on, or something new.

I've done that already and shared to @jmorganca some days ago on Discord. He is aware of it. The problem is that even downgrading to a version that doesn't give this error the problem I'm facing is that it doesn't work for RAGs because of it's context window limitation issue. This has made me look for an alternative with LlamaIndex using their custom models. In any case I would love to use Ollama and Langchain but having this big limitation for RAGs it isn't very useful.

<!-- gh-comment-id:1910373070 --> @venturaEffect commented on GitHub (Jan 25, 2024): > @silvergasp looks like you hit a GPU out-of-memory on 0.1.20. We've added some fixes to [0.1.21](https://github.com/ollama/ollama/releases/tag/v0.1.21) to improve low memory GPUs, but the algorithm still isn't quite perfect. > > @venturaEffect if you can try with [0.1.21](https://github.com/ollama/ollama/releases/tag/v0.1.21) and share the server logs that will help us understand if this is a known issue we're working on, or something new. I've done that already and shared to @jmorganca some days ago on Discord. He is aware of it. The problem is that even downgrading to a version that doesn't give this error the problem I'm facing is that it doesn't work for RAGs because of it's context window limitation issue. This has made me look for an alternative with LlamaIndex using their custom models. In any case I would love to use Ollama and Langchain but having this big limitation for RAGs it isn't very useful.
Author
Owner

@jmorganca commented on GitHub (Jan 25, 2024):

@venturaEffect I'm so sorry you hit an error with large context windows. Will be fixing this soon, keep an eye on https://github.com/ollama/ollama/issues/1952

<!-- gh-comment-id:1910646649 --> @jmorganca commented on GitHub (Jan 25, 2024): @venturaEffect I'm so sorry you hit an error with large context windows. Will be fixing this soon, keep an eye on https://github.com/ollama/ollama/issues/1952
Author
Owner

@venturaEffect commented on GitHub (Jan 25, 2024):

@venturaEffect I'm so sorry you hit an error with large context windows. Will be fixing this soon, keep an eye on https://github.com/ollama/ollama/issues/1952

Following

<!-- gh-comment-id:1910697000 --> @venturaEffect commented on GitHub (Jan 25, 2024): > @venturaEffect I'm so sorry you hit an error with large context windows. Will be fixing this soon, keep an eye on https://github.com/ollama/ollama/issues/1952 Following
Author
Owner

@basillicus commented on GitHub (Jan 26, 2024):

Is there any solution that you know that could solve the issue?

It is not a solution but a workaround I am using until the bug is solved. I believe the problem is that ollama offloads more layers to the GPU than it will be able to handle. So I just trial-and-error change the number of layers to be offloaded to the GPU manually for the model you want to use until the model works.

ollama show dolphin-mistral --modelfile

will show the Modelfile of the model. I just use this Modelfile using FROM dolphin-mistral as the base model and adding PARAMETER num_gpu x

Then create the model:
ollama create dolphin-mistral_numGPU -f Modelfile_num_gpu_x
And keep modifying x until the model works.

EDIT: version 0.1.22 fixes my problem of offloading too many layers to the GPU.

<!-- gh-comment-id:1911728749 --> @basillicus commented on GitHub (Jan 26, 2024): > Is there any solution that you know that could solve the issue? It is not a solution but a workaround I am using until the bug is solved. I believe the problem is that ollama offloads more layers to the GPU than it will be able to handle. So I just trial-and-error change the number of layers to be offloaded to the GPU manually for the model you want to use until the model works. `ollama show dolphin-mistral --modelfile` will show the Modelfile of the model. I just use this Modelfile using `FROM dolphin-mistral `as the base model and adding `PARAMETER num_gpu x` Then create the model: `ollama create dolphin-mistral_numGPU -f Modelfile_num_gpu_x ` And keep modifying x until the model works. EDIT: version 0.1.22 fixes my problem of offloading too many layers to the GPU.
Author
Owner

@venturaEffect commented on GitHub (Jan 26, 2024):

Is there any solution that you know that could solve the issue?

It is not a solution but a workaround I am using until the bug is solved. I believe the problem is that ollama offloads more layers to the GPU than it will be able to handle. So I just trial-and-error change the number of layers to be offloaded to the GPU manually for the model you want to use until the model works.

ollama show dolphin-mistral --modelfile

will show the Modelfile of the model. I just use this Modelfile using FROM dolphin-mistral as the base model and adding PARAMETER num_gpu x

Then create the model:
ollama create dolphin-mistral_numGPU -f Modelfile_num_gpu_x
And keep modifying x until the model works.

Thanks, but this wouldn't solve the problem of context window limitation for RAGs with Ollama and Langchain I guess. It is just for the issue with the last Ollama version.

<!-- gh-comment-id:1912366276 --> @venturaEffect commented on GitHub (Jan 26, 2024): > > > Is there any solution that you know that could solve the issue? > > It is not a solution but a workaround I am using until the bug is solved. I believe the problem is that ollama offloads more layers to the GPU than it will be able to handle. So I just trial-and-error change the number of layers to be offloaded to the GPU manually for the model you want to use until the model works. > > `ollama show dolphin-mistral --modelfile` > > will show the Modelfile of the model. I just use this Modelfile using `FROM dolphin-mistral `as the base model and adding `PARAMETER num_gpu x` > > Then create the model: > `ollama create dolphin-mistral_numGPU -f Modelfile_num_gpu_x > ` > And keep modifying x until the model works. > > Thanks, but this wouldn't solve the problem of context window limitation for RAGs with Ollama and Langchain I guess. It is just for the issue with the last Ollama version.
Author
Owner

@nathaniel-brough commented on GitHub (Jan 26, 2024):

@silvergasp looks like you hit a GPU out-of-memory on 0.1.20. We've added some fixes to 0.1.21 to improve low memory GPUs, but the algorithm still isn't quite perfect.

Just tried the latest docker image. Looks like it's working fine for me now. Many thanks!

<!-- gh-comment-id:1912432911 --> @nathaniel-brough commented on GitHub (Jan 26, 2024): > @silvergasp looks like you hit a GPU out-of-memory on 0.1.20. We've added some fixes to [0.1.21](https://github.com/ollama/ollama/releases/tag/v0.1.21) to improve low memory GPUs, but the algorithm still isn't quite perfect. Just tried the latest docker image. Looks like it's working fine for me now. Many thanks!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#1214