[GH-ISSUE #2203] Model not found #47771

Closed
opened 2026-04-28 05:17:03 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @gaborkukucska on GitHub (Jan 26, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2203

First of all, I must say, what a great piece of software Ollama is! THANK YOU for all your work everyone!!!
I am trying to setup MemGPT to use CodeLlama via ollama serve
I've made sure that I've pulled the exact model I want before start up the api but I still get an error when MemGPT is trying to inference the LLM.

I start ollama with:

OLLAMA_HOST=0.0.0.0:63321 ollama serve

then set MemGPT up like this:

? Select LLM inference provider: local
? Select LLM backend (select 'openai' if you have an OpenAI compatible proxy): ollama
? Enter default endpoint: http://127.0.0.1:63321
? Enter default model name (required for Ollama, see: https://memgpt.readme.io/docs/ollama): codellama:7b-instruct-q6_K
? Select default model wrapper (recommended: chatml): chatml
? Select your model's context window (for Mistral 7B models, this is probably 8k / 8192): 8192
? Select embedding provider: local
? Select default preset: memgpt_chat
? Select default persona: sam_pov
? Select default human: basic
? Select storage backend for archival data: local

error log:

Exception: API call got non-200 response code (code=404, msg={"error":"model 'codellama:7b-instruct-q6_K' not found, try pulling it first"}) for address: http://127.0.0.1:63321/api/generate. Make sure that the ollama API server is running and reachable at http://127.0.0.1:63321/api/generate.

The model works perfectly well if I do:

ollama run codellama:7b-instruct-q6_K
Originally created by @gaborkukucska on GitHub (Jan 26, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2203 First of all, I must say, what a great piece of software Ollama is! THANK YOU for all your work everyone!!! I am trying to setup MemGPT to use CodeLlama via `ollama serve` I've made sure that I've pulled the exact model I want before start up the api but I still get an error when MemGPT is trying to inference the LLM. I start ollama with: ``` OLLAMA_HOST=0.0.0.0:63321 ollama serve ``` then set MemGPT up like this: ``` ? Select LLM inference provider: local ? Select LLM backend (select 'openai' if you have an OpenAI compatible proxy): ollama ? Enter default endpoint: http://127.0.0.1:63321 ? Enter default model name (required for Ollama, see: https://memgpt.readme.io/docs/ollama): codellama:7b-instruct-q6_K ? Select default model wrapper (recommended: chatml): chatml ? Select your model's context window (for Mistral 7B models, this is probably 8k / 8192): 8192 ? Select embedding provider: local ? Select default preset: memgpt_chat ? Select default persona: sam_pov ? Select default human: basic ? Select storage backend for archival data: local ``` error log: ``` Exception: API call got non-200 response code (code=404, msg={"error":"model 'codellama:7b-instruct-q6_K' not found, try pulling it first"}) for address: http://127.0.0.1:63321/api/generate. Make sure that the ollama API server is running and reachable at http://127.0.0.1:63321/api/generate. ``` The model works perfectly well if I do: ``` ollama run codellama:7b-instruct-q6_K ```
GiteaMirror added the needs more info label 2026-04-28 05:17:03 -05:00
Author
Owner

@BruceMacD commented on GitHub (Jan 26, 2024):

It could be that you're connecting to a different ollama instance when you run directly if OLLAMA_HOST isn't set for your environment.

Try this: OLLAMA_HOST=0.0.0.0:63321 ollama pull codellama:7b-instruct-q6_K

<!-- gh-comment-id:1912423353 --> @BruceMacD commented on GitHub (Jan 26, 2024): It could be that you're connecting to a different ollama instance when you run directly if `OLLAMA_HOST` isn't set for your environment. Try this: `OLLAMA_HOST=0.0.0.0:63321 ollama pull codellama:7b-instruct-q6_K`
Author
Owner

@gaborkukucska commented on GitHub (Jan 27, 2024):

It could be that you're connecting to a different ollama instance when you run directly if OLLAMA_HOST isn't set for your environment.

Try this: OLLAMA_HOST=0.0.0.0:63321 ollama pull codellama:7b-instruct-q6_K

that command just tells me to use ollama serve instead... also, MemGPT hits the correct ollama api that I launch from the same environment where I pulled the model into...

  1. activate the environment,
  2. then 'ollama pull the-model-name' to download the model I need,
  3. then ollama run the-model-name to check if all OK.
  4. then 'ollama serve` to start the api.
  5. then memgpt configure to set up the parameters
  6. finally memgpt run to initiate the inference

On top of the above mentioned, here is what I see on the ollama side when MemGPT is trying to access:

[GIN] 2024/01/27 - 11:31:00 | 404 |    2.237327ms |    192.168.1.31 | POST     "/api/generate"
<!-- gh-comment-id:1912952687 --> @gaborkukucska commented on GitHub (Jan 27, 2024): > It could be that you're connecting to a different ollama instance when you run directly if `OLLAMA_HOST` isn't set for your environment. > > Try this: `OLLAMA_HOST=0.0.0.0:63321 ollama pull codellama:7b-instruct-q6_K` that command just tells me to use `ollama serve` instead... also, MemGPT hits the correct ollama api that I launch from the same environment where I pulled the model into... 1. activate the environment, 2. then 'ollama pull the-model-name' to download the model I need, 4. then `ollama run the-model-name` to check if all OK. 5. then 'ollama serve` to start the api. 6. then `memgpt configure` to set up the parameters 7. finally `memgpt run` to initiate the inference On top of the above mentioned, here is what I see on the ollama side when MemGPT is trying to access: ``` [GIN] 2024/01/27 - 11:31:00 | 404 | 2.237327ms | 192.168.1.31 | POST "/api/generate" ```
Author
Owner

@mxyng commented on GitHub (Mar 11, 2024):

ollama without OLLAMA_HOST use 127.0.0.1:11434 by default so unless that's set to 0.0.0.0:63321 for both server and client, it'll use 127.0.0.1:11434 regardless of python environment

<!-- gh-comment-id:1989142736 --> @mxyng commented on GitHub (Mar 11, 2024): ollama without OLLAMA_HOST use 127.0.0.1:11434 by default so unless that's set to 0.0.0.0:63321 for both server and client, it'll use 127.0.0.1:11434 regardless of python environment
Author
Owner

@cheginit commented on GitHub (Mar 18, 2024):

You also need to pass the location where the models are. Here's the default location for Linux:

OLLAMA_MODELS=/usr/share/ollama/.ollama/models OLLAMA_HOST=127.0.0.1:11434 ollama serve

Look here for other OSs.

<!-- gh-comment-id:2005184119 --> @cheginit commented on GitHub (Mar 18, 2024): You also need to pass the location where the models are. Here's the default location for Linux: ```console OLLAMA_MODELS=/usr/share/ollama/.ollama/models OLLAMA_HOST=127.0.0.1:11434 ollama serve ``` Look [here](https://github.com/ollama/ollama/blob/main/docs/faq.md#where-are-models-stored) for other OSs.
Author
Owner

@pdevine commented on GitHub (May 17, 2024):

I'm going to go ahead and close this as answered. @gaborkukucska you just need the OLLAMA_HOST variable set correctly for both the client and the server. I hope you got it working!

<!-- gh-comment-id:2118421317 --> @pdevine commented on GitHub (May 17, 2024): I'm going to go ahead and close this as answered. @gaborkukucska you just need the `OLLAMA_HOST` variable set correctly for both the client and the server. I hope you got it working!
Author
Owner

@gaborkukucska commented on GitHub (May 18, 2024):

I'm going to go ahead and close this as answered. @gaborkukucska you just need the OLLAMA_HOST variable set correctly for both the client and the server. I hope you got it working!

Yes all good 👍 Thanks for your reply!

<!-- gh-comment-id:2118569976 --> @gaborkukucska commented on GitHub (May 18, 2024): > I'm going to go ahead and close this as answered. @gaborkukucska you just need the `OLLAMA_HOST` variable set correctly for both the client and the server. I hope you got it working! Yes all good 👍 Thanks for your reply!
Author
Owner

@Anna-Pinewood commented on GitHub (Dec 22, 2024):

there is no clear instructions of what to fix... Could someopne please provide what to do to resolve issue? step by step?

<!-- gh-comment-id:2558608645 --> @Anna-Pinewood commented on GitHub (Dec 22, 2024): there is no clear instructions of what to fix... Could someopne please provide what to do to resolve issue? step by step?
Author
Owner

@BruceMacD commented on GitHub (Dec 22, 2024):

@Anna-Pinewood we will need a bit more context, this error can be a result of a few different scenarios. In this case ollama was running on a remote host, in other cases it may be that the model or tag name or not being specified correctly.

<!-- gh-comment-id:2558627808 --> @BruceMacD commented on GitHub (Dec 22, 2024): @Anna-Pinewood we will need a bit more context, this error can be a result of a few different scenarios. In this case ollama was running on a remote host, in other cases it may be that the model or tag name or not being specified correctly.
Author
Owner

@Anna-Pinewood commented on GitHub (Dec 24, 2024):

@BruceMacD
Thank you for answering.

Actually the problem was magically solved.

I will describe how i run ollama, maybe it will help someone

  1. Run ollama server and model in different terminal windows. Default llama port in taken so i use an alternative one.
OLLAMA_MODELS=/usr/share/ollama/.ollama/models OLLAMA_HOST=127.0.0.1:11436 ollama serve                     
OLLAMA_MODELS=/usr/share/ollama/.ollama/models OLLAMA_HOST=127.0.0.1:11436 ollama run qwq:32b-preview-q4_K_M
  1. My minimal python ollama run code
import requests

data = {
    "model": "qwq:32b-preview-q4_K_M",
    "prompt": "What is the meaning of life?",
}
llama_port = 11436
url = f"http://localhost:{llama_port}/api/generate"
response = requests.post(url, json=data)
response

It works, it's just acting strange, returning response like a single word....
But the initial problem is gone.

>>> eval(response.text.split('\n')[0])
{'model': 'qwq:32b-preview-q4_K_M', 'created_at': '2024-12-24T08:24:39.006089838Z', 'response': 'The', 'done': False}
>>> eval(response.text.split('\n')[1])
{'model': 'qwq:32b-preview-q4_K_M', 'created_at': '2024-12-24T08:24:39.391406647Z', 'response': ' meaning', 'done': False}
>>> eval(response.text.split('\n')[2])
{'model': 'qwq:32b-preview-q4_K_M', 'created_at': '2024-12-24T08:24:39.771685857Z', 'response': ' of', 'done': False}
>>> eval(response.text.split('\n')[3])
{'model': 'qwq:32b-preview-q4_K_M', 'created_at': '2024-12-24T08:24:40.157191278Z', 'response': ' life', 'done': False}

The problem I had looked like this:
"error": "model not found, try pulling if first", even though i've already pulled it and it is definitely available.
image

<!-- gh-comment-id:2560835753 --> @Anna-Pinewood commented on GitHub (Dec 24, 2024): @BruceMacD Thank you for answering. Actually the problem was magically solved. I will describe how i run ollama, maybe it will help someone 1. Run ollama server and model in different terminal windows. Default llama port in taken so i use an alternative one. ``` OLLAMA_MODELS=/usr/share/ollama/.ollama/models OLLAMA_HOST=127.0.0.1:11436 ollama serve ``` ``` OLLAMA_MODELS=/usr/share/ollama/.ollama/models OLLAMA_HOST=127.0.0.1:11436 ollama run qwq:32b-preview-q4_K_M ``` 2. My minimal python ollama run code ``` import requests data = { "model": "qwq:32b-preview-q4_K_M", "prompt": "What is the meaning of life?", } llama_port = 11436 url = f"http://localhost:{llama_port}/api/generate" response = requests.post(url, json=data) response ``` It works, it's just acting strange, returning response like a single word.... But the initial problem is gone. ``` >>> eval(response.text.split('\n')[0]) {'model': 'qwq:32b-preview-q4_K_M', 'created_at': '2024-12-24T08:24:39.006089838Z', 'response': 'The', 'done': False} >>> eval(response.text.split('\n')[1]) {'model': 'qwq:32b-preview-q4_K_M', 'created_at': '2024-12-24T08:24:39.391406647Z', 'response': ' meaning', 'done': False} >>> eval(response.text.split('\n')[2]) {'model': 'qwq:32b-preview-q4_K_M', 'created_at': '2024-12-24T08:24:39.771685857Z', 'response': ' of', 'done': False} >>> eval(response.text.split('\n')[3]) {'model': 'qwq:32b-preview-q4_K_M', 'created_at': '2024-12-24T08:24:40.157191278Z', 'response': ' life', 'done': False} ``` The problem I had looked like this: "error": "model not found, try pulling if first", even though i've already pulled it and it is definitely available. ![image](https://github.com/user-attachments/assets/b0236756-fb63-419e-8465-c4325a75cbeb)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#47771