[GH-ISSUE #9757] Facing issue on docker POD container ollama._types.ResponseError: (status code: 503) #52887

Open
opened 2026-04-29 01:16:29 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @geereddy-vmeg on GitHub (Mar 14, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9757

Hi Team,

I made a simple chat bot application in my local. Build an Image of the application and ran the docker image in my local and every thing was perfect till here.

Then I have connect to Kubernetes cluster and deployed on the cluster on specific namespace. Currently it is running on the pod (let say podx). When I am trying to test it I am facing below error

ollama._types.ResponseError: (status code: 503) Traceback: File "/app/Application.py", line 172, in <module> response = sql_chain.invoke({ File "/usr/local/lib/python3.9/site-packages/langchain_core/runnables/base.py", line 3016, in invoke input = context.run(step.invoke, input, config) File "/usr/local/lib/python3.9/site-packages/langchain_core/language_models/llms.py", line 387, in invoke self.generate_prompt( File "/usr/local/lib/python3.9/site-packages/langchain_core/language_models/llms.py", line 760, in generate_prompt return self.generate(prompt_strings, stop=stop, callbacks=callbacks, **kwargs) File "/usr/local/lib/python3.9/site-packages/langchain_core/language_models/llms.py", line 963, in generate output = self._generate_helper( File "/usr/local/lib/python3.9/site-packages/langchain_core/language_models/llms.py", line 784, in _generate_helper self._generate( File "/usr/local/lib/python3.9/site-packages/langchain_ollama/llms.py", line 288, in _generate final_chunk = self._stream_with_aggregation( File "/usr/local/lib/python3.9/site-packages/langchain_ollama/llms.py", line 256, in _stream_with_aggregation for stream_resp in self._create_generate_stream(prompt, stop, **kwargs): File "/usr/local/lib/python3.9/site-packages/langchain_ollama/llms.py", line 211, in _create_generate_stream yield from self._client.generate( File "/usr/local/lib/python3.9/site-packages/ollama/_client.py", line 168, in inner raise ResponseError(e.response.text, e.response.status_code) from None

FYI

the port number where ollama is running the model seems to be correct
root 1006 49.8 2.1 10857532 8306012 ? Rl 06:24 5:52 /usr/local/lib/ollama/runners/cpu_avx2/ollama_llama_server runner --model /root/.ollama/models/blobs/sha256-b559938ab7a0392fc9ea9675b82280f2a15669ec3e0e0fc491c9cb0a7681cf94 --ctx-size 8192 --batch-size 512 --threads 32 --no-mmap --parallel 4 --port 44483

When I run the curl command I am able to get the output but after very long time
[root@prod2-data-ops-gpt-7fd9c55ddc-8prsz app]# curl -X POST http://localhost:11434/api/generate -d '{
"model": "mistral-nemo:latest",
"prompt": "Hi!"
}'
{"model":"mistral-nemo:latest","created_at":"2025-03-14T06:56:23.367741829Z","response":"Hello","done":false}
{"model":"mistral-nemo:latest","created_at":"2025-03-14T07:02:49.867305932Z","response":"!","done":false}
{"model":"mistral-nemo:latest","created_at":"2025-03-14T07:08:57.667542738Z","response":" How","done":false}
{"model":"mistral-nemo:latest","created_at":"2025-03-14T07:15:16.966139514Z","response":" can","done":false}

And when I try to request using python script it says 503 eror

[root@prod2-data-ops-gpt-7fd9c55ddc-8prsz app]# python
Python 3.9.20 (main, Sep 26 2024, 20:59:47)
[GCC 8.5.0 20210514 (Red Hat 8.5.0-22)] on linux
Type "help", "copyright", "credits" or "license" for more information.

import requests
url = "http://localhost:11434/api/generate"
data = {"model": "mistral-nemo:latest", "prompt": "What is the capital of France?"}
response = requests.post(url, json=data)
print(response)
<Response [503]>
print(response.text)

print(response.json)
<bound method Response.json of <Response [503]>>

Could you please help me how can I resolve this issue. I tried to trouble shoot it for long time but didn't found any cl

Originally created by @geereddy-vmeg on GitHub (Mar 14, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9757 Hi Team, I made a simple chat bot application in my local. Build an Image of the application and ran the docker image in my local and every thing was perfect till here. Then I have connect to Kubernetes cluster and deployed on the cluster on specific namespace. Currently it is running on the pod (let say podx). When I am trying to test it I am facing below error `ollama._types.ResponseError: (status code: 503) Traceback: File "/app/Application.py", line 172, in <module> response = sql_chain.invoke({ File "/usr/local/lib/python3.9/site-packages/langchain_core/runnables/base.py", line 3016, in invoke input = context.run(step.invoke, input, config) File "/usr/local/lib/python3.9/site-packages/langchain_core/language_models/llms.py", line 387, in invoke self.generate_prompt( File "/usr/local/lib/python3.9/site-packages/langchain_core/language_models/llms.py", line 760, in generate_prompt return self.generate(prompt_strings, stop=stop, callbacks=callbacks, **kwargs) File "/usr/local/lib/python3.9/site-packages/langchain_core/language_models/llms.py", line 963, in generate output = self._generate_helper( File "/usr/local/lib/python3.9/site-packages/langchain_core/language_models/llms.py", line 784, in _generate_helper self._generate( File "/usr/local/lib/python3.9/site-packages/langchain_ollama/llms.py", line 288, in _generate final_chunk = self._stream_with_aggregation( File "/usr/local/lib/python3.9/site-packages/langchain_ollama/llms.py", line 256, in _stream_with_aggregation for stream_resp in self._create_generate_stream(prompt, stop, **kwargs): File "/usr/local/lib/python3.9/site-packages/langchain_ollama/llms.py", line 211, in _create_generate_stream yield from self._client.generate( File "/usr/local/lib/python3.9/site-packages/ollama/_client.py", line 168, in inner raise ResponseError(e.response.text, e.response.status_code) from None` FYI the port number where ollama is running the model seems to be correct root 1006 49.8 2.1 10857532 8306012 ? Rl 06:24 5:52 /usr/local/lib/ollama/runners/cpu_avx2/ollama_llama_server runner --model /root/.ollama/models/blobs/sha256-b559938ab7a0392fc9ea9675b82280f2a15669ec3e0e0fc491c9cb0a7681cf94 --ctx-size 8192 --batch-size 512 --threads 32 --no-mmap --parallel 4 --port 44483 When I run the curl command I am able to get the output but after very long time [root@prod2-data-ops-gpt-7fd9c55ddc-8prsz app]# curl -X POST http://localhost:11434/api/generate -d '{ "model": "mistral-nemo:latest", "prompt": "Hi!" }' {"model":"mistral-nemo:latest","created_at":"2025-03-14T06:56:23.367741829Z","response":"Hello","done":false} {"model":"mistral-nemo:latest","created_at":"2025-03-14T07:02:49.867305932Z","response":"!","done":false} {"model":"mistral-nemo:latest","created_at":"2025-03-14T07:08:57.667542738Z","response":" How","done":false} {"model":"mistral-nemo:latest","created_at":"2025-03-14T07:15:16.966139514Z","response":" can","done":false} And when I try to request using python script it says 503 eror [root@prod2-data-ops-gpt-7fd9c55ddc-8prsz app]# python Python 3.9.20 (main, Sep 26 2024, 20:59:47) [GCC 8.5.0 20210514 (Red Hat 8.5.0-22)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import requests >>> url = "http://localhost:11434/api/generate" >>> data = {"model": "mistral-nemo:latest", "prompt": "What is the capital of France?"} >>> response = requests.post(url, json=data) >>> print(response) <Response [503]> >>> print(response.text) >>> print(response.json) <bound method Response.json of <Response [503]>> Could you please help me how can I resolve this issue. I tried to trouble shoot it for long time but didn't found any cl
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#52887