[GH-ISSUE #10256] model requires more system memory (42.8 GiB) than is available (12.9 GiB) - Memory leak? #68787

Closed
opened 2026-05-04 15:11:27 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @khteh on GitHub (Apr 13, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10256

Inside the k8s pod:

root@ollama-0:/# free -h
              total        used        free      shared  buff/cache   available
Mem:           68Gi        53Gi       821Mi       3.9Gi        13Gi       9.9Gi
Swap:         2.0Gi       2.0Gi       0.0Ki
root@ollama-0:/# ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 07:44 ?        00:00:00 /bin/bash /usr/local/bin/run.sh bash
root           7       1  1 07:44 ?        00:00:54 ollama serve
root       20568       7  0 09:01 ?        00:00:00 /usr/bin/ollama runner --model /models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 --ctx-size 8192 --bat
root       20609       0  0 09:05 pts/0    00:00:00 bash
root       20618   20609  0 09:05 pts/0    00:00:00 ps -ef

Exception when trying to run langchain app:

ResponseError('model requires more system memory (42.8 GiB) than is available (12.9 GiB)')Traceback (most recent call last):


  File "/home/khteh/.local/share/virtualenvs/rag-agent-YeW3dxEa/lib/python3.12/site-packages/langgraph/pregel/__init__.py", line 2651, in astream
    async for _ in runner.atick(


  File "/home/khteh/.local/share/virtualenvs/rag-agent-YeW3dxEa/lib/python3.12/site-packages/langgraph/prebuilt/chat_agent_executor.py", line 763, in acall_model
    response = cast(AIMessage, await model_runnable.ainvoke(state, config))
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


  File "/home/khteh/.local/share/virtualenvs/rag-agent-YeW3dxEa/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 3089, in ainvoke
    input = await asyncio.create_task(part(), context=context)  # type: ignore
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


  File "/home/khteh/.local/share/virtualenvs/rag-agent-YeW3dxEa/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 5453, in ainvoke
    return await self.bound.ainvoke(
           ^^^^^^^^^^^^^^^^^^^^^^^^^


  File "/home/khteh/.local/share/virtualenvs/rag-agent-YeW3dxEa/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 353, in ainvoke
    llm_result = await self.agenerate_prompt(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^


  File "/home/khteh/.local/share/virtualenvs/rag-agent-YeW3dxEa/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 905, in agenerate_prompt
    return await self.agenerate(
           ^^^^^^^^^^^^^^^^^^^^^


  File "/home/khteh/.local/share/virtualenvs/rag-agent-YeW3dxEa/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 863, in agenerate
    raise exceptions[0]


  File "/home/khteh/.local/share/virtualenvs/rag-agent-YeW3dxEa/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 1033, in _agenerate_with_cache
    result = await self._agenerate(
             ^^^^^^^^^^^^^^^^^^^^^^


  File "/home/khteh/.local/share/virtualenvs/rag-agent-YeW3dxEa/lib/python3.12/site-packages/langchain_ollama/chat_models.py", line 831, in _agenerate
    final_chunk = await self._achat_stream_with_aggregation(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


  File "/home/khteh/.local/share/virtualenvs/rag-agent-YeW3dxEa/lib/python3.12/site-packages/langchain_ollama/chat_models.py", line 667, in _achat_stream_with_aggregation
    async for chunk in self._aiterate_over_stream(messages, stop, **kwargs):


  File "/home/khteh/.local/share/virtualenvs/rag-agent-YeW3dxEa/lib/python3.12/site-packages/langchain_ollama/chat_models.py", line 779, in _aiterate_over_stream
    async for stream_resp in self._acreate_chat_stream(messages, stop, **kwargs):


  File "/home/khteh/.local/share/virtualenvs/rag-agent-YeW3dxEa/lib/python3.12/site-packages/langchain_ollama/chat_models.py", line 615, in _acreate_chat_stream
    async for part in await self._async_client.chat(**chat_params):


  File "/home/khteh/.local/share/virtualenvs/rag-agent-YeW3dxEa/lib/python3.12/site-packages/ollama/_client.py", line 672, in inner
    raise ResponseError(e.response.text, e.response.status_code) from None


ollama._types.ResponseError: model requires more system memory (42.8 GiB) than is available (12.9 GiB) (status code: 500)


During task with name 'agent' and id '4bbb6491-ebeb-bdf4-c08b-5d6501e483f4'

The instance is serving my test llm-rag application which runs one at a time. There is no concurrent applications' request. Why doesn't it recover?

Originally created by @khteh on GitHub (Apr 13, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10256 Inside the k8s pod: ``` root@ollama-0:/# free -h total used free shared buff/cache available Mem: 68Gi 53Gi 821Mi 3.9Gi 13Gi 9.9Gi Swap: 2.0Gi 2.0Gi 0.0Ki root@ollama-0:/# ps -ef UID PID PPID C STIME TTY TIME CMD root 1 0 0 07:44 ? 00:00:00 /bin/bash /usr/local/bin/run.sh bash root 7 1 1 07:44 ? 00:00:54 ollama serve root 20568 7 0 09:01 ? 00:00:00 /usr/bin/ollama runner --model /models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 --ctx-size 8192 --bat root 20609 0 0 09:05 pts/0 00:00:00 bash root 20618 20609 0 09:05 pts/0 00:00:00 ps -ef ``` Exception when trying to run langchain app: ``` ResponseError('model requires more system memory (42.8 GiB) than is available (12.9 GiB)')Traceback (most recent call last): File "/home/khteh/.local/share/virtualenvs/rag-agent-YeW3dxEa/lib/python3.12/site-packages/langgraph/pregel/__init__.py", line 2651, in astream async for _ in runner.atick( File "/home/khteh/.local/share/virtualenvs/rag-agent-YeW3dxEa/lib/python3.12/site-packages/langgraph/prebuilt/chat_agent_executor.py", line 763, in acall_model response = cast(AIMessage, await model_runnable.ainvoke(state, config)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/khteh/.local/share/virtualenvs/rag-agent-YeW3dxEa/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 3089, in ainvoke input = await asyncio.create_task(part(), context=context) # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/khteh/.local/share/virtualenvs/rag-agent-YeW3dxEa/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 5453, in ainvoke return await self.bound.ainvoke( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/khteh/.local/share/virtualenvs/rag-agent-YeW3dxEa/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 353, in ainvoke llm_result = await self.agenerate_prompt( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/khteh/.local/share/virtualenvs/rag-agent-YeW3dxEa/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 905, in agenerate_prompt return await self.agenerate( ^^^^^^^^^^^^^^^^^^^^^ File "/home/khteh/.local/share/virtualenvs/rag-agent-YeW3dxEa/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 863, in agenerate raise exceptions[0] File "/home/khteh/.local/share/virtualenvs/rag-agent-YeW3dxEa/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 1033, in _agenerate_with_cache result = await self._agenerate( ^^^^^^^^^^^^^^^^^^^^^^ File "/home/khteh/.local/share/virtualenvs/rag-agent-YeW3dxEa/lib/python3.12/site-packages/langchain_ollama/chat_models.py", line 831, in _agenerate final_chunk = await self._achat_stream_with_aggregation( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/khteh/.local/share/virtualenvs/rag-agent-YeW3dxEa/lib/python3.12/site-packages/langchain_ollama/chat_models.py", line 667, in _achat_stream_with_aggregation async for chunk in self._aiterate_over_stream(messages, stop, **kwargs): File "/home/khteh/.local/share/virtualenvs/rag-agent-YeW3dxEa/lib/python3.12/site-packages/langchain_ollama/chat_models.py", line 779, in _aiterate_over_stream async for stream_resp in self._acreate_chat_stream(messages, stop, **kwargs): File "/home/khteh/.local/share/virtualenvs/rag-agent-YeW3dxEa/lib/python3.12/site-packages/langchain_ollama/chat_models.py", line 615, in _acreate_chat_stream async for part in await self._async_client.chat(**chat_params): File "/home/khteh/.local/share/virtualenvs/rag-agent-YeW3dxEa/lib/python3.12/site-packages/ollama/_client.py", line 672, in inner raise ResponseError(e.response.text, e.response.status_code) from None ollama._types.ResponseError: model requires more system memory (42.8 GiB) than is available (12.9 GiB) (status code: 500) During task with name 'agent' and id '4bbb6491-ebeb-bdf4-c08b-5d6501e483f4' ``` The instance is serving my test llm-rag application which runs one at a time. There is no concurrent applications' request. Why doesn't it recover?
GiteaMirror added the needs more info label 2026-05-04 15:11:27 -05:00
Author
Owner

@rick-github commented on GitHub (Apr 13, 2025):

Server logs may aid in diagnosis, but as a guess I'd say your app is trying to load the model with a large context buffer, and it exceeds the available resources.

<!-- gh-comment-id:2799882175 --> @rick-github commented on GitHub (Apr 13, 2025): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in diagnosis, but as a guess I'd say your app is trying to load the model with a large context buffer, and it exceeds the available resources.
Author
Owner

@khteh commented on GitHub (Apr 13, 2025):

I will check the logs next time when it happens.

<!-- gh-comment-id:2799890973 --> @khteh commented on GitHub (Apr 13, 2025): I will check the logs next time when it happens.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#68787