[GH-ISSUE #3568] ollama crashed at 0.1.31 - CUDA out of memory #2201

Closed
opened 2026-04-12 12:27:02 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @abnormalboy on GitHub (Apr 10, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3568

Originally assigned to: @mxyng on GitHub.

What is the issue?

When i use langchain in python , the ollama crashed . What model i use is "gemma:7b". when i use "llama2:7b" ollama is normal work. Is my memory is not support? My computer GPU memory is 8GB.

from langchain.llms.ollama import Ollama
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.vectorstores import DocArrayInMemorySearch
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
from langchain_community.embeddings import OllamaEmbeddings
import asyncio
async def main():

    vectorstore = DocArrayInMemorySearch.from_texts(
        ["以可解析的json格式返回", "如{\"hello\":\"world\"}"],
        embedding=OllamaEmbeddings(),
    )

    retriever = vectorstore.as_retriever()

    template = """Answer the question based only on the following context:
    {context}

    Question: {question}
    """
    prompt = ChatPromptTemplate.from_template(template)
    model = Ollama(model="gemma")
    output_parser = StrOutputParser()

    setup_and_retrieval = RunnableParallel(
        {"context": retriever, "question": RunnablePassthrough()}
    )
    chain = setup_and_retrieval | prompt | model | output_parser

    print(chain.input_schema.schema())
    chunks = []
    async for chunk in chain.astream("返回一个苹果"):
        chunks.append(chunk)
        print(chunk, end="", flush=True)
asyncio.run(main())
time=2024-04-10T08:55:13.247+08:00 level=WARN source=server.go:113 msg="server crash 59 - exit code 3221226505 - respawning"

What did you expect to see?

No response

Steps to reproduce

No response

Are there any recent changes that introduced the issue?

No response

OS

Windows

Architecture

amd64

Platform

No response

Ollama version

0.1.31

GPU

Nvidia

GPU info

Snipaste_2024-04-10_08-50-17

CPU

Intel

Other software

No response

Originally created by @abnormalboy on GitHub (Apr 10, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3568 Originally assigned to: @mxyng on GitHub. ### What is the issue? When i use langchain in python , the ollama crashed . What model i use is "gemma:7b". when i use "llama2:7b" ollama is normal work. Is my memory is not support? My computer GPU memory is 8GB. ```pyhon from langchain.llms.ollama import Ollama from langchain_core.prompts import ChatPromptTemplate from langchain_core.output_parsers import StrOutputParser from langchain_community.vectorstores import DocArrayInMemorySearch from langchain_core.runnables import RunnableParallel, RunnablePassthrough from langchain_community.embeddings import OllamaEmbeddings import asyncio async def main(): vectorstore = DocArrayInMemorySearch.from_texts( ["以可解析的json格式返回", "如{\"hello\":\"world\"}"], embedding=OllamaEmbeddings(), ) retriever = vectorstore.as_retriever() template = """Answer the question based only on the following context: {context} Question: {question} """ prompt = ChatPromptTemplate.from_template(template) model = Ollama(model="gemma") output_parser = StrOutputParser() setup_and_retrieval = RunnableParallel( {"context": retriever, "question": RunnablePassthrough()} ) chain = setup_and_retrieval | prompt | model | output_parser print(chain.input_schema.schema()) chunks = [] async for chunk in chain.astream("返回一个苹果"): chunks.append(chunk) print(chunk, end="", flush=True) asyncio.run(main()) ``` ```ollama log time=2024-04-10T08:55:13.247+08:00 level=WARN source=server.go:113 msg="server crash 59 - exit code 3221226505 - respawning" ``` ### What did you expect to see? _No response_ ### Steps to reproduce _No response_ ### Are there any recent changes that introduced the issue? _No response_ ### OS Windows ### Architecture amd64 ### Platform _No response_ ### Ollama version 0.1.31 ### GPU Nvidia ### GPU info <img width="635" alt="Snipaste_2024-04-10_08-50-17" src="https://github.com/ollama/ollama/assets/77949946/0d6a607e-c8ab-4db8-8b01-1c0e9738647b"> ### CPU Intel ### Other software _No response_
GiteaMirror added the bugnvidia labels 2026-04-12 12:27:02 -05:00
Author
Owner

@LukeMauldin commented on GitHub (Apr 10, 2024):

I can confirm on Windows on Ollama version 0.1.31 I am seeing the same behavior with the gemma-7b models. Attaching my app and server log. GPU is an Nvidia RTX 4050 with driver version 546.09.
app.log
server.log

                   ollama list
NAME                            ID              SIZE    MODIFIED
codegemma:7b-code               8dd7a5cc56a5    5.0 GB  50 minutes ago
codegemma:7b-instruct           ca966f70c13f    5.0 GB  50 minutes ago
deepseek-coder:6.7b-instruct    ce298d984115    3.8 GB  44 hours ago
gemma:7b-instruct-v1.1-q4_0     a72c7f4d0a15    5.0 GB  49 minutes ago
mistral:7b-instruct             61e88e884507    4.1 GB  44 hours ago
openchat:7b                     537a4e03b649    4.1 GB  44 hours ago
starling-lm:7b                  39153f619be6    4.1 GB  19 hours ago
<!-- gh-comment-id:2047480189 --> @LukeMauldin commented on GitHub (Apr 10, 2024): I can confirm on Windows on Ollama version 0.1.31 I am seeing the same behavior with the gemma-7b models. Attaching my app and server log. GPU is an Nvidia RTX 4050 with driver version 546.09. [app.log](https://github.com/ollama/ollama/files/14932411/app.log) [server.log](https://github.com/ollama/ollama/files/14932412/server.log) ``` ollama list NAME ID SIZE MODIFIED codegemma:7b-code 8dd7a5cc56a5 5.0 GB 50 minutes ago codegemma:7b-instruct ca966f70c13f 5.0 GB 50 minutes ago deepseek-coder:6.7b-instruct ce298d984115 3.8 GB 44 hours ago gemma:7b-instruct-v1.1-q4_0 a72c7f4d0a15 5.0 GB 49 minutes ago mistral:7b-instruct 61e88e884507 4.1 GB 44 hours ago openchat:7b 537a4e03b649 4.1 GB 44 hours ago starling-lm:7b 39153f619be6 4.1 GB 19 hours ago ```
Author
Owner

@qua1121 commented on GitHub (Apr 15, 2024):

Same here.

<!-- gh-comment-id:2056476611 --> @qua1121 commented on GitHub (Apr 15, 2024): Same here.
Author
Owner

@IHaBiS02 commented on GitHub (Apr 15, 2024):

Same here too.

<!-- gh-comment-id:2057991088 --> @IHaBiS02 commented on GitHub (Apr 15, 2024): Same here too.
Author
Owner

@ycyy commented on GitHub (Apr 18, 2024):

#3232 Update to version 0.1.32 for testing.

<!-- gh-comment-id:2063047142 --> @ycyy commented on GitHub (Apr 18, 2024): #3232 Update to version 0.1.32 for testing.
Author
Owner

@qua1121 commented on GitHub (Apr 19, 2024):

#3232 Update to version 0.1.32 for testing.

It works like a charm! Thanks:-)

<!-- gh-comment-id:2065551718 --> @qua1121 commented on GitHub (Apr 19, 2024): > #3232 Update to version 0.1.32 for testing. It works like a charm! Thanks:-)
Author
Owner

@dhiltgen commented on GitHub (May 5, 2024):

We've fixed a number of bugs in the memory prediction logic recently, so it sounds like we can close this one as fixed. If you're still seeing OOM crashes with 0.1.33, please share your server log so we can see where the prediction logic went wrong.

<!-- gh-comment-id:2094517751 --> @dhiltgen commented on GitHub (May 5, 2024): We've fixed a number of bugs in the memory prediction logic recently, so it sounds like we can close this one as fixed. If you're still seeing OOM crashes with 0.1.33, please share your server log so we can see where the prediction logic went wrong.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#2201