[GH-ISSUE #1513] I don't like the idea that ollama force me to use a server. #822

Closed
opened 2026-04-12 10:29:46 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @franciscoprin on GitHub (Dec 14, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1513

so, if I have a python code that looks like this:

from langchain.schema import (SystemMessage, HumanMessage, AIMessage)
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chat_models import ChatOllama

question = "Could I have GitHub access?"
chat_template = [
    SystemMessage(
        content=(
            "You are a helpful DevOps assistant, rewrite user's questions to only include the websites that they want to access."
        )
    ),
    HumanMessage(content=question),
]

chat_model = ChatOllama(
    # model="llama2:7b-chat",
    model_path="./models/llama-2-7b-chat.Q4_K_M.gguf",
    callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),
)

chat_model(chat_template)

the above source code gives me the following error:

requests. exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x107601090>: Failed to establish a new connection: [Errno 61] Connection refused'))

I don't want my models to be downloaded by Ollama service, I want to use the models that I already had downloaded instead.

Originally created by @franciscoprin on GitHub (Dec 14, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1513 so, if I have a python code that looks like this: ```python from langchain.schema import (SystemMessage, HumanMessage, AIMessage) from langchain.callbacks.manager import CallbackManager from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler from langchain.chat_models import ChatOllama question = "Could I have GitHub access?" chat_template = [ SystemMessage( content=( "You are a helpful DevOps assistant, rewrite user's questions to only include the websites that they want to access." ) ), HumanMessage(content=question), ] chat_model = ChatOllama( # model="llama2:7b-chat", model_path="./models/llama-2-7b-chat.Q4_K_M.gguf", callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]), ) chat_model(chat_template) ``` the above source code gives me the following error: ``` requests. exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x107601090>: Failed to establish a new connection: [Errno 61] Connection refused')) ``` I don't want my models to be downloaded by Ollama service, I want to use the models that I already had downloaded instead.
Author
Owner

@easp commented on GitHub (Dec 14, 2023):

Then import the ones you have downloaded or choose something that is better aligned with your requirements.

Ollama's integrated model management is pretty central to its value proposition.

<!-- gh-comment-id:1855077979 --> @easp commented on GitHub (Dec 14, 2023): Then [import](https://github.com/jmorganca/ollama/blob/main/docs/import.md) the ones you have downloaded or choose something that is better aligned with your requirements. Ollama's integrated model management is pretty central to its value proposition.
Author
Owner

@pdevine commented on GitHub (Mar 12, 2024):

@franciscoprin as @easp mentioned, you can make a Modelfile and then use ollama create to pull the model in. Ollama uses its own "layers" (called "blobs") which stored the various parts of the model such as the weights. What's really nice about this is that if you have multiple models which share the same weights it will automatically deduplicate them on disk. Also, once you've imported your model you can use ollama push to push it to ollama.com and share it with others.

I'm going to go ahead and close out the issue, but feel free to keep commenting if you have any questions.

<!-- gh-comment-id:1989738692 --> @pdevine commented on GitHub (Mar 12, 2024): @franciscoprin as @easp mentioned, you can make a Modelfile and then use `ollama create` to pull the model in. Ollama uses its own "layers" (called "blobs") which stored the various parts of the model such as the weights. What's really nice about this is that if you have multiple models which share the same weights it will automatically deduplicate them on disk. Also, once you've imported your model you can use `ollama push` to push it to `ollama.com` and share it with others. I'm going to go ahead and close out the issue, but feel free to keep commenting if you have any questions.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#822