[GH-ISSUE #7444] "Connection Refused" Issue while running ollama in container with LLM Chat bot app in another docker Container #51243

New Issue

GiteaMirror · 2026-04-28T18:59:21-05:00

GiteaMirror commented

2026-04-28 18:59:21 -05:00

Originally created by @VenturaAI on GitHub (Oct 31, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7444

I have created a local chatbot in python 3.12 that allows user to chat with pdf uploaded by creating embeddings in qdrant vector database and further getting inference from ollama (Model LLama3.2:3B).
In my source code, I am using the following dependencies:

streamlit langchain langchain_community langchain_core python-dotenv langchain-huggingface langchain-qdrant langchain-ollama unstructured[pdf] onnx==1.16.1 qdrant-client torch torchvision torchaudio

Since I want to deploy the code on a server (where there is no dependencies installed), I will be using docker to run the containers for qdrant, chatbotapp and ollama. I have successfully pulled ollama latest image and qdrant using docker.

docker run -d -v D:\myollamamodels:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
docker exec -it ollama ollama run llama3.2:3b

Both Ollama and docker container are running and accessible from within. Checked using Docker desktop aswell. I have also bridged the chatbot app, ollama and qdrant container onto single network using:

docker network connect my_network ollama
docker network connect my_network qdrant

Now when i run the app, it does open and allowing me to upload the pdf, create the embedding and my embeddings are also successfully store din vector DB( I have included relevant print statements which are reflected in app GUI). Now the issue comes when i want to chat with the document, so when i enter the question, it waits and instead of responding with the inference output , it provides me the error: "⚠️ An error occurred while processing your request: [Errno 111] Connection refused".

I have the docker compose file as below:

version: '3.8'

services:
qdrant:
image: qdrant/qdrant:v1.12.1
container_name: qdrant
ports:
- "6333:6333" # Expose Qdrant on the default port
volumes:
- qdrant_data:/qdrant/storage
networks:
- my_network # Connect qdrant to my_network

ollama:
image: ollama/ollama:latest
container_name: ollama
ports:
- "11434:11434" # Expose Ollama on the default port
environment:
- OLLAMA_MODEL=llama3.2:3b
volumes:
- /d/myollamamodels:/models
networks:
- my_network

app:
build: .
container_name: app_new
ports:
- "8501:8501" # Streamlit default port
environment:
QDRANT_URL: http://qdrant:6333 # Use Qdrant service name from Docker Compose
OLLAMA_URL: http://ollama:11434
#OLLAMA_MODEL: http://host.docker.internal:11434/llama3.2:3b # Point to Ollama on host
depends_on:
- qdrant
- ollama
volumes:
- ./models:/models # Mount the model directory for access
networks:
- my_network # Connect app to my_network

volumes:
qdrant_data:

networks:
my_network:
driver: bridge`

The python program and class which I have been using for AI chatbot is as follows:
Streamlit app code and vector embeddings code are in different.py files.

class ChatbotManager:
def __init__(
self,
model_name: str = "BAAI/bge-small-en",
device: str = "cpu",
encode_kwargs: dict = {"normalize_embeddings": True},
llm_model: str = "llama3.2:3b",
#llm_model: str = None, # Set to None to use environment variable
llm_temperature: float = 0.7,
qdrant_url: str = "http://qdrant:6333",
ollama_url: str = "http://ollama:11434", # URL for Ollama inside Docker network
collection_name: str = "vector_db",
):
"""
Initializes the ChatbotManager with embedding models, LLM, and vector store.

    Args:
        model_name (str): The HuggingFace model name for embeddings.
        device (str): The device to run the model on ('cpu' or 'cuda').
        encode_kwargs (dict): Additional keyword arguments for encoding.
        llm_model (str): The local LLM model name for ChatOllama.
        llm_temperature (float): Temperature setting for the LLM.
        qdrant_url (str): The URL for the Qdrant instance.
        collection_name (str): The name of the Qdrant collection.
    """
    self.model_name = model_name
    self.device = device
    self.encode_kwargs = encode_kwargs
    #self.llm_model = llm_model
    # Get the LLM model name from the environment variable
    self.llm_model = os.getenv("OLLAMA_MODEL", llm_model)
    self.llm_temperature = llm_temperature
    self.qdrant_url = qdrant_url
    self.collection_name = collection_name
    self.ollama_url = ollama_url  # Initialize ollama_url

    # Initialize Embeddings
    self.embeddings = HuggingFaceBgeEmbeddings(
        model_name=self.model_name,
        model_kwargs={"device": self.device},
        encode_kwargs=self.encode_kwargs,
    )

    # Initialize Local LLM
    self.llm = ChatOllama(
        model=self.llm_model,
        temperature=self.llm_temperature,
        server_url=self.ollama_url
        # Add other parameters if needed
    )`

    # Define the prompt template
    self.prompt_template = """Use the following pieces of information to answer the user's question.

If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: {context}
Question: {question}

Only return the helpful answer. Answer must be detailed and well explained.
Helpful answer:
"""

    # Initialize Qdrant client
    self.client = QdrantClient(
        url=self.qdrant_url, prefer_grpc=False
    )

    # Initialize the Qdrant vector store
    self.db = Qdrant(
        client=self.client,
        embeddings=self.embeddings,
        collection_name=self.collection_name
    )

    # Initialize the prompt
    self.prompt = PromptTemplate(
        template=self.prompt_template,
        input_variables=['context', 'question']
    )

    # Initialize the retriever
    self.retriever = self.db.as_retriever(search_kwargs={"k": 1})

    # Define chain type kwargs
    self.chain_type_kwargs = {"prompt": self.prompt}

    # Initialize the RetrievalQA chain with return_source_documents=False
    self.qa = RetrievalQA.from_chain_type(
        llm=self.llm,
        chain_type="stuff",
        retriever=self.retriever,
        return_source_documents=False,  # Set to False to return only 'result'
        chain_type_kwargs=self.chain_type_kwargs,
        verbose=False
    )

def get_response(self, query: str) -> str:
    """
    Processes the user's query and returns the chatbot's response.

    Args:
        query (str): The user's input question.

    Returns:
        str: The chatbot's response.
    """
    try:
        response = self.qa.run(query)
        return response  # 'response' is now a string containing only the 'result'
    except Exception as e:
        st.error(f"An error occurred while processing your request: {e}")
        return "Sorry, I couldn't process your request at the moment."`

Logs of app container:

2024-10-30` 16:47:13 2024-10-30 11:17:13.140 Examining the path of torch.classes raised: Tried to instantiate class 'path.path', but it does not exist! Ensure that it is registered via torch::class

2024-10-30` 16:49:55 2024-10-30 11:19:55.974 Examining the path of torch.classes raised: Tried to instantiate class 'path.path', but it does not exist! Ensure that it is registered via torch::class

2024-10-30 16:50:44 /app/chatbot.py:119: LangChainDeprecationWarning: The method Chain.run was deprecated in langchain 0.1.0 and will be removed in 1.0. Use :meth:~invoke instead.

2024-10-30 16:50:44 response = self.qa.run(query)

I have looked into it many times and modified it based on ollama_url and other factors such as checking ollama service availability, ollama container status, modification of yml file, but none seem to work and I am struck at this error. The entire code is working well though within the development environment without docker (and with ollama as service on host) but I need to deploy it at the earliest on a server to make it available on network.

I have checked ollama container service is working on port 11434 (did checked it via url and also via docker command) and qdrant is also working since the embedding are created and are shown via successful message in the APP UI but somehow the connection to ollama is being refused I guess.

Could someone please explain the issue and solution for this problem.
Thanks.

Originally created by @VenturaAI on GitHub (Oct 31, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7444 I have created a **local chatbot** in python 3.12 that allows user to chat with pdf uploaded by creating embeddings in qdrant vector database and further getting inference from ollama (Model LLama3.2:3B). In my source code, I am using the following dependencies: `streamlit langchain langchain_community langchain_core python-dotenv langchain-huggingface langchain-qdrant langchain-ollama unstructured[pdf] onnx==1.16.1 qdrant-client torch torchvision torchaudio` Since I want to deploy the code on a server (where there is no dependencies installed), I will be using docker to run the containers for qdrant, chatbotapp and ollama. I have successfully pulled ollama latest image and qdrant using docker. `docker run -d -v D:\myollamamodels:/root/.ollama -p 11434:11434 --name ollama ollama/ollama` `docker exec -it ollama ollama run llama3.2:3b` Both Ollama and docker container are running and accessible from within. Checked using Docker desktop aswell. I have also bridged the chatbot app, ollama and qdrant container onto single network using: `docker network connect my_network ollama` `docker network connect my_network qdrant` Now when i run the app, it does open and allowing me to upload the pdf, create the embedding and my embeddings are also successfully store din vector DB( I have included relevant print statements which are reflected in app GUI). Now the issue comes when i want to chat with the document, so when i enter the question, it waits and instead of responding with the inference output , it provides me the error: "**⚠️ An error occurred while processing your request: [Errno 111] Connection refused".** **I have the docker compose file as below:** version: '3.8' services: qdrant: image: qdrant/qdrant:v1.12.1 container_name: qdrant ports: - "6333:6333" # Expose Qdrant on the default port volumes: - qdrant_data:/qdrant/storage networks: - my_network # Connect qdrant to my_network ollama: image: ollama/ollama:latest container_name: ollama ports: - "11434:11434" # Expose Ollama on the default port environment: - OLLAMA_MODEL=llama3.2:3b volumes: - /d/myollamamodels:/models networks: - my_network app: build: . container_name: app_new ports: - "8501:8501" # Streamlit default port environment: QDRANT_URL: http://qdrant:6333 # Use Qdrant service name from Docker Compose OLLAMA_URL: http://ollama:11434 #OLLAMA_MODEL: http://host.docker.internal:11434/llama3.2:3b # Point to Ollama on host depends_on: - qdrant - ollama volumes: - ./models:/models # Mount the model directory for access networks: - my_network # Connect app to my_network volumes: qdrant_data: networks: my_network: driver: bridge` **The python program and class which I have been using for AI chatbot is as follows:** Streamlit app code and vector embeddings code are in different.py files. `class ChatbotManager:` `def __init__(` `self,` `model_name: str = "BAAI/bge-small-en",` `device: str = "cpu",` `encode_kwargs: dict = {"normalize_embeddings": True},` `llm_model: str = "llama3.2:3b",` `#llm_model: str = None, # Set to None to use environment variable` `llm_temperature: float = 0.7,` `qdrant_url: str = "http://qdrant:6333",` `ollama_url: str = "http://ollama:11434", # URL for Ollama inside Docker network` `collection_name: str = "vector_db",` `):` ` """` ` Initializes the ChatbotManager with embedding models, LLM, and vector store.` Args: model_name (str): The HuggingFace model name for embeddings. device (str): The device to run the model on ('cpu' or 'cuda'). encode_kwargs (dict): Additional keyword arguments for encoding. llm_model (str): The local LLM model name for ChatOllama. llm_temperature (float): Temperature setting for the LLM. qdrant_url (str): The URL for the Qdrant instance. collection_name (str): The name of the Qdrant collection. """ self.model_name = model_name self.device = device self.encode_kwargs = encode_kwargs #self.llm_model = llm_model # Get the LLM model name from the environment variable self.llm_model = os.getenv("OLLAMA_MODEL", llm_model) self.llm_temperature = llm_temperature self.qdrant_url = qdrant_url self.collection_name = collection_name self.ollama_url = ollama_url # Initialize ollama_url # Initialize Embeddings self.embeddings = HuggingFaceBgeEmbeddings( model_name=self.model_name, model_kwargs={"device": self.device}, encode_kwargs=self.encode_kwargs, ) # Initialize Local LLM self.llm = ChatOllama( model=self.llm_model, temperature=self.llm_temperature, server_url=self.ollama_url # Add other parameters if needed )` # Define the prompt template self.prompt_template = """Use the following pieces of information to answer the user's question. If you don't know the answer, just say that you don't know, don't try to make up an answer. Context: {context} Question: {question} Only return the helpful answer. Answer must be detailed and well explained. Helpful answer: """ # Initialize Qdrant client self.client = QdrantClient( url=self.qdrant_url, prefer_grpc=False ) # Initialize the Qdrant vector store self.db = Qdrant( client=self.client, embeddings=self.embeddings, collection_name=self.collection_name ) # Initialize the prompt self.prompt = PromptTemplate( template=self.prompt_template, input_variables=['context', 'question'] ) # Initialize the retriever self.retriever = self.db.as_retriever(search_kwargs={"k": 1}) # Define chain type kwargs self.chain_type_kwargs = {"prompt": self.prompt} # Initialize the RetrievalQA chain with return_source_documents=False self.qa = RetrievalQA.from_chain_type( llm=self.llm, chain_type="stuff", retriever=self.retriever, return_source_documents=False, # Set to False to return only 'result' chain_type_kwargs=self.chain_type_kwargs, verbose=False ) def get_response(self, query: str) -> str: """ Processes the user's query and returns the chatbot's response. Args: query (str): The user's input question. Returns: str: The chatbot's response. """ try: response = self.qa.run(query) return response # 'response' is now a string containing only the 'result' except Exception as e: st.error(f"An error occurred while processing your request: {e}") return "Sorry, I couldn't process your request at the moment."` **Logs of app container:** > 2024-10-30` 16:47:13 2024-10-30 11:17:13.140 Examining the path of torch.classes raised: Tried to instantiate class '__path__._path', but it does not exist! Ensure that it is registered via torch::class_ > 2024-10-30` 16:49:55 2024-10-30 11:19:55.974 Examining the path of torch.classes raised: Tried to instantiate class '__path__._path', but it does not exist! Ensure that it is registered via torch::class_ > 2024-10-30 16:50:44 /app/chatbot.py:119: LangChainDeprecationWarning: The method `Chain.run` was deprecated in langchain 0.1.0 and will be removed in 1.0. Use :meth:`~invoke` instead. > 2024-10-30 16:50:44 response = self.qa.run(query) I have looked into it many times and modified it based on ollama_url and other factors such as checking ollama service availability, ollama container status, modification of yml file, but none seem to work and I am struck at this error. **The entire code is working well though within the development environment without docker (and with ollama as service on host)** but I need to deploy it at the earliest on a server to make it available on network. I have checked ollama container service is working on port 11434 (did checked it via url and also via docker command) and qdrant is also working since the embedding are created and are shown via successful message in the APP UI but somehow the connection to ollama is being refused I guess. Could someone please explain the issue and solution for this problem. Thanks.

GiteaMirror closed this issue

2026-04-28 18:59:21 -05:00

GiteaMirror commented

2026-04-28 18:59:22 -05:00

@rick-github commented on GitHub (Oct 31, 2024):

--- 7444.py.orig	2024-10-31 11:56:22.897375168 +0100
+++ 7444.py	2024-10-31 11:56:15.330871235 +0100
@@ -45,7 +45,7 @@
     self.llm = ChatOllama(
         model=self.llm_model,
         temperature=self.llm_temperature,
-        server_url=self.ollama_url
+        base_url=self.ollama_url
         # Add other parameters if needed
     )

@rick-github commented on GitHub (Oct 31, 2024): ```diff --- 7444.py.orig 2024-10-31 11:56:22.897375168 +0100 +++ 7444.py 2024-10-31 11:56:15.330871235 +0100 @@ -45,7 +45,7 @@ self.llm = ChatOllama( model=self.llm_model, temperature=self.llm_temperature, - server_url=self.ollama_url + base_url=self.ollama_url # Add other parameters if needed ) ```

GiteaMirror commented

2026-04-28 18:59:23 -05:00

@VenturaAI commented on GitHub (Oct 31, 2024):

Hi, tried implementing the suggested change in the code but it didn't worked and the error is same "Connection refused" when question is asked for chatting with pdf.

@VenturaAI commented on GitHub (Oct 31, 2024): Hi, tried implementing the suggested change in the code but it didn't worked and the error is same "Connection refused" when question is asked for chatting with pdf.

GiteaMirror commented

2026-04-28 18:59:24 -05:00

@rick-github commented on GitHub (Oct 31, 2024):

What's the result of

docker exec -it app_new curl ollama:11434

@rick-github commented on GitHub (Oct 31, 2024): What's the result of ``` docker exec -it app_new curl ollama:11434 ```

GiteaMirror commented

2026-04-28 18:59:24 -05:00

@VenturaAI commented on GitHub (Oct 31, 2024):

Result after the command is executed: Ollama is running

@VenturaAI commented on GitHub (Oct 31, 2024): Result after the command is executed: Ollama is running

GiteaMirror commented

2026-04-28 18:59:25 -05:00

@rick-github commented on GitHub (Oct 31, 2024):

What base image do you use for app_new?

@rick-github commented on GitHub (Oct 31, 2024): What base image do you use for `app_new`?

GiteaMirror commented

2026-04-28 18:59:25 -05:00

@VenturaAI commented on GitHub (Oct 31, 2024):

FROM python:3.12

@VenturaAI commented on GitHub (Oct 31, 2024): FROM python:3.12

GiteaMirror commented

2026-04-28 18:59:26 -05:00

@rick-github commented on GitHub (Oct 31, 2024):

run this:

docker exec -it app_new bash -c 'apt update && apt install -y tcpflow'
docker exec -it app_new tcpflow -c -i any port 11434

and then go to your streamlit ui and run a query. post results, if any.

@rick-github commented on GitHub (Oct 31, 2024): run this: ``` docker exec -it app_new bash -c 'apt update && apt install -y tcpflow' docker exec -it app_new tcpflow -c -i any port 11434 ``` and then go to your streamlit ui and run a query. post results, if any.

GiteaMirror commented

2026-04-28 18:59:28 -05:00

@VenturaAI commented on GitHub (Oct 31, 2024):

After running the tcpflow, the container logs are:

2024-10-3117:03:36 /app/chatbot.py:120: LangChainDeprecationWarning: The methodChain.run was deprecated in langchain 0.1.0 and will be removed in 1.0. Use :meth:~invoke` instead.

2024-10-31` 17:03:36 response = self.qa.run(query)

2024-10-31`` 17:31:21 2024-10-31 12:01:21.796 Examining the path of torch.classes raised: Tried to instantiate class 'path.path', but it does not exist! Ensure that it is registered via torch::class

& the Streamlit UI still shows error:

⚠️ An error occurred while processing your request: [Errno 111] Connection refused
⚠️ Sorry, I couldn't process your request at the moment.

@VenturaAI commented on GitHub (Oct 31, 2024): After running the tcpflow, the container logs are: > 2024-10-31` 17:03:36 /app/chatbot.py:120: LangChainDeprecationWarning: The method `Chain.run` was deprecated in langchain 0.1.0 and will be removed in 1.0. Use :meth:`~invoke` instead. > 2024-10-31` 17:03:36 response = self.qa.run(query) > 2024-10-31`` 17:31:21 2024-10-31 12:01:21.796 Examining the path of torch.classes raised: Tried to instantiate class '__path__._path', but it does not exist! Ensure that it is registered via torch::class_ & the Streamlit UI still shows error: ⚠️ An error occurred while processing your request: [Errno 111] Connection refused ⚠️ Sorry, I couldn't process your request at the moment.

GiteaMirror commented

2026-04-28 18:59:29 -05:00

@VenturaAI commented on GitHub (Oct 31, 2024):

I need to know if the ollama service is running in container and the app in another container then ollama_url shall be localhost:11434 or ollama:11434 ??

@VenturaAI commented on GitHub (Oct 31, 2024): I need to know if the ollama service is running in container and the app in another container then ollama_url shall be localhost:11434 or ollama:11434 ??

GiteaMirror commented

2026-04-28 18:59:29 -05:00

@rick-github commented on GitHub (Oct 31, 2024):

What was the output of tcpflow?

If ollama is running in a container, the client will need to connect to http://$container_name:11434.

@rick-github commented on GitHub (Oct 31, 2024): What was the output of `tcpflow`? If ollama is running in a container, the client will need to connect to http://$container_name:11434.

GiteaMirror commented

2026-04-28 18:59:30 -05:00

@VenturaAI commented on GitHub (Oct 31, 2024):

reportfilename: ./report.xml
tcpflow: listening on any
Nothing is appearing after this in the terminal when query is executed in streamlit UI.

@VenturaAI commented on GitHub (Oct 31, 2024): reportfilename: ./report.xml tcpflow: listening on any Nothing is appearing after this in the terminal when query is executed in streamlit UI.

GiteaMirror commented

2026-04-28 18:59:30 -05:00

@rick-github commented on GitHub (Oct 31, 2024):

The app is not trying to connect to ollama at all, or at least not on port 11434. If you add print(f"ollama_url is {self.ollama_url}") before the call to ChatOllama, what's the result?

@rick-github commented on GitHub (Oct 31, 2024): The app is not trying to connect to ollama at all, or at least not on port 11434. If you add `print(f"ollama_url is {self.ollama_url}")` before the call to `ChatOllama`, what's the result?

GiteaMirror commented

2026-04-28 18:59:30 -05:00

@VenturaAI commented on GitHub (Oct 31, 2024):

Included the above statement, Getting the same logs from container console:
2024-10-31 18:01:16 self.db = Qdrant(
2024-10-31 18:01:17 2024-10-31 12:31:17.078 Examining the path of torch.classes raised: Tried to instantiate class 'path.path', but it does not exist! Ensure that it is registered via torch::class
2024-10-31 18:01:37 /app/chatbot.py:123: LangChainDeprecationWarning: The method Chain.run was deprecated in langchain 0.1.0 and will be removed in 1.0. Use :meth:~invoke instead.
2024-10-31 18:01:37 response = self.qa.run(query)

After creating the db in qdrant it shall invoke chatbot.py. Below is the snippet from APP UI CODE:

`# User input
if user_input := st.chat_input("Type your message here..."):
# Display user message
st.chat_message("user").markdown(user_input)
st.session_state['messages'].append({"role": "user", "content": user_input})

            with st.spinner("🤖 Responding..."):
                try:
                    # Get the chatbot response using the ChatbotManager
                    answer = st.session_state['chatbot_manager'].get_response(user_input)
                    time.sleep(1)  # Simulate processing time
                except Exception as e:
                    answer = f"⚠️ An error occurred while processing your request: {e}"`

@VenturaAI commented on GitHub (Oct 31, 2024): Included the above statement, Getting the same logs from container console: 2024-10-31 18:01:16 self.db = Qdrant( 2024-10-31 18:01:17 2024-10-31 12:31:17.078 Examining the path of torch.classes raised: Tried to instantiate class '__path__._path', but it does not exist! Ensure that it is registered via torch::class_ 2024-10-31 18:01:37 /app/chatbot.py:123: LangChainDeprecationWarning: The method `Chain.run` was deprecated in langchain 0.1.0 and will be removed in 1.0. Use :meth:`~invoke` instead. 2024-10-31 18:01:37 response = self.qa.run(query) After creating the db in qdrant it shall invoke chatbot.py. Below is the snippet from APP UI CODE: `# User input if user_input := st.chat_input("Type your message here..."): # Display user message st.chat_message("user").markdown(user_input) st.session_state['messages'].append({"role": "user", "content": user_input}) with st.spinner("🤖 Responding..."): try: # Get the chatbot response using the ChatbotManager answer = st.session_state['chatbot_manager'].get_response(user_input) time.sleep(1) # Simulate processing time except Exception as e: answer = f"⚠️ An error occurred while processing your request: {e}"`

GiteaMirror commented

2026-04-28 18:59:31 -05:00

@rick-github commented on GitHub (Oct 31, 2024):

I don't see ollama_url is ... in the log.

Can I ask that you wrap code and log snippets in a markdown code block, three backticks (```) on a line at the start and again at the end. It makes it much easier to read the code if it's properly formatted.

@rick-github commented on GitHub (Oct 31, 2024): I don't see `ollama_url is ...` in the log. Can I ask that you wrap code and log snippets in a markdown code block, three backticks (\`\`\`) on a line at the start and again at the end. It makes it much easier to read the code if it's properly formatted.

GiteaMirror commented

2026-04-28 18:59:32 -05:00

@VenturaAI commented on GitHub (Oct 31, 2024):

Sure here's the code :

# chatbot.py

import os
from langchain_community.embeddings import HuggingFaceBgeEmbeddings
from langchain_community.vectorstores import Qdrant
from langchain_ollama import ChatOllama
from qdrant_client import QdrantClient
from langchain import PromptTemplate
from langchain.chains import RetrievalQA
import streamlit as st

class ChatbotManager:
   def __init__(
       self,
       model_name: str = "BAAI/bge-small-en",
       device: str = "cpu",
       encode_kwargs: dict = {"normalize_embeddings": True},
       llm_model: str = "llama3.2:3b",
       #llm_model: str = None,  # Set to None to use environment variable
       llm_temperature: float = 0.7,
       qdrant_url: str = "http://qdrant:6333",
       ollama_url: str = "http://ollama:11434",  # URL for Ollama inside Docker network
       collection_name: str = "vector_db",
   ):
       """
       Initializes the ChatbotManager with embedding models, LLM, and vector store.

       Args:
           model_name (str): The HuggingFace model name for embeddings.
           device (str): The device to run the model on ('cpu' or 'cuda').
           encode_kwargs (dict): Additional keyword arguments for encoding.
           llm_model (str): The local LLM model name for ChatOllama.
           llm_temperature (float): Temperature setting for the LLM.
           qdrant_url (str): The URL for the Qdrant instance.
           collection_name (str): The name of the Qdrant collection.
       """
       self.model_name = model_name
       self.device = device
       self.encode_kwargs = encode_kwargs
       #self.llm_model = llm_model
       # Get the LLM model name from the environment variable
       self.llm_model = os.getenv("OLLAMA_MODEL", llm_model)
       self.llm_temperature = llm_temperature
       self.qdrant_url = qdrant_url
       self.collection_name = collection_name
       self.ollama_url = ollama_url  # Initialize ollama_url

       # Initialize Embeddings
       self.embeddings = HuggingFaceBgeEmbeddings(
           model_name=self.model_name,
           model_kwargs={"device": self.device},
           encode_kwargs=self.encode_kwargs,
       )
       
       # Print the ollama_url for debugging purposes
       print(f"ollama_url is {self.ollama_url}")

       # Initialize Local LLM
       self.llm = ChatOllama(
           model=self.llm_model,
           temperature=self.llm_temperature,
           #server_url=self.ollama_url
           base_url=self.ollama_url
           # Add other parameters if needed
       )

       # Define the prompt template
       self.prompt_template = """Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: {context}
Question: {question}

Only return the helpful answer. Answer must be detailed and well explained.
Helpful answer:
"""

       # Initialize Qdrant client
       self.client = QdrantClient(
           url=self.qdrant_url, prefer_grpc=False
       )

       # Initialize the Qdrant vector store
       self.db = Qdrant(
           client=self.client,
           embeddings=self.embeddings,
           collection_name=self.collection_name
       )

       # Initialize the prompt
       self.prompt = PromptTemplate(
           template=self.prompt_template,
           input_variables=['context', 'question']
       )

       # Initialize the retriever
       self.retriever = self.db.as_retriever(search_kwargs={"k": 1})

       # Define chain type kwargs
       self.chain_type_kwargs = {"prompt": self.prompt}

       # Initialize the RetrievalQA chain with return_source_documents=False
       self.qa = RetrievalQA.from_chain_type(
           llm=self.llm,
           chain_type="stuff",
           retriever=self.retriever,
           return_source_documents=False,  # Set to False to return only 'result'
           chain_type_kwargs=self.chain_type_kwargs,
           verbose=False
       )

   def get_response(self, query: str) -> str:
       """
       Processes the user's query and returns the chatbot's response.

       Args:
           query (str): The user's input question.

       Returns:
           str: The chatbot's response.
       """
       try:
           response = self.qa.run(query)
           return response  # 'response' is now a string containing only the 'result'
       except Exception as e:
           st.error(f"⚠️ An error occurred while processing your request: {e}")
           return "⚠️ Sorry, I couldn't process your request at the moment."

Streamlit app code:

# app.py

import streamlit as st
from streamlit import session_state
import time
import base64
import os
from vectors import EmbeddingsManager  # Import the EmbeddingsManager class
from chatbot import ChatbotManager     # Import the ChatbotManager class

# Function to display the PDF of a given file
def displayPDF(file):
    # Reading the uploaded file
    base64_pdf = base64.b64encode(file.read()).decode('utf-8')

    # Embedding PDF in HTML
    pdf_display = f'<iframe src="data:application/pdf;base64,{base64_pdf}" width="100%" height="600" type="application/pdf"></iframe>'

    # Displaying the PDF
    st.markdown(pdf_display, unsafe_allow_html=True)

# Initialize session_state variables if not already present
if 'temp_pdf_path' not in st.session_state:
    st.session_state['temp_pdf_path'] = None

if 'chatbot_manager' not in st.session_state:
    st.session_state['chatbot_manager'] = None

if 'messages' not in st.session_state:
    st.session_state['messages'] = []

# Set the page configuration to wide layout and add a title
st.set_page_config(
    page_title="DocBot AI App",
    layout="wide",
    initial_sidebar_state="expanded",
)
#st.markdown("<h1 style='color: #1c5384;'>DocBot AI App</h1>", unsafe_allow_html=True)

# Sidebar
with st.sidebar:
    
    st.markdown("### 📚 Your Personal Document Assistant")
    st.markdown ("#### Powered by AI 🚀🚀")
    st.markdown("---")
    
    # Navigation Menu
    menu = ["🏠 Home", "🤖 Chatbot"]
    choice = st.selectbox("Navigate", menu)

# Home Page
if choice == "🏠 Home":
    #st.title("📄 DocBot AI App")
    st.markdown("<h1 style='color: #1c5384;'> 📄 DocBot AI App</h1>", unsafe_allow_html=True)
    st.markdown("""
    Welcome to **DocBot AI App**!
    """)

# Chatbot Page
elif choice == "🤖 Chatbot":
    #st.title("🤖 AI Chatbot Interface")
    st.markdown("<h1 style='color: #1c5384;'> 🤖 AI Chatbot Interface</h1>", unsafe_allow_html=True)
    st.markdown("---")
    
    # Create three columns
    col1, col2, col3 = st.columns(3)

    # Column 1: File Uploader and Preview
    with col1:
        st.header("📂 Upload Document")
        #st.markdown("<h1 style='color: #1c5384;'> ���� Upload Document</h1>", unsafe_allow_html=True)
        uploaded_file = st.file_uploader("Upload a PDF", type=["pdf"])
        if uploaded_file is not None:
            st.success("📄 File Uploaded Successfully!")
            # Display file name and size
            st.markdown(f"**Filename:** {uploaded_file.name}")
            st.markdown(f"**File Size:** {uploaded_file.size} bytes")
            
            # Display PDF preview using displayPDF function
            st.markdown("### 📖 PDF Preview")
            displayPDF(uploaded_file)
            
            # Save the uploaded file to a temporary location
            temp_pdf_path = "temp.pdf"
            with open(temp_pdf_path, "wb") as f:
                f.write(uploaded_file.getbuffer())
            
            # Store the temp_pdf_path in session_state
            st.session_state['temp_pdf_path'] = temp_pdf_path

    # Column 2: Create Embeddings
    with col2:
        st.header("🧠 Process the PDF ")
        create_embeddings = st.checkbox("✅ Create Embeddings")
        if create_embeddings:
            if st.session_state['temp_pdf_path'] is None:
                st.warning("⚠️ Please upload a PDF first.")
            else:
                try:
                    # Initialize the EmbeddingsManager
                    embeddings_manager = EmbeddingsManager(
                        model_name="BAAI/bge-small-en",
                        device="cpu",
                        encode_kwargs={"normalize_embeddings": True},
                        qdrant_url="http://qdrant:6333",
                        collection_name="vector_db"
                    )
                    
                    with st.spinner("🔄 Embeddings are in process..."):
                        # Create embeddings
                        result = embeddings_manager.create_embeddings(st.session_state['temp_pdf_path'])
                        time.sleep(1)  # Optional: To show spinner for a bit longer
                    st.success(result)
                    
                    # Initialize the ChatbotManager after embeddings are created
                    if st.session_state['chatbot_manager'] is None:
                        st.session_state['chatbot_manager'] = ChatbotManager(
                            model_name="BAAI/bge-small-en",
                            device="cpu",
                            encode_kwargs={"normalize_embeddings": True},
                            llm_model="llama3.2:3b",
                            llm_temperature=0.7,
                            qdrant_url="http://qdrant:6333",
                            collection_name="vector_db",
                            ollama_url = "http://ollama:11434"
                        )
                    
                except FileNotFoundError as fnf_error:
                    st.error(fnf_error)
                except ValueError as val_error:
                    st.error(val_error)
                except ConnectionError as conn_error:
                    st.error(conn_error)
                except Exception as e:
                    st.error(f"An unexpected error occurred: {e}")

    # Column 3: Chatbot Interface
    with col3:
        st.header("💬 Chat with Document")
        
        if st.session_state['chatbot_manager'] is None:
            st.info("🤖 Please upload a PDF and create embeddings to start chatting.")
        else:
            # Display existing messages
            for msg in st.session_state['messages']:
                st.chat_message(msg['role']).markdown(msg['content'])

            # User input
            if user_input := st.chat_input("Type your message here..."):
                # Display user message
                st.chat_message("user").markdown(user_input)
                st.session_state['messages'].append({"role": "user", "content": user_input})

                with st.spinner("🤖 Responding..."):
                    try:
                        # Get the chatbot response using the ChatbotManager
                        answer = st.session_state['chatbot_manager'].get_response(user_input)
                        time.sleep(1)  # Simulate processing time
                    except Exception as e:
                        answer = f"⚠️ An error occurred while processing your request: {e}"
                
                # Display chatbot message
                st.chat_message("assistant").markdown(answer)
                st.session_state['messages'].append({"role": "assistant", "content": answer})



# Footer
st.markdown("---")```

@VenturaAI commented on GitHub (Oct 31, 2024): Sure here's the code : ``` # chatbot.py import os from langchain_community.embeddings import HuggingFaceBgeEmbeddings from langchain_community.vectorstores import Qdrant from langchain_ollama import ChatOllama from qdrant_client import QdrantClient from langchain import PromptTemplate from langchain.chains import RetrievalQA import streamlit as st class ChatbotManager: def __init__( self, model_name: str = "BAAI/bge-small-en", device: str = "cpu", encode_kwargs: dict = {"normalize_embeddings": True}, llm_model: str = "llama3.2:3b", #llm_model: str = None, # Set to None to use environment variable llm_temperature: float = 0.7, qdrant_url: str = "http://qdrant:6333", ollama_url: str = "http://ollama:11434", # URL for Ollama inside Docker network collection_name: str = "vector_db", ): """ Initializes the ChatbotManager with embedding models, LLM, and vector store. Args: model_name (str): The HuggingFace model name for embeddings. device (str): The device to run the model on ('cpu' or 'cuda'). encode_kwargs (dict): Additional keyword arguments for encoding. llm_model (str): The local LLM model name for ChatOllama. llm_temperature (float): Temperature setting for the LLM. qdrant_url (str): The URL for the Qdrant instance. collection_name (str): The name of the Qdrant collection. """ self.model_name = model_name self.device = device self.encode_kwargs = encode_kwargs #self.llm_model = llm_model # Get the LLM model name from the environment variable self.llm_model = os.getenv("OLLAMA_MODEL", llm_model) self.llm_temperature = llm_temperature self.qdrant_url = qdrant_url self.collection_name = collection_name self.ollama_url = ollama_url # Initialize ollama_url # Initialize Embeddings self.embeddings = HuggingFaceBgeEmbeddings( model_name=self.model_name, model_kwargs={"device": self.device}, encode_kwargs=self.encode_kwargs, ) # Print the ollama_url for debugging purposes print(f"ollama_url is {self.ollama_url}") # Initialize Local LLM self.llm = ChatOllama( model=self.llm_model, temperature=self.llm_temperature, #server_url=self.ollama_url base_url=self.ollama_url # Add other parameters if needed ) # Define the prompt template self.prompt_template = """Use the following pieces of information to answer the user's question. If you don't know the answer, just say that you don't know, don't try to make up an answer. Context: {context} Question: {question} Only return the helpful answer. Answer must be detailed and well explained. Helpful answer: """ # Initialize Qdrant client self.client = QdrantClient( url=self.qdrant_url, prefer_grpc=False ) # Initialize the Qdrant vector store self.db = Qdrant( client=self.client, embeddings=self.embeddings, collection_name=self.collection_name ) # Initialize the prompt self.prompt = PromptTemplate( template=self.prompt_template, input_variables=['context', 'question'] ) # Initialize the retriever self.retriever = self.db.as_retriever(search_kwargs={"k": 1}) # Define chain type kwargs self.chain_type_kwargs = {"prompt": self.prompt} # Initialize the RetrievalQA chain with return_source_documents=False self.qa = RetrievalQA.from_chain_type( llm=self.llm, chain_type="stuff", retriever=self.retriever, return_source_documents=False, # Set to False to return only 'result' chain_type_kwargs=self.chain_type_kwargs, verbose=False ) def get_response(self, query: str) -> str: """ Processes the user's query and returns the chatbot's response. Args: query (str): The user's input question. Returns: str: The chatbot's response. """ try: response = self.qa.run(query) return response # 'response' is now a string containing only the 'result' except Exception as e: st.error(f"⚠️ An error occurred while processing your request: {e}") return "⚠️ Sorry, I couldn't process your request at the moment." ``` Streamlit app code: ``` # app.py import streamlit as st from streamlit import session_state import time import base64 import os from vectors import EmbeddingsManager # Import the EmbeddingsManager class from chatbot import ChatbotManager # Import the ChatbotManager class # Function to display the PDF of a given file def displayPDF(file): # Reading the uploaded file base64_pdf = base64.b64encode(file.read()).decode('utf-8') # Embedding PDF in HTML pdf_display = f'<iframe src="data:application/pdf;base64,{base64_pdf}" width="100%" height="600" type="application/pdf"></iframe>' # Displaying the PDF st.markdown(pdf_display, unsafe_allow_html=True) # Initialize session_state variables if not already present if 'temp_pdf_path' not in st.session_state: st.session_state['temp_pdf_path'] = None if 'chatbot_manager' not in st.session_state: st.session_state['chatbot_manager'] = None if 'messages' not in st.session_state: st.session_state['messages'] = [] # Set the page configuration to wide layout and add a title st.set_page_config( page_title="DocBot AI App", layout="wide", initial_sidebar_state="expanded", ) #st.markdown("<h1 style='color: #1c5384;'>DocBot AI App</h1>", unsafe_allow_html=True) # Sidebar with st.sidebar: st.markdown("### 📚 Your Personal Document Assistant") st.markdown ("#### Powered by AI 🚀🚀") st.markdown("---") # Navigation Menu menu = ["🏠 Home", "🤖 Chatbot"] choice = st.selectbox("Navigate", menu) # Home Page if choice == "🏠 Home": #st.title("📄 DocBot AI App") st.markdown("<h1 style='color: #1c5384;'> 📄 DocBot AI App</h1>", unsafe_allow_html=True) st.markdown(""" Welcome to **DocBot AI App**! """) # Chatbot Page elif choice == "🤖 Chatbot": #st.title("🤖 AI Chatbot Interface") st.markdown("<h1 style='color: #1c5384;'> 🤖 AI Chatbot Interface</h1>", unsafe_allow_html=True) st.markdown("---") # Create three columns col1, col2, col3 = st.columns(3) # Column 1: File Uploader and Preview with col1: st.header("📂 Upload Document") #st.markdown("<h1 style='color: #1c5384;'> �� Upload Document</h1>", unsafe_allow_html=True) uploaded_file = st.file_uploader("Upload a PDF", type=["pdf"]) if uploaded_file is not None: st.success("📄 File Uploaded Successfully!") # Display file name and size st.markdown(f"**Filename:** {uploaded_file.name}") st.markdown(f"**File Size:** {uploaded_file.size} bytes") # Display PDF preview using displayPDF function st.markdown("### 📖 PDF Preview") displayPDF(uploaded_file) # Save the uploaded file to a temporary location temp_pdf_path = "temp.pdf" with open(temp_pdf_path, "wb") as f: f.write(uploaded_file.getbuffer()) # Store the temp_pdf_path in session_state st.session_state['temp_pdf_path'] = temp_pdf_path # Column 2: Create Embeddings with col2: st.header("🧠 Process the PDF ") create_embeddings = st.checkbox("✅ Create Embeddings") if create_embeddings: if st.session_state['temp_pdf_path'] is None: st.warning("⚠️ Please upload a PDF first.") else: try: # Initialize the EmbeddingsManager embeddings_manager = EmbeddingsManager( model_name="BAAI/bge-small-en", device="cpu", encode_kwargs={"normalize_embeddings": True}, qdrant_url="http://qdrant:6333", collection_name="vector_db" ) with st.spinner("🔄 Embeddings are in process..."): # Create embeddings result = embeddings_manager.create_embeddings(st.session_state['temp_pdf_path']) time.sleep(1) # Optional: To show spinner for a bit longer st.success(result) # Initialize the ChatbotManager after embeddings are created if st.session_state['chatbot_manager'] is None: st.session_state['chatbot_manager'] = ChatbotManager( model_name="BAAI/bge-small-en", device="cpu", encode_kwargs={"normalize_embeddings": True}, llm_model="llama3.2:3b", llm_temperature=0.7, qdrant_url="http://qdrant:6333", collection_name="vector_db", ollama_url = "http://ollama:11434" ) except FileNotFoundError as fnf_error: st.error(fnf_error) except ValueError as val_error: st.error(val_error) except ConnectionError as conn_error: st.error(conn_error) except Exception as e: st.error(f"An unexpected error occurred: {e}") # Column 3: Chatbot Interface with col3: st.header("💬 Chat with Document") if st.session_state['chatbot_manager'] is None: st.info("🤖 Please upload a PDF and create embeddings to start chatting.") else: # Display existing messages for msg in st.session_state['messages']: st.chat_message(msg['role']).markdown(msg['content']) # User input if user_input := st.chat_input("Type your message here..."): # Display user message st.chat_message("user").markdown(user_input) st.session_state['messages'].append({"role": "user", "content": user_input}) with st.spinner("🤖 Responding..."): try: # Get the chatbot response using the ChatbotManager answer = st.session_state['chatbot_manager'].get_response(user_input) time.sleep(1) # Simulate processing time except Exception as e: answer = f"⚠️ An error occurred while processing your request: {e}" # Display chatbot message st.chat_message("assistant").markdown(answer) st.session_state['messages'].append({"role": "assistant", "content": answer}) # Footer st.markdown("---")```

GiteaMirror commented

2026-04-28 18:59:32 -05:00

@rick-github commented on GitHub (Oct 31, 2024):

I modified the docker compose config from the first post:

--- docker-compose.yaml.orig	2024-10-31 22:17:26.736521468 +0100
+++ docker-compose.yaml	2024-10-31 22:09:19.128216925 +0100
@@ -19,7 +19,7 @@
     environment:
       - OLLAMA_MODEL=llama3.2:3b
     volumes:
-      - /d/myollamamodels:/models
+      - ./myollamamodels:/root/.ollama
     networks:
       - my_network

and added a new function to app.py to make it easier to test the connection to ollama:

--- app.py.orig	2024-10-31 17:45:52.477968011 +0100
+++ app.py	2024-10-31 21:28:32.309133808 +0100
@@ -29,6 +29,9 @@
 if 'messages' not in st.session_state:
     st.session_state['messages'] = []
 
+ollama_url = os.getenv("OLLAMA_URL", "http://ollama:11434")
+qdrant_url = os.getenv("QDRANT_URL", "http://qdrant:6333")
+
 # Set the page configuration to wide layout and add a title
 st.set_page_config(
     page_title="DocBot AI App",
@@ -45,7 +48,7 @@
     st.markdown("---")
     
     # Navigation Menu
-    menu = ["🏠 Home", "🤖 Chatbot"]
+    menu = ["🏠 Home", "🤖 Chatbot", "Ollama chat"]
     choice = st.selectbox("Navigate", menu)
 
 # Home Page
@@ -102,7 +105,7 @@
                         model_name="BAAI/bge-small-en",
                         device="cpu",
                         encode_kwargs={"normalize_embeddings": True},
-                        qdrant_url="http://qdrant:6333",
+                        qdrant_url=qdrant_url,
                         collection_name="vector_db"
                     )
                     
@@ -120,9 +123,9 @@
                             encode_kwargs={"normalize_embeddings": True},
                             llm_model="llama3.2:3b",
                             llm_temperature=0.7,
-                            qdrant_url="http://qdrant:6333",
+                            qdrant_url=qdrant_url,
                             collection_name="vector_db",
-                            ollama_url = "http://ollama:11434"
+                            ollama_url = ollama_url
                         )
                     
                 except FileNotFoundError as fnf_error:
@@ -163,6 +166,46 @@
                 st.chat_message("assistant").markdown(answer)
                 st.session_state['messages'].append({"role": "assistant", "content": answer})
 
+elif choice == "Ollama chat":
+    st.markdown("<h1 style='color: #1c5384;'> 🤖 Ollama Chat Interface</h1>", unsafe_allow_html=True)
+    st.markdown("---")
+    
+    if st.session_state['chatbot_manager'] is None:
+        st.session_state['chatbot_manager'] = ChatbotManager(
+            model_name="BAAI/bge-small-en",
+            device="cpu",
+            encode_kwargs={"normalize_embeddings": True},
+            llm_model="llama3.2:3b",
+            llm_temperature=0.7,
+            qdrant_url=qdrant_url,
+            collection_name="vector_db",
+            ollama_url = ollama_url
+        )
+
+    st.header("💬 Chat with ollama")
+    history = st.container(height=400)
+
+    # Display existing messages
+    for msg in st.session_state['messages']:
+        history.chat_message(msg['role']).markdown(msg['content'])
+
+    # User input
+    if user_input := st.chat_input("Type your message here..."):
+        # Display user message
+        history.chat_message("user").markdown(user_input)
+        st.session_state['messages'].append({"role": "user", "content": user_input})
+
+        with st.spinner("🤖 Responding..."):
+            try:
+                # Get the chatbot response using the ChatbotManager
+                answer = st.session_state['chatbot_manager'].llm.invoke(st.session_state['messages']).content
+                time.sleep(1)  # Simulate processing time
+            except Exception as e:
+                answer = f"⚠️ An error occurred while processing your request: {e}"
+        
+        # Display chatbot message
+        history.chat_message("assistant").markdown(answer)
+        st.session_state['messages'].append({"role": "assistant", "content": answer})
 
 
 # Footer

This works as expected.

@rick-github commented on GitHub (Oct 31, 2024): I modified the docker compose config from the first post: ```diff --- docker-compose.yaml.orig 2024-10-31 22:17:26.736521468 +0100 +++ docker-compose.yaml 2024-10-31 22:09:19.128216925 +0100 @@ -19,7 +19,7 @@ environment: - OLLAMA_MODEL=llama3.2:3b volumes: - - /d/myollamamodels:/models + - ./myollamamodels:/root/.ollama networks: - my_network ``` and added a new function to app.py to make it easier to test the connection to ollama: ```diff --- app.py.orig 2024-10-31 17:45:52.477968011 +0100 +++ app.py 2024-10-31 21:28:32.309133808 +0100 @@ -29,6 +29,9 @@ if 'messages' not in st.session_state: st.session_state['messages'] = [] +ollama_url = os.getenv("OLLAMA_URL", "http://ollama:11434") +qdrant_url = os.getenv("QDRANT_URL", "http://qdrant:6333") + # Set the page configuration to wide layout and add a title st.set_page_config( page_title="DocBot AI App", @@ -45,7 +48,7 @@ st.markdown("---") # Navigation Menu - menu = ["🏠 Home", "🤖 Chatbot"] + menu = ["🏠 Home", "🤖 Chatbot", "Ollama chat"] choice = st.selectbox("Navigate", menu) # Home Page @@ -102,7 +105,7 @@ model_name="BAAI/bge-small-en", device="cpu", encode_kwargs={"normalize_embeddings": True}, - qdrant_url="http://qdrant:6333", + qdrant_url=qdrant_url, collection_name="vector_db" ) @@ -120,9 +123,9 @@ encode_kwargs={"normalize_embeddings": True}, llm_model="llama3.2:3b", llm_temperature=0.7, - qdrant_url="http://qdrant:6333", + qdrant_url=qdrant_url, collection_name="vector_db", - ollama_url = "http://ollama:11434" + ollama_url = ollama_url ) except FileNotFoundError as fnf_error: @@ -163,6 +166,46 @@ st.chat_message("assistant").markdown(answer) st.session_state['messages'].append({"role": "assistant", "content": answer}) +elif choice == "Ollama chat": + st.markdown("<h1 style='color: #1c5384;'> 🤖 Ollama Chat Interface</h1>", unsafe_allow_html=True) + st.markdown("---") + + if st.session_state['chatbot_manager'] is None: + st.session_state['chatbot_manager'] = ChatbotManager( + model_name="BAAI/bge-small-en", + device="cpu", + encode_kwargs={"normalize_embeddings": True}, + llm_model="llama3.2:3b", + llm_temperature=0.7, + qdrant_url=qdrant_url, + collection_name="vector_db", + ollama_url = ollama_url + ) + + st.header("💬 Chat with ollama") + history = st.container(height=400) + + # Display existing messages + for msg in st.session_state['messages']: + history.chat_message(msg['role']).markdown(msg['content']) + + # User input + if user_input := st.chat_input("Type your message here..."): + # Display user message + history.chat_message("user").markdown(user_input) + st.session_state['messages'].append({"role": "user", "content": user_input}) + + with st.spinner("🤖 Responding..."): + try: + # Get the chatbot response using the ChatbotManager + answer = st.session_state['chatbot_manager'].llm.invoke(st.session_state['messages']).content + time.sleep(1) # Simulate processing time + except Exception as e: + answer = f"⚠️ An error occurred while processing your request: {e}" + + # Display chatbot message + history.chat_message("assistant").markdown(answer) + st.session_state['messages'].append({"role": "assistant", "content": answer}) # Footer ``` This works as expected. ![Screenshot 2024-10-31 22 36 20](https://github.com/user-attachments/assets/dcabdfbd-dd9f-45ca-9e06-edaedfeedb5f)

GiteaMirror commented

2026-04-28 18:59:33 -05:00

@VenturaAI commented on GitHub (Nov 1, 2024):

Hi, Thanks, I tried the above changes except for test function. It kind of throws an error when container is launched : ModuleNotFoundError: No module named 'langchain_ollama'. This issue was not there prior to changes.

Below is the Code for Embeddings as Well. Can you check after qdrant DB step being successfull , why ollama is not getting connected for Chat.

# vectors.py

import os
import base64
from langchain_community.document_loaders import UnstructuredPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceBgeEmbeddings
from langchain_community.vectorstores import Qdrant

class EmbeddingsManager:
    def __init__(
        self,
        model_name: str = "BAAI/bge-small-en",
        device: str = "cpu",
        encode_kwargs: dict = {"normalize_embeddings": True},
        qdrant_url: str = "http://qdrant:6333",
        collection_name: str = "vector_db",
    ):
        """
        Initializes the EmbeddingsManager with the specified model and Qdrant settings.

        Args:
            model_name (str): The HuggingFace model name for embeddings.
            device (str): The device to run the model on ('cpu' or 'cuda').
            encode_kwargs (dict): Additional keyword arguments for encoding.
            qdrant_url (str): The URL for the Qdrant instance.
            collection_name (str): The name of the Qdrant collection.
        """
        self.model_name = model_name
        self.device = device
        self.encode_kwargs = encode_kwargs
        self.qdrant_url = qdrant_url
        self.collection_name = collection_name

        self.embeddings = HuggingFaceBgeEmbeddings(
            model_name=self.model_name,
            model_kwargs={"device": self.device},
            encode_kwargs=self.encode_kwargs,
        )

    def create_embeddings(self, pdf_path: str):
        """
        Processes the PDF, creates embeddings, and stores them in Qdrant.

        Args:
            pdf_path (str): The file path to the PDF document.

        Returns:
            str: Success message upon completion.
        """
        if not os.path.exists(pdf_path):
            raise FileNotFoundError(f"The file {pdf_path} does not exist.")

        # Load and preprocess the document
        loader = UnstructuredPDFLoader(pdf_path)
        docs = loader.load()
        if not docs:
            raise ValueError("No documents were loaded from the PDF.")

        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000, chunk_overlap=250
        )
        splits = text_splitter.split_documents(docs)
        if not splits:
            raise ValueError("No text chunks were created from the documents.")

        # Create and store embeddings in Qdrant
        try:
            qdrant = Qdrant.from_documents(
                splits,
                self.embeddings,
                url=self.qdrant_url,
                prefer_grpc=False,
                collection_name=self.collection_name,
            )
        except Exception as e:
            raise ConnectionError(f"Failed to connect to Qdrant: {e}")

        return "✅ Vector DB Successfully Created and Stored in Qdrant!"

I am still getting the same error of connection refused when containers are executed. Also did you tried the execution from docker with individual containers of ollama, webapp or through python terminal since from python directly, it was working fine earlier as well.

@VenturaAI commented on GitHub (Nov 1, 2024): Hi, Thanks, I tried the above changes except for test function. It kind of throws an error when container is launched : ModuleNotFoundError: No module named 'langchain_ollama'. This issue was not there prior to changes. Below is the Code for Embeddings as Well. Can you check after qdrant DB step being successfull , why ollama is not getting connected for Chat. ``` # vectors.py import os import base64 from langchain_community.document_loaders import UnstructuredPDFLoader from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_community.embeddings import HuggingFaceBgeEmbeddings from langchain_community.vectorstores import Qdrant class EmbeddingsManager: def __init__( self, model_name: str = "BAAI/bge-small-en", device: str = "cpu", encode_kwargs: dict = {"normalize_embeddings": True}, qdrant_url: str = "http://qdrant:6333", collection_name: str = "vector_db", ): """ Initializes the EmbeddingsManager with the specified model and Qdrant settings. Args: model_name (str): The HuggingFace model name for embeddings. device (str): The device to run the model on ('cpu' or 'cuda'). encode_kwargs (dict): Additional keyword arguments for encoding. qdrant_url (str): The URL for the Qdrant instance. collection_name (str): The name of the Qdrant collection. """ self.model_name = model_name self.device = device self.encode_kwargs = encode_kwargs self.qdrant_url = qdrant_url self.collection_name = collection_name self.embeddings = HuggingFaceBgeEmbeddings( model_name=self.model_name, model_kwargs={"device": self.device}, encode_kwargs=self.encode_kwargs, ) def create_embeddings(self, pdf_path: str): """ Processes the PDF, creates embeddings, and stores them in Qdrant. Args: pdf_path (str): The file path to the PDF document. Returns: str: Success message upon completion. """ if not os.path.exists(pdf_path): raise FileNotFoundError(f"The file {pdf_path} does not exist.") # Load and preprocess the document loader = UnstructuredPDFLoader(pdf_path) docs = loader.load() if not docs: raise ValueError("No documents were loaded from the PDF.") text_splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=250 ) splits = text_splitter.split_documents(docs) if not splits: raise ValueError("No text chunks were created from the documents.") # Create and store embeddings in Qdrant try: qdrant = Qdrant.from_documents( splits, self.embeddings, url=self.qdrant_url, prefer_grpc=False, collection_name=self.collection_name, ) except Exception as e: raise ConnectionError(f"Failed to connect to Qdrant: {e}") return "✅ Vector DB Successfully Created and Stored in Qdrant!" ``` I am still getting the same error of connection refused when containers are executed. Also did you tried the execution from docker with individual containers of ollama, webapp or through python terminal since from python directly, it was working fine earlier as well.

GiteaMirror commented

2026-04-28 18:59:34 -05:00

@VenturaAI commented on GitHub (Nov 1, 2024):

This is the error i am getting now:

It was not coming earlier though i have langchain_ollama already installed in virtual env and in requirements.txt.

@VenturaAI commented on GitHub (Nov 1, 2024): This is the error i am getting now: ![image](https://github.com/user-attachments/assets/6347b979-1bba-4ce0-87ce-a30e3143ba8d) It was not coming earlier though i have langchain_ollama already installed in virtual env and in requirements.txt.

GiteaMirror commented

2026-04-28 18:59:35 -05:00

@rick-github commented on GitHub (Nov 1, 2024):

I added vector.py and rebuilt the docker image. Works as expected.

Also did you tried the execution from docker with individual containers of ollama, webapp or through python terminal since from python directly, it was working fine earlier as well.

I used the docker compose config you posted in the first message.

I tried the above changes except for test function. It kind of throws an error when container is launched : ModuleNotFoundError: No module named 'langchain_ollama'. This issue was not there prior to changes.

The changes I made make no reference to langchain_ollama. If chatbot.py (which my change doesn't touch) can't find it, it would seem to be an issue the the container/environment.

@rick-github commented on GitHub (Nov 1, 2024): I added `vector.py` and rebuilt the docker image. Works as expected. ![Screenshot from 2024-11-01 12-24-13](https://github.com/user-attachments/assets/d0ac0e62-7f63-4c1a-b345-142cdda1ea50) > Also did you tried the execution from docker with individual containers of ollama, webapp or through python terminal since from python directly, it was working fine earlier as well. I used the docker compose config you posted in the first message. > I tried the above changes except for test function. It kind of throws an error when container is launched : ModuleNotFoundError: No module named 'langchain_ollama'. This issue was not there prior to changes. The changes I made make no reference to `langchain_ollama`. If `chatbot.py` (which my change doesn't touch) can't find it, it would seem to be an issue the the container/environment.

GiteaMirror commented

2026-04-28 18:59:37 -05:00

@VenturaAI commented on GitHub (Nov 1, 2024):

Ok. I want to know where you have pulled the ML model from ollama, i have done it in separate D Drive and rest of the softwares are there in C Drive. So in my case would - /d/myollamamodels:/models work or this - ./myollamamodels:/root/.ollama.
I have pulled the image of ollama from docker and then for models have created separate folder in Ddrive called "myollammodels" in which llama 3.2 3b is stored.

would appreciate little help on this from your side.

@VenturaAI commented on GitHub (Nov 1, 2024): Ok. I want to know where you have pulled the ML model from ollama, i have done it in separate D Drive and rest of the softwares are there in C Drive. So in my case would `- /d/myollamamodels:/models` work or this `- ./myollamamodels:/root/.ollama`. I have pulled the image of ollama from docker and then for models have created separate folder in Ddrive called "myollammodels" in which llama 3.2 3b is stored. would appreciate little help on this from your side.

GiteaMirror commented

2026-04-28 18:59:40 -05:00

@rick-github commented on GitHub (Nov 1, 2024):

Inside the container, ollama stores the models in /root/.ollama. If you want to map that onto the /d/myollamamodels directory in the host system, use /d/myollamamodels:/root/.ollama.

@rick-github commented on GitHub (Nov 1, 2024): Inside the container, ollama stores the models in `/root/.ollama`. If you want to map that onto the `/d/myollamamodels` directory in the host system, use `/d/myollamamodels:/root/.ollama`.

GiteaMirror commented

2026-04-28 18:59:41 -05:00

@VenturaAI commented on GitHub (Nov 1, 2024):

Here's the logs of app container i am getting:

2024-11-01 18:21:36 2024-11-01 12:51:36,607 - INFO - Received query: summarize the document
2024-11-01 18:21:36 2024-11-01 12:51:36,846 - DEBUG - connect_tcp.started host='qdrant' port=6333 local_address=None timeout=5.0 socket_options=None
2024-11-01 18:21:36 2024-11-01 12:51:36,849 - DEBUG - connect_tcp.complete return_value=<httpcore._backends.sync.SyncStream object at 0x7f9d938a6b40>
2024-11-01 18:21:36 2024-11-01 12:51:36,850 - DEBUG - send_request_headers.started request=<Request [b'POST']>
2024-11-01 18:21:36 2024-11-01 12:51:36,850 - DEBUG - send_request_headers.complete
2024-11-01 18:21:36 2024-11-01 12:51:36,850 - DEBUG - send_request_body.started request=<Request [b'POST']>
2024-11-01 18:21:36 2024-11-01 12:51:36,851 - DEBUG - send_request_body.complete
2024-11-01 18:21:36 2024-11-01 12:51:36,851 - DEBUG - receive_response_headers.started request=<Request [b'POST']>
2024-11-01 18:21:36 2024-11-01 12:51:36,885 - DEBUG - receive_response_headers.complete return_value=(b'HTTP/1.1', 200, b'OK', [(b'transfer-encoding', b'chunked'), (b'vary', b'accept-encoding, Origin, Access-Control-Request-Method, Access-Control-Request-Headers'), (b'content-type', b'application/json'), (b'content-encoding', b'gzip'), (b'date', b'Fri, 01 Nov 2024 12:51:36 GMT')])
2024-11-01 18:21:36 2024-11-01 12:51:36,886 - INFO - HTTP Request: POST http://qdrant:6333/collections/vector_db/points/search "HTTP/1.1 200 OK"
2024-11-01 18:21:36 2024-11-01 12:51:36,886 - DEBUG - receive_response_body.started request=<Request [b'POST']>
2024-11-01 18:21:36 2024-11-01 12:51:36,887 - DEBUG - receive_response_body.complete
2024-11-01 18:21:36 2024-11-01 12:51:36,888 - DEBUG - response_closed.started
2024-11-01 18:21:36 2024-11-01 12:51:36,888 - DEBUG - response_closed.complete
2024-11-01 18:21:36 2024-11-01 12:51:36,889 - DEBUG - connect_tcp.started host='127.0.0.1' port=11434 local_address=None timeout=None socket_options=None
2024-11-01 18:21:36 2024-11-01 12:51:36,890 - DEBUG - connect_tcp.failed exception=ConnectError(ConnectionRefusedError(111, 'Connection refused'))

Inside the container, ollama stores the models in /root/.ollama. If you want to map that onto the /d/myollamamodels directory in the host system, use /d/myollamamodels:/root/.ollama.

I have downloaded the model previously and stored in D Drive. do i need to pull and run the model again while running container or ollama will pull it for me.

@VenturaAI commented on GitHub (Nov 1, 2024): Here's the logs of app container i am getting: 2024-11-01 18:21:36 2024-11-01 12:51:36,607 - INFO - Received query: summarize the document 2024-11-01 18:21:36 2024-11-01 12:51:36,846 - DEBUG - connect_tcp.started host='qdrant' port=6333 local_address=None timeout=5.0 socket_options=None 2024-11-01 18:21:36 2024-11-01 12:51:36,849 - DEBUG - connect_tcp.complete return_value=<httpcore._backends.sync.SyncStream object at 0x7f9d938a6b40> 2024-11-01 18:21:36 2024-11-01 12:51:36,850 - DEBUG - send_request_headers.started request=<Request [b'POST']> 2024-11-01 18:21:36 2024-11-01 12:51:36,850 - DEBUG - send_request_headers.complete 2024-11-01 18:21:36 2024-11-01 12:51:36,850 - DEBUG - send_request_body.started request=<Request [b'POST']> 2024-11-01 18:21:36 2024-11-01 12:51:36,851 - DEBUG - send_request_body.complete 2024-11-01 18:21:36 2024-11-01 12:51:36,851 - DEBUG - receive_response_headers.started request=<Request [b'POST']> 2024-11-01 18:21:36 2024-11-01 12:51:36,885 - DEBUG - receive_response_headers.complete return_value=(b'HTTP/1.1', 200, b'OK', [(b'transfer-encoding', b'chunked'), (b'vary', b'accept-encoding, Origin, Access-Control-Request-Method, Access-Control-Request-Headers'), (b'content-type', b'application/json'), (b'content-encoding', b'gzip'), (b'date', b'Fri, 01 Nov 2024 12:51:36 GMT')]) 2024-11-01 18:21:36 2024-11-01 12:51:36,886 - INFO - HTTP Request: POST http://qdrant:6333/collections/vector_db/points/search "HTTP/1.1 200 OK" 2024-11-01 18:21:36 2024-11-01 12:51:36,886 - DEBUG - receive_response_body.started request=<Request [b'POST']> 2024-11-01 18:21:36 2024-11-01 12:51:36,887 - DEBUG - receive_response_body.complete 2024-11-01 18:21:36 2024-11-01 12:51:36,888 - DEBUG - response_closed.started 2024-11-01 18:21:36 2024-11-01 12:51:36,888 - DEBUG - response_closed.complete 2024-11-01 18:21:36 2024-11-01 12:51:36,889 - DEBUG - connect_tcp.started host='127.0.0.1' port=11434 local_address=None timeout=None socket_options=None 2024-11-01 18:21:36 2024-11-01 12:51:36,890 - DEBUG - connect_tcp.failed exception=ConnectError(ConnectionRefusedError(111, 'Connection refused')) > Inside the container, ollama stores the models in /root/.ollama. If you want to map that onto the /d/myollamamodels directory in the host system, use /d/myollamamodels:/root/.ollama. I have downloaded the model previously and stored in D Drive. do i need to pull and run the model again while running container or ollama will pull it for me.

GiteaMirror commented

2026-04-28 18:59:41 -05:00

@rick-github commented on GitHub (Nov 1, 2024):

2024-11-01 18:21:36 2024-11-01 12:51:36,889 - DEBUG - connect_tcp.started host='127.0.0.1' port=11434 local_address=None timeout=None socket_options=None

This is not connecting to the ollama container, it is connecting to the default address of 127.0.01:11434. You need to set base_url in ChatOllama to the address to connect to.

I have downloaded the model previously and stored in D Drive. do i need to pull and run the model again while running container or ollama will pull it for me.

If you have already downloaded the model and have mapped it into the container at /root/.ollama, ollama will use it without trying to pull it again.

@rick-github commented on GitHub (Nov 1, 2024): ``` 2024-11-01 18:21:36 2024-11-01 12:51:36,889 - DEBUG - connect_tcp.started host='127.0.0.1' port=11434 local_address=None timeout=None socket_options=None ``` This is not connecting to the ollama container, it is connecting to the default address of 127.0.01:11434. You need to set `base_url` in `ChatOllama` to the address to connect to. > I have downloaded the model previously and stored in D Drive. do i need to pull and run the model again while running container or ollama will pull it for me. If you have already downloaded the model and have mapped it into the container at `/root/.ollama`, ollama will use it without trying to pull it again.

GiteaMirror commented

2026-04-28 18:59:42 -05:00

@VenturaAI commented on GitHub (Nov 1, 2024):

Thanks a lot man for your patience and support. The solution worked and the app is now generating responses.

@VenturaAI commented on GitHub (Nov 1, 2024): Thanks a lot man for your patience and support. The solution worked and the app is now generating responses.

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

parth-launch-plan-gating

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#51243