[GH-ISSUE #7444] "Connection Refused" Issue while running ollama in container with LLM Chat bot app in another docker Container #51243

Closed
opened 2026-04-28 18:59:21 -05:00 by GiteaMirror · 24 comments
Owner

Originally created by @VenturaAI on GitHub (Oct 31, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7444

I have created a local chatbot in python 3.12 that allows user to chat with pdf uploaded by creating embeddings in qdrant vector database and further getting inference from ollama (Model LLama3.2:3B).
In my source code, I am using the following dependencies:

streamlit langchain langchain_community langchain_core python-dotenv langchain-huggingface langchain-qdrant langchain-ollama unstructured[pdf] onnx==1.16.1 qdrant-client torch torchvision torchaudio

Since I want to deploy the code on a server (where there is no dependencies installed), I will be using docker to run the containers for qdrant, chatbotapp and ollama. I have successfully pulled ollama latest image and qdrant using docker.

docker run -d -v D:\myollamamodels:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
docker exec -it ollama ollama run llama3.2:3b

Both Ollama and docker container are running and accessible from within. Checked using Docker desktop aswell. I have also bridged the chatbot app, ollama and qdrant container onto single network using:

docker network connect my_network ollama
docker network connect my_network qdrant

Now when i run the app, it does open and allowing me to upload the pdf, create the embedding and my embeddings are also successfully store din vector DB( I have included relevant print statements which are reflected in app GUI). Now the issue comes when i want to chat with the document, so when i enter the question, it waits and instead of responding with the inference output , it provides me the error: "⚠️ An error occurred while processing your request: [Errno 111] Connection refused".

I have the docker compose file as below:

version: '3.8'

services:
qdrant:
image: qdrant/qdrant:v1.12.1
container_name: qdrant
ports:
- "6333:6333" # Expose Qdrant on the default port
volumes:
- qdrant_data:/qdrant/storage
networks:
- my_network # Connect qdrant to my_network

ollama:
image: ollama/ollama:latest
container_name: ollama
ports:
- "11434:11434" # Expose Ollama on the default port
environment:
- OLLAMA_MODEL=llama3.2:3b
volumes:
- /d/myollamamodels:/models
networks:
- my_network

app:
build: .
container_name: app_new
ports:
- "8501:8501" # Streamlit default port
environment:
QDRANT_URL: http://qdrant:6333 # Use Qdrant service name from Docker Compose
OLLAMA_URL: http://ollama:11434
#OLLAMA_MODEL: http://host.docker.internal:11434/llama3.2:3b # Point to Ollama on host
depends_on:
- qdrant
- ollama
volumes:
- ./models:/models # Mount the model directory for access
networks:
- my_network # Connect app to my_network

volumes:
qdrant_data:

networks:
my_network:
driver: bridge`

The python program and class which I have been using for AI chatbot is as follows:
Streamlit app code and vector embeddings code are in different.py files.

class ChatbotManager:
def __init__(
self,
model_name: str = "BAAI/bge-small-en",
device: str = "cpu",
encode_kwargs: dict = {"normalize_embeddings": True},
llm_model: str = "llama3.2:3b",
#llm_model: str = None, # Set to None to use environment variable
llm_temperature: float = 0.7,
qdrant_url: str = "http://qdrant:6333",
ollama_url: str = "http://ollama:11434", # URL for Ollama inside Docker network
collection_name: str = "vector_db",
):
"""
Initializes the ChatbotManager with embedding models, LLM, and vector store.

    Args:
        model_name (str): The HuggingFace model name for embeddings.
        device (str): The device to run the model on ('cpu' or 'cuda').
        encode_kwargs (dict): Additional keyword arguments for encoding.
        llm_model (str): The local LLM model name for ChatOllama.
        llm_temperature (float): Temperature setting for the LLM.
        qdrant_url (str): The URL for the Qdrant instance.
        collection_name (str): The name of the Qdrant collection.
    """
    self.model_name = model_name
    self.device = device
    self.encode_kwargs = encode_kwargs
    #self.llm_model = llm_model
    # Get the LLM model name from the environment variable
    self.llm_model = os.getenv("OLLAMA_MODEL", llm_model)
    self.llm_temperature = llm_temperature
    self.qdrant_url = qdrant_url
    self.collection_name = collection_name
    self.ollama_url = ollama_url  # Initialize ollama_url

    # Initialize Embeddings
    self.embeddings = HuggingFaceBgeEmbeddings(
        model_name=self.model_name,
        model_kwargs={"device": self.device},
        encode_kwargs=self.encode_kwargs,
    )

    # Initialize Local LLM
    self.llm = ChatOllama(
        model=self.llm_model,
        temperature=self.llm_temperature,
        server_url=self.ollama_url
        # Add other parameters if needed
    )`

    # Define the prompt template
    self.prompt_template = """Use the following pieces of information to answer the user's question.

If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: {context}
Question: {question}

Only return the helpful answer. Answer must be detailed and well explained.
Helpful answer:
"""

    # Initialize Qdrant client
    self.client = QdrantClient(
        url=self.qdrant_url, prefer_grpc=False
    )

    # Initialize the Qdrant vector store
    self.db = Qdrant(
        client=self.client,
        embeddings=self.embeddings,
        collection_name=self.collection_name
    )

    # Initialize the prompt
    self.prompt = PromptTemplate(
        template=self.prompt_template,
        input_variables=['context', 'question']
    )

    # Initialize the retriever
    self.retriever = self.db.as_retriever(search_kwargs={"k": 1})

    # Define chain type kwargs
    self.chain_type_kwargs = {"prompt": self.prompt}

    # Initialize the RetrievalQA chain with return_source_documents=False
    self.qa = RetrievalQA.from_chain_type(
        llm=self.llm,
        chain_type="stuff",
        retriever=self.retriever,
        return_source_documents=False,  # Set to False to return only 'result'
        chain_type_kwargs=self.chain_type_kwargs,
        verbose=False
    )

def get_response(self, query: str) -> str:
    """
    Processes the user's query and returns the chatbot's response.

    Args:
        query (str): The user's input question.

    Returns:
        str: The chatbot's response.
    """
    try:
        response = self.qa.run(query)
        return response  # 'response' is now a string containing only the 'result'
    except Exception as e:
        st.error(f"An error occurred while processing your request: {e}")
        return "Sorry, I couldn't process your request at the moment."`

Logs of app container:

2024-10-30` 16:47:13 2024-10-30 11:17:13.140 Examining the path of torch.classes raised: Tried to instantiate class 'path.path', but it does not exist! Ensure that it is registered via torch::class

2024-10-30` 16:49:55 2024-10-30 11:19:55.974 Examining the path of torch.classes raised: Tried to instantiate class 'path.path', but it does not exist! Ensure that it is registered via torch::class

2024-10-30 16:50:44 /app/chatbot.py:119: LangChainDeprecationWarning: The method Chain.run was deprecated in langchain 0.1.0 and will be removed in 1.0. Use :meth:~invoke instead.

2024-10-30 16:50:44 response = self.qa.run(query)

I have looked into it many times and modified it based on ollama_url and other factors such as checking ollama service availability, ollama container status, modification of yml file, but none seem to work and I am struck at this error. The entire code is working well though within the development environment without docker (and with ollama as service on host) but I need to deploy it at the earliest on a server to make it available on network.

I have checked ollama container service is working on port 11434 (did checked it via url and also via docker command) and qdrant is also working since the embedding are created and are shown via successful message in the APP UI but somehow the connection to ollama is being refused I guess.

Could someone please explain the issue and solution for this problem.
Thanks.

Originally created by @VenturaAI on GitHub (Oct 31, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7444 I have created a **local chatbot** in python 3.12 that allows user to chat with pdf uploaded by creating embeddings in qdrant vector database and further getting inference from ollama (Model LLama3.2:3B). In my source code, I am using the following dependencies: `streamlit langchain langchain_community langchain_core python-dotenv langchain-huggingface langchain-qdrant langchain-ollama unstructured[pdf] onnx==1.16.1 qdrant-client torch torchvision torchaudio` Since I want to deploy the code on a server (where there is no dependencies installed), I will be using docker to run the containers for qdrant, chatbotapp and ollama. I have successfully pulled ollama latest image and qdrant using docker. `docker run -d -v D:\myollamamodels:/root/.ollama -p 11434:11434 --name ollama ollama/ollama` `docker exec -it ollama ollama run llama3.2:3b` Both Ollama and docker container are running and accessible from within. Checked using Docker desktop aswell. I have also bridged the chatbot app, ollama and qdrant container onto single network using: `docker network connect my_network ollama` `docker network connect my_network qdrant` Now when i run the app, it does open and allowing me to upload the pdf, create the embedding and my embeddings are also successfully store din vector DB( I have included relevant print statements which are reflected in app GUI). Now the issue comes when i want to chat with the document, so when i enter the question, it waits and instead of responding with the inference output , it provides me the error: "**⚠️ An error occurred while processing your request: [Errno 111] Connection refused".** **I have the docker compose file as below:** version: '3.8' services: qdrant: image: qdrant/qdrant:v1.12.1 container_name: qdrant ports: - "6333:6333" # Expose Qdrant on the default port volumes: - qdrant_data:/qdrant/storage networks: - my_network # Connect qdrant to my_network ollama: image: ollama/ollama:latest container_name: ollama ports: - "11434:11434" # Expose Ollama on the default port environment: - OLLAMA_MODEL=llama3.2:3b volumes: - /d/myollamamodels:/models networks: - my_network app: build: . container_name: app_new ports: - "8501:8501" # Streamlit default port environment: QDRANT_URL: http://qdrant:6333 # Use Qdrant service name from Docker Compose OLLAMA_URL: http://ollama:11434 #OLLAMA_MODEL: http://host.docker.internal:11434/llama3.2:3b # Point to Ollama on host depends_on: - qdrant - ollama volumes: - ./models:/models # Mount the model directory for access networks: - my_network # Connect app to my_network volumes: qdrant_data: networks: my_network: driver: bridge` **The python program and class which I have been using for AI chatbot is as follows:** Streamlit app code and vector embeddings code are in different.py files. `class ChatbotManager:` `def __init__(` `self,` `model_name: str = "BAAI/bge-small-en",` `device: str = "cpu",` `encode_kwargs: dict = {"normalize_embeddings": True},` `llm_model: str = "llama3.2:3b",` `#llm_model: str = None, # Set to None to use environment variable` `llm_temperature: float = 0.7,` `qdrant_url: str = "http://qdrant:6333",` `ollama_url: str = "http://ollama:11434", # URL for Ollama inside Docker network` `collection_name: str = "vector_db",` `):` ` """` ` Initializes the ChatbotManager with embedding models, LLM, and vector store.` Args: model_name (str): The HuggingFace model name for embeddings. device (str): The device to run the model on ('cpu' or 'cuda'). encode_kwargs (dict): Additional keyword arguments for encoding. llm_model (str): The local LLM model name for ChatOllama. llm_temperature (float): Temperature setting for the LLM. qdrant_url (str): The URL for the Qdrant instance. collection_name (str): The name of the Qdrant collection. """ self.model_name = model_name self.device = device self.encode_kwargs = encode_kwargs #self.llm_model = llm_model # Get the LLM model name from the environment variable self.llm_model = os.getenv("OLLAMA_MODEL", llm_model) self.llm_temperature = llm_temperature self.qdrant_url = qdrant_url self.collection_name = collection_name self.ollama_url = ollama_url # Initialize ollama_url # Initialize Embeddings self.embeddings = HuggingFaceBgeEmbeddings( model_name=self.model_name, model_kwargs={"device": self.device}, encode_kwargs=self.encode_kwargs, ) # Initialize Local LLM self.llm = ChatOllama( model=self.llm_model, temperature=self.llm_temperature, server_url=self.ollama_url # Add other parameters if needed )` # Define the prompt template self.prompt_template = """Use the following pieces of information to answer the user's question. If you don't know the answer, just say that you don't know, don't try to make up an answer. Context: {context} Question: {question} Only return the helpful answer. Answer must be detailed and well explained. Helpful answer: """ # Initialize Qdrant client self.client = QdrantClient( url=self.qdrant_url, prefer_grpc=False ) # Initialize the Qdrant vector store self.db = Qdrant( client=self.client, embeddings=self.embeddings, collection_name=self.collection_name ) # Initialize the prompt self.prompt = PromptTemplate( template=self.prompt_template, input_variables=['context', 'question'] ) # Initialize the retriever self.retriever = self.db.as_retriever(search_kwargs={"k": 1}) # Define chain type kwargs self.chain_type_kwargs = {"prompt": self.prompt} # Initialize the RetrievalQA chain with return_source_documents=False self.qa = RetrievalQA.from_chain_type( llm=self.llm, chain_type="stuff", retriever=self.retriever, return_source_documents=False, # Set to False to return only 'result' chain_type_kwargs=self.chain_type_kwargs, verbose=False ) def get_response(self, query: str) -> str: """ Processes the user's query and returns the chatbot's response. Args: query (str): The user's input question. Returns: str: The chatbot's response. """ try: response = self.qa.run(query) return response # 'response' is now a string containing only the 'result' except Exception as e: st.error(f"An error occurred while processing your request: {e}") return "Sorry, I couldn't process your request at the moment."` **Logs of app container:** > 2024-10-30` 16:47:13 2024-10-30 11:17:13.140 Examining the path of torch.classes raised: Tried to instantiate class '__path__._path', but it does not exist! Ensure that it is registered via torch::class_ > 2024-10-30` 16:49:55 2024-10-30 11:19:55.974 Examining the path of torch.classes raised: Tried to instantiate class '__path__._path', but it does not exist! Ensure that it is registered via torch::class_ > 2024-10-30 16:50:44 /app/chatbot.py:119: LangChainDeprecationWarning: The method `Chain.run` was deprecated in langchain 0.1.0 and will be removed in 1.0. Use :meth:`~invoke` instead. > 2024-10-30 16:50:44 response = self.qa.run(query) I have looked into it many times and modified it based on ollama_url and other factors such as checking ollama service availability, ollama container status, modification of yml file, but none seem to work and I am struck at this error. **The entire code is working well though within the development environment without docker (and with ollama as service on host)** but I need to deploy it at the earliest on a server to make it available on network. I have checked ollama container service is working on port 11434 (did checked it via url and also via docker command) and qdrant is also working since the embedding are created and are shown via successful message in the APP UI but somehow the connection to ollama is being refused I guess. Could someone please explain the issue and solution for this problem. Thanks.
Author
Owner

@rick-github commented on GitHub (Oct 31, 2024):

--- 7444.py.orig	2024-10-31 11:56:22.897375168 +0100
+++ 7444.py	2024-10-31 11:56:15.330871235 +0100
@@ -45,7 +45,7 @@
     self.llm = ChatOllama(
         model=self.llm_model,
         temperature=self.llm_temperature,
-        server_url=self.ollama_url
+        base_url=self.ollama_url
         # Add other parameters if needed
     )
<!-- gh-comment-id:2449579123 --> @rick-github commented on GitHub (Oct 31, 2024): ```diff --- 7444.py.orig 2024-10-31 11:56:22.897375168 +0100 +++ 7444.py 2024-10-31 11:56:15.330871235 +0100 @@ -45,7 +45,7 @@ self.llm = ChatOllama( model=self.llm_model, temperature=self.llm_temperature, - server_url=self.ollama_url + base_url=self.ollama_url # Add other parameters if needed ) ```
Author
Owner

@VenturaAI commented on GitHub (Oct 31, 2024):

Hi, tried implementing the suggested change in the code but it didn't worked and the error is same "Connection refused" when question is asked for chatting with pdf.

<!-- gh-comment-id:2449639463 --> @VenturaAI commented on GitHub (Oct 31, 2024): Hi, tried implementing the suggested change in the code but it didn't worked and the error is same "Connection refused" when question is asked for chatting with pdf.
Author
Owner

@rick-github commented on GitHub (Oct 31, 2024):

What's the result of

docker exec -it app_new curl ollama:11434
<!-- gh-comment-id:2449645621 --> @rick-github commented on GitHub (Oct 31, 2024): What's the result of ``` docker exec -it app_new curl ollama:11434 ```
Author
Owner

@VenturaAI commented on GitHub (Oct 31, 2024):

Result after the command is executed: Ollama is running

<!-- gh-comment-id:2449648657 --> @VenturaAI commented on GitHub (Oct 31, 2024): Result after the command is executed: Ollama is running
Author
Owner

@rick-github commented on GitHub (Oct 31, 2024):

What base image do you use for app_new?

<!-- gh-comment-id:2449650964 --> @rick-github commented on GitHub (Oct 31, 2024): What base image do you use for `app_new`?
Author
Owner

@VenturaAI commented on GitHub (Oct 31, 2024):

FROM python:3.12

<!-- gh-comment-id:2449656150 --> @VenturaAI commented on GitHub (Oct 31, 2024): FROM python:3.12
Author
Owner

@rick-github commented on GitHub (Oct 31, 2024):

run this:

docker exec -it app_new bash -c 'apt update && apt install -y tcpflow'
docker exec -it app_new tcpflow -c -i any port 11434

and then go to your streamlit ui and run a query. post results, if any.

<!-- gh-comment-id:2449674358 --> @rick-github commented on GitHub (Oct 31, 2024): run this: ``` docker exec -it app_new bash -c 'apt update && apt install -y tcpflow' docker exec -it app_new tcpflow -c -i any port 11434 ``` and then go to your streamlit ui and run a query. post results, if any.
Author
Owner

@VenturaAI commented on GitHub (Oct 31, 2024):

After running the tcpflow, the container logs are:

2024-10-3117:03:36 /app/chatbot.py:120: LangChainDeprecationWarning: The methodChain.run was deprecated in langchain 0.1.0 and will be removed in 1.0. Use :meth:~invoke` instead.

2024-10-31` 17:03:36 response = self.qa.run(query)

2024-10-31`` 17:31:21 2024-10-31 12:01:21.796 Examining the path of torch.classes raised: Tried to instantiate class 'path.path', but it does not exist! Ensure that it is registered via torch::class

& the Streamlit UI still shows error:

⚠️ An error occurred while processing your request: [Errno 111] Connection refused
⚠️ Sorry, I couldn't process your request at the moment.

<!-- gh-comment-id:2449688423 --> @VenturaAI commented on GitHub (Oct 31, 2024): After running the tcpflow, the container logs are: > 2024-10-31` 17:03:36 /app/chatbot.py:120: LangChainDeprecationWarning: The method `Chain.run` was deprecated in langchain 0.1.0 and will be removed in 1.0. Use :meth:`~invoke` instead. > 2024-10-31` 17:03:36 response = self.qa.run(query) > 2024-10-31`` 17:31:21 2024-10-31 12:01:21.796 Examining the path of torch.classes raised: Tried to instantiate class '__path__._path', but it does not exist! Ensure that it is registered via torch::class_ & the Streamlit UI still shows error: ⚠️ An error occurred while processing your request: [Errno 111] Connection refused ⚠️ Sorry, I couldn't process your request at the moment.
Author
Owner

@VenturaAI commented on GitHub (Oct 31, 2024):

I need to know if the ollama service is running in container and the app in another container then ollama_url shall be localhost:11434 or ollama:11434 ??

<!-- gh-comment-id:2449690453 --> @VenturaAI commented on GitHub (Oct 31, 2024): I need to know if the ollama service is running in container and the app in another container then ollama_url shall be localhost:11434 or ollama:11434 ??
Author
Owner

@rick-github commented on GitHub (Oct 31, 2024):

What was the output of tcpflow?

If ollama is running in a container, the client will need to connect to http://$container_name:11434.

<!-- gh-comment-id:2449694049 --> @rick-github commented on GitHub (Oct 31, 2024): What was the output of `tcpflow`? If ollama is running in a container, the client will need to connect to http://$container_name:11434.
Author
Owner

@VenturaAI commented on GitHub (Oct 31, 2024):

reportfilename: ./report.xml
tcpflow: listening on any
Nothing is appearing after this in the terminal when query is executed in streamlit UI.

<!-- gh-comment-id:2449711234 --> @VenturaAI commented on GitHub (Oct 31, 2024): reportfilename: ./report.xml tcpflow: listening on any Nothing is appearing after this in the terminal when query is executed in streamlit UI.
Author
Owner

@rick-github commented on GitHub (Oct 31, 2024):

The app is not trying to connect to ollama at all, or at least not on port 11434. If you add print(f"ollama_url is {self.ollama_url}") before the call to ChatOllama, what's the result?

<!-- gh-comment-id:2449716289 --> @rick-github commented on GitHub (Oct 31, 2024): The app is not trying to connect to ollama at all, or at least not on port 11434. If you add `print(f"ollama_url is {self.ollama_url}")` before the call to `ChatOllama`, what's the result?
Author
Owner

@VenturaAI commented on GitHub (Oct 31, 2024):

Included the above statement, Getting the same logs from container console:
2024-10-31 18:01:16 self.db = Qdrant(
2024-10-31 18:01:17 2024-10-31 12:31:17.078 Examining the path of torch.classes raised: Tried to instantiate class 'path.path', but it does not exist! Ensure that it is registered via torch::class
2024-10-31 18:01:37 /app/chatbot.py:123: LangChainDeprecationWarning: The method Chain.run was deprecated in langchain 0.1.0 and will be removed in 1.0. Use :meth:~invoke instead.
2024-10-31 18:01:37 response = self.qa.run(query)

After creating the db in qdrant it shall invoke chatbot.py. Below is the snippet from APP UI CODE:

`# User input
if user_input := st.chat_input("Type your message here..."):
# Display user message
st.chat_message("user").markdown(user_input)
st.session_state['messages'].append({"role": "user", "content": user_input})

            with st.spinner("🤖 Responding..."):
                try:
                    # Get the chatbot response using the ChatbotManager
                    answer = st.session_state['chatbot_manager'].get_response(user_input)
                    time.sleep(1)  # Simulate processing time
                except Exception as e:
                    answer = f"⚠️ An error occurred while processing your request: {e}"`
<!-- gh-comment-id:2449742591 --> @VenturaAI commented on GitHub (Oct 31, 2024): Included the above statement, Getting the same logs from container console: 2024-10-31 18:01:16 self.db = Qdrant( 2024-10-31 18:01:17 2024-10-31 12:31:17.078 Examining the path of torch.classes raised: Tried to instantiate class '__path__._path', but it does not exist! Ensure that it is registered via torch::class_ 2024-10-31 18:01:37 /app/chatbot.py:123: LangChainDeprecationWarning: The method `Chain.run` was deprecated in langchain 0.1.0 and will be removed in 1.0. Use :meth:`~invoke` instead. 2024-10-31 18:01:37 response = self.qa.run(query) After creating the db in qdrant it shall invoke chatbot.py. Below is the snippet from APP UI CODE: `# User input if user_input := st.chat_input("Type your message here..."): # Display user message st.chat_message("user").markdown(user_input) st.session_state['messages'].append({"role": "user", "content": user_input}) with st.spinner("🤖 Responding..."): try: # Get the chatbot response using the ChatbotManager answer = st.session_state['chatbot_manager'].get_response(user_input) time.sleep(1) # Simulate processing time except Exception as e: answer = f"⚠️ An error occurred while processing your request: {e}"`
Author
Owner

@rick-github commented on GitHub (Oct 31, 2024):

I don't see ollama_url is ... in the log.

Can I ask that you wrap code and log snippets in a markdown code block, three backticks (```) on a line at the start and again at the end. It makes it much easier to read the code if it's properly formatted.

<!-- gh-comment-id:2449812298 --> @rick-github commented on GitHub (Oct 31, 2024): I don't see `ollama_url is ...` in the log. Can I ask that you wrap code and log snippets in a markdown code block, three backticks (\`\`\`) on a line at the start and again at the end. It makes it much easier to read the code if it's properly formatted.
Author
Owner

@VenturaAI commented on GitHub (Oct 31, 2024):

Sure here's the code :

# chatbot.py

import os
from langchain_community.embeddings import HuggingFaceBgeEmbeddings
from langchain_community.vectorstores import Qdrant
from langchain_ollama import ChatOllama
from qdrant_client import QdrantClient
from langchain import PromptTemplate
from langchain.chains import RetrievalQA
import streamlit as st

class ChatbotManager:
   def __init__(
       self,
       model_name: str = "BAAI/bge-small-en",
       device: str = "cpu",
       encode_kwargs: dict = {"normalize_embeddings": True},
       llm_model: str = "llama3.2:3b",
       #llm_model: str = None,  # Set to None to use environment variable
       llm_temperature: float = 0.7,
       qdrant_url: str = "http://qdrant:6333",
       ollama_url: str = "http://ollama:11434",  # URL for Ollama inside Docker network
       collection_name: str = "vector_db",
   ):
       """
       Initializes the ChatbotManager with embedding models, LLM, and vector store.

       Args:
           model_name (str): The HuggingFace model name for embeddings.
           device (str): The device to run the model on ('cpu' or 'cuda').
           encode_kwargs (dict): Additional keyword arguments for encoding.
           llm_model (str): The local LLM model name for ChatOllama.
           llm_temperature (float): Temperature setting for the LLM.
           qdrant_url (str): The URL for the Qdrant instance.
           collection_name (str): The name of the Qdrant collection.
       """
       self.model_name = model_name
       self.device = device
       self.encode_kwargs = encode_kwargs
       #self.llm_model = llm_model
       # Get the LLM model name from the environment variable
       self.llm_model = os.getenv("OLLAMA_MODEL", llm_model)
       self.llm_temperature = llm_temperature
       self.qdrant_url = qdrant_url
       self.collection_name = collection_name
       self.ollama_url = ollama_url  # Initialize ollama_url

       # Initialize Embeddings
       self.embeddings = HuggingFaceBgeEmbeddings(
           model_name=self.model_name,
           model_kwargs={"device": self.device},
           encode_kwargs=self.encode_kwargs,
       )
       
       # Print the ollama_url for debugging purposes
       print(f"ollama_url is {self.ollama_url}")

       # Initialize Local LLM
       self.llm = ChatOllama(
           model=self.llm_model,
           temperature=self.llm_temperature,
           #server_url=self.ollama_url
           base_url=self.ollama_url
           # Add other parameters if needed
       )

       # Define the prompt template
       self.prompt_template = """Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: {context}
Question: {question}

Only return the helpful answer. Answer must be detailed and well explained.
Helpful answer:
"""

       # Initialize Qdrant client
       self.client = QdrantClient(
           url=self.qdrant_url, prefer_grpc=False
       )

       # Initialize the Qdrant vector store
       self.db = Qdrant(
           client=self.client,
           embeddings=self.embeddings,
           collection_name=self.collection_name
       )

       # Initialize the prompt
       self.prompt = PromptTemplate(
           template=self.prompt_template,
           input_variables=['context', 'question']
       )

       # Initialize the retriever
       self.retriever = self.db.as_retriever(search_kwargs={"k": 1})

       # Define chain type kwargs
       self.chain_type_kwargs = {"prompt": self.prompt}

       # Initialize the RetrievalQA chain with return_source_documents=False
       self.qa = RetrievalQA.from_chain_type(
           llm=self.llm,
           chain_type="stuff",
           retriever=self.retriever,
           return_source_documents=False,  # Set to False to return only 'result'
           chain_type_kwargs=self.chain_type_kwargs,
           verbose=False
       )

   def get_response(self, query: str) -> str:
       """
       Processes the user's query and returns the chatbot's response.

       Args:
           query (str): The user's input question.

       Returns:
           str: The chatbot's response.
       """
       try:
           response = self.qa.run(query)
           return response  # 'response' is now a string containing only the 'result'
       except Exception as e:
           st.error(f"⚠️ An error occurred while processing your request: {e}")
           return "⚠️ Sorry, I couldn't process your request at the moment."

Streamlit app code:

# app.py

import streamlit as st
from streamlit import session_state
import time
import base64
import os
from vectors import EmbeddingsManager  # Import the EmbeddingsManager class
from chatbot import ChatbotManager     # Import the ChatbotManager class

# Function to display the PDF of a given file
def displayPDF(file):
    # Reading the uploaded file
    base64_pdf = base64.b64encode(file.read()).decode('utf-8')

    # Embedding PDF in HTML
    pdf_display = f'<iframe src="data:application/pdf;base64,{base64_pdf}" width="100%" height="600" type="application/pdf"></iframe>'

    # Displaying the PDF
    st.markdown(pdf_display, unsafe_allow_html=True)

# Initialize session_state variables if not already present
if 'temp_pdf_path' not in st.session_state:
    st.session_state['temp_pdf_path'] = None

if 'chatbot_manager' not in st.session_state:
    st.session_state['chatbot_manager'] = None

if 'messages' not in st.session_state:
    st.session_state['messages'] = []

# Set the page configuration to wide layout and add a title
st.set_page_config(
    page_title="DocBot AI App",
    layout="wide",
    initial_sidebar_state="expanded",
)
#st.markdown("<h1 style='color: #1c5384;'>DocBot AI App</h1>", unsafe_allow_html=True)

# Sidebar
with st.sidebar:
    
    st.markdown("### 📚 Your Personal Document Assistant")
    st.markdown ("#### Powered by AI 🚀🚀")
    st.markdown("---")
    
    # Navigation Menu
    menu = ["🏠 Home", "🤖 Chatbot"]
    choice = st.selectbox("Navigate", menu)

# Home Page
if choice == "🏠 Home":
    #st.title("📄 DocBot AI App")
    st.markdown("<h1 style='color: #1c5384;'> 📄 DocBot AI App</h1>", unsafe_allow_html=True)
    st.markdown("""
    Welcome to **DocBot AI App**!
    """)

# Chatbot Page
elif choice == "🤖 Chatbot":
    #st.title("🤖 AI Chatbot Interface")
    st.markdown("<h1 style='color: #1c5384;'> 🤖 AI Chatbot Interface</h1>", unsafe_allow_html=True)
    st.markdown("---")
    
    # Create three columns
    col1, col2, col3 = st.columns(3)

    # Column 1: File Uploader and Preview
    with col1:
        st.header("📂 Upload Document")
        #st.markdown("<h1 style='color: #1c5384;'> ���� Upload Document</h1>", unsafe_allow_html=True)
        uploaded_file = st.file_uploader("Upload a PDF", type=["pdf"])
        if uploaded_file is not None:
            st.success("📄 File Uploaded Successfully!")
            # Display file name and size
            st.markdown(f"**Filename:** {uploaded_file.name}")
            st.markdown(f"**File Size:** {uploaded_file.size} bytes")
            
            # Display PDF preview using displayPDF function
            st.markdown("### 📖 PDF Preview")
            displayPDF(uploaded_file)
            
            # Save the uploaded file to a temporary location
            temp_pdf_path = "temp.pdf"
            with open(temp_pdf_path, "wb") as f:
                f.write(uploaded_file.getbuffer())
            
            # Store the temp_pdf_path in session_state
            st.session_state['temp_pdf_path'] = temp_pdf_path

    # Column 2: Create Embeddings
    with col2:
        st.header("🧠 Process the PDF ")
        create_embeddings = st.checkbox("✅ Create Embeddings")
        if create_embeddings:
            if st.session_state['temp_pdf_path'] is None:
                st.warning("⚠️ Please upload a PDF first.")
            else:
                try:
                    # Initialize the EmbeddingsManager
                    embeddings_manager = EmbeddingsManager(
                        model_name="BAAI/bge-small-en",
                        device="cpu",
                        encode_kwargs={"normalize_embeddings": True},
                        qdrant_url="http://qdrant:6333",
                        collection_name="vector_db"
                    )
                    
                    with st.spinner("🔄 Embeddings are in process..."):
                        # Create embeddings
                        result = embeddings_manager.create_embeddings(st.session_state['temp_pdf_path'])
                        time.sleep(1)  # Optional: To show spinner for a bit longer
                    st.success(result)
                    
                    # Initialize the ChatbotManager after embeddings are created
                    if st.session_state['chatbot_manager'] is None:
                        st.session_state['chatbot_manager'] = ChatbotManager(
                            model_name="BAAI/bge-small-en",
                            device="cpu",
                            encode_kwargs={"normalize_embeddings": True},
                            llm_model="llama3.2:3b",
                            llm_temperature=0.7,
                            qdrant_url="http://qdrant:6333",
                            collection_name="vector_db",
                            ollama_url = "http://ollama:11434"
                        )
                    
                except FileNotFoundError as fnf_error:
                    st.error(fnf_error)
                except ValueError as val_error:
                    st.error(val_error)
                except ConnectionError as conn_error:
                    st.error(conn_error)
                except Exception as e:
                    st.error(f"An unexpected error occurred: {e}")

    # Column 3: Chatbot Interface
    with col3:
        st.header("💬 Chat with Document")
        
        if st.session_state['chatbot_manager'] is None:
            st.info("🤖 Please upload a PDF and create embeddings to start chatting.")
        else:
            # Display existing messages
            for msg in st.session_state['messages']:
                st.chat_message(msg['role']).markdown(msg['content'])

            # User input
            if user_input := st.chat_input("Type your message here..."):
                # Display user message
                st.chat_message("user").markdown(user_input)
                st.session_state['messages'].append({"role": "user", "content": user_input})

                with st.spinner("🤖 Responding..."):
                    try:
                        # Get the chatbot response using the ChatbotManager
                        answer = st.session_state['chatbot_manager'].get_response(user_input)
                        time.sleep(1)  # Simulate processing time
                    except Exception as e:
                        answer = f"⚠️ An error occurred while processing your request: {e}"
                
                # Display chatbot message
                st.chat_message("assistant").markdown(answer)
                st.session_state['messages'].append({"role": "assistant", "content": answer})



# Footer
st.markdown("---")```

<!-- gh-comment-id:2449830946 --> @VenturaAI commented on GitHub (Oct 31, 2024): Sure here's the code : ``` # chatbot.py import os from langchain_community.embeddings import HuggingFaceBgeEmbeddings from langchain_community.vectorstores import Qdrant from langchain_ollama import ChatOllama from qdrant_client import QdrantClient from langchain import PromptTemplate from langchain.chains import RetrievalQA import streamlit as st class ChatbotManager: def __init__( self, model_name: str = "BAAI/bge-small-en", device: str = "cpu", encode_kwargs: dict = {"normalize_embeddings": True}, llm_model: str = "llama3.2:3b", #llm_model: str = None, # Set to None to use environment variable llm_temperature: float = 0.7, qdrant_url: str = "http://qdrant:6333", ollama_url: str = "http://ollama:11434", # URL for Ollama inside Docker network collection_name: str = "vector_db", ): """ Initializes the ChatbotManager with embedding models, LLM, and vector store. Args: model_name (str): The HuggingFace model name for embeddings. device (str): The device to run the model on ('cpu' or 'cuda'). encode_kwargs (dict): Additional keyword arguments for encoding. llm_model (str): The local LLM model name for ChatOllama. llm_temperature (float): Temperature setting for the LLM. qdrant_url (str): The URL for the Qdrant instance. collection_name (str): The name of the Qdrant collection. """ self.model_name = model_name self.device = device self.encode_kwargs = encode_kwargs #self.llm_model = llm_model # Get the LLM model name from the environment variable self.llm_model = os.getenv("OLLAMA_MODEL", llm_model) self.llm_temperature = llm_temperature self.qdrant_url = qdrant_url self.collection_name = collection_name self.ollama_url = ollama_url # Initialize ollama_url # Initialize Embeddings self.embeddings = HuggingFaceBgeEmbeddings( model_name=self.model_name, model_kwargs={"device": self.device}, encode_kwargs=self.encode_kwargs, ) # Print the ollama_url for debugging purposes print(f"ollama_url is {self.ollama_url}") # Initialize Local LLM self.llm = ChatOllama( model=self.llm_model, temperature=self.llm_temperature, #server_url=self.ollama_url base_url=self.ollama_url # Add other parameters if needed ) # Define the prompt template self.prompt_template = """Use the following pieces of information to answer the user's question. If you don't know the answer, just say that you don't know, don't try to make up an answer. Context: {context} Question: {question} Only return the helpful answer. Answer must be detailed and well explained. Helpful answer: """ # Initialize Qdrant client self.client = QdrantClient( url=self.qdrant_url, prefer_grpc=False ) # Initialize the Qdrant vector store self.db = Qdrant( client=self.client, embeddings=self.embeddings, collection_name=self.collection_name ) # Initialize the prompt self.prompt = PromptTemplate( template=self.prompt_template, input_variables=['context', 'question'] ) # Initialize the retriever self.retriever = self.db.as_retriever(search_kwargs={"k": 1}) # Define chain type kwargs self.chain_type_kwargs = {"prompt": self.prompt} # Initialize the RetrievalQA chain with return_source_documents=False self.qa = RetrievalQA.from_chain_type( llm=self.llm, chain_type="stuff", retriever=self.retriever, return_source_documents=False, # Set to False to return only 'result' chain_type_kwargs=self.chain_type_kwargs, verbose=False ) def get_response(self, query: str) -> str: """ Processes the user's query and returns the chatbot's response. Args: query (str): The user's input question. Returns: str: The chatbot's response. """ try: response = self.qa.run(query) return response # 'response' is now a string containing only the 'result' except Exception as e: st.error(f"⚠️ An error occurred while processing your request: {e}") return "⚠️ Sorry, I couldn't process your request at the moment." ``` Streamlit app code: ``` # app.py import streamlit as st from streamlit import session_state import time import base64 import os from vectors import EmbeddingsManager # Import the EmbeddingsManager class from chatbot import ChatbotManager # Import the ChatbotManager class # Function to display the PDF of a given file def displayPDF(file): # Reading the uploaded file base64_pdf = base64.b64encode(file.read()).decode('utf-8') # Embedding PDF in HTML pdf_display = f'<iframe src="data:application/pdf;base64,{base64_pdf}" width="100%" height="600" type="application/pdf"></iframe>' # Displaying the PDF st.markdown(pdf_display, unsafe_allow_html=True) # Initialize session_state variables if not already present if 'temp_pdf_path' not in st.session_state: st.session_state['temp_pdf_path'] = None if 'chatbot_manager' not in st.session_state: st.session_state['chatbot_manager'] = None if 'messages' not in st.session_state: st.session_state['messages'] = [] # Set the page configuration to wide layout and add a title st.set_page_config( page_title="DocBot AI App", layout="wide", initial_sidebar_state="expanded", ) #st.markdown("<h1 style='color: #1c5384;'>DocBot AI App</h1>", unsafe_allow_html=True) # Sidebar with st.sidebar: st.markdown("### 📚 Your Personal Document Assistant") st.markdown ("#### Powered by AI 🚀🚀") st.markdown("---") # Navigation Menu menu = ["🏠 Home", "🤖 Chatbot"] choice = st.selectbox("Navigate", menu) # Home Page if choice == "🏠 Home": #st.title("📄 DocBot AI App") st.markdown("<h1 style='color: #1c5384;'> 📄 DocBot AI App</h1>", unsafe_allow_html=True) st.markdown(""" Welcome to **DocBot AI App**! """) # Chatbot Page elif choice == "🤖 Chatbot": #st.title("🤖 AI Chatbot Interface") st.markdown("<h1 style='color: #1c5384;'> 🤖 AI Chatbot Interface</h1>", unsafe_allow_html=True) st.markdown("---") # Create three columns col1, col2, col3 = st.columns(3) # Column 1: File Uploader and Preview with col1: st.header("📂 Upload Document") #st.markdown("<h1 style='color: #1c5384;'> ���� Upload Document</h1>", unsafe_allow_html=True) uploaded_file = st.file_uploader("Upload a PDF", type=["pdf"]) if uploaded_file is not None: st.success("📄 File Uploaded Successfully!") # Display file name and size st.markdown(f"**Filename:** {uploaded_file.name}") st.markdown(f"**File Size:** {uploaded_file.size} bytes") # Display PDF preview using displayPDF function st.markdown("### 📖 PDF Preview") displayPDF(uploaded_file) # Save the uploaded file to a temporary location temp_pdf_path = "temp.pdf" with open(temp_pdf_path, "wb") as f: f.write(uploaded_file.getbuffer()) # Store the temp_pdf_path in session_state st.session_state['temp_pdf_path'] = temp_pdf_path # Column 2: Create Embeddings with col2: st.header("🧠 Process the PDF ") create_embeddings = st.checkbox("✅ Create Embeddings") if create_embeddings: if st.session_state['temp_pdf_path'] is None: st.warning("⚠️ Please upload a PDF first.") else: try: # Initialize the EmbeddingsManager embeddings_manager = EmbeddingsManager( model_name="BAAI/bge-small-en", device="cpu", encode_kwargs={"normalize_embeddings": True}, qdrant_url="http://qdrant:6333", collection_name="vector_db" ) with st.spinner("🔄 Embeddings are in process..."): # Create embeddings result = embeddings_manager.create_embeddings(st.session_state['temp_pdf_path']) time.sleep(1) # Optional: To show spinner for a bit longer st.success(result) # Initialize the ChatbotManager after embeddings are created if st.session_state['chatbot_manager'] is None: st.session_state['chatbot_manager'] = ChatbotManager( model_name="BAAI/bge-small-en", device="cpu", encode_kwargs={"normalize_embeddings": True}, llm_model="llama3.2:3b", llm_temperature=0.7, qdrant_url="http://qdrant:6333", collection_name="vector_db", ollama_url = "http://ollama:11434" ) except FileNotFoundError as fnf_error: st.error(fnf_error) except ValueError as val_error: st.error(val_error) except ConnectionError as conn_error: st.error(conn_error) except Exception as e: st.error(f"An unexpected error occurred: {e}") # Column 3: Chatbot Interface with col3: st.header("💬 Chat with Document") if st.session_state['chatbot_manager'] is None: st.info("🤖 Please upload a PDF and create embeddings to start chatting.") else: # Display existing messages for msg in st.session_state['messages']: st.chat_message(msg['role']).markdown(msg['content']) # User input if user_input := st.chat_input("Type your message here..."): # Display user message st.chat_message("user").markdown(user_input) st.session_state['messages'].append({"role": "user", "content": user_input}) with st.spinner("🤖 Responding..."): try: # Get the chatbot response using the ChatbotManager answer = st.session_state['chatbot_manager'].get_response(user_input) time.sleep(1) # Simulate processing time except Exception as e: answer = f"⚠️ An error occurred while processing your request: {e}" # Display chatbot message st.chat_message("assistant").markdown(answer) st.session_state['messages'].append({"role": "assistant", "content": answer}) # Footer st.markdown("---")```
Author
Owner

@rick-github commented on GitHub (Oct 31, 2024):

I modified the docker compose config from the first post:

--- docker-compose.yaml.orig	2024-10-31 22:17:26.736521468 +0100
+++ docker-compose.yaml	2024-10-31 22:09:19.128216925 +0100
@@ -19,7 +19,7 @@
     environment:
       - OLLAMA_MODEL=llama3.2:3b
     volumes:
-      - /d/myollamamodels:/models
+      - ./myollamamodels:/root/.ollama
     networks:
       - my_network
 

and added a new function to app.py to make it easier to test the connection to ollama:

--- app.py.orig	2024-10-31 17:45:52.477968011 +0100
+++ app.py	2024-10-31 21:28:32.309133808 +0100
@@ -29,6 +29,9 @@
 if 'messages' not in st.session_state:
     st.session_state['messages'] = []
 
+ollama_url = os.getenv("OLLAMA_URL", "http://ollama:11434")
+qdrant_url = os.getenv("QDRANT_URL", "http://qdrant:6333")
+
 # Set the page configuration to wide layout and add a title
 st.set_page_config(
     page_title="DocBot AI App",
@@ -45,7 +48,7 @@
     st.markdown("---")
     
     # Navigation Menu
-    menu = ["🏠 Home", "🤖 Chatbot"]
+    menu = ["🏠 Home", "🤖 Chatbot", "Ollama chat"]
     choice = st.selectbox("Navigate", menu)
 
 # Home Page
@@ -102,7 +105,7 @@
                         model_name="BAAI/bge-small-en",
                         device="cpu",
                         encode_kwargs={"normalize_embeddings": True},
-                        qdrant_url="http://qdrant:6333",
+                        qdrant_url=qdrant_url,
                         collection_name="vector_db"
                     )
                     
@@ -120,9 +123,9 @@
                             encode_kwargs={"normalize_embeddings": True},
                             llm_model="llama3.2:3b",
                             llm_temperature=0.7,
-                            qdrant_url="http://qdrant:6333",
+                            qdrant_url=qdrant_url,
                             collection_name="vector_db",
-                            ollama_url = "http://ollama:11434"
+                            ollama_url = ollama_url
                         )
                     
                 except FileNotFoundError as fnf_error:
@@ -163,6 +166,46 @@
                 st.chat_message("assistant").markdown(answer)
                 st.session_state['messages'].append({"role": "assistant", "content": answer})
 
+elif choice == "Ollama chat":
+    st.markdown("<h1 style='color: #1c5384;'> 🤖 Ollama Chat Interface</h1>", unsafe_allow_html=True)
+    st.markdown("---")
+    
+    if st.session_state['chatbot_manager'] is None:
+        st.session_state['chatbot_manager'] = ChatbotManager(
+            model_name="BAAI/bge-small-en",
+            device="cpu",
+            encode_kwargs={"normalize_embeddings": True},
+            llm_model="llama3.2:3b",
+            llm_temperature=0.7,
+            qdrant_url=qdrant_url,
+            collection_name="vector_db",
+            ollama_url = ollama_url
+        )
+
+    st.header("💬 Chat with ollama")
+    history = st.container(height=400)
+
+    # Display existing messages
+    for msg in st.session_state['messages']:
+        history.chat_message(msg['role']).markdown(msg['content'])
+
+    # User input
+    if user_input := st.chat_input("Type your message here..."):
+        # Display user message
+        history.chat_message("user").markdown(user_input)
+        st.session_state['messages'].append({"role": "user", "content": user_input})
+
+        with st.spinner("🤖 Responding..."):
+            try:
+                # Get the chatbot response using the ChatbotManager
+                answer = st.session_state['chatbot_manager'].llm.invoke(st.session_state['messages']).content
+                time.sleep(1)  # Simulate processing time
+            except Exception as e:
+                answer = f"⚠️ An error occurred while processing your request: {e}"
+        
+        # Display chatbot message
+        history.chat_message("assistant").markdown(answer)
+        st.session_state['messages'].append({"role": "assistant", "content": answer})
 
 
 # Footer

This works as expected.

Screenshot 2024-10-31 22 36 20

<!-- gh-comment-id:2450887017 --> @rick-github commented on GitHub (Oct 31, 2024): I modified the docker compose config from the first post: ```diff --- docker-compose.yaml.orig 2024-10-31 22:17:26.736521468 +0100 +++ docker-compose.yaml 2024-10-31 22:09:19.128216925 +0100 @@ -19,7 +19,7 @@ environment: - OLLAMA_MODEL=llama3.2:3b volumes: - - /d/myollamamodels:/models + - ./myollamamodels:/root/.ollama networks: - my_network ``` and added a new function to app.py to make it easier to test the connection to ollama: ```diff --- app.py.orig 2024-10-31 17:45:52.477968011 +0100 +++ app.py 2024-10-31 21:28:32.309133808 +0100 @@ -29,6 +29,9 @@ if 'messages' not in st.session_state: st.session_state['messages'] = [] +ollama_url = os.getenv("OLLAMA_URL", "http://ollama:11434") +qdrant_url = os.getenv("QDRANT_URL", "http://qdrant:6333") + # Set the page configuration to wide layout and add a title st.set_page_config( page_title="DocBot AI App", @@ -45,7 +48,7 @@ st.markdown("---") # Navigation Menu - menu = ["🏠 Home", "🤖 Chatbot"] + menu = ["🏠 Home", "🤖 Chatbot", "Ollama chat"] choice = st.selectbox("Navigate", menu) # Home Page @@ -102,7 +105,7 @@ model_name="BAAI/bge-small-en", device="cpu", encode_kwargs={"normalize_embeddings": True}, - qdrant_url="http://qdrant:6333", + qdrant_url=qdrant_url, collection_name="vector_db" ) @@ -120,9 +123,9 @@ encode_kwargs={"normalize_embeddings": True}, llm_model="llama3.2:3b", llm_temperature=0.7, - qdrant_url="http://qdrant:6333", + qdrant_url=qdrant_url, collection_name="vector_db", - ollama_url = "http://ollama:11434" + ollama_url = ollama_url ) except FileNotFoundError as fnf_error: @@ -163,6 +166,46 @@ st.chat_message("assistant").markdown(answer) st.session_state['messages'].append({"role": "assistant", "content": answer}) +elif choice == "Ollama chat": + st.markdown("<h1 style='color: #1c5384;'> 🤖 Ollama Chat Interface</h1>", unsafe_allow_html=True) + st.markdown("---") + + if st.session_state['chatbot_manager'] is None: + st.session_state['chatbot_manager'] = ChatbotManager( + model_name="BAAI/bge-small-en", + device="cpu", + encode_kwargs={"normalize_embeddings": True}, + llm_model="llama3.2:3b", + llm_temperature=0.7, + qdrant_url=qdrant_url, + collection_name="vector_db", + ollama_url = ollama_url + ) + + st.header("💬 Chat with ollama") + history = st.container(height=400) + + # Display existing messages + for msg in st.session_state['messages']: + history.chat_message(msg['role']).markdown(msg['content']) + + # User input + if user_input := st.chat_input("Type your message here..."): + # Display user message + history.chat_message("user").markdown(user_input) + st.session_state['messages'].append({"role": "user", "content": user_input}) + + with st.spinner("🤖 Responding..."): + try: + # Get the chatbot response using the ChatbotManager + answer = st.session_state['chatbot_manager'].llm.invoke(st.session_state['messages']).content + time.sleep(1) # Simulate processing time + except Exception as e: + answer = f"⚠️ An error occurred while processing your request: {e}" + + # Display chatbot message + history.chat_message("assistant").markdown(answer) + st.session_state['messages'].append({"role": "assistant", "content": answer}) # Footer ``` This works as expected. ![Screenshot 2024-10-31 22 36 20](https://github.com/user-attachments/assets/dcabdfbd-dd9f-45ca-9e06-edaedfeedb5f)
Author
Owner

@VenturaAI commented on GitHub (Nov 1, 2024):

Hi, Thanks, I tried the above changes except for test function. It kind of throws an error when container is launched : ModuleNotFoundError: No module named 'langchain_ollama'. This issue was not there prior to changes.

Below is the Code for Embeddings as Well. Can you check after qdrant DB step being successfull , why ollama is not getting connected for Chat.

# vectors.py

import os
import base64
from langchain_community.document_loaders import UnstructuredPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceBgeEmbeddings
from langchain_community.vectorstores import Qdrant

class EmbeddingsManager:
    def __init__(
        self,
        model_name: str = "BAAI/bge-small-en",
        device: str = "cpu",
        encode_kwargs: dict = {"normalize_embeddings": True},
        qdrant_url: str = "http://qdrant:6333",
        collection_name: str = "vector_db",
    ):
        """
        Initializes the EmbeddingsManager with the specified model and Qdrant settings.

        Args:
            model_name (str): The HuggingFace model name for embeddings.
            device (str): The device to run the model on ('cpu' or 'cuda').
            encode_kwargs (dict): Additional keyword arguments for encoding.
            qdrant_url (str): The URL for the Qdrant instance.
            collection_name (str): The name of the Qdrant collection.
        """
        self.model_name = model_name
        self.device = device
        self.encode_kwargs = encode_kwargs
        self.qdrant_url = qdrant_url
        self.collection_name = collection_name

        self.embeddings = HuggingFaceBgeEmbeddings(
            model_name=self.model_name,
            model_kwargs={"device": self.device},
            encode_kwargs=self.encode_kwargs,
        )

    def create_embeddings(self, pdf_path: str):
        """
        Processes the PDF, creates embeddings, and stores them in Qdrant.

        Args:
            pdf_path (str): The file path to the PDF document.

        Returns:
            str: Success message upon completion.
        """
        if not os.path.exists(pdf_path):
            raise FileNotFoundError(f"The file {pdf_path} does not exist.")

        # Load and preprocess the document
        loader = UnstructuredPDFLoader(pdf_path)
        docs = loader.load()
        if not docs:
            raise ValueError("No documents were loaded from the PDF.")

        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000, chunk_overlap=250
        )
        splits = text_splitter.split_documents(docs)
        if not splits:
            raise ValueError("No text chunks were created from the documents.")

        # Create and store embeddings in Qdrant
        try:
            qdrant = Qdrant.from_documents(
                splits,
                self.embeddings,
                url=self.qdrant_url,
                prefer_grpc=False,
                collection_name=self.collection_name,
            )
        except Exception as e:
            raise ConnectionError(f"Failed to connect to Qdrant: {e}")

        return "✅ Vector DB Successfully Created and Stored in Qdrant!"

I am still getting the same error of connection refused when containers are executed. Also did you tried the execution from docker with individual containers of ollama, webapp or through python terminal since from python directly, it was working fine earlier as well.

<!-- gh-comment-id:2451197733 --> @VenturaAI commented on GitHub (Nov 1, 2024): Hi, Thanks, I tried the above changes except for test function. It kind of throws an error when container is launched : ModuleNotFoundError: No module named 'langchain_ollama'. This issue was not there prior to changes. Below is the Code for Embeddings as Well. Can you check after qdrant DB step being successfull , why ollama is not getting connected for Chat. ``` # vectors.py import os import base64 from langchain_community.document_loaders import UnstructuredPDFLoader from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_community.embeddings import HuggingFaceBgeEmbeddings from langchain_community.vectorstores import Qdrant class EmbeddingsManager: def __init__( self, model_name: str = "BAAI/bge-small-en", device: str = "cpu", encode_kwargs: dict = {"normalize_embeddings": True}, qdrant_url: str = "http://qdrant:6333", collection_name: str = "vector_db", ): """ Initializes the EmbeddingsManager with the specified model and Qdrant settings. Args: model_name (str): The HuggingFace model name for embeddings. device (str): The device to run the model on ('cpu' or 'cuda'). encode_kwargs (dict): Additional keyword arguments for encoding. qdrant_url (str): The URL for the Qdrant instance. collection_name (str): The name of the Qdrant collection. """ self.model_name = model_name self.device = device self.encode_kwargs = encode_kwargs self.qdrant_url = qdrant_url self.collection_name = collection_name self.embeddings = HuggingFaceBgeEmbeddings( model_name=self.model_name, model_kwargs={"device": self.device}, encode_kwargs=self.encode_kwargs, ) def create_embeddings(self, pdf_path: str): """ Processes the PDF, creates embeddings, and stores them in Qdrant. Args: pdf_path (str): The file path to the PDF document. Returns: str: Success message upon completion. """ if not os.path.exists(pdf_path): raise FileNotFoundError(f"The file {pdf_path} does not exist.") # Load and preprocess the document loader = UnstructuredPDFLoader(pdf_path) docs = loader.load() if not docs: raise ValueError("No documents were loaded from the PDF.") text_splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=250 ) splits = text_splitter.split_documents(docs) if not splits: raise ValueError("No text chunks were created from the documents.") # Create and store embeddings in Qdrant try: qdrant = Qdrant.from_documents( splits, self.embeddings, url=self.qdrant_url, prefer_grpc=False, collection_name=self.collection_name, ) except Exception as e: raise ConnectionError(f"Failed to connect to Qdrant: {e}") return "✅ Vector DB Successfully Created and Stored in Qdrant!" ``` I am still getting the same error of connection refused when containers are executed. Also did you tried the execution from docker with individual containers of ollama, webapp or through python terminal since from python directly, it was working fine earlier as well.
Author
Owner

@VenturaAI commented on GitHub (Nov 1, 2024):

This is the error i am getting now:
image

It was not coming earlier though i have langchain_ollama already installed in virtual env and in requirements.txt.

<!-- gh-comment-id:2451275817 --> @VenturaAI commented on GitHub (Nov 1, 2024): This is the error i am getting now: ![image](https://github.com/user-attachments/assets/6347b979-1bba-4ce0-87ce-a30e3143ba8d) It was not coming earlier though i have langchain_ollama already installed in virtual env and in requirements.txt.
Author
Owner

@rick-github commented on GitHub (Nov 1, 2024):

I added vector.py and rebuilt the docker image. Works as expected.

Screenshot from 2024-11-01 12-24-13

Also did you tried the execution from docker with individual containers of ollama, webapp or through python terminal since from python directly, it was working fine earlier as well.

I used the docker compose config you posted in the first message.

I tried the above changes except for test function. It kind of throws an error when container is launched : ModuleNotFoundError: No module named 'langchain_ollama'. This issue was not there prior to changes.

The changes I made make no reference to langchain_ollama. If chatbot.py (which my change doesn't touch) can't find it, it would seem to be an issue the the container/environment.

<!-- gh-comment-id:2451733681 --> @rick-github commented on GitHub (Nov 1, 2024): I added `vector.py` and rebuilt the docker image. Works as expected. ![Screenshot from 2024-11-01 12-24-13](https://github.com/user-attachments/assets/d0ac0e62-7f63-4c1a-b345-142cdda1ea50) > Also did you tried the execution from docker with individual containers of ollama, webapp or through python terminal since from python directly, it was working fine earlier as well. I used the docker compose config you posted in the first message. > I tried the above changes except for test function. It kind of throws an error when container is launched : ModuleNotFoundError: No module named 'langchain_ollama'. This issue was not there prior to changes. The changes I made make no reference to `langchain_ollama`. If `chatbot.py` (which my change doesn't touch) can't find it, it would seem to be an issue the the container/environment.
Author
Owner

@VenturaAI commented on GitHub (Nov 1, 2024):

Ok. I want to know where you have pulled the ML model from ollama, i have done it in separate D Drive and rest of the softwares are there in C Drive. So in my case would - /d/myollamamodels:/models work or this - ./myollamamodels:/root/.ollama.
I have pulled the image of ollama from docker and then for models have created separate folder in Ddrive called "myollammodels" in which llama 3.2 3b is stored.

would appreciate little help on this from your side.

<!-- gh-comment-id:2451802401 --> @VenturaAI commented on GitHub (Nov 1, 2024): Ok. I want to know where you have pulled the ML model from ollama, i have done it in separate D Drive and rest of the softwares are there in C Drive. So in my case would `- /d/myollamamodels:/models` work or this `- ./myollamamodels:/root/.ollama`. I have pulled the image of ollama from docker and then for models have created separate folder in Ddrive called "myollammodels" in which llama 3.2 3b is stored. would appreciate little help on this from your side.
Author
Owner

@rick-github commented on GitHub (Nov 1, 2024):

Inside the container, ollama stores the models in /root/.ollama. If you want to map that onto the /d/myollamamodels directory in the host system, use /d/myollamamodels:/root/.ollama.

<!-- gh-comment-id:2451809048 --> @rick-github commented on GitHub (Nov 1, 2024): Inside the container, ollama stores the models in `/root/.ollama`. If you want to map that onto the `/d/myollamamodels` directory in the host system, use `/d/myollamamodels:/root/.ollama`.
Author
Owner

@VenturaAI commented on GitHub (Nov 1, 2024):

Here's the logs of app container i am getting:

2024-11-01 18:21:36 2024-11-01 12:51:36,607 - INFO - Received query: summarize the document
2024-11-01 18:21:36 2024-11-01 12:51:36,846 - DEBUG - connect_tcp.started host='qdrant' port=6333 local_address=None timeout=5.0 socket_options=None
2024-11-01 18:21:36 2024-11-01 12:51:36,849 - DEBUG - connect_tcp.complete return_value=<httpcore._backends.sync.SyncStream object at 0x7f9d938a6b40>
2024-11-01 18:21:36 2024-11-01 12:51:36,850 - DEBUG - send_request_headers.started request=<Request [b'POST']>
2024-11-01 18:21:36 2024-11-01 12:51:36,850 - DEBUG - send_request_headers.complete
2024-11-01 18:21:36 2024-11-01 12:51:36,850 - DEBUG - send_request_body.started request=<Request [b'POST']>
2024-11-01 18:21:36 2024-11-01 12:51:36,851 - DEBUG - send_request_body.complete
2024-11-01 18:21:36 2024-11-01 12:51:36,851 - DEBUG - receive_response_headers.started request=<Request [b'POST']>
2024-11-01 18:21:36 2024-11-01 12:51:36,885 - DEBUG - receive_response_headers.complete return_value=(b'HTTP/1.1', 200, b'OK', [(b'transfer-encoding', b'chunked'), (b'vary', b'accept-encoding, Origin, Access-Control-Request-Method, Access-Control-Request-Headers'), (b'content-type', b'application/json'), (b'content-encoding', b'gzip'), (b'date', b'Fri, 01 Nov 2024 12:51:36 GMT')])
2024-11-01 18:21:36 2024-11-01 12:51:36,886 - INFO - HTTP Request: POST http://qdrant:6333/collections/vector_db/points/search "HTTP/1.1 200 OK"
2024-11-01 18:21:36 2024-11-01 12:51:36,886 - DEBUG - receive_response_body.started request=<Request [b'POST']>
2024-11-01 18:21:36 2024-11-01 12:51:36,887 - DEBUG - receive_response_body.complete
2024-11-01 18:21:36 2024-11-01 12:51:36,888 - DEBUG - response_closed.started
2024-11-01 18:21:36 2024-11-01 12:51:36,888 - DEBUG - response_closed.complete
2024-11-01 18:21:36 2024-11-01 12:51:36,889 - DEBUG - connect_tcp.started host='127.0.0.1' port=11434 local_address=None timeout=None socket_options=None
2024-11-01 18:21:36 2024-11-01 12:51:36,890 - DEBUG - connect_tcp.failed exception=ConnectError(ConnectionRefusedError(111, 'Connection refused'))

Inside the container, ollama stores the models in /root/.ollama. If you want to map that onto the /d/myollamamodels directory in the host system, use /d/myollamamodels:/root/.ollama.

I have downloaded the model previously and stored in D Drive. do i need to pull and run the model again while running container or ollama will pull it for me.

<!-- gh-comment-id:2451828327 --> @VenturaAI commented on GitHub (Nov 1, 2024): Here's the logs of app container i am getting: 2024-11-01 18:21:36 2024-11-01 12:51:36,607 - INFO - Received query: summarize the document 2024-11-01 18:21:36 2024-11-01 12:51:36,846 - DEBUG - connect_tcp.started host='qdrant' port=6333 local_address=None timeout=5.0 socket_options=None 2024-11-01 18:21:36 2024-11-01 12:51:36,849 - DEBUG - connect_tcp.complete return_value=<httpcore._backends.sync.SyncStream object at 0x7f9d938a6b40> 2024-11-01 18:21:36 2024-11-01 12:51:36,850 - DEBUG - send_request_headers.started request=<Request [b'POST']> 2024-11-01 18:21:36 2024-11-01 12:51:36,850 - DEBUG - send_request_headers.complete 2024-11-01 18:21:36 2024-11-01 12:51:36,850 - DEBUG - send_request_body.started request=<Request [b'POST']> 2024-11-01 18:21:36 2024-11-01 12:51:36,851 - DEBUG - send_request_body.complete 2024-11-01 18:21:36 2024-11-01 12:51:36,851 - DEBUG - receive_response_headers.started request=<Request [b'POST']> 2024-11-01 18:21:36 2024-11-01 12:51:36,885 - DEBUG - receive_response_headers.complete return_value=(b'HTTP/1.1', 200, b'OK', [(b'transfer-encoding', b'chunked'), (b'vary', b'accept-encoding, Origin, Access-Control-Request-Method, Access-Control-Request-Headers'), (b'content-type', b'application/json'), (b'content-encoding', b'gzip'), (b'date', b'Fri, 01 Nov 2024 12:51:36 GMT')]) 2024-11-01 18:21:36 2024-11-01 12:51:36,886 - INFO - HTTP Request: POST http://qdrant:6333/collections/vector_db/points/search "HTTP/1.1 200 OK" 2024-11-01 18:21:36 2024-11-01 12:51:36,886 - DEBUG - receive_response_body.started request=<Request [b'POST']> 2024-11-01 18:21:36 2024-11-01 12:51:36,887 - DEBUG - receive_response_body.complete 2024-11-01 18:21:36 2024-11-01 12:51:36,888 - DEBUG - response_closed.started 2024-11-01 18:21:36 2024-11-01 12:51:36,888 - DEBUG - response_closed.complete 2024-11-01 18:21:36 2024-11-01 12:51:36,889 - DEBUG - connect_tcp.started host='127.0.0.1' port=11434 local_address=None timeout=None socket_options=None 2024-11-01 18:21:36 2024-11-01 12:51:36,890 - DEBUG - connect_tcp.failed exception=ConnectError(ConnectionRefusedError(111, 'Connection refused')) > Inside the container, ollama stores the models in /root/.ollama. If you want to map that onto the /d/myollamamodels directory in the host system, use /d/myollamamodels:/root/.ollama. I have downloaded the model previously and stored in D Drive. do i need to pull and run the model again while running container or ollama will pull it for me.
Author
Owner

@rick-github commented on GitHub (Nov 1, 2024):

2024-11-01 18:21:36 2024-11-01 12:51:36,889 - DEBUG - connect_tcp.started host='127.0.0.1' port=11434 local_address=None timeout=None socket_options=None

This is not connecting to the ollama container, it is connecting to the default address of 127.0.01:11434. You need to set base_url in ChatOllama to the address to connect to.

I have downloaded the model previously and stored in D Drive. do i need to pull and run the model again while running container or ollama will pull it for me.

If you have already downloaded the model and have mapped it into the container at /root/.ollama, ollama will use it without trying to pull it again.

<!-- gh-comment-id:2451912753 --> @rick-github commented on GitHub (Nov 1, 2024): ``` 2024-11-01 18:21:36 2024-11-01 12:51:36,889 - DEBUG - connect_tcp.started host='127.0.0.1' port=11434 local_address=None timeout=None socket_options=None ``` This is not connecting to the ollama container, it is connecting to the default address of 127.0.01:11434. You need to set `base_url` in `ChatOllama` to the address to connect to. > I have downloaded the model previously and stored in D Drive. do i need to pull and run the model again while running container or ollama will pull it for me. If you have already downloaded the model and have mapped it into the container at `/root/.ollama`, ollama will use it without trying to pull it again.
Author
Owner

@VenturaAI commented on GitHub (Nov 1, 2024):

Thanks a lot man for your patience and support. The solution worked and the app is now generating responses.

<!-- gh-comment-id:2451999644 --> @VenturaAI commented on GitHub (Nov 1, 2024): Thanks a lot man for your patience and support. The solution worked and the app is now generating responses.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#51243