[GH-ISSUE #1220] Access ollama output directly on streamlit screen #26382

Closed
opened 2026-04-22 02:38:19 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @arnram on GitHub (Nov 21, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1220

Hi,
I am in the process of developing a chatbot application using the RAG (Retrieval-Augmented Generation) technique alongside Ollama and LangChain. Initially, I successfully constructed the application using LangChain and achieved accurate responses displayed on the command-line interface (CLI). Subsequently, I attempted to create a graphical user interface (GUI) for the application. However, I encountered an issue where Ollama initially displays its output on the CLI before storing the string in a variable to provide it to Streamlit.

My concern pertains to accessing the direct output stream of Ollama within Streamlit, bypassing the CLI altogether. This direct access to the output stream of Ollama within the Streamlit interface would be more efficient and beneficial for my application's functionality.

Would you like guidance on how to redirect Ollama's output stream directly to Streamlit within your application?

Originally created by @arnram on GitHub (Nov 21, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1220 Hi, I am in the process of developing a chatbot application using the RAG (Retrieval-Augmented Generation) technique alongside Ollama and LangChain. Initially, I successfully constructed the application using LangChain and achieved accurate responses displayed on the command-line interface (CLI). Subsequently, I attempted to create a graphical user interface (GUI) for the application. However, I encountered an issue where Ollama initially displays its output on the CLI before storing the string in a variable to provide it to Streamlit. My concern pertains to accessing the direct output stream of Ollama within Streamlit, bypassing the CLI altogether. This direct access to the output stream of Ollama within the Streamlit interface would be more efficient and beneficial for my application's functionality. Would you like guidance on how to redirect Ollama's output stream directly to Streamlit within your application?
Author
Owner

@BruceMacD commented on GitHub (Nov 21, 2023):

Hi @arnram, it sounds like the Ollama API is what you are looking for. You can call this to get responses from the LLM directly without using the CLI.

<!-- gh-comment-id:1821188581 --> @BruceMacD commented on GitHub (Nov 21, 2023): Hi @arnram, it sounds like the [Ollama API](https://github.com/jmorganca/ollama/blob/main/docs/api.md) is what you are looking for. You can call this to get responses from the LLM directly without using the CLI.
Author
Owner

@marcellodesales commented on GitHub (Nov 21, 2023):

@arnram I've been writing a Streamlit-based framework so that it can talk to any Ollama server using the API described by @BruceMacD... Maybe someone wants to help out :)

<!-- gh-comment-id:1821737995 --> @marcellodesales commented on GitHub (Nov 21, 2023): @arnram I've been writing a Streamlit-based framework so that it can talk to any Ollama server using the API described by @BruceMacD... Maybe someone wants to help out :)
Author
Owner

@arnram commented on GitHub (Nov 22, 2023):

@BruceMacD, it is possible to directly get the feed o/p of ollama to any variable and without displaying it on the screen

<!-- gh-comment-id:1822115800 --> @arnram commented on GitHub (Nov 22, 2023): @BruceMacD, it is possible to directly get the feed o/p of ollama to any variable and without displaying it on the screen
Author
Owner

@technovangelist commented on GitHub (Dec 5, 2023):

Not sure exactly what you are asking here. With the Ollama API you can make a request to the /api/generate endpoint, and then process the json response and set that to any variable.

<!-- gh-comment-id:1839772407 --> @technovangelist commented on GitHub (Dec 5, 2023): Not sure exactly what you are asking here. With the Ollama API you can make a request to the /api/generate endpoint, and then process the json response and set that to any variable.
Author
Owner

@arnram commented on GitHub (Dec 7, 2023):

from langchain.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
data = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)

Embed and store

from langchain.embeddings import (
GPT4AllEmbeddings,
OllamaEmbeddings, # We can also try Ollama embeddings
)
from langchain.vectorstores import Chroma

vectorstore = Chroma.from_documents(documents=all_splits, embedding=GPT4AllEmbeddings()

RAG prompt

from langchain import hub

QA_CHAIN_PROMPT = hub.pull("rlm/rag-prompt-llama")

LLM

from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.llms import Ollama

llm = Ollama(
model="llama2",
verbose=True,
callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),
)

QA chain

from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
llm,
retriever=vectorstore.as_retriever(),
chain_type_kwargs={"prompt": QA_CHAIN_PROMPT},
)

question = "What are the various approaches to Task Decomposition for AI Agents?"
result = qa_chain({"query": question})

I am using code for designing the chabot using RAG and ollama via langchain. As you can see langchain use Ollama() api for requestion the server. By using this method the respone is first print on cli and then store in the "result" variable. So my question is can you some method where I can use ollama post request with langchain or any examples.

<!-- gh-comment-id:1845229450 --> @arnram commented on GitHub (Dec 7, 2023): from langchain.document_loaders import WebBaseLoader from langchain.text_splitter import RecursiveCharacterTextSplitter loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/") data = loader.load() text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0) all_splits = text_splitter.split_documents(data) # Embed and store from langchain.embeddings import ( GPT4AllEmbeddings, OllamaEmbeddings, # We can also try Ollama embeddings ) from langchain.vectorstores import Chroma vectorstore = Chroma.from_documents(documents=all_splits, embedding=GPT4AllEmbeddings() # RAG prompt from langchain import hub QA_CHAIN_PROMPT = hub.pull("rlm/rag-prompt-llama") # LLM from langchain.callbacks.manager import CallbackManager from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler from langchain.llms import Ollama llm = Ollama( model="llama2", verbose=True, callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]), ) # QA chain from langchain.chains import RetrievalQA qa_chain = RetrievalQA.from_chain_type( llm, retriever=vectorstore.as_retriever(), chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}, ) question = "What are the various approaches to Task Decomposition for AI Agents?" result = qa_chain({"query": question}) I am using code for designing the chabot using RAG and ollama via langchain. As you can see langchain use Ollama() api for requestion the server. By using this method the respone is first print on cli and then store in the "result" variable. So my question is can you some method where I can use ollama post request with langchain or any examples.
Author
Owner

@mxyng commented on GitHub (Dec 7, 2023):

callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),

This is explicitly streaming the response to stdout. If you want to save the output to a variable, you should use a different callback manager. Moreover, this appears to be an issue more fit for langchain than ollama.

<!-- gh-comment-id:1845978237 --> @mxyng commented on GitHub (Dec 7, 2023): > callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]), This is explicitly streaming the response to stdout. If you want to save the output to a variable, you should use a different callback manager. Moreover, this appears to be an issue more fit for [langchain](https://github.com/langchain-ai/langchain) than ollama.
Author
Owner

@arnram commented on GitHub (Dec 8, 2023):

Thanks for the help, really appreciate it.
My other doubt is: As I am using RAG method, which retrieves related doc. or data related user's query, then how to provide it along with the prompt template to ollama post API. Can pls. provide some exaple or help for it

<!-- gh-comment-id:1846564958 --> @arnram commented on GitHub (Dec 8, 2023): Thanks for the help, really appreciate it. My other doubt is: As I am using RAG method, which retrieves related doc. or data related user's query, then how to provide it along with the prompt template to ollama post API. Can pls. provide some exaple or help for it
Author
Owner

@technovangelist commented on GitHub (Dec 8, 2023):

There are a few examples in our repo under examples. Look for the word langchain in the folder name. Langchain also has a lot of examples in their documentation.

<!-- gh-comment-id:1846724035 --> @technovangelist commented on GitHub (Dec 8, 2023): There are a few examples in our repo under examples. Look for the word langchain in the folder name. Langchain also has a lot of examples in their documentation.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#26382