[GH-ISSUE #3492] Add enhancement to allow RAG functionnality #2150

Open
opened 2026-04-12 12:22:49 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @g02200jeff on GitHub (Apr 4, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3492

What are you trying to do?

I want use a custom script on ollama server (windows) to execute Retrieval Augmented Generation (RAG) process. How can I do ?
(I have an example which is working with a python script, langchain and ollama but I can't do it behing the ollama server using api restful).

How should we solve this?

Allow Retrieval Augmented Generation (RAG) or add an example

What is the impact of not solving this?

No response

Anything else?

No response

Originally created by @g02200jeff on GitHub (Apr 4, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3492 ### What are you trying to do? I want use a custom script on ollama server (windows) to execute Retrieval Augmented Generation (RAG) process. How can I do ? (I have an example which is working with a python script, langchain and ollama but I can't do it behing the ollama server using api restful). ### How should we solve this? Allow Retrieval Augmented Generation (RAG) or add an example ### What is the impact of not solving this? _No response_ ### Anything else? _No response_
GiteaMirror added the feature request label 2026-04-12 12:22:49 -05:00
Author
Owner

@ThisModernDay commented on GitHub (Apr 4, 2024):

@g02200jeff When I need to do RAG using ollama I have always used a vector database and an embedding model (EX: nomic-embed-text:latest) on the embeddings endpoint to generate my embeddings and then search for the closest matching embedding.

<!-- gh-comment-id:2037605997 --> @ThisModernDay commented on GitHub (Apr 4, 2024): @g02200jeff When I need to do RAG using ollama I have always used a vector database and an embedding model (EX: nomic-embed-text:latest) on the embeddings endpoint to generate my embeddings and then search for the closest matching embedding.
Author
Owner
<!-- gh-comment-id:2041045445 --> @eusebiu commented on GitHub (Apr 6, 2024): Check out this: https://python.langchain.com/docs/integrations/llms/ollama/ https://python.langchain.com/docs/use_cases/question_answering/quickstart/
Author
Owner

@aosan commented on GitHub (Apr 6, 2024):

@g02200jeff have a look at my little project, it is a RAG with ollama back-end: https://github.com/aosan/VaultChat

<!-- gh-comment-id:2041069646 --> @aosan commented on GitHub (Apr 6, 2024): @g02200jeff have a look at my little project, it is a RAG with ollama back-end: https://github.com/aosan/VaultChat
Author
Owner

@mr-pepe69 commented on GitHub (Apr 25, 2024):

i am intrested too, without a way of scoring the output query against the input prompt it is really hard to determinate if the answer is any good, i would love to see ollama get more stuff like that.

<!-- gh-comment-id:2076938042 --> @mr-pepe69 commented on GitHub (Apr 25, 2024): i am intrested too, without a way of scoring the output query against the input prompt it is really hard to determinate if the answer is any good, i would love to see ollama get more stuff like that.
Author
Owner

@RobbyCBennett commented on GitHub (May 28, 2024):

I love the reliability and speed of Ollama. It would be great to add RAG support because it essentially creates an infinite context window size. This has limitations of speed and accuracy, but it's a great solution for the context window problem.

<!-- gh-comment-id:2135798128 --> @RobbyCBennett commented on GitHub (May 28, 2024): I love the reliability and speed of Ollama. It would be great to add RAG support because it essentially creates an infinite context window size. This has limitations of speed and accuracy, but it's a great solution for the context window problem.
Author
Owner

@ralyodio commented on GitHub (Jul 31, 2024):

once i query the vector db for related embeddings, how do I call ollama api to use them in a query?

<!-- gh-comment-id:2260114199 --> @ralyodio commented on GitHub (Jul 31, 2024): once i query the vector db for related embeddings, how do I call ollama api to use them in a query?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#2150