[GH-ISSUE #1429] Can you explain the difference between query and complete? Why one versus the other? Thanks! #760

Closed
opened 2026-04-12 10:26:24 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @OpenSpacesAndPlaces on GitHub (Dec 8, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1429

e.g.
query_engine = index.as_query_engine()
retrieved_nodes = query_engine.query("What is the price of apples?")

vs.

prompt ="What is the price of apples?";
response = llm.complete(prompt)


I saw this example dogfooding the query into the complete? Why might you want to-do that vs. just query?
https://www.educative.io/answers/how-to-train-gpt-4-on-custom-datasets-using-llamaindex

Originally created by @OpenSpacesAndPlaces on GitHub (Dec 8, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1429 e.g. query_engine = index.as_query_engine() retrieved_nodes = query_engine.query("What is the price of apples?") vs. prompt ="What is the price of apples?"; response = llm.complete(prompt) ---- I saw this example dogfooding the query into the complete? Why might you want to-do that vs. just query? https://www.educative.io/answers/how-to-train-gpt-4-on-custom-datasets-using-llamaindex
Author
Owner

@iplayfast commented on GitHub (Dec 8, 2023):

Answering as a user (not a developer).
query is asking a question, whereas complete will try to complete a sentence.
example query is "where did Jack sit"
example complete is "Jack sat on the"

<!-- gh-comment-id:1846726034 --> @iplayfast commented on GitHub (Dec 8, 2023): Answering as a user (not a developer). query is asking a question, whereas complete will try to complete a sentence. example query is "where did Jack sit" example complete is "Jack sat on the"
Author
Owner

@OpenSpacesAndPlaces commented on GitHub (Dec 8, 2023):

Thanks @iplayfast!


The example I'm working from is:
https://gist.github.com/mneedham/eec9246a5ce95dc792f2e73b16dfe78e

Ollama has a the same method as "openai.Completion.create(" in the form of "complete"
That's part of where my question was coming from.


Ultimately I was trying to find examples of data training vs. embedding - but I haven't have found an example or confirmation that training is possible.

<!-- gh-comment-id:1847197776 --> @OpenSpacesAndPlaces commented on GitHub (Dec 8, 2023): Thanks @iplayfast! --- The example I'm working from is: https://gist.github.com/mneedham/eec9246a5ce95dc792f2e73b16dfe78e Ollama has a the same method as "openai.Completion.create(" in the form of "complete" That's part of where my question was coming from. --- Ultimately I was trying to find examples of data training vs. embedding - but I haven't have found an example or confirmation that training is possible.
Author
Owner

@technovangelist commented on GitHub (Dec 8, 2023):

Tell me more about where you found these examples @OpenSpacesAndPlaces

<!-- gh-comment-id:1847927713 --> @technovangelist commented on GitHub (Dec 8, 2023): Tell me more about where you found these examples @OpenSpacesAndPlaces
Author
Owner

@technovangelist commented on GitHub (Dec 8, 2023):

So the two examples you showed come from the LlamaIndex documentation. The LLM complete command is asking the model to do a generation based on the prompt you gave it, in this case "what is the price of apples".

The query_engine.query("What is the price of apples?") is probably part of a RAG application. So that is going to do the same generation, but also querying the vector database ahead of time to get the most relevant documents or chunks of documents from the database.

<!-- gh-comment-id:1847937591 --> @technovangelist commented on GitHub (Dec 8, 2023): So the two examples you showed come from the LlamaIndex documentation. The LLM complete command is asking the model to do a generation based on the prompt you gave it, in this case "what is the price of apples". The query_engine.query("What is the price of apples?") is probably part of a RAG application. So that is going to do the same generation, but also querying the vector database ahead of time to get the most relevant documents or chunks of documents from the database.
Author
Owner

@technovangelist commented on GitHub (Dec 8, 2023):

When adding your documents to a vector database, you go through a process referred to as embedding to save the contents as an array of numbers. Adding the documents starts with splitting up the doc into smaller chunks. How small? Best I can say is it depends. Sometimes shorter chunks are better, sometimes not. Once your content is in the database, you can do the query call in LlamaIndex. What that does is embed the query, turning it into an array of numbers. Then compare that array to each of the arrays already in the db, looking for the ones most similar. The full text of those doc chunks are then returned and added to the query so that the model can come up with a good answer

<!-- gh-comment-id:1847943851 --> @technovangelist commented on GitHub (Dec 8, 2023): When adding your documents to a vector database, you go through a process referred to as embedding to save the contents as an array of numbers. Adding the documents starts with splitting up the doc into smaller chunks. How small? Best I can say is it depends. Sometimes shorter chunks are better, sometimes not. Once your content is in the database, you can do the query call in LlamaIndex. What that does is embed the query, turning it into an array of numbers. Then compare that array to each of the arrays already in the db, looking for the ones most similar. The full text of those doc chunks are then returned and added to the query so that the model can come up with a good answer
Author
Owner

@technovangelist commented on GitHub (Dec 8, 2023):

The other question is about training. That’s a harder one to answer. It could mean training, but I think it means fine tuning. Training is what happens when the model is first created and costs a lot of money. Fine tuning takes your existing information, often in the form of questions and answers, and then uses that to make the weights and biases in the model align better with the content you care about. Hopefully that results in better answers.

For now we don't do fine tuning. It's a hard process and something we want to offer in the future. At our meetup the other day we heard a presentation about axolotl which tries to make this easier. I don't know much about this tool yet. You can find it here: https://github.com/OpenAccess-AI-Collective/axolotl

<!-- gh-comment-id:1847950067 --> @technovangelist commented on GitHub (Dec 8, 2023): The other question is about training. That’s a harder one to answer. It could mean training, but I think it means fine tuning. Training is what happens when the model is first created and costs a lot of money. Fine tuning takes your existing information, often in the form of questions and answers, and then uses that to make the weights and biases in the model align better with the content you care about. Hopefully that results in better answers. For now we don't do fine tuning. It's a hard process and something we want to offer in the future. At our meetup the other day we heard a presentation about axolotl which tries to make this easier. I don't know much about this tool yet. You can find it here: https://github.com/OpenAccess-AI-Collective/axolotl
Author
Owner

@OpenSpacesAndPlaces commented on GitHub (Dec 9, 2023):

Appreciate all the helpful notes! I think at this point my questions got answered and some additional insights.

Thanks again!

<!-- gh-comment-id:1848066977 --> @OpenSpacesAndPlaces commented on GitHub (Dec 9, 2023): Appreciate all the helpful notes! I think at this point my questions got answered and some additional insights. Thanks again!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#760