[GH-ISSUE #8294] Ollama should avoid calling hallucinated tools #5309

Closed
opened 2026-04-12 16:29:58 -05:00 by GiteaMirror · 11 comments
Owner

Originally created by @ehsavoie on GitHub (Jan 3, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8294

Originally assigned to: @ParthSareen on GitHub.

What is the issue?

Sometimes the model seems to hallucinate and call a tool on the client that doesn't exist. In my opinion since Ollama has the list of tools being callable it should check that the tool being called is in this list before calling it.
This is described also there:
https://github.com/langchain4j/langchain4j/issues/1052

OS

Linux, Docker

GPU

Other

CPU

Intel

Ollama version

0.5.4

Originally created by @ehsavoie on GitHub (Jan 3, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8294 Originally assigned to: @ParthSareen on GitHub. ### What is the issue? Sometimes the model seems to hallucinate and call a tool on the client that doesn't exist. In my opinion since Ollama has the list of tools being callable it should check that the tool being called is in this list before calling it. This is described also there: https://github.com/langchain4j/langchain4j/issues/1052 ### OS Linux, Docker ### GPU Other ### CPU Intel ### Ollama version 0.5.4
GiteaMirror added the bug label 2026-04-12 16:29:58 -05:00
Author
Owner

@rick-github commented on GitHub (Jan 3, 2025):

What should ollama do?

  1. Retry the generation in the hope that the model doesn't hallucinate this time?
  2. Discard the tool call and return an error to the client?

1 is a recipe for an infinite loop. 2 requires error handling in the client, so why doesn't the client detect the hallucinated tool call and make a decision using the much better context it has around the tool call?

dosubot makes some appropriate suggestions. For some of the linked issues, it would be handy to have the server logs with OLLAMA_DEBUG=1 to see if there's anything that might be done to reduce the hallucinations. But realistically, hallucinations are part of LLM responses, and a client needs to be prepared for it.

<!-- gh-comment-id:2569304205 --> @rick-github commented on GitHub (Jan 3, 2025): What should ollama do? 1. Retry the generation in the hope that the model doesn't hallucinate this time? 2. Discard the tool call and return an error to the client? 1 is a recipe for an infinite loop. 2 requires error handling in the client, so why doesn't the client detect the hallucinated tool call and make a decision using the much better context it has around the tool call? dosubot makes some appropriate suggestions. For some of the linked issues, it would be handy to have the [server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) with `OLLAMA_DEBUG=1` to see if there's anything that might be done to reduce the hallucinations. But realistically, hallucinations are part of LLM responses, and a client needs to be prepared for it.
Author
Owner

@ehsavoie commented on GitHub (Jan 3, 2025):

@rick-github well, from the client side the question is basically the same. I'll try to implement some strategy there but in my opinion even option 2 is better than the current situation where I could be executing a bunch of tools until running into the hallucinated one. (but I'm not an expert on those topics)

<!-- gh-comment-id:2569354895 --> @ehsavoie commented on GitHub (Jan 3, 2025): @rick-github well, from the client side the question is basically the same. I'll try to implement some strategy there but in my opinion even option 2 is better than the current situation where I could be executing a bunch of tools until running into the hallucinated one. (but I'm not an expert on those topics)
Author
Owner

@arey commented on GitHub (Jan 5, 2025):

Hi @rick-github
I suggest a third solution: retry the generation by disabling the tool call. A kind of fallback generation.

<!-- gh-comment-id:2571682635 --> @arey commented on GitHub (Jan 5, 2025): Hi @rick-github I suggest a third solution: retry the generation by disabling the tool call. A kind of fallback generation.
Author
Owner

@rick-github commented on GitHub (Jan 5, 2025):

So the client makes a request expecting a tool call result and gets back a not tool call result. It needs logic to process that, how is that different from handling an error? I'm not familiar with langchain, is that something already handled by the framework? What about non- langchain clients?

<!-- gh-comment-id:2571686747 --> @rick-github commented on GitHub (Jan 5, 2025): So the client makes a request expecting a tool call result and gets back a not tool call result. It needs logic to process that, how is that different from handling an error? I'm not familiar with langchain, is that something already handled by the framework? What about non- langchain clients?
Author
Owner

@arey commented on GitHub (Jan 5, 2025):

My understanding is that tool calls can help the LLM generate a response. It's an optional behaviour. If the tool does not exist, the LLM could generate a response without any external data to augment the prompt.

<!-- gh-comment-id:2571690396 --> @arey commented on GitHub (Jan 5, 2025): My understanding is that tool calls can help the LLM generate a response. It's an optional behaviour. If the tool does not exist, the LLM could generate a response without any external data to augment the prompt.
Author
Owner

@rick-github commented on GitHub (Jan 5, 2025):

Your app defines a tool for getting stock prices. You invoke({"messages": [HumanMessage(content="What is the stock price for BRK-A?")]}). The model generates a tool call to a hallucinated tool. Ollama discards the response and tries again, without the tool. The model generates a response with a hallucinated stock price. Ollama sends the response to the app and you submit a buy for 1000 BRK-A at $0.01.

<!-- gh-comment-id:2571779658 --> @rick-github commented on GitHub (Jan 5, 2025): Your app defines a tool for getting stock prices. You `invoke({"messages": [HumanMessage(content="What is the stock price for BRK-A?")]})`. The model generates a tool call to a hallucinated tool. Ollama discards the response and tries again, without the tool. The model generates a response with a hallucinated stock price. Ollama sends the response to the app and you submit a buy for 1000 BRK-A at $0.01.
Author
Owner

@arey commented on GitHub (Jan 8, 2025):

I agree @rick-github, in the use case you describe, it is preferable to let the client decide what to do rather than to have the model generate a response with a hallucinated stock price, without knowing that the tool has not been called.
The strategy to be adopted when a tool hallucinates has to be decided by the consumer of the Ollama API.

<!-- gh-comment-id:2576994587 --> @arey commented on GitHub (Jan 8, 2025): I agree @rick-github, in the use case you describe, it is preferable to let the client decide what to do rather than to have the model generate a response with a hallucinated stock price, without knowing that the tool has not been called. The strategy to be adopted when a tool hallucinates has to be decided by the consumer of the Ollama API.
Author
Owner

@ehsavoie commented on GitHub (Jan 8, 2025):

Well, the llm not calling a tool can still happen right or it could use an hallucinated stock name. A proper application would still check the returned value before making a decision (I hope). In my opinion Ollama shouldn't ask for an improper tool, maybe an error message would be best.

<!-- gh-comment-id:2577319089 --> @ehsavoie commented on GitHub (Jan 8, 2025): Well, the llm not calling a tool can still happen right or it could use an hallucinated stock name. A proper application would still check the returned value before making a decision (I hope). In my opinion Ollama shouldn't ask for an improper tool, maybe an error message would be best.
Author
Owner

@rick-github commented on GitHub (Jan 8, 2025):

Looks like there's a consensus that the client should do some sort of response handling. Reading through the OpenAI API docs, there's no indication that their service does anything other than return a potentialy hallucinated tool call. Other service providers may take a different approach, I've done limited research on this. There is discussion in the OpenAI docs of using structured outputs to more tightly constrain a tool call. Ollama supports SO since 0.5.0 (although I don't know how complete the support is, or how closely it matches OpenAI), so this seems a more fruitful approach than adding logic to throw an error that is unique to ollama.

<!-- gh-comment-id:2577453648 --> @rick-github commented on GitHub (Jan 8, 2025): Looks like there's a consensus that the client should do some sort of response handling. Reading through the OpenAI API docs, there's no indication that their service does anything other than return a potentialy hallucinated tool call. Other service providers may take a different approach, I've done limited research on this. There is discussion in the OpenAI docs of using structured outputs to more tightly constrain a tool call. Ollama supports SO since 0.5.0 (although I don't know how complete the support is, or how closely it matches OpenAI), so this seems a more fruitful approach than adding logic to throw an error that is unique to ollama.
Author
Owner

@ParthSareen commented on GitHub (Feb 8, 2025):

Hey @ehsavoie, @arey,

The way I see it the responsibility of the server ends after the generation has correctly executed. There is also not much to be done server side if a tool was hallucinated as none of the options provide a good experience.

The current way + error handling on the user's end allows for the most flexibility in how a hallucinated tool should be handled. Maybe it's a retry in some cases and returning an error in others.

<!-- gh-comment-id:2644463316 --> @ParthSareen commented on GitHub (Feb 8, 2025): Hey @ehsavoie, @arey, The way I see it the responsibility of the server ends after the generation has correctly executed. There is also not much to be done server side if a tool was hallucinated as none of the options provide a good experience. The current way + error handling on the user's end allows for the most flexibility in how a hallucinated tool should be handled. Maybe it's a retry in some cases and returning an error in others.
Author
Owner

@arey commented on GitHub (Feb 9, 2025):

Hi @ParthSareen
When the client detects that the LLM is hallucinating, he can decide what strategy to adopt. I agree, there seem to be different ways of dealing with hallucinations. The LangChain4j team has been working along these lines and I look forward to using their work: https://github.com/langchain4j/langchain4j/pull/2460

<!-- gh-comment-id:2646407059 --> @arey commented on GitHub (Feb 9, 2025): Hi @ParthSareen When the client detects that the LLM is hallucinating, he can decide what strategy to adopt. I agree, there seem to be different ways of dealing with hallucinations. The LangChain4j team has been working along these lines and I look forward to using their work: https://github.com/langchain4j/langchain4j/pull/2460
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#5309