[GH-ISSUE #12690] Web search does not work properly with non gpt-oss models #70477

Open
opened 2026-05-04 21:40:51 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @ghost on GitHub (Oct 18, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12690

What is the issue?

Version 0.12.6 adds web search in the GUI to tool-calling models like Qwen3, but then it returns 102,207 tokens for a simple query, the same query using gpt-oss:20b-cloud web search returns 768 tokens.

My test prompt was "Who is the current US president search" and it seems like for Qwen3 it just returns 5 webpages and the entire contents of each, so one of them was the Wikipedia page for US president, which is about 100,000 tokens.

For gpt-oss:20b-cloud the formatting of web search results is completely different however, just returning a small 768 token snippet. Is this because qwen3:4b sucks at calling ollama's web search properly?

In Ollama's blog post (https://ollama.com/blog/web-search), they do say it can return thousands of tokens and to set context length to at least 32k, but still, 102,207 tokens is a bit much. Any reason for this?

Relevant log output


OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.12.6

Originally created by @ghost on GitHub (Oct 18, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12690 ### What is the issue? Version 0.12.6 adds web search in the GUI to tool-calling models like Qwen3, but then it returns 102,207 tokens for a simple query, the same query using gpt-oss:20b-cloud web search returns 768 tokens. My test prompt was "Who is the current US president search" and it seems like for Qwen3 it just returns 5 webpages and the entire contents of each, so one of them was the Wikipedia page for US president, which is about 100,000 tokens. For gpt-oss:20b-cloud the formatting of web search results is completely different however, just returning a small 768 token snippet. Is this because qwen3:4b sucks at calling ollama's web search properly? In Ollama's blog post ([https://ollama.com/blog/web-search](url)), they do say it can return thousands of tokens and to set context length to at least 32k, but still, 102,207 tokens is a bit much. Any reason for this? ### Relevant log output ```shell ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.12.6
GiteaMirror added the bug label 2026-05-04 21:40:51 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#70477