[GH-ISSUE #9488] Thinking doesn't showing for Deepseek R1 via API (external connection) #31055

Closed
opened 2026-04-25 05:07:45 -05:00 by GiteaMirror · 60 comments
Owner

Originally created by @drshliapa on GitHub (Feb 6, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/9488

I connected to Deepseek R1 via its API, but it only shows the final result instead of the thinking steps. Open WebUI version 0.5.10

Image

Originally created by @drshliapa on GitHub (Feb 6, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/9488 ### I connected to Deepseek R1 via its API, but it only shows the final result instead of the thinking steps. Open WebUI version 0.5.10 ![Image](https://github.com/user-attachments/assets/a481d1a9-e986-4ebb-a619-658a671213cc)
Author
Owner

@ImLuke954 commented on GitHub (Feb 6, 2025):

I can confirm this bug. Having the same issue.

<!-- gh-comment-id:2639652500 --> @ImLuke954 commented on GitHub (Feb 6, 2025): I can confirm this bug. Having the same issue.
Author
Owner

@EricsmOOn commented on GitHub (Feb 6, 2025):

Image

<!-- gh-comment-id:2639681287 --> @EricsmOOn commented on GitHub (Feb 6, 2025): ![Image](https://github.com/user-attachments/assets/347a35ae-66d8-48dc-a1a4-8218d52e8ca5)
Author
Owner

@itshen commented on GitHub (Feb 6, 2025):

ref : https://api-docs.deepseek.com/zh-cn/guides/reasoning_model

<!-- gh-comment-id:2639918082 --> @itshen commented on GitHub (Feb 6, 2025): ref : https://api-docs.deepseek.com/zh-cn/guides/reasoning_model
Author
Owner

@ddowhy commented on GitHub (Feb 6, 2025):

Even for locally hosted models it doesnt show every time. Sometimes it does, others it does not. Inconsistent behaviour.

<!-- gh-comment-id:2639954323 --> @ddowhy commented on GitHub (Feb 6, 2025): Even for locally hosted models it doesnt show every time. Sometimes it does, others it does not. Inconsistent behaviour.
Author
Owner

@EricSolshkov commented on GitHub (Feb 6, 2025):

This issue shows up nearly 100% using deepseek-r1 with reasoning, making open-webui incapable working with this model.

<!-- gh-comment-id:2640341516 --> @EricSolshkov commented on GitHub (Feb 6, 2025): This issue shows up nearly 100% using deepseek-r1 with reasoning, making open-webui incapable working with this model.
Author
Owner

@HeMuling commented on GitHub (Feb 6, 2025):

There has been a solution that somehow fix the problem:

Image

The solution is to use the pipe function: https://openwebui.com/f/zgccrui/deepseek_r1 (my recommendation is to increase the timeout in the code)

For more details, see forum: https://linux.do/t/topic/383183, it seems to be related to the openai standard package

<!-- gh-comment-id:2640537231 --> @HeMuling commented on GitHub (Feb 6, 2025): There has been a solution that somehow fix the problem: ![Image](https://github.com/user-attachments/assets/f1ad87c7-244e-4873-9f81-41e0145dc559) The solution is to use the pipe function: https://openwebui.com/f/zgccrui/deepseek_r1 (my recommendation is to increase the timeout in the code) For more details, see forum: https://linux.do/t/topic/383183, it seems to be related to the openai standard package
Author
Owner

@EliEron commented on GitHub (Feb 6, 2025):

Technically it's not a bug. The issue is that R1 returns the thinking tokens separately from the content itself using a field called reasoning_content. Since that deviates from the standard OpenAI spec which Open-WebUI follows tjbck has stated very explicitly that they will not support it natively, leaving it up to pipe functions instead.

Personally I think it would be a good idea to add this support natively, if for no other reason than to end the confusion a lot of users clearly face, but it's not up to me to decide. And it is true that you can work around it pretty easily with pipe functions. If you are using OpenRouter then you can use the function found in this github. If you are using the DeepSeek API itself you can use this function. They need different functions since OpenRouter has also gone with a slightly different design for how they deliver thinking tokens.

In either case once configured the function will add a new model to your model list which will act exactly as the normal R1 endpoint but will show the thinking tokens properly.

<!-- gh-comment-id:2640955127 --> @EliEron commented on GitHub (Feb 6, 2025): Technically it's not a bug. The issue is that R1 returns the thinking tokens separately from the content itself using a field called `reasoning_content`. Since that deviates from the standard OpenAI spec which Open-WebUI follows tjbck has stated very explicitly that they will not support it [natively](https://github.com/open-webui/open-webui/pull/9241#issuecomment-2629707340), leaving it up to pipe functions instead. Personally I think it would be a good idea to add this support natively, if for no other reason than to end the confusion a lot of users clearly face, but it's not up to me to decide. And it is true that you can work around it pretty easily with pipe functions. If you are using OpenRouter then you can use the function found in [this](https://github.com/rmarfil3/openwebui-openrouter-reasoning-tokens) github. If you are using the DeepSeek API itself you can use [this](https://openwebui.com/f/zgccrui/deepseek_r1) function. They need different functions since OpenRouter has also gone with a slightly different design for how they deliver thinking tokens. In either case once configured the function will add a new model to your model list which will act exactly as the normal R1 endpoint but will show the thinking tokens properly.
Author
Owner

@crzroot commented on GitHub (Feb 7, 2025):

Yes, I also have this problem, and I don't show thinking

<!-- gh-comment-id:2641956619 --> @crzroot commented on GitHub (Feb 7, 2025): Yes, I also have this problem, and I don't show thinking
Author
Owner

@aaronps commented on GitHub (Feb 9, 2025):

I deploy models with vllm, originally the output was ok (could see reasoning), after adding the new --enable-reasoning flag in vllm, there is not reasoning output in open-webu, yes we know about the reasoning_content but other tools already works with that nicely.

<!-- gh-comment-id:2646058863 --> @aaronps commented on GitHub (Feb 9, 2025): I deploy models with vllm, originally the output was ok (could see reasoning), after adding the new `--enable-reasoning` flag in vllm, there is not reasoning output in open-webu, yes we know about the `reasoning_content` but other tools already works with that nicely.
Author
Owner

@fenghan0430 commented on GitHub (Feb 10, 2025):

The same problem arises with the Alibaba Cloud api

<!-- gh-comment-id:2647809188 --> @fenghan0430 commented on GitHub (Feb 10, 2025): The same problem arises with the Alibaba Cloud api
Author
Owner

@djw520158 commented on GitHub (Feb 10, 2025):

Yes, I also have this problem, and I don't show thinking

<!-- gh-comment-id:2648051094 --> @djw520158 commented on GitHub (Feb 10, 2025): Yes, I also have this problem, and I don't show thinking
Author
Owner

@fq393 commented on GitHub (Feb 11, 2025):

There has been a solution that somehow fix the problem:

Image

The solution is to use the pipe function: https://openwebui.com/f/zgccrui/deepseek_r1 (my recommendation is to increase the timeout in the code)

For more details, see forum: https://linux.do/t/topic/383183, it seems to be related to the openai standard package

The following pipeline preferably needs to optimize the next “https://openwebui.com/f/zgccrui/deepseek_r1” to prevent the "IndexError: list index out of range" problem

Before modification

choice = data.get("choices", [{}])[0]

after modification

choices = data.get("choices", [{}])
if not isinstance(choices, list) or len(choices) == 0: # 双重保险
yield self._format_error("APIError", "Empty choices in response")
return
choice = choices[0]

<!-- gh-comment-id:2649774175 --> @fq393 commented on GitHub (Feb 11, 2025): > There has been a solution that somehow fix the problem: > > ![Image](https://github.com/user-attachments/assets/f1ad87c7-244e-4873-9f81-41e0145dc559) > > The solution is to use the pipe function: https://openwebui.com/f/zgccrui/deepseek_r1 (my recommendation is to increase the timeout in the code) > > For more details, see forum: https://linux.do/t/topic/383183, it seems to be related to the openai standard package The following pipeline preferably needs to optimize the next “https://openwebui.com/f/zgccrui/deepseek_r1” to prevent the "IndexError: list index out of range" problem # Before modification choice = data.get("choices", [{}])[0] # after modification choices = data.get("choices", [{}]) if not isinstance(choices, list) or len(choices) == 0: # 双重保险 yield self._format_error("APIError", "Empty choices in response") return choice = choices[0]
Author
Owner

@peter-ch commented on GitHub (Feb 11, 2025):

Can you at least make that pipe built-in or something, why do I have to program things?

<!-- gh-comment-id:2650072489 --> @peter-ch commented on GitHub (Feb 11, 2025): Can you at least make that pipe built-in or something, why do I have to program things?
Author
Owner

@litionls commented on GitHub (Feb 11, 2025):

use alibaba's api and volcengine‘s api have the same problem ,but if you use ollama run the distilled model is ok

<!-- gh-comment-id:2650492363 --> @litionls commented on GitHub (Feb 11, 2025): use alibaba's api and volcengine‘s api have the same problem ,but if you use ollama run the distilled model is ok
Author
Owner

@AndreyYukavichin commented on GitHub (Feb 12, 2025):

While the reasoning process can be observed when running the distillation model using Ollama, it is not accessible when utilizing the API of the Alibaba Cloud Bailian platform.

<!-- gh-comment-id:2652496287 --> @AndreyYukavichin commented on GitHub (Feb 12, 2025): While the reasoning process can be observed when running the distillation model using Ollama, it is not accessible when utilizing the API of the Alibaba Cloud Bailian platform.
Author
Owner

@GrayXu commented on GitHub (Feb 12, 2025):

the essence of this issue is that open-webui can only handle content, while the reasoning_content used by many API providers is not supported.

I noticed that tjbck has already claimed it will not support reasoning_content, which is very disappointing.

First of all, reasoning_content is clearly a better practice; the tag format is too easily misused. I have reproduced many instances where, during discussions with the model about thinking models, its output tags were incorrectly displayed. A completely isolated field is an elegant implementation strategy.

Secondly, tjbck's concern is worrying that different MaaS will require different adaptations for different providers, but in fact DeepSeek's design has also been adopted by various MaaS simultaneously, which is a de facto standard created by leading companies.

BTW, the pipe implementation mentioned earlier lacks scalability and locks down the endpoint.

<!-- gh-comment-id:2652571289 --> @GrayXu commented on GitHub (Feb 12, 2025): the essence of this issue is that open-webui can only handle <thinking> content, while the `reasoning_content` used by many API providers is not supported. I noticed that tjbck has already claimed it will not support `reasoning_content`, which is very disappointing. First of all, `reasoning_content` is clearly a better practice; the tag format is too easily misused. I have reproduced many instances where, during discussions with the model about thinking models, its output tags were incorrectly displayed. A completely isolated field is an elegant implementation strategy. Secondly, tjbck's concern is worrying that different MaaS will require different adaptations for different providers, but in fact DeepSeek's design has also been adopted by various MaaS simultaneously, which is a de facto standard created by leading companies. BTW, the pipe implementation mentioned earlier lacks scalability and locks down the endpoint.
Author
Owner

@Changego commented on GitHub (Feb 13, 2025):

Indeed, when I connect to the DeepSeek provided by the public cloud, the thinking process is not displayed. This is quite frustrating.

<!-- gh-comment-id:2655296390 --> @Changego commented on GitHub (Feb 13, 2025): Indeed, when I connect to the DeepSeek provided by the public cloud, the thinking process is not displayed. This is quite frustrating.
Author
Owner

@iskradelta commented on GitHub (Feb 13, 2025):

Id like to give back to the community by documenting what I did to get the DeepSeek R1 thinking step and also also make it wake-on-lan and sleep when nobody is talking to the llm.

open-webui is running on a different docker container host, than where the machine with GPU running vllm is.

Go to Settings, Admin Settings, Functions "tab" which is not very visible. See gist https://gist.github.com/iskradelta/8f1e11e32126b86a6e2a3f3026dad354

Change the mac address in the code and the IP of 10.88.0.1 should be the ip of the wake-on-lan-proxy (docker container IP) you will run.
If youre on the same lan and same subnet you dont need a wake-on-lan-proxy but can modify the function to send the wol packet directly instead.

The wake-on-lan is a simple Dockerfile with an entry.sh https://gist.github.com/iskradelta/dd46eeb7671338c5ee7858ec2f4b37b9
So just docker build . -t wake-on-lan-proxy and docker run it on the same host as open-webui

Next on the computer where you run vllm --enable-reasoning, also add | tee brain.file to just write its output to a log file.

Now do in another shell or put this in a systemd service,

while true ; do inotifywait -q -t 900 -e modify brain.file ; if [ $? -eq 2 ] ; then systemctl suspend ; else  echo "...go on talky talk" ; sleep 60s ; fi ; done
brain.fil

This will make the GPU machine go to suspend after 900s (15min), after there have been no logs by vllm.

When a user wants to talk to the model, using open-webui, the "thinking" function with wake-onlan, will get wake it from suspend, and its ready within 3-4s to think.

<!-- gh-comment-id:2656103632 --> @iskradelta commented on GitHub (Feb 13, 2025): Id like to give back to the community by documenting what I did to get the DeepSeek R1 thinking step and also also make it wake-on-lan and sleep when nobody is talking to the llm. open-webui is running on a different docker container host, than where the machine with GPU running vllm is. Go to Settings, Admin Settings, Functions "tab" which is not very visible. See gist https://gist.github.com/iskradelta/8f1e11e32126b86a6e2a3f3026dad354 Change the mac address in the code and the IP of 10.88.0.1 should be the ip of the wake-on-lan-proxy (docker container IP) you will run. If youre on the same lan and same subnet you dont need a wake-on-lan-proxy but can modify the function to send the wol packet directly instead. The wake-on-lan is a simple Dockerfile with an entry.sh https://gist.github.com/iskradelta/dd46eeb7671338c5ee7858ec2f4b37b9 So just docker build . -t wake-on-lan-proxy and docker run it on the same host as open-webui Next on the computer where you run vllm --enable-reasoning, also add | tee brain.file to just write its output to a log file. Now do in another shell or put this in a systemd service, ``` while true ; do inotifywait -q -t 900 -e modify brain.file ; if [ $? -eq 2 ] ; then systemctl suspend ; else echo "...go on talky talk" ; sleep 60s ; fi ; done brain.fil ``` This will make the GPU machine go to suspend after 900s (15min), after there have been no logs by vllm. When a user wants to talk to the model, using open-webui, the "thinking" function with wake-onlan, will get wake it from suspend, and its ready within 3-4s to think.
Author
Owner

@Ronchy2000 commented on GitHub (Feb 13, 2025):

There has been a solution that somehow fix the problem:

Image

The solution is to use the pipe function: https://openwebui.com/f/zgccrui/deepseek_r1 (my recommendation is to increase the timeout in the code)

For more details, see forum: https://linux.do/t/topic/383183, it seems to be related to the openai standard package

Wonderful support,it works!Thx a lot。

<!-- gh-comment-id:2656170248 --> @Ronchy2000 commented on GitHub (Feb 13, 2025): > There has been a solution that somehow fix the problem: > > ![Image](https://github.com/user-attachments/assets/f1ad87c7-244e-4873-9f81-41e0145dc559) > > The solution is to use the pipe function: https://openwebui.com/f/zgccrui/deepseek_r1 (my recommendation is to increase the timeout in the code) > > For more details, see forum: https://linux.do/t/topic/383183, it seems to be related to the openai standard package Wonderful support,it works!Thx a lot。
Author
Owner

@fenghan0430 commented on GitHub (Feb 13, 2025):

使用open-wenui的管道处理功能,对api返回的数据进行二次处理,可以解决没有think标签的问题。

不过管道功能不知道为什么连接不了本地的vllm服务器,出现错误 所有连接失败

<!-- gh-comment-id:2656171473 --> @fenghan0430 commented on GitHub (Feb 13, 2025): 使用open-wenui的管道处理功能,对api返回的数据进行二次处理,可以解决没有think标签的问题。 不过管道功能不知道为什么连接不了本地的vllm服务器,出现错误 所有连接失败
Author
Owner

@KinglyWayne commented on GitHub (Feb 13, 2025):

For those who lost the opening tag after r1 chattemplate update:
The new chat template of deepseek r1 has broken the starting tag. Before the official fix, I modified a pipeline from zgccrui to be compatible with the backend inference service without the starting tag. It is currently running well for me, and you can take a try.
https://openwebui.com/f/kinglywayne/deepseek_r1_thinkfix
All contributions come from zgccrui; I just added the check logic for the beginning .
The original function address of zgccrui is https://openwebui.com/f/zgccrui/deepseek_r1

<!-- gh-comment-id:2656309654 --> @KinglyWayne commented on GitHub (Feb 13, 2025): For those who lost the opening <think> tag after r1 chattemplate update: The new chat template of deepseek r1 has broken the starting tag. Before the official fix, I modified a pipeline from zgccrui to be compatible with the backend inference service without the starting tag. It is currently running well for me, and you can take a try. https://openwebui.com/f/kinglywayne/deepseek_r1_thinkfix All contributions come from zgccrui; I just added the check logic for the beginning . The original function address of zgccrui is https://openwebui.com/f/zgccrui/deepseek_r1
Author
Owner

@Ronchy2000 commented on GitHub (Feb 13, 2025):

使用open-wenui的管道处理功能,对api返回的数据进行二次处理,可以解决没有think标签的问题。

不过管道功能不知道为什么连接不了本地的vllm服务器,出现错误 所有连接失败

这个倒是没试过,我都是直接localhost:8080,测试没问题的。
btw,请问联网功能有人测试过吗?我尝试了网上有人说的google 的 Google PSE API,每次都是报错,是和代理有关吗? 很疑惑,望有能人异士解答,谢谢!

<!-- gh-comment-id:2656314579 --> @Ronchy2000 commented on GitHub (Feb 13, 2025): > 使用open-wenui的管道处理功能,对api返回的数据进行二次处理,可以解决没有think标签的问题。 > > 不过管道功能不知道为什么连接不了本地的vllm服务器,出现错误 所有连接失败 这个倒是没试过,我都是直接localhost:8080,测试没问题的。 btw,请问联网功能有人测试过吗?我尝试了网上有人说的google 的 Google PSE API,每次都是报错,是和代理有关吗? 很疑惑,望有能人异士解答,谢谢!
Author
Owner

@fenghan0430 commented on GitHub (Feb 13, 2025):

使用open-wenui的管道处理功能,对api返回的数据进行二次处理,可以解决没有think标签的问题。
不过管道功能不知道为什么连接不了本地的vllm服务器,出现错误 所有连接失败

这个倒是没试过,我都是直接localhost:8080,测试没问题的。 btw,请问联网功能有人测试过吗?我尝试了网上有人说的google 的 Google PSE API,每次都是报错,是和代理有关吗? 很疑惑,望有能人异士解答,谢谢!

用代理,我这边联网搜索可用

<!-- gh-comment-id:2656463187 --> @fenghan0430 commented on GitHub (Feb 13, 2025): > > 使用open-wenui的管道处理功能,对api返回的数据进行二次处理,可以解决没有think标签的问题。 > > 不过管道功能不知道为什么连接不了本地的vllm服务器,出现错误 所有连接失败 > > 这个倒是没试过,我都是直接localhost:8080,测试没问题的。 btw,请问联网功能有人测试过吗?我尝试了网上有人说的google 的 Google PSE API,每次都是报错,是和代理有关吗? 很疑惑,望有能人异士解答,谢谢! 用代理,我这边联网搜索可用
Author
Owner

@Ronchy2000 commented on GitHub (Feb 13, 2025):

使用open-wenui的管道处理功能,对api返回的数据进行二次处理,可以解决没有think标签的问题。
不过管道功能不知道为什么连接不了本地的vllm服务器,出现错误 所有连接失败

这个倒是没试过,我都是直接localhost:8080,测试没问题的。 btw,请问联网功能有人测试过吗?我尝试了网上有人说的google 的 Google PSE API,每次都是报错,是和代理有关吗? 很疑惑,望有能人异士解答,谢谢!

用代理,我这边联网搜索可用

需要额外配置吗?我这边用的是clash ,端口7890

<!-- gh-comment-id:2656638946 --> @Ronchy2000 commented on GitHub (Feb 13, 2025): > > > 使用open-wenui的管道处理功能,对api返回的数据进行二次处理,可以解决没有think标签的问题。 > > > 不过管道功能不知道为什么连接不了本地的vllm服务器,出现错误 所有连接失败 > > > > > > 这个倒是没试过,我都是直接localhost:8080,测试没问题的。 btw,请问联网功能有人测试过吗?我尝试了网上有人说的google 的 Google PSE API,每次都是报错,是和代理有关吗? 很疑惑,望有能人异士解答,谢谢! > > 用代理,我这边联网搜索可用 需要额外配置吗?我这边用的是clash ,端口7890
Author
Owner

@fenghan0430 commented on GitHub (Feb 14, 2025):

如果你使用docker部署,那么需要使用clash的tun模式。如果你使用clash的http代理,你可以去找一下open-wenui配置代理的设置项。

<!-- gh-comment-id:2658002378 --> @fenghan0430 commented on GitHub (Feb 14, 2025): 如果你使用docker部署,那么需要使用clash的tun模式。如果你使用clash的http代理,你可以去找一下open-wenui配置代理的设置项。
Author
Owner

@lincyang commented on GitHub (Feb 14, 2025):

我是昨天部署的deepseek-r1:14b,使用openwebui后是能够正常显示think的,但是在问题提交后,think出现前,会有时间不等的延迟,有时10秒,难问题会几十秒。而我直接用ollama提问,会立刻返回think标签的。我不清楚是哪里的问题,请指点!
另外,DeepSeek R1_ThinkFix这个pipe的方案也尝试了,都一样。

<!-- gh-comment-id:2658522409 --> @lincyang commented on GitHub (Feb 14, 2025): 我是昨天部署的deepseek-r1:14b,使用openwebui后是能够正常显示think的,但是在问题提交后,think出现前,会有时间不等的延迟,有时10秒,难问题会几十秒。而我直接用ollama提问,会立刻返回think标签的。我不清楚是哪里的问题,请指点! 另外,DeepSeek R1_ThinkFix这个pipe的方案也尝试了,都一样。
Author
Owner

@fenghan0430 commented on GitHub (Feb 14, 2025):

我是昨天部署的deepseek-r1:14b,使用openwebui后是能够正常显示think的,但是在问题提交后,think出现前,会有时间不等的延迟,有时10秒,难问题会几十秒。而我直接用ollama提问,会立刻返回think标签的。我不清楚是哪里的问题,请指点! 另外,DeepSeek R1_ThinkFix这个pipe的方案也尝试了,都一样。

deepseek api和其他api,会将思考文本放在reasoning_content中,模型的最后结果会出现在content。open-webui只解析了content而没有解析reasoning_content,导致api返回了reasoning_content但是没有程序处理。使用pipe可以手动处理这一部分。

如果你使用pipe没有效果,可以把你的配置发出来,我尝试帮你解决。

<!-- gh-comment-id:2658605063 --> @fenghan0430 commented on GitHub (Feb 14, 2025): > 我是昨天部署的deepseek-r1:14b,使用openwebui后是能够正常显示think的,但是在问题提交后,think出现前,会有时间不等的延迟,有时10秒,难问题会几十秒。而我直接用ollama提问,会立刻返回think标签的。我不清楚是哪里的问题,请指点! 另外,DeepSeek R1_ThinkFix这个pipe的方案也尝试了,都一样。 deepseek api和其他api,会将思考文本放在`reasoning_content`中,模型的最后结果会出现在`content`。open-webui只解析了`content`而没有解析`reasoning_content`,导致api返回了`reasoning_content`但是没有程序处理。使用pipe可以手动处理这一部分。 如果你使用pipe没有效果,可以把你的配置发出来,我尝试帮你解决。
Author
Owner

@he0119 commented on GitHub (Feb 14, 2025):

open-webui 官方不支持的理由如下:

https://github.com/open-webui/open-webui/pull/9241#issuecomment-2629707340

Let me be perfectly clear here: We will not add a reasoning_content field or any similar deviation that is not explicitly provided by OpenAI's official API. The core goal of Open WebUI's OpenAI implementation is to adhere strictly to OpenAI's original design and specifications. This is not up for debate.

我简单的修改了一下,也许在 openai 没有标准化之前可以凑合用用。

https://hub.docker.com/r/he0119/open-webui
https://github.com/he0119/open-webui/pkgs/container/open-webui

<!-- gh-comment-id:2658629221 --> @he0119 commented on GitHub (Feb 14, 2025): open-webui 官方不支持的理由如下: ><https://github.com/open-webui/open-webui/pull/9241#issuecomment-2629707340> > >Let me be perfectly clear here: We will not add a reasoning_content field or any similar deviation that is not explicitly provided by OpenAI's official API. The core goal of Open WebUI's OpenAI implementation is to adhere strictly to OpenAI's original design and specifications. This is not up for debate. 我简单的修改了一下,也许在 openai 没有标准化之前可以凑合用用。 https://hub.docker.com/r/he0119/open-webui https://github.com/he0119/open-webui/pkgs/container/open-webui
Author
Owner

@lwdnxu commented on GitHub (Feb 18, 2025):

相同的问题,是否有解决方案呢?

<!-- gh-comment-id:2665125379 --> @lwdnxu commented on GitHub (Feb 18, 2025): 相同的问题,是否有解决方案呢?
Author
Owner

@zx900930 commented on GitHub (Feb 19, 2025):

open-webui 官方不支持的理由如下:

#9241 (comment)
Let me be perfectly clear here: We will not add a reasoning_content field or any similar deviation that is not explicitly provided by OpenAI's official API. The core goal of Open WebUI's OpenAI implementation is to adhere strictly to OpenAI's original design and specifications. This is not up for debate.

我简单的修改了一下,也许在 openai 没有标准化之前可以凑合用用。

https://hub.docker.com/r/he0119/open-webui

The <think> block is still not showing up, model: bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF

logs:

2025-02-19T13:40:52.358306738+08:00 ERROR [asyncio] Task exception was never retrieved

2025-02-19T13:40:52.358338210+08:00 future: <Task finished name='Task-1921' coro=<process_chat_response.<locals>.post_response_handler() done, defined at /app/backend/open_webui/utils/middleware.py:1074> exception=ClientPayloadError("Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>")>

2025-02-19T13:40:52.358340439+08:00 Traceback (most recent call last):

2025-02-19T13:40:52.358341986+08:00   File "/usr/local/lib/python3.11/site-packages/aiohttp/client_proto.py", line 92, in connection_lost

2025-02-19T13:40:52.358343705+08:00     uncompleted = self._parser.feed_eof()

2025-02-19T13:40:52.358345006+08:00                   ^^^^^^^^^^^^^^^^^^^^^^^

2025-02-19T13:40:52.358346585+08:00   File "aiohttp/_http_parser.pyx", line 508, in aiohttp._http_parser.HttpParser.feed_eof

2025-02-19T13:40:52.358347797+08:00 aiohttp.http_exceptions.TransferEncodingError: 400, message:

2025-02-19T13:40:52.358348969+08:00   Not enough data for satisfy transfer length header.

2025-02-19T13:40:52.358350014+08:00 

2025-02-19T13:40:52.358351674+08:00 The above exception was the direct cause of the following exception:

2025-02-19T13:40:52.358352668+08:00 

2025-02-19T13:40:52.358354035+08:00 Traceback (most recent call last):

2025-02-19T13:40:52.358355227+08:00   File "/app/backend/open_webui/utils/middleware.py", line 1592, in post_response_handler

2025-02-19T13:40:52.358356477+08:00     await stream_body_handler(response)

2025-02-19T13:40:52.358357745+08:00   File "/app/backend/open_webui/utils/middleware.py", line 1418, in stream_body_handler

2025-02-19T13:40:52.358359009+08:00     async for line in response.body_iterator:

2025-02-19T13:40:52.358360161+08:00   File "/usr/local/lib/python3.11/site-packages/aiohttp/streams.py", line 52, in __anext__

2025-02-19T13:40:52.358361849+08:00     rv = await self.read_func()

2025-02-19T13:40:52.358362936+08:00          ^^^^^^^^^^^^^^^^^^^^^^

2025-02-19T13:40:52.358364363+08:00   File "/usr/local/lib/python3.11/site-packages/aiohttp/streams.py", line 352, in readline

2025-02-19T13:40:52.358365499+08:00     return await self.readuntil()

2025-02-19T13:40:52.358366599+08:00            ^^^^^^^^^^^^^^^^^^^^^^

2025-02-19T13:40:52.358367811+08:00   File "/usr/local/lib/python3.11/site-packages/aiohttp/streams.py", line 386, in readuntil

2025-02-19T13:40:52.358369023+08:00     await self._wait("readuntil")

2025-02-19T13:40:52.358370225+08:00   File "/usr/local/lib/python3.11/site-packages/aiohttp/streams.py", line 347, in _wait

2025-02-19T13:40:52.358371333+08:00     await waiter

2025-02-19T13:40:52.358373224+08:00 aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>
<!-- gh-comment-id:2667568806 --> @zx900930 commented on GitHub (Feb 19, 2025): > open-webui 官方不支持的理由如下: > > > [#9241 (comment)](https://github.com/open-webui/open-webui/pull/9241#issuecomment-2629707340) > > Let me be perfectly clear here: We will not add a reasoning_content field or any similar deviation that is not explicitly provided by OpenAI's official API. The core goal of Open WebUI's OpenAI implementation is to adhere strictly to OpenAI's original design and specifications. This is not up for debate. > > 我简单的修改了一下,也许在 openai 没有标准化之前可以凑合用用。 > > https://hub.docker.com/r/he0119/open-webui The `<think>` block is still not showing up, model: `bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF` logs: ``` 2025-02-19T13:40:52.358306738+08:00 ERROR [asyncio] Task exception was never retrieved 2025-02-19T13:40:52.358338210+08:00 future: <Task finished name='Task-1921' coro=<process_chat_response.<locals>.post_response_handler() done, defined at /app/backend/open_webui/utils/middleware.py:1074> exception=ClientPayloadError("Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>")> 2025-02-19T13:40:52.358340439+08:00 Traceback (most recent call last): 2025-02-19T13:40:52.358341986+08:00 File "/usr/local/lib/python3.11/site-packages/aiohttp/client_proto.py", line 92, in connection_lost 2025-02-19T13:40:52.358343705+08:00 uncompleted = self._parser.feed_eof() 2025-02-19T13:40:52.358345006+08:00 ^^^^^^^^^^^^^^^^^^^^^^^ 2025-02-19T13:40:52.358346585+08:00 File "aiohttp/_http_parser.pyx", line 508, in aiohttp._http_parser.HttpParser.feed_eof 2025-02-19T13:40:52.358347797+08:00 aiohttp.http_exceptions.TransferEncodingError: 400, message: 2025-02-19T13:40:52.358348969+08:00 Not enough data for satisfy transfer length header. 2025-02-19T13:40:52.358350014+08:00 2025-02-19T13:40:52.358351674+08:00 The above exception was the direct cause of the following exception: 2025-02-19T13:40:52.358352668+08:00 2025-02-19T13:40:52.358354035+08:00 Traceback (most recent call last): 2025-02-19T13:40:52.358355227+08:00 File "/app/backend/open_webui/utils/middleware.py", line 1592, in post_response_handler 2025-02-19T13:40:52.358356477+08:00 await stream_body_handler(response) 2025-02-19T13:40:52.358357745+08:00 File "/app/backend/open_webui/utils/middleware.py", line 1418, in stream_body_handler 2025-02-19T13:40:52.358359009+08:00 async for line in response.body_iterator: 2025-02-19T13:40:52.358360161+08:00 File "/usr/local/lib/python3.11/site-packages/aiohttp/streams.py", line 52, in __anext__ 2025-02-19T13:40:52.358361849+08:00 rv = await self.read_func() 2025-02-19T13:40:52.358362936+08:00 ^^^^^^^^^^^^^^^^^^^^^^ 2025-02-19T13:40:52.358364363+08:00 File "/usr/local/lib/python3.11/site-packages/aiohttp/streams.py", line 352, in readline 2025-02-19T13:40:52.358365499+08:00 return await self.readuntil() 2025-02-19T13:40:52.358366599+08:00 ^^^^^^^^^^^^^^^^^^^^^^ 2025-02-19T13:40:52.358367811+08:00 File "/usr/local/lib/python3.11/site-packages/aiohttp/streams.py", line 386, in readuntil 2025-02-19T13:40:52.358369023+08:00 await self._wait("readuntil") 2025-02-19T13:40:52.358370225+08:00 File "/usr/local/lib/python3.11/site-packages/aiohttp/streams.py", line 347, in _wait 2025-02-19T13:40:52.358371333+08:00 await waiter 2025-02-19T13:40:52.358373224+08:00 aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'> ```
Author
Owner

@he0119 commented on GitHub (Feb 19, 2025):

The <think> block is still not showing up, model: bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF

It looks like your api payload is not completed. I don't know why.

<!-- gh-comment-id:2667659808 --> @he0119 commented on GitHub (Feb 19, 2025): > The `<think>` block is still not showing up, model: `bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF` It looks like your api payload is not completed. I don't know why.
Author
Owner

@roksttr8616 commented on GitHub (Feb 21, 2025):

help! me too. beg beg you, devlop fast!!!

<!-- gh-comment-id:2673494360 --> @roksttr8616 commented on GitHub (Feb 21, 2025): help! me too. beg beg you, devlop fast!!!
Author
Owner

@tomly2019m commented on GitHub (Feb 21, 2025):

今天我也遇到了这个问题,但是现在已经解决了,想法是把reasoning content的东西移到content中来 并在reasoning时,手动加上标签。具体做法是截获第三方api返回的stream,并修改其中的内容,再移交给open webUI处理。明天我将把我的修改放上来,其实只需要增加一个截获函数即可。

Today I also encountered this issue, but it has been resolved now. The idea was to move the content from "reasoning content" into the main "content" field, and manually add the <think> tag during reasoning. The specific approach involves intercepting the stream returned by a third-party API, modifying its content, and then handing it over to Open WebUI for processing. I will share my modifications tomorrow—it essentially requires simply adding an interception function.

<!-- gh-comment-id:2675237500 --> @tomly2019m commented on GitHub (Feb 21, 2025): 今天我也遇到了这个问题,但是现在已经解决了,想法是把reasoning content的东西移到content中来 并在reasoning时,手动加上<think>标签。具体做法是截获第三方api返回的stream,并修改其中的内容,再移交给open webUI处理。明天我将把我的修改放上来,其实只需要增加一个截获函数即可。 Today I also encountered this issue, but it has been resolved now. The idea was to move the content from "reasoning content" into the main "content" field, and manually add the `<think>` tag during reasoning. The specific approach involves intercepting the stream returned by a third-party API, modifying its content, and then handing it over to Open WebUI for processing. I will share my modifications tomorrow—it essentially requires simply adding an interception function.
Author
Owner

@tomly2019m commented on GitHub (Feb 22, 2025):

backend\open_webui\routers\openai.py中 找到 generate_chat_completion 这个函数

    r = None

    session = None

    streaming = False

    response = None

下方 增加截获函数 这个方案适用于 火山 百炼 vLLM我没试过,但是思路是一样的,根据vLLM返回的chunk内容,把reasoning的部分贴到content中去就能正常显示了。

    async def modify_stream_content(original_stream):
        start_reasoning = True
        end_reasoning = True
        # 逐块处理流式内容
        async for chunk in original_stream:
            # 示例:修改 chunk 内容(假设 chunk 是 JSON 字符串)
            try:
                # 1. 解码字节流为字符串
                decoded_chunk = chunk.decode('utf-8').lstrip('data: ').strip()
                
                # 2. 解析 JSON(根据实际格式调整)
                chunk_data = json.loads(decoded_chunk)
                
                # 3. 修改内容
                if "choices" in chunk_data and len(chunk_data["choices"]) > 0:
                    delta = chunk_data["choices"][0].get("delta", {})
                    # print(delta)
                    if delta["content"] == None:
                        delta["content"] = ""
                    if "reasoning_content" in delta and delta["reasoning_content"] == None:
                        delta["reasoning_content"] = ""

                    if delta["content"] == "":
                        if start_reasoning:
                            delta["content"] = "<think>" + delta["content"] + delta["reasoning_content"]
                            start_reasoning = False
                        else:
                            delta["content"] = delta["content"] + delta["reasoning_content"]
                    else:
                        if end_reasoning:
                            delta["content"] = "</think>" + delta["content"]
                            end_reasoning = False
                    # print(delta)
                # 4. 重新编码为字节流
                modified_chunk = f"data: {json.dumps(chunk_data)}\n\n".encode('utf-8')
                
            except (json.JSONDecodeError, KeyError) as e:
                # 处理错误(可选:记录日志或跳过)
                modified_chunk = chunk  # 保留原始数据
                
            # 5. 返回修改后的 chunk
            yield modified_chunk

在下方,if "text/event-stream" in r.headers.get("Content-Type", ""): 分支里。调用截获函数并返回

   # Check if response is SSE
        if "text/event-stream" in r.headers.get("Content-Type", ""):
            streaming = True
            # 调用截获函数
            modified_stream = modify_stream_content(r.content)

            return StreamingResponse(
                # r.content,
                modified_stream,
                status_code=r.status,
                headers=dict(r.headers),
                background=BackgroundTask(
                    cleanup_response, response=r, session=session
                ),
            )

Image

    r = None

    session = None

    streaming = False

    response = None
    async def modify_stream_content(original_stream):
        start_reasoning = True
        end_reasoning = True
        # 逐块处理流式内容
        async for chunk in original_stream:
            # 示例:修改 chunk 内容(假设 chunk 是 JSON 字符串)
            try:
                # 1. 解码字节流为字符串
                decoded_chunk = chunk.decode('utf-8').lstrip('data: ').strip()
                
                # 2. 解析 JSON(根据实际格式调整)
                chunk_data = json.loads(decoded_chunk)
                
                # 3. 修改内容
                if "choices" in chunk_data and len(chunk_data["choices"]) > 0:
                    delta = chunk_data["choices"][0].get("delta", {})
                    # print(delta)
                    if delta["content"] == None:
                        delta["content"] = ""
                    if "reasoning_content" in delta and delta["reasoning_content"] == None:
                        delta["reasoning_content"] = ""

                    if delta["content"] == "":
                        if start_reasoning:
                            delta["content"] = "<think>" + delta["content"] + delta["reasoning_content"]
                            start_reasoning = False
                        else:
                            delta["content"] = delta["content"] + delta["reasoning_content"]
                    else:
                        if end_reasoning:
                            delta["content"] = "</think>" + delta["content"]
                            end_reasoning = False
                    # print(delta)
                # 4. 重新编码为字节流
                modified_chunk = f"data: {json.dumps(chunk_data)}\n\n".encode('utf-8')
                
            except (json.JSONDecodeError, KeyError) as e:
                # 处理错误(可选:记录日志或跳过)
                modified_chunk = chunk  # 保留原始数据
                
            # 5. 返回修改后的 chunk
            yield modified_chunk

Image

如果你的API 返回的格式有出入,可以在截获函数中打印出来,作相应的调整。

If your API returns data in a format that differs from expectations, you can print it out within the interceptor function and make corresponding adjustments.

<!-- gh-comment-id:2676007418 --> @tomly2019m commented on GitHub (Feb 22, 2025): 在`backend\open_webui\routers\openai.py`中 找到 `generate_chat_completion` 这个函数 在 ``` r = None session = None streaming = False response = None ``` 下方 增加截获函数 这个方案适用于 火山 百炼 vLLM我没试过,但是思路是一样的,根据vLLM返回的chunk内容,把reasoning的部分贴到content中去就能正常显示了。 ``` async def modify_stream_content(original_stream): start_reasoning = True end_reasoning = True # 逐块处理流式内容 async for chunk in original_stream: # 示例:修改 chunk 内容(假设 chunk 是 JSON 字符串) try: # 1. 解码字节流为字符串 decoded_chunk = chunk.decode('utf-8').lstrip('data: ').strip() # 2. 解析 JSON(根据实际格式调整) chunk_data = json.loads(decoded_chunk) # 3. 修改内容 if "choices" in chunk_data and len(chunk_data["choices"]) > 0: delta = chunk_data["choices"][0].get("delta", {}) # print(delta) if delta["content"] == None: delta["content"] = "" if "reasoning_content" in delta and delta["reasoning_content"] == None: delta["reasoning_content"] = "" if delta["content"] == "": if start_reasoning: delta["content"] = "<think>" + delta["content"] + delta["reasoning_content"] start_reasoning = False else: delta["content"] = delta["content"] + delta["reasoning_content"] else: if end_reasoning: delta["content"] = "</think>" + delta["content"] end_reasoning = False # print(delta) # 4. 重新编码为字节流 modified_chunk = f"data: {json.dumps(chunk_data)}\n\n".encode('utf-8') except (json.JSONDecodeError, KeyError) as e: # 处理错误(可选:记录日志或跳过) modified_chunk = chunk # 保留原始数据 # 5. 返回修改后的 chunk yield modified_chunk ``` 在下方,`if "text/event-stream" in r.headers.get("Content-Type", ""):` 分支里。调用截获函数并返回 ``` # Check if response is SSE if "text/event-stream" in r.headers.get("Content-Type", ""): streaming = True # 调用截获函数 modified_stream = modify_stream_content(r.content) return StreamingResponse( # r.content, modified_stream, status_code=r.status, headers=dict(r.headers), background=BackgroundTask( cleanup_response, response=r, session=session ), ) ``` ![Image](https://github.com/user-attachments/assets/fd3adbb9-6f08-4259-a168-1ce15a556c3e) ``` r = None session = None streaming = False response = None ``` ``` async def modify_stream_content(original_stream): start_reasoning = True end_reasoning = True # 逐块处理流式内容 async for chunk in original_stream: # 示例:修改 chunk 内容(假设 chunk 是 JSON 字符串) try: # 1. 解码字节流为字符串 decoded_chunk = chunk.decode('utf-8').lstrip('data: ').strip() # 2. 解析 JSON(根据实际格式调整) chunk_data = json.loads(decoded_chunk) # 3. 修改内容 if "choices" in chunk_data and len(chunk_data["choices"]) > 0: delta = chunk_data["choices"][0].get("delta", {}) # print(delta) if delta["content"] == None: delta["content"] = "" if "reasoning_content" in delta and delta["reasoning_content"] == None: delta["reasoning_content"] = "" if delta["content"] == "": if start_reasoning: delta["content"] = "<think>" + delta["content"] + delta["reasoning_content"] start_reasoning = False else: delta["content"] = delta["content"] + delta["reasoning_content"] else: if end_reasoning: delta["content"] = "</think>" + delta["content"] end_reasoning = False # print(delta) # 4. 重新编码为字节流 modified_chunk = f"data: {json.dumps(chunk_data)}\n\n".encode('utf-8') except (json.JSONDecodeError, KeyError) as e: # 处理错误(可选:记录日志或跳过) modified_chunk = chunk # 保留原始数据 # 5. 返回修改后的 chunk yield modified_chunk ``` ![Image](https://github.com/user-attachments/assets/651d8c7c-c3ae-454d-bdb1-abb7bdcd66d2) 如果你的API 返回的格式有出入,可以在截获函数中打印出来,作相应的调整。 If your API returns data in a format that differs from expectations, you can print it out within the interceptor function and make corresponding adjustments.
Author
Owner

@i-iooi-i commented on GitHub (Feb 22, 2025):

Image

https://github.com/n4ze3m/page-assist

You can temporarily use this browser extension as a substitute, as it displays the complete thought process.

<!-- gh-comment-id:2676159397 --> @i-iooi-i commented on GitHub (Feb 22, 2025): ![Image](https://github.com/user-attachments/assets/b4dddf0a-83f9-493c-8fc6-1ab673f31962) https://github.com/n4ze3m/page-assist You can temporarily use this browser extension as a substitute, as it displays the complete thought process.
Author
Owner

@romeoleung commented on GitHub (Feb 23, 2025):

I tried to add a phrase in the system prompt to ask it to output the reasoning_content as tag. Here's what I put:

"In your response, please output the reasoning_content as tag."
In Chinese: "回复中,请将reasoning_content转换为tag的内容输出"

It does "work" as it indeed gave me the CoT in the UI. But I guess it has a cost - I think it first output the reasoning_content in the back but did not show, and then in the real reply, it just wrapped the thinking process with tag and output again. I didn't really look the real reasoning_content (don't know how), so I guess it's outputting the thinking process twice.

Image

<!-- gh-comment-id:2676890978 --> @romeoleung commented on GitHub (Feb 23, 2025): I tried to add a phrase in the system prompt to ask it to output the reasoning_content as <think> tag. Here's what I put: "In your response, please output the reasoning_content as <think> tag." In Chinese: "回复中,请将reasoning_content转换为<think>tag的内容输出" It does "work" as it indeed gave me the CoT in the UI. But I guess it has a cost - I think it first output the reasoning_content in the back but did not show, and then in the real reply, it just wrapped the thinking process with <think> tag and output again. I didn't really look the real reasoning_content (don't know how), so I guess it's outputting the thinking process twice. ![Image](https://github.com/user-attachments/assets/f4affd51-c577-4e61-a326-978dcea6adbc)
Author
Owner

@cdmusic2019 commented on GitHub (Feb 24, 2025):

其实用he0119大佬的分枝就可以了,或者自行替换一下大佬修改过的middleware.py文件就可以了。

<!-- gh-comment-id:2677354368 --> @cdmusic2019 commented on GitHub (Feb 24, 2025): 其实用he0119大佬的分枝就可以了,或者自行替换一下大佬修改过的middleware.py文件就可以了。
Author
Owner

@grea commented on GitHub (Feb 24, 2025):

其实用he0119大佬的分枝就可以了,或者自行替换一下大佬修改过的middleware.py文件就可以了。

有具体的链接吗?

<!-- gh-comment-id:2677453929 --> @grea commented on GitHub (Feb 24, 2025): > 其实用he0119大佬的分枝就可以了,或者自行替换一下大佬修改过的middleware.py文件就可以了。 有具体的链接吗?
Author
Owner

@cdmusic2019 commented on GitHub (Feb 24, 2025):

其实用he0119大佬的分枝就可以了,或者自行替换一下大佬修改过的middleware.py文件就可以了。

有具体的链接吗?

https://github.com/he0119/open-webui
@grea @he0119

<!-- gh-comment-id:2677618022 --> @cdmusic2019 commented on GitHub (Feb 24, 2025): > > 其实用he0119大佬的分枝就可以了,或者自行替换一下大佬修改过的middleware.py文件就可以了。 > > 有具体的链接吗? https://github.com/he0119/open-webui @grea @he0119
Author
Owner

@bm2ilabs commented on GitHub (Feb 24, 2025):

I was able to make this work for openrouter

"""
title: OpenRouter R1 Fix Think (with OpenRouter)
author: zgccrui (adapted by Boukraa Mohamed)
description: 在OpwenWebUI中显示OpenRouter模型的思维链 - 仅支持0.5.6及以上版本 (使用OpenRouter API)
version: 1.4.2
"""

import json
import httpx
import re
from typing import AsyncGenerator, Callable, Awaitable
from pydantic import BaseModel, Field
import asyncio


class Pipe:
    class Valves(BaseModel):
        OPENROUTER_API_BASE_URL: str = Field(
            default="https://openrouter.ai/api/v1",
            description="OpenRouter API的基础请求地址",
        )
        OPENROUTER_API_KEY: str = Field(
            default="", description="用于身份验证的OpenRouter API密钥"
        )
        OPENROUTER_API_MODEL: str = Field(
            default="deepseek/deepseek-r1",  # Default to a DeepSeek model
            description="API请求的模型名称,例如 mistralai/mistral-medium",
        )
        OPENROUTER_REFERER: str = Field(
            default="",
            description="Your site URL to set as the 'Referer' header.  Important for free-tier usage!",
        )
        MAX_RETRIES: int = Field(
            default=3,
            description="Maximum number of retries for API requests.",
        )
        # Add a regex pattern to clean the model ID.  Adjust as needed.
        MODEL_ID_CLEAN_PATTERN: str = Field(
            default=r"^.*?\.?",  # Matches anything up to and including the first period.
            description="Regex to remove prefixes from the model ID.",
        )

    def __init__(self):
        self.valves = self.Valves()
        self.data_prefix = "data:"
        self.emitter = None

    def pipes(self):
        return [
            {
                "id": self.valves.OPENROUTER_API_MODEL,
                "name": self.valves.OPENROUTER_API_MODEL,
            }
        ]

    async def pipe(
        self, body: dict, __event_emitter__: Callable[[dict], Awaitable[None]] = None
    ) -> AsyncGenerator[str, None]:
        """Main processing pipeline (using OpenRouter API)."""

        thinking_state = {"thinking": -1}
        self.emitter = __event_emitter__
        search_providers = 0
        retries = 0

        if not self.valves.OPENROUTER_API_KEY:
            yield json.dumps(
                {"error": "未配置 OpenRouter API 密钥"}, ensure_ascii=False
            )
            return

        if not self.valves.OPENROUTER_REFERER:
            yield json.dumps(
                {"error": "未配置 OpenRouter Referer (您的站点URL)"}, ensure_ascii=False
            )
            return

        headers = {
            "Authorization": f"Bearer {self.valves.OPENROUTER_API_KEY}",
            "Content-Type": "application/json",
            "HTTP-Referer": self.valves.OPENROUTER_REFERER,
            "X-Title": "OpwenWebUI",
        }

        model_id = body["model"].split(".", 1)[-1]

        payload = {
            **body,
            "model": model_id,
            "messages": self.normalize_messages(body["messages"]),
            "stream": True,
            "include_reasoning": True,
        }
        # --- (Rest of the code is the same as version 1.4.0) ---

        while retries < self.valves.MAX_RETRIES:
            try:
                async with httpx.AsyncClient(http2=True) as client:
                    async with client.stream(
                        "POST",
                        f"{self.valves.OPENROUTER_API_BASE_URL}/chat/completions",
                        json=payload,
                        headers=headers,
                        timeout=300,
                    ) as response:

                        if response.status_code != 200:
                            error_content = await response.aread()
                            yield self._format_error(
                                response.status_code, error_content
                            )
                            if response.status_code in (429, 500, 502, 503, 504):
                                retries += 1
                                await asyncio.sleep(2**retries)
                                continue
                            return

                        async for line in response.aiter_lines():
                            if not line.startswith(self.data_prefix):
                                continue

                            json_str = line[len(self.data_prefix) :]

                            if json_str.strip() == "[DONE]":
                                return

                            try:
                                data = json.loads(json_str)
                            except json.JSONDecodeError as e:
                                error_detail = f"解析失败 - 内容:{json_str},原因:{e}"
                                yield self._format_error(
                                    "JSONDecodeError", error_detail
                                )
                                return

                            choice = data.get("choices", [{}])[0]
                            if choice.get("finish_reason"):
                                return

                            delta = choice.get("delta", {})

                            # --- Thinking State and Content ---
                            reasoning_content = delta.get("reasoning", "")
                            content = delta.get("content", "")

                            if reasoning_content:
                                if thinking_state["thinking"] == -1:
                                    yield "<think>\n"
                                    thinking_state["thinking"] = 0
                                yield reasoning_content

                            elif content:
                                if thinking_state["thinking"] == 0:
                                    yield "\n</think>\n\n"
                                    thinking_state["thinking"] = 1

                                # Basic URL extraction (as before)
                                urls = re.findall(r"(https?://\S+)", content)
                                if urls:
                                    if search_providers == 0:
                                        yield '<details type="search">\n'
                                        yield f"<summary>已找到 {len(urls)} 个链接</summary>\n"
                                        for i, url in enumerate(urls, 1):
                                            cleaned_url = url.rstrip(".,;?!)]>")
                                            yield f"> {i}. [链接 {i}]({cleaned_url})\n"
                                        yield "</details>\n"
                                        search_providers = 3

                                yield content

                return  # Exit on success

            except httpx.RequestError as e:
                yield self._format_exception(e)
                retries += 1
                await asyncio.sleep(2**retries)
            except Exception as e:
                yield self._format_exception(e)
                return

        yield self._format_error(
            "MaxRetriesExceeded", f"Reached maximum retries ({self.valves.MAX_RETRIES})"
        )

    def normalize_messages(self, messages):
        """Ensures alternating user/assistant roles."""
        normalized_messages = []
        for i, message in enumerate(messages):
            if i > 0 and message["role"] == normalized_messages[-1]["role"]:
                alternate_role = "assistant" if message["role"] == "user" else "user"
                normalized_messages.append(
                    {"role": alternate_role, "content": "[Unfinished thinking]"}
                )
            normalized_messages.append(message)
        return normalized_messages

    def _format_error(self, status_code: int, error: bytes) -> str:
        if isinstance(error, str):
            error_str = error
        else:
            error_str = error.decode(errors="ignore")
        try:
            err_msg = json.loads(error_str).get("message", error_str)[:200]
        except Exception:
            err_msg = error_str[:200]
        return json.dumps(
            {"error": f"HTTP {status_code}: {err_msg}"}, ensure_ascii=False
        )

    def _format_exception(self, e: Exception) -> str:
        err_type = type(e).__name__
        return json.dumps({"error": f"{err_type}: {str(e)}"}, ensure_ascii=False)
<!-- gh-comment-id:2679051543 --> @bm2ilabs commented on GitHub (Feb 24, 2025): I was able to make this work for openrouter ``` """ title: OpenRouter R1 Fix Think (with OpenRouter) author: zgccrui (adapted by Boukraa Mohamed) description: 在OpwenWebUI中显示OpenRouter模型的思维链 - 仅支持0.5.6及以上版本 (使用OpenRouter API) version: 1.4.2 """ import json import httpx import re from typing import AsyncGenerator, Callable, Awaitable from pydantic import BaseModel, Field import asyncio class Pipe: class Valves(BaseModel): OPENROUTER_API_BASE_URL: str = Field( default="https://openrouter.ai/api/v1", description="OpenRouter API的基础请求地址", ) OPENROUTER_API_KEY: str = Field( default="", description="用于身份验证的OpenRouter API密钥" ) OPENROUTER_API_MODEL: str = Field( default="deepseek/deepseek-r1", # Default to a DeepSeek model description="API请求的模型名称,例如 mistralai/mistral-medium", ) OPENROUTER_REFERER: str = Field( default="", description="Your site URL to set as the 'Referer' header. Important for free-tier usage!", ) MAX_RETRIES: int = Field( default=3, description="Maximum number of retries for API requests.", ) # Add a regex pattern to clean the model ID. Adjust as needed. MODEL_ID_CLEAN_PATTERN: str = Field( default=r"^.*?\.?", # Matches anything up to and including the first period. description="Regex to remove prefixes from the model ID.", ) def __init__(self): self.valves = self.Valves() self.data_prefix = "data:" self.emitter = None def pipes(self): return [ { "id": self.valves.OPENROUTER_API_MODEL, "name": self.valves.OPENROUTER_API_MODEL, } ] async def pipe( self, body: dict, __event_emitter__: Callable[[dict], Awaitable[None]] = None ) -> AsyncGenerator[str, None]: """Main processing pipeline (using OpenRouter API).""" thinking_state = {"thinking": -1} self.emitter = __event_emitter__ search_providers = 0 retries = 0 if not self.valves.OPENROUTER_API_KEY: yield json.dumps( {"error": "未配置 OpenRouter API 密钥"}, ensure_ascii=False ) return if not self.valves.OPENROUTER_REFERER: yield json.dumps( {"error": "未配置 OpenRouter Referer (您的站点URL)"}, ensure_ascii=False ) return headers = { "Authorization": f"Bearer {self.valves.OPENROUTER_API_KEY}", "Content-Type": "application/json", "HTTP-Referer": self.valves.OPENROUTER_REFERER, "X-Title": "OpwenWebUI", } model_id = body["model"].split(".", 1)[-1] payload = { **body, "model": model_id, "messages": self.normalize_messages(body["messages"]), "stream": True, "include_reasoning": True, } # --- (Rest of the code is the same as version 1.4.0) --- while retries < self.valves.MAX_RETRIES: try: async with httpx.AsyncClient(http2=True) as client: async with client.stream( "POST", f"{self.valves.OPENROUTER_API_BASE_URL}/chat/completions", json=payload, headers=headers, timeout=300, ) as response: if response.status_code != 200: error_content = await response.aread() yield self._format_error( response.status_code, error_content ) if response.status_code in (429, 500, 502, 503, 504): retries += 1 await asyncio.sleep(2**retries) continue return async for line in response.aiter_lines(): if not line.startswith(self.data_prefix): continue json_str = line[len(self.data_prefix) :] if json_str.strip() == "[DONE]": return try: data = json.loads(json_str) except json.JSONDecodeError as e: error_detail = f"解析失败 - 内容:{json_str},原因:{e}" yield self._format_error( "JSONDecodeError", error_detail ) return choice = data.get("choices", [{}])[0] if choice.get("finish_reason"): return delta = choice.get("delta", {}) # --- Thinking State and Content --- reasoning_content = delta.get("reasoning", "") content = delta.get("content", "") if reasoning_content: if thinking_state["thinking"] == -1: yield "<think>\n" thinking_state["thinking"] = 0 yield reasoning_content elif content: if thinking_state["thinking"] == 0: yield "\n</think>\n\n" thinking_state["thinking"] = 1 # Basic URL extraction (as before) urls = re.findall(r"(https?://\S+)", content) if urls: if search_providers == 0: yield '<details type="search">\n' yield f"<summary>已找到 {len(urls)} 个链接</summary>\n" for i, url in enumerate(urls, 1): cleaned_url = url.rstrip(".,;?!)]>") yield f"> {i}. [链接 {i}]({cleaned_url})\n" yield "</details>\n" search_providers = 3 yield content return # Exit on success except httpx.RequestError as e: yield self._format_exception(e) retries += 1 await asyncio.sleep(2**retries) except Exception as e: yield self._format_exception(e) return yield self._format_error( "MaxRetriesExceeded", f"Reached maximum retries ({self.valves.MAX_RETRIES})" ) def normalize_messages(self, messages): """Ensures alternating user/assistant roles.""" normalized_messages = [] for i, message in enumerate(messages): if i > 0 and message["role"] == normalized_messages[-1]["role"]: alternate_role = "assistant" if message["role"] == "user" else "user" normalized_messages.append( {"role": alternate_role, "content": "[Unfinished thinking]"} ) normalized_messages.append(message) return normalized_messages def _format_error(self, status_code: int, error: bytes) -> str: if isinstance(error, str): error_str = error else: error_str = error.decode(errors="ignore") try: err_msg = json.loads(error_str).get("message", error_str)[:200] except Exception: err_msg = error_str[:200] return json.dumps( {"error": f"HTTP {status_code}: {err_msg}"}, ensure_ascii=False ) def _format_exception(self, e: Exception) -> str: err_type = type(e).__name__ return json.dumps({"error": f"{err_type}: {str(e)}"}, ensure_ascii=False) ```
Author
Owner

@AndrewTsao commented on GitHub (Feb 25, 2025):

I was able to make this work for openrouter

"""
title: OpenRouter R1 Fix Think (with OpenRouter)
author: zgccrui (adapted by Boukraa Mohamed)
description: 在OpwenWebUI中显示OpenRouter模型的思维链 - 仅支持0.5.6及以上版本 (使用OpenRouter API)
version: 1.4.2
"""

import json
import httpx
import re
from typing import AsyncGenerator, Callable, Awaitable
from pydantic import BaseModel, Field
import asyncio


class Pipe:
    class Valves(BaseModel):
        OPENROUTER_API_BASE_URL: str = Field(
            default="https://openrouter.ai/api/v1",
            description="OpenRouter API的基础请求地址",
        )
        OPENROUTER_API_KEY: str = Field(
            default="", description="用于身份验证的OpenRouter API密钥"
        )
        OPENROUTER_API_MODEL: str = Field(
            default="deepseek/deepseek-r1",  # Default to a DeepSeek model
            description="API请求的模型名称,例如 mistralai/mistral-medium",
        )
        OPENROUTER_REFERER: str = Field(
            default="",
            description="Your site URL to set as the 'Referer' header.  Important for free-tier usage!",
        )
        MAX_RETRIES: int = Field(
            default=3,
            description="Maximum number of retries for API requests.",
        )
        # Add a regex pattern to clean the model ID.  Adjust as needed.
        MODEL_ID_CLEAN_PATTERN: str = Field(
            default=r"^.*?\.?",  # Matches anything up to and including the first period.
            description="Regex to remove prefixes from the model ID.",
        )

    def __init__(self):
        self.valves = self.Valves()
        self.data_prefix = "data:"
        self.emitter = None

    def pipes(self):
        return [
            {
                "id": self.valves.OPENROUTER_API_MODEL,
                "name": self.valves.OPENROUTER_API_MODEL,
            }
        ]

    async def pipe(
        self, body: dict, __event_emitter__: Callable[[dict], Awaitable[None]] = None
    ) -> AsyncGenerator[str, None]:
        """Main processing pipeline (using OpenRouter API)."""

        thinking_state = {"thinking": -1}
        self.emitter = __event_emitter__
        search_providers = 0
        retries = 0

        if not self.valves.OPENROUTER_API_KEY:
            yield json.dumps(
                {"error": "未配置 OpenRouter API 密钥"}, ensure_ascii=False
            )
            return

        if not self.valves.OPENROUTER_REFERER:
            yield json.dumps(
                {"error": "未配置 OpenRouter Referer (您的站点URL)"}, ensure_ascii=False
            )
            return

        headers = {
            "Authorization": f"Bearer {self.valves.OPENROUTER_API_KEY}",
            "Content-Type": "application/json",
            "HTTP-Referer": self.valves.OPENROUTER_REFERER,
            "X-Title": "OpwenWebUI",
        }

        model_id = body["model"].split(".", 1)[-1]

        payload = {
            **body,
            "model": model_id,
            "messages": self.normalize_messages(body["messages"]),
            "stream": True,
            "include_reasoning": True,
        }
        # --- (Rest of the code is the same as version 1.4.0) ---

        while retries < self.valves.MAX_RETRIES:
            try:
                async with httpx.AsyncClient(http2=True) as client:
                    async with client.stream(
                        "POST",
                        f"{self.valves.OPENROUTER_API_BASE_URL}/chat/completions",
                        json=payload,
                        headers=headers,
                        timeout=300,
                    ) as response:

                        if response.status_code != 200:
                            error_content = await response.aread()
                            yield self._format_error(
                                response.status_code, error_content
                            )
                            if response.status_code in (429, 500, 502, 503, 504):
                                retries += 1
                                await asyncio.sleep(2**retries)
                                continue
                            return

                        async for line in response.aiter_lines():
                            if not line.startswith(self.data_prefix):
                                continue

                            json_str = line[len(self.data_prefix) :]

                            if json_str.strip() == "[DONE]":
                                return

                            try:
                                data = json.loads(json_str)
                            except json.JSONDecodeError as e:
                                error_detail = f"解析失败 - 内容:{json_str},原因:{e}"
                                yield self._format_error(
                                    "JSONDecodeError", error_detail
                                )
                                return

                            choice = data.get("choices", [{}])[0]
                            if choice.get("finish_reason"):
                                return

                            delta = choice.get("delta", {})

                            # --- Thinking State and Content ---
                            reasoning_content = delta.get("reasoning", "")
                            content = delta.get("content", "")

                            if reasoning_content:
                                if thinking_state["thinking"] == -1:
                                    yield "<think>\n"
                                    thinking_state["thinking"] = 0
                                yield reasoning_content

                            elif content:
                                if thinking_state["thinking"] == 0:
                                    yield "\n</think>\n\n"
                                    thinking_state["thinking"] = 1

                                # Basic URL extraction (as before)
                                urls = re.findall(r"(https?://\S+)", content)
                                if urls:
                                    if search_providers == 0:
                                        yield '<details type="search">\n'
                                        yield f"<summary>已找到 {len(urls)} 个链接</summary>\n"
                                        for i, url in enumerate(urls, 1):
                                            cleaned_url = url.rstrip(".,;?!)]>")
                                            yield f"> {i}. [链接 {i}]({cleaned_url})\n"
                                        yield "</details>\n"
                                        search_providers = 3

                                yield content

                return  # Exit on success

            except httpx.RequestError as e:
                yield self._format_exception(e)
                retries += 1
                await asyncio.sleep(2**retries)
            except Exception as e:
                yield self._format_exception(e)
                return

        yield self._format_error(
            "MaxRetriesExceeded", f"Reached maximum retries ({self.valves.MAX_RETRIES})"
        )

    def normalize_messages(self, messages):
        """Ensures alternating user/assistant roles."""
        normalized_messages = []
        for i, message in enumerate(messages):
            if i > 0 and message["role"] == normalized_messages[-1]["role"]:
                alternate_role = "assistant" if message["role"] == "user" else "user"
                normalized_messages.append(
                    {"role": alternate_role, "content": "[Unfinished thinking]"}
                )
            normalized_messages.append(message)
        return normalized_messages

    def _format_error(self, status_code: int, error: bytes) -> str:
        if isinstance(error, str):
            error_str = error
        else:
            error_str = error.decode(errors="ignore")
        try:
            err_msg = json.loads(error_str).get("message", error_str)[:200]
        except Exception:
            err_msg = error_str[:200]
        return json.dumps(
            {"error": f"HTTP {status_code}: {err_msg}"}, ensure_ascii=False
        )

    def _format_exception(self, e: Exception) -> str:
        err_type = type(e).__name__
        return json.dumps({"error": f"{err_type}: {str(e)}"}, ensure_ascii=False)

使用函数方式能正常显示Thinking过程,但是发现无法关联知识库的内容。非函数的模型就可以。

<!-- gh-comment-id:2680229616 --> @AndrewTsao commented on GitHub (Feb 25, 2025): > I was able to make this work for openrouter > > ``` > """ > title: OpenRouter R1 Fix Think (with OpenRouter) > author: zgccrui (adapted by Boukraa Mohamed) > description: 在OpwenWebUI中显示OpenRouter模型的思维链 - 仅支持0.5.6及以上版本 (使用OpenRouter API) > version: 1.4.2 > """ > > import json > import httpx > import re > from typing import AsyncGenerator, Callable, Awaitable > from pydantic import BaseModel, Field > import asyncio > > > class Pipe: > class Valves(BaseModel): > OPENROUTER_API_BASE_URL: str = Field( > default="https://openrouter.ai/api/v1", > description="OpenRouter API的基础请求地址", > ) > OPENROUTER_API_KEY: str = Field( > default="", description="用于身份验证的OpenRouter API密钥" > ) > OPENROUTER_API_MODEL: str = Field( > default="deepseek/deepseek-r1", # Default to a DeepSeek model > description="API请求的模型名称,例如 mistralai/mistral-medium", > ) > OPENROUTER_REFERER: str = Field( > default="", > description="Your site URL to set as the 'Referer' header. Important for free-tier usage!", > ) > MAX_RETRIES: int = Field( > default=3, > description="Maximum number of retries for API requests.", > ) > # Add a regex pattern to clean the model ID. Adjust as needed. > MODEL_ID_CLEAN_PATTERN: str = Field( > default=r"^.*?\.?", # Matches anything up to and including the first period. > description="Regex to remove prefixes from the model ID.", > ) > > def __init__(self): > self.valves = self.Valves() > self.data_prefix = "data:" > self.emitter = None > > def pipes(self): > return [ > { > "id": self.valves.OPENROUTER_API_MODEL, > "name": self.valves.OPENROUTER_API_MODEL, > } > ] > > async def pipe( > self, body: dict, __event_emitter__: Callable[[dict], Awaitable[None]] = None > ) -> AsyncGenerator[str, None]: > """Main processing pipeline (using OpenRouter API).""" > > thinking_state = {"thinking": -1} > self.emitter = __event_emitter__ > search_providers = 0 > retries = 0 > > if not self.valves.OPENROUTER_API_KEY: > yield json.dumps( > {"error": "未配置 OpenRouter API 密钥"}, ensure_ascii=False > ) > return > > if not self.valves.OPENROUTER_REFERER: > yield json.dumps( > {"error": "未配置 OpenRouter Referer (您的站点URL)"}, ensure_ascii=False > ) > return > > headers = { > "Authorization": f"Bearer {self.valves.OPENROUTER_API_KEY}", > "Content-Type": "application/json", > "HTTP-Referer": self.valves.OPENROUTER_REFERER, > "X-Title": "OpwenWebUI", > } > > model_id = body["model"].split(".", 1)[-1] > > payload = { > **body, > "model": model_id, > "messages": self.normalize_messages(body["messages"]), > "stream": True, > "include_reasoning": True, > } > # --- (Rest of the code is the same as version 1.4.0) --- > > while retries < self.valves.MAX_RETRIES: > try: > async with httpx.AsyncClient(http2=True) as client: > async with client.stream( > "POST", > f"{self.valves.OPENROUTER_API_BASE_URL}/chat/completions", > json=payload, > headers=headers, > timeout=300, > ) as response: > > if response.status_code != 200: > error_content = await response.aread() > yield self._format_error( > response.status_code, error_content > ) > if response.status_code in (429, 500, 502, 503, 504): > retries += 1 > await asyncio.sleep(2**retries) > continue > return > > async for line in response.aiter_lines(): > if not line.startswith(self.data_prefix): > continue > > json_str = line[len(self.data_prefix) :] > > if json_str.strip() == "[DONE]": > return > > try: > data = json.loads(json_str) > except json.JSONDecodeError as e: > error_detail = f"解析失败 - 内容:{json_str},原因:{e}" > yield self._format_error( > "JSONDecodeError", error_detail > ) > return > > choice = data.get("choices", [{}])[0] > if choice.get("finish_reason"): > return > > delta = choice.get("delta", {}) > > # --- Thinking State and Content --- > reasoning_content = delta.get("reasoning", "") > content = delta.get("content", "") > > if reasoning_content: > if thinking_state["thinking"] == -1: > yield "<think>\n" > thinking_state["thinking"] = 0 > yield reasoning_content > > elif content: > if thinking_state["thinking"] == 0: > yield "\n</think>\n\n" > thinking_state["thinking"] = 1 > > # Basic URL extraction (as before) > urls = re.findall(r"(https?://\S+)", content) > if urls: > if search_providers == 0: > yield '<details type="search">\n' > yield f"<summary>已找到 {len(urls)} 个链接</summary>\n" > for i, url in enumerate(urls, 1): > cleaned_url = url.rstrip(".,;?!)]>") > yield f"> {i}. [链接 {i}]({cleaned_url})\n" > yield "</details>\n" > search_providers = 3 > > yield content > > return # Exit on success > > except httpx.RequestError as e: > yield self._format_exception(e) > retries += 1 > await asyncio.sleep(2**retries) > except Exception as e: > yield self._format_exception(e) > return > > yield self._format_error( > "MaxRetriesExceeded", f"Reached maximum retries ({self.valves.MAX_RETRIES})" > ) > > def normalize_messages(self, messages): > """Ensures alternating user/assistant roles.""" > normalized_messages = [] > for i, message in enumerate(messages): > if i > 0 and message["role"] == normalized_messages[-1]["role"]: > alternate_role = "assistant" if message["role"] == "user" else "user" > normalized_messages.append( > {"role": alternate_role, "content": "[Unfinished thinking]"} > ) > normalized_messages.append(message) > return normalized_messages > > def _format_error(self, status_code: int, error: bytes) -> str: > if isinstance(error, str): > error_str = error > else: > error_str = error.decode(errors="ignore") > try: > err_msg = json.loads(error_str).get("message", error_str)[:200] > except Exception: > err_msg = error_str[:200] > return json.dumps( > {"error": f"HTTP {status_code}: {err_msg}"}, ensure_ascii=False > ) > > def _format_exception(self, e: Exception) -> str: > err_type = type(e).__name__ > return json.dumps({"error": f"{err_type}: {str(e)}"}, ensure_ascii=False) > ``` 使用函数方式能正常显示Thinking过程,但是发现无法关联知识库的内容。非函数的模型就可以。
Author
Owner

@Hugh-yw commented on GitHub (Feb 26, 2025):

Deepseek-r1 deployed by SGLang engine, using the latest version of open-webui:0.5.16, configured the function according to the online tutorial, but still can't display the thinking chain

<!-- gh-comment-id:2683894165 --> @Hugh-yw commented on GitHub (Feb 26, 2025): Deepseek-r1 deployed by SGLang engine, using the latest version of open-webui:0.5.16, configured the function according to the online tutorial, but still can't display the thinking chain
Author
Owner

@duanhongyi commented on GitHub (Feb 26, 2025):

I solved this problem using the monkey patch, which allows me to utilize the original image with only runtime changes.

docker-compose.yml

  openwebui:
    image: ghcr.io/open-webui/open-webui:main
    environment:
      DATABASE_URL: postgres://xxxx:xxxxx@postgresql:5432/openwebui?sslmode=disable
      OPENAI_API_KEY: sk-aaaa
      OPENAI_API_BASE_URL: https://xxx.xxx.com/v1
    restart: always
    command:
    - /bin/bash
    - -c
    - |
      cat /app/backend/patch/main.py > /app/backend/open_webui/main.py.tmp
      echo "" >> /app/backend/open_webui/main.py.tmp
      cat /app/backend/open_webui/main.py >> /app/backend/open_webui/main.py.tmp
      mv /app/backend/open_webui/main.py.tmp /app/backend/open_webui/main.py
      /app/backend/start.sh
    volumes:
    - ./openwebui/data:/app/backend/data
    - ./openwebui/patch:/app/backend/patch

./openwebui/patch/main.py

async def modify_stream_content(original_stream):
    import json
    start_reasoning = True
    end_reasoning = True
    async for chunk in original_stream:
        try:
            decoded_chunk = chunk.decode('utf-8').lstrip('data: ').strip()
            chunk_data = json.loads(decoded_chunk)
            if "choices" in chunk_data and len(chunk_data["choices"]) > 0:
                delta = chunk_data["choices"][0].get("delta", {})
                if delta["content"] == None:
                    delta["content"] = ""
                if "reasoning_content" in delta and delta["reasoning_content"] == None:
                    delta["reasoning_content"] = ""
                if delta["content"] == "":
                    if start_reasoning:
                        delta["content"] = "<think>" + delta["content"] + delta["reasoning_content"]
                        start_reasoning = False
                    else:
                        delta["content"] = delta["content"] + delta["reasoning_content"]
                else:
                    if end_reasoning:
                        delta["content"] = "</think>" + delta["content"]
                        end_reasoning = False
            modified_chunk = f"data: {json.dumps(chunk_data)}\n\n".encode('utf-8')
        except (json.JSONDecodeError, KeyError) as e:
            modified_chunk = chunk
        yield modified_chunk


def patch_generate_chat_completion():
    from fastapi.responses import StreamingResponse
    from open_webui.routers import openai
    _generate_chat_completion = openai.generate_chat_completion

    @openai.router.post("/chat/completions")
    async def generate_chat_completion(*args, **kwargs):
        response = await _generate_chat_completion(*args, **kwargs)
        if isinstance(response, StreamingResponse):
            response.body_iterator = modify_stream_content(response.body_iterator)
        return response

    openai.generate_chat_completion = generate_chat_completion

patch_generate_chat_completion()
<!-- gh-comment-id:2684127343 --> @duanhongyi commented on GitHub (Feb 26, 2025): I solved this problem using the monkey patch, which allows me to utilize the original image with only runtime changes. ## docker-compose.yml ``` openwebui: image: ghcr.io/open-webui/open-webui:main environment: DATABASE_URL: postgres://xxxx:xxxxx@postgresql:5432/openwebui?sslmode=disable OPENAI_API_KEY: sk-aaaa OPENAI_API_BASE_URL: https://xxx.xxx.com/v1 restart: always command: - /bin/bash - -c - | cat /app/backend/patch/main.py > /app/backend/open_webui/main.py.tmp echo "" >> /app/backend/open_webui/main.py.tmp cat /app/backend/open_webui/main.py >> /app/backend/open_webui/main.py.tmp mv /app/backend/open_webui/main.py.tmp /app/backend/open_webui/main.py /app/backend/start.sh volumes: - ./openwebui/data:/app/backend/data - ./openwebui/patch:/app/backend/patch ``` ## ./openwebui/patch/main.py ``` async def modify_stream_content(original_stream): import json start_reasoning = True end_reasoning = True async for chunk in original_stream: try: decoded_chunk = chunk.decode('utf-8').lstrip('data: ').strip() chunk_data = json.loads(decoded_chunk) if "choices" in chunk_data and len(chunk_data["choices"]) > 0: delta = chunk_data["choices"][0].get("delta", {}) if delta["content"] == None: delta["content"] = "" if "reasoning_content" in delta and delta["reasoning_content"] == None: delta["reasoning_content"] = "" if delta["content"] == "": if start_reasoning: delta["content"] = "<think>" + delta["content"] + delta["reasoning_content"] start_reasoning = False else: delta["content"] = delta["content"] + delta["reasoning_content"] else: if end_reasoning: delta["content"] = "</think>" + delta["content"] end_reasoning = False modified_chunk = f"data: {json.dumps(chunk_data)}\n\n".encode('utf-8') except (json.JSONDecodeError, KeyError) as e: modified_chunk = chunk yield modified_chunk def patch_generate_chat_completion(): from fastapi.responses import StreamingResponse from open_webui.routers import openai _generate_chat_completion = openai.generate_chat_completion @openai.router.post("/chat/completions") async def generate_chat_completion(*args, **kwargs): response = await _generate_chat_completion(*args, **kwargs) if isinstance(response, StreamingResponse): response.body_iterator = modify_stream_content(response.body_iterator) return response openai.generate_chat_completion = generate_chat_completion patch_generate_chat_completion() ```
Author
Owner

@tomly2019m commented on GitHub (Feb 26, 2025):

我提供的方法支持关联知识库,具体请参考我之前的回复,只需要新增一个函数即可。效果图

Image

<!-- gh-comment-id:2684134779 --> @tomly2019m commented on GitHub (Feb 26, 2025): 我提供的方法支持关联知识库,具体请参考我之前的回复,只需要新增一个函数即可。效果图 ![Image](https://github.com/user-attachments/assets/96d2605f-de23-4757-a136-e2c59997d51d)
Author
Owner

@Hugh-yw commented on GitHub (Feb 26, 2025):

有一个解决方案可以以某种方式解决该问题:

Image

解决方案是使用管道函数:https://openwebui.com/f/zgccrui/deepseek_r1(我的建议是在代码中增加超时时间)

更多细节见论坛:https://linux.do/t/topic/383183,好像跟 openai 标准包有关

Your solution only supports the public cloud API. The deepseek r1 deployed locally through the sglang engine is incompatible

<!-- gh-comment-id:2684238940 --> @Hugh-yw commented on GitHub (Feb 26, 2025): > 有一个解决方案可以以某种方式解决该问题: > > ![Image](https://github.com/user-attachments/assets/f1ad87c7-244e-4873-9f81-41e0145dc559) > > 解决方案是使用管道函数:https://openwebui.com/f/zgccrui/deepseek_r1(我的建议是在代码中增加超时时间) > > 更多细节见论坛:https://linux.do/t/topic/383183,好像跟 openai 标准包有关 Your solution only supports the public cloud API. The deepseek r1 deployed locally through the sglang engine is incompatible
Author
Owner

@Hugh-yw commented on GitHub (Feb 26, 2025):

致那些在 r1 chattemplate 更新后丢失了开始标签的人: deepseek r1 的新聊天模板损坏了开始标签。在官方修复之前,我从 zgccrui 修改了一个管道,以兼容没有开始标签的后端推理服务。它目前对我来说运行良好,你可以试试。https ://openwebui.com/f/kinglywayne/deepseek_r1_thinkfix 所有贡献来自 zgccrui;我只是添加了开头的检查逻辑。zgccrui 的原始函数地址是https://openwebui.com/f/zgccrui/deepseek_r1

This solution can be applied to the local deployment of deepseek -r1, and the test is OK

<!-- gh-comment-id:2684246127 --> @Hugh-yw commented on GitHub (Feb 26, 2025): > 致那些在 r1 chattemplate 更新后丢失了开始标签的人: deepseek r1 的新聊天模板损坏了开始标签。在官方修复之前,我从 zgccrui 修改了一个管道,以兼容没有开始标签的后端推理服务。它目前对我来说运行良好,你可以试试。https [://openwebui.com/f/kinglywayne/deepseek_r1_thinkfix](https://openwebui.com/f/kinglywayne/deepseek_r1_thinkfix) 所有贡献来自 zgccrui;我只是添加了开头的检查逻辑。zgccrui 的原始函数地址是https://openwebui.com/f/zgccrui/deepseek_r1 This solution can be applied to the local deployment of deepseek -r1, and the test is OK
Author
Owner

@acejsk commented on GitHub (Feb 26, 2025):

Why not use Filter?
Will using filters cause performance issues?

class Filter:
...
def outlet(self, body: dict, user: Optional[dict] = None) -> dict:
# Modify or analyze the response body after processing by the API.
# This function is the post-processor for the API, which can be used to modify the response

    for message in body["messages"]:
        if message["role"] == "assistant":  # Target model response
            message["content"] = re.sub(r"<think>.*?</think>", "", message["content"], flags=re.DOTALL)
    return body

chat history needs to be modified

<!-- gh-comment-id:2684299917 --> @acejsk commented on GitHub (Feb 26, 2025): Why not use Filter? Will using filters cause performance issues? class Filter: ... def outlet(self, body: dict, __user__: Optional[dict] = None) -> dict: # Modify or analyze the response body after processing by the API. # This function is the post-processor for the API, which can be used to modify the response for message in body["messages"]: if message["role"] == "assistant": # Target model response message["content"] = re.sub(r"<think>.*?</think>", "", message["content"], flags=re.DOTALL) return body chat history needs to be modified
Author
Owner

@Hugh-yw commented on GitHub (Feb 27, 2025):

For those who lost the opening tag after r1 chattemplate update: The new chat template of deepseek r1 has broken the starting tag. Before the official fix, I modified a pipeline from zgccrui to be compatible with the backend inference service without the starting tag. It is currently running well for me, and you can take a try. https://openwebui.com/f/kinglywayne/deepseek_r1_thinkfix All contributions come from zgccrui; I just added the check logic for the beginning . The original function address of zgccrui is https://openwebui.com/f/zgccrui/deepseek_r1

@KinglyWayne How is the online search function implemented? The online search is abnormal with the current configuration. No search results found

<!-- gh-comment-id:2686630866 --> @Hugh-yw commented on GitHub (Feb 27, 2025): > For those who lost the opening tag after r1 chattemplate update: The new chat template of deepseek r1 has broken the starting tag. Before the official fix, I modified a pipeline from zgccrui to be compatible with the backend inference service without the starting tag. It is currently running well for me, and you can take a try. https://openwebui.com/f/kinglywayne/deepseek_r1_thinkfix All contributions come from zgccrui; I just added the check logic for the beginning . The original function address of zgccrui is https://openwebui.com/f/zgccrui/deepseek_r1 @KinglyWayne How is the online search function implemented? The online search is abnormal with the current configuration. No search results found
Author
Owner

@KinglyWayne commented on GitHub (Feb 27, 2025):

For those who lost the opening tag after r1 chattemplate update: The new chat template of deepseek r1 has broken the starting tag. Before the official fix, I modified a pipeline from zgccrui to be compatible with the backend inference service without the starting tag. It is currently running well for me, and you can take a try. https://openwebui.com/f/kinglywayne/deepseek_r1_thinkfix All contributions come from zgccrui; I just added the check logic for the beginning . The original function address of zgccrui is https://openwebui.com/f/zgccrui/deepseek_r1

@KinglyWayne How is the online search function implemented? The online search is abnormal with the current configuration. No search results found

Online search is a completely independent function. As far as I know, if you enable online search, you may need to update to the latest version, as a certain last version may have disrupted the split search function by default. Anyway, these have nothing to do with this pipeline.

<!-- gh-comment-id:2687063865 --> @KinglyWayne commented on GitHub (Feb 27, 2025): > > For those who lost the opening tag after r1 chattemplate update: The new chat template of deepseek r1 has broken the starting tag. Before the official fix, I modified a pipeline from zgccrui to be compatible with the backend inference service without the starting tag. It is currently running well for me, and you can take a try. https://openwebui.com/f/kinglywayne/deepseek_r1_thinkfix All contributions come from zgccrui; I just added the check logic for the beginning . The original function address of zgccrui is https://openwebui.com/f/zgccrui/deepseek_r1 > > [@KinglyWayne](https://github.com/KinglyWayne) How is the online search function implemented? The online search is abnormal with the current configuration. No search results found Online search is a completely independent function. As far as I know, if you enable online search, you may need to update to the latest version, as a certain last version may have disrupted the split search function by default. Anyway, these have nothing to do with this pipeline.
Author
Owner

@he0119 commented on GitHub (Feb 28, 2025):

After v0.5.17 released, we can use the new stream filter to do this.

Perhaps there are some specific scenarios that haven't been considered. Waiting for someone more experienced to improve the script.

https://openwebui.com/f/he0119/reasoning

class Filter:
    detect_reasoning_content = {}

    def stream(self, event: dict) -> dict:
        event_id = event.get("id")
        for choice in event.get("choices", []):
            delta = choice.get("delta", {})

            value = delta.get("content", "")
            reasoning_value = delta.get("reasoning_content")
            if self.detect_reasoning_content.get(event_id, False):
                if reasoning_value is None:
                    if event_id in self.detect_reasoning_content:
                        del self.detect_reasoning_content[event_id]
                    delta["content"] = f"</think>\n{value}"
                else:
                    delta["content"] = reasoning_value
            elif reasoning_value is not None:
                self.detect_reasoning_content[event_id] = True
                delta["content"] = f"<think>\n{reasoning_value}"
        return event

<!-- gh-comment-id:2689512553 --> @he0119 commented on GitHub (Feb 28, 2025): After v0.5.17 released, we can use the new stream filter to do this. Perhaps there are some specific scenarios that haven't been considered. Waiting for someone more experienced to improve the script. <https://openwebui.com/f/he0119/reasoning> ```python class Filter: detect_reasoning_content = {} def stream(self, event: dict) -> dict: event_id = event.get("id") for choice in event.get("choices", []): delta = choice.get("delta", {}) value = delta.get("content", "") reasoning_value = delta.get("reasoning_content") if self.detect_reasoning_content.get(event_id, False): if reasoning_value is None: if event_id in self.detect_reasoning_content: del self.detect_reasoning_content[event_id] delta["content"] = f"</think>\n{value}" else: delta["content"] = reasoning_value elif reasoning_value is not None: self.detect_reasoning_content[event_id] = True delta["content"] = f"<think>\n{reasoning_value}" return event ```
Author
Owner

@duanhongyi commented on GitHub (Feb 28, 2025):

After v0.5.17 released, we can use the new stream filter to do this.

https://openwebui.com/f/he0119/reasoning

class Filter:
detect_reasoning_content = {}

def stream(self, event: dict) -> dict:
    event_id = event.get("id")
    for choice in event.get("choices", []):
        delta = choice.get("delta", {})
        if delta.get("role") != "assistant":
            continue

        reasoning_value = delta.get("reasoning_content")
        if self.detect_reasoning_content.get(event_id, False):
            if reasoning_value is None:
                self.detect_reasoning_content[event_id] = False
                delta["content"] = f"</think>\n{delta.get('content', '')}"
            else:
                delta["content"] = reasoning_value
        elif reasoning_value is not None:
            self.detect_reasoning_content[event_id] = True
            delta["content"] = f"<think>\n{reasoning_value}"

    return event

Best answer

<!-- gh-comment-id:2689606232 --> @duanhongyi commented on GitHub (Feb 28, 2025): > After v0.5.17 released, we can use the new stream filter to do this. > > https://openwebui.com/f/he0119/reasoning > > class Filter: > detect_reasoning_content = {} > > def stream(self, event: dict) -> dict: > event_id = event.get("id") > for choice in event.get("choices", []): > delta = choice.get("delta", {}) > if delta.get("role") != "assistant": > continue > > reasoning_value = delta.get("reasoning_content") > if self.detect_reasoning_content.get(event_id, False): > if reasoning_value is None: > self.detect_reasoning_content[event_id] = False > delta["content"] = f"</think>\n{delta.get('content', '')}" > else: > delta["content"] = reasoning_value > elif reasoning_value is not None: > self.detect_reasoning_content[event_id] = True > delta["content"] = f"<think>\n{reasoning_value}" > > return event Best answer
Author
Owner

@i-iooi-i commented on GitHub (Feb 28, 2025):

@he0119 大佬,我添加了这个函数,思维链是有呈现出来,但是回答的内容也被折叠进去了。
而且回答完了,还是一直会显示“正在思考”转圈圈。 🙈🙈

https://github.com/user-attachments/assets/c3296495-b605-424f-9ad9-75b2f877c77c

<!-- gh-comment-id:2689636642 --> @i-iooi-i commented on GitHub (Feb 28, 2025): @he0119 大佬,我添加了这个函数,思维链是有呈现出来,但是回答的内容也被折叠进去了。 而且回答完了,还是一直会显示“正在思考”转圈圈。 🙈🙈 https://github.com/user-attachments/assets/c3296495-b605-424f-9ad9-75b2f877c77c
Author
Owner

@he0119 commented on GitHub (Feb 28, 2025):

@he0119 大佬,我添加了这个函数,思维链是有呈现出来,但是回答的内容也被折叠进去了。 而且回答完了,还是一直会显示“正在思考”转圈圈。 🙈🙈

也许有什么特别的情况没考虑到,等一个大佬完善脚本吧,我只遇到过输出有 <think> 标签出错的情况🤣

<!-- gh-comment-id:2689640863 --> @he0119 commented on GitHub (Feb 28, 2025): > [@he0119](https://github.com/he0119) 大佬,我添加了这个函数,思维链是有呈现出来,但是回答的内容也被折叠进去了。 而且回答完了,还是一直会显示“正在思考”转圈圈。 🙈🙈 也许有什么特别的情况没考虑到,等一个大佬完善脚本吧,我只遇到过输出有 `<think>` 标签出错的情况🤣
Author
Owner

@i-iooi-i commented on GitHub (Feb 28, 2025):

@he0119 嗯嗯 好的。 😂😂

<!-- gh-comment-id:2689642171 --> @i-iooi-i commented on GitHub (Feb 28, 2025): @he0119 嗯嗯 好的。 😂😂
Author
Owner

@duanhongyi commented on GitHub (Feb 28, 2025):

@i-iooi-i

试试这个看看,event id在标签闭合后应该删除。

class Filter:
    detect_reasoning_content = {}

    def stream(self, event: dict) -> dict:
        event_id = event.get("id")
        for choice in event.get("choices", []):
            delta = choice.get("delta")
            value = delta.get("content", None)
            reasoning_value = delta.get("reasoning_content", None)
            if self.detect_reasoning_content.get(event_id, False):
                if reasoning_value is not None:
                    delta["content"] = reasoning_value
                else:
                    if event_id in self.detect_reasoning_content:
                        del self.detect_reasoning_content[event_id]
                    delta["content"] = f"</think>\n{value}"
            elif reasoning_value is not None:
                self.detect_reasoning_content[event_id] = True
                delta["content"] = f"<think>\n{reasoning_value}"
        return event

<!-- gh-comment-id:2689662272 --> @duanhongyi commented on GitHub (Feb 28, 2025): @i-iooi-i 试试这个看看,event id在标签闭合后应该删除。 ``` class Filter: detect_reasoning_content = {} def stream(self, event: dict) -> dict: event_id = event.get("id") for choice in event.get("choices", []): delta = choice.get("delta") value = delta.get("content", None) reasoning_value = delta.get("reasoning_content", None) if self.detect_reasoning_content.get(event_id, False): if reasoning_value is not None: delta["content"] = reasoning_value else: if event_id in self.detect_reasoning_content: del self.detect_reasoning_content[event_id] delta["content"] = f"</think>\n{value}" elif reasoning_value is not None: self.detect_reasoning_content[event_id] = True delta["content"] = f"<think>\n{reasoning_value}" return event ```
Author
Owner

@i-iooi-i commented on GitHub (Feb 28, 2025):

https://github.com/user-attachments/assets/73c42fb9-b3c7-4d3d-b64c-a86dac6413b5

可以了大佬👍
开启函数,在函数设置里开启“全局”(关闭全局好像就没有思维链了)
提出一个问题,可能是Deepseek官方服务器的问题,有延迟,等待了一会儿,就能看到思维链与输出结果了。 🙇

<!-- gh-comment-id:2689675062 --> @i-iooi-i commented on GitHub (Feb 28, 2025): https://github.com/user-attachments/assets/73c42fb9-b3c7-4d3d-b64c-a86dac6413b5 可以了大佬👍! 开启函数,在函数设置里开启“全局”(关闭全局好像就没有思维链了) 提出一个问题,可能是Deepseek官方服务器的问题,有延迟,等待了一会儿,就能看到思维链与输出结果了。 🙇‍
Author
Owner

@cdmusic2019 commented on GitHub (Feb 28, 2025):

测试可以正常使用了。感谢! @duanhongyi @he0119

<!-- gh-comment-id:2689682877 --> @cdmusic2019 commented on GitHub (Feb 28, 2025): 测试可以正常使用了。感谢! @duanhongyi @he0119
Author
Owner

@FFDVDGD commented on GitHub (Feb 28, 2025):

@i-iooi-i

试试这个看看,event id在标签闭合后应该删除。

class Filter:
    detect_reasoning_content = {}

    def stream(self, event: dict) -> dict:
        event_id = event.get("id")
        for choice in event.get("choices", []):
            delta = choice.get("delta")
            value = delta.get("content", None)
            reasoning_value = delta.get("reasoning_content", None)
            if self.detect_reasoning_content.get(event_id, False):
                if reasoning_value is not None:
                    delta["content"] = reasoning_value
                else:
                    if event_id in self.detect_reasoning_content:
                        del self.detect_reasoning_content[event_id]
                    delta["content"] = f"</think>\n{value}"
            elif reasoning_value is not None:
                self.detect_reasoning_content[event_id] = True
                delta["content"] = f"<think>\n{reasoning_value}"
        return event

大佬您好,在使用了脚本之后显示思考过程但不显示结果。用的是阿里的api。

Image

<!-- gh-comment-id:2690599506 --> @FFDVDGD commented on GitHub (Feb 28, 2025): > [@i-iooi-i](https://github.com/i-iooi-i) > > 试试这个看看,event id在标签闭合后应该删除。 > > ``` > class Filter: > detect_reasoning_content = {} > > def stream(self, event: dict) -> dict: > event_id = event.get("id") > for choice in event.get("choices", []): > delta = choice.get("delta") > value = delta.get("content", None) > reasoning_value = delta.get("reasoning_content", None) > if self.detect_reasoning_content.get(event_id, False): > if reasoning_value is not None: > delta["content"] = reasoning_value > else: > if event_id in self.detect_reasoning_content: > del self.detect_reasoning_content[event_id] > delta["content"] = f"</think>\n{value}" > elif reasoning_value is not None: > self.detect_reasoning_content[event_id] = True > delta["content"] = f"<think>\n{reasoning_value}" > return event > ``` 大佬您好,在使用了脚本之后显示思考过程但不显示结果。用的是阿里的api。 ![Image](https://github.com/user-attachments/assets/2a5c4217-32b7-44a3-8616-c6369b03dffe)
Author
Owner

@duanhongyi commented on GitHub (Feb 28, 2025):

@FFDVDGD

阿里的返回值略有不同,我稍微改了一下,兼容两种模式,你试一试:

class Filter:
    detect_reasoning_content = {}

    def stream(self, event: dict) -> dict:
        event_id = event.get("id")
        for choice in event.get("choices", []):
            delta = choice.get("delta")
            reasoning_value = delta.get("reasoning_content", "")
            if reasoning_value:
                if self.detect_reasoning_content.get(event_id, False):
                    delta["content"] = reasoning_value
                else:
                    self.detect_reasoning_content[event_id] = True
                    delta["content"] = f"<think>\n{reasoning_value}"
            elif self.detect_reasoning_content.get(event_id, False):
                if event_id in self.detect_reasoning_content:
                    del self.detect_reasoning_content[event_id]
                value = delta.get("content", "")
                delta["content"] = f"</think>\n{value}"
        return event

<!-- gh-comment-id:2690947539 --> @duanhongyi commented on GitHub (Feb 28, 2025): @FFDVDGD 阿里的返回值略有不同,我稍微改了一下,兼容两种模式,你试一试: ``` class Filter: detect_reasoning_content = {} def stream(self, event: dict) -> dict: event_id = event.get("id") for choice in event.get("choices", []): delta = choice.get("delta") reasoning_value = delta.get("reasoning_content", "") if reasoning_value: if self.detect_reasoning_content.get(event_id, False): delta["content"] = reasoning_value else: self.detect_reasoning_content[event_id] = True delta["content"] = f"<think>\n{reasoning_value}" elif self.detect_reasoning_content.get(event_id, False): if event_id in self.detect_reasoning_content: del self.detect_reasoning_content[event_id] value = delta.get("content", "") delta["content"] = f"</think>\n{value}" return event ```
Author
Owner

@qsqfcg commented on GitHub (Mar 12, 2025):

使用open-wenui的管道处理功能,对api返回的数据进行二次处理,可以解决没有think标签的问题。

不过管道功能不知道为什么连接不了本地的vllm服务器,出现错误 所有连接失败

我如果在open-wenui里面直接添加本地vllm服务器,没有<think>标签,倒是有</think>标签。但是用管道的话,我看了这篇文章,https://hadb.me/posts/2025/display-deepseek-r1-thinking 问问题直接报错。

{"error": "Traceback (most recent call last):\n File "/usr/local/lib/python3.11/site-packages/httpx/_transports/default.py", line 101, in map_httpcore_exceptions\n yield\n File "/usr/local/lib/python3.11/site-packages/httpx/_transports/default.py", line 394, in handle_async_request\n resp = await self._pool.handle_async_request(req)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/usr/local/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 256, in handle_async_request\n raise exc from None\n File "/usr/local/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 236, in handle_async_request\n response = await connection.handle_async_request(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/usr/local/lib/python3.11/site-packages/httpcore/_async/connection.py", line 103, in handle_async_request\n return await self._connection.handle_async_request(request)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/usr/local/lib/python3.11/site-packages/httpcore/_async/http11.py", line 136, in handle_async_request\n raise exc\n File "/usr/local/lib/python3.11/site-packages/httpcore/_async/http11.py", line 86, in handle_async_request\n await self._send_request_headers(**kwargs)\n File "/usr/local/lib/python3.11/site-packages/httpcore/_async/http11.py", line 144, in _send_request_headers\n with map_exceptions({h11.LocalProtocolError: LocalProtocolError}):\n File "/usr/local/lib/python3.11/contextlib.py", line 158, in exit\n self.gen.throw(typ, value, traceback)\n File "/usr/local/lib/python3.11/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions\n raise to_exc(exc) from exc\nhttpcore.LocalProtocolError: Illegal header value b'Bearer '\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "<string>", line 109, in pipe\n File "/usr/local/lib/python3.11/contextlib.py", line 210, in aenter\n return await anext(self.gen)\n ^^^^^^^^^^^^^^^^^^^^^\n File "/usr/local/lib/python3.11/site-packages/httpx/_client.py", line 1583, in stream\n response = await self.send(\n ^^^^^^^^^^^^^^^^\n File "/usr/local/lib/python3.11/site-packages/httpx/_client.py", line 1629, in send\n response = await self._send_handling_auth(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/usr/local/lib/python3.11/site-packages/httpx/_client.py", line 1657, in _send_handling_auth\n response = await self._send_handling_redirects(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/usr/local/lib/python3.11/site-packages/httpx/_client.py", line 1694, in _send_handling_redirects\n response = await self._send_single_request(request)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/usr/local/lib/python3.11/site-packages/httpx/_client.py", line 1730, in _send_single_request\n response = await transport.handle_async_request(request)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/usr/local/lib/python3.11/site-packages/httpx/_transports/default.py", line 393, in handle_async_request\n with map_httpcore_exceptions():\n File "/usr/local/lib/python3.11/contextlib.py", line 158, in exit\n self.gen.throw(typ, value, traceback)\n File "/usr/local/lib/python3.11/site-packages/httpx/_transports/default.py", line 118, in map_httpcore_exceptions\n raise mapped_exc(message) from exc\nhttpx.LocalProtocolError: Illegal header value b'Bearer '\n"}
<!-- gh-comment-id:2716979913 --> @qsqfcg commented on GitHub (Mar 12, 2025): > 使用open-wenui的管道处理功能,对api返回的数据进行二次处理,可以解决没有think标签的问题。 > > 不过管道功能不知道为什么连接不了本地的vllm服务器,出现错误 所有连接失败 我如果在open-wenui里面直接添加本地vllm服务器,没有`<think>`标签,倒是有`</think>`标签。但是用管道的话,我看了这篇文章,https://hadb.me/posts/2025/display-deepseek-r1-thinking 问问题直接报错。 ``` {"error": "Traceback (most recent call last):\n File "/usr/local/lib/python3.11/site-packages/httpx/_transports/default.py", line 101, in map_httpcore_exceptions\n yield\n File "/usr/local/lib/python3.11/site-packages/httpx/_transports/default.py", line 394, in handle_async_request\n resp = await self._pool.handle_async_request(req)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/usr/local/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 256, in handle_async_request\n raise exc from None\n File "/usr/local/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 236, in handle_async_request\n response = await connection.handle_async_request(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/usr/local/lib/python3.11/site-packages/httpcore/_async/connection.py", line 103, in handle_async_request\n return await self._connection.handle_async_request(request)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/usr/local/lib/python3.11/site-packages/httpcore/_async/http11.py", line 136, in handle_async_request\n raise exc\n File "/usr/local/lib/python3.11/site-packages/httpcore/_async/http11.py", line 86, in handle_async_request\n await self._send_request_headers(**kwargs)\n File "/usr/local/lib/python3.11/site-packages/httpcore/_async/http11.py", line 144, in _send_request_headers\n with map_exceptions({h11.LocalProtocolError: LocalProtocolError}):\n File "/usr/local/lib/python3.11/contextlib.py", line 158, in exit\n self.gen.throw(typ, value, traceback)\n File "/usr/local/lib/python3.11/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions\n raise to_exc(exc) from exc\nhttpcore.LocalProtocolError: Illegal header value b'Bearer '\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "<string>", line 109, in pipe\n File "/usr/local/lib/python3.11/contextlib.py", line 210, in aenter\n return await anext(self.gen)\n ^^^^^^^^^^^^^^^^^^^^^\n File "/usr/local/lib/python3.11/site-packages/httpx/_client.py", line 1583, in stream\n response = await self.send(\n ^^^^^^^^^^^^^^^^\n File "/usr/local/lib/python3.11/site-packages/httpx/_client.py", line 1629, in send\n response = await self._send_handling_auth(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/usr/local/lib/python3.11/site-packages/httpx/_client.py", line 1657, in _send_handling_auth\n response = await self._send_handling_redirects(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/usr/local/lib/python3.11/site-packages/httpx/_client.py", line 1694, in _send_handling_redirects\n response = await self._send_single_request(request)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/usr/local/lib/python3.11/site-packages/httpx/_client.py", line 1730, in _send_single_request\n response = await transport.handle_async_request(request)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/usr/local/lib/python3.11/site-packages/httpx/_transports/default.py", line 393, in handle_async_request\n with map_httpcore_exceptions():\n File "/usr/local/lib/python3.11/contextlib.py", line 158, in exit\n self.gen.throw(typ, value, traceback)\n File "/usr/local/lib/python3.11/site-packages/httpx/_transports/default.py", line 118, in map_httpcore_exceptions\n raise mapped_exc(message) from exc\nhttpx.LocalProtocolError: Illegal header value b'Bearer '\n"} ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#31055