[GH-ISSUE #16217] feat: Allow Editing of Reasoning/Thinking Section in Chat Mode #56492

Open
opened 2026-05-05 19:31:18 -05:00 by GiteaMirror · 17 comments
Owner

Originally created by @OracleToes on GitHub (Aug 1, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/16217

Check Existing Issues

  • I have searched the existing issues and discussions.

Problem Description

Currently, when interacting with an LLM in the Open WebUI chat interface, the model's reasoning process is displayed within a collapsible "thinking" section (using <details> and <think> tags). While this section can be opened to view the live token stream during the reasoning process, the contents cannot be directly edited when attempting to modify a message. In edit mode, the reasoning section appears as a placeholder like <details id="__DETAIL_0__"/>. This prevents users from refining or correcting the LLM's thought process within the context of an ongoing conversation.
This functionality is available in the Open WebUI playground, suggesting it's technically feasible but not implemented in the chat interface.

I am aware of #9034 and #9044 but for those running local models, this limitation is irrelevant.

Desired Solution you'd like

The ideal solution should enable direct editing of the content within the <details> or <think> tags during message modification in the Open WebUI chat interface. This would provide users, especially those running local models, with greater control over the LLM's thought process.

Allowing this editing would facilitate correction of errors or logical flaws within the reasoning steps, and enable much finer control/steering of the model.

I find this to be important because even if you're running local models, time is still a valuable resourceBeing able to stop the model after it's already generated a good chunk of reasoning is often better than regenerating the entire response.

Alternatives Considered

This could be a toggle in the Admin Panel, so that it can be disabled for those who want to restrict this function for their users.

Additional Context

When manually typing out a <think> section and then submitting the message, the browser running the webui tab will lock up and become unresponsive.

Originally created by @OracleToes on GitHub (Aug 1, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/16217 ### Check Existing Issues - [x] I have searched the existing issues and discussions. ### Problem Description Currently, when interacting with an LLM in the Open WebUI chat interface, the model's reasoning process is displayed within a collapsible "thinking" section (using `<details>` and `<think>` tags). While this section can be opened to view the live token stream during the reasoning process, the contents cannot be directly edited when attempting to modify a message. In edit mode, the reasoning section appears as a placeholder like `<details id="__DETAIL_0__"/>`. This prevents users from refining or correcting the LLM's thought process within the context of an ongoing conversation. This functionality is available in the Open WebUI playground, suggesting it's technically feasible but not implemented in the chat interface. I am aware of #9034 and #9044 but for those running local models, this limitation is irrelevant. ### Desired Solution you'd like The ideal solution should enable direct editing of the content within the `<details>` or `<think>` tags during message modification in the Open WebUI chat interface. This would provide users, especially those running local models, with greater control over the LLM's thought process. Allowing this editing would facilitate correction of errors or logical flaws within the reasoning steps, and enable much finer control/steering of the model. I find this to be important because even if you're running local models, time is still a valuable resourceBeing able to stop the model after it's already generated a good chunk of reasoning is often better than _**regenerating the entire response**_. ### Alternatives Considered This could be a toggle in the Admin Panel, so that it can be disabled for those who want to restrict this function for their users. ### Additional Context When manually typing out a `<think>` section and then submitting the message, the browser running the webui tab will lock up and become unresponsive.
Author
Owner

@jdwx commented on GitHub (Aug 19, 2025):

I have run into a similar issue. However, because thinking/details aren't carried forward to future generation requests, implementation of this would probably also need some way to differentiate between regenerating the whole response (including reasoning) and regenerating the message content starting after the reasoning section, reusing the existing reasoning. (Which might also be generally useful with reasoning models.)

<!-- gh-comment-id:3201844955 --> @jdwx commented on GitHub (Aug 19, 2025): I have run into a similar issue. However, because thinking/details aren't carried forward to future generation requests, implementation of this would probably also need some way to differentiate between regenerating the whole response (including reasoning) and regenerating the message content starting **after** the reasoning section, reusing the existing reasoning. (Which might also be generally useful with reasoning models.)
Author
Owner

@Classic298 commented on GitHub (Aug 19, 2025):

Allowing this editing would facilitate correction of errors or logical flaws within the reasoning steps, and enable much finer control/steering of the model.

But how, since the reasoning is never sent back to the model. Any edits are irrelevant as the reasoning text of old responses is never sent to the model.

What would be the usecase for this? I really do not see it.

<!-- gh-comment-id:3201948146 --> @Classic298 commented on GitHub (Aug 19, 2025): > Allowing this editing would facilitate correction of errors or logical flaws within the reasoning steps, and enable much finer control/steering of the model. But how, since the reasoning is never sent back to the model. Any edits are irrelevant as the reasoning text of old responses is never sent to the model. What would be the usecase for this? I really do not see it.
Author
Owner

@jdwx commented on GitHub (Aug 19, 2025):

@Classic298 I am, and I believe OracleToes is also, referring to editing thinking for the current response, the one where the thinking is still part of the context, because it's all part of the same response.

User: How many R's in raspberry?

Assistant: <thinking>Hmm, the user seems to be asking about the storage of single-board computers.</thinking> Well, recent models of the Pi can support quite a bit of RAM, which can hold a lot of R's! The specifics vary by model. Was there a particular model you're interested in?

What I think we're asking for is the ability to edit the thinking to:

Assistant: <thinking>Hmm, the user seems to be asking about the spelling of the name of the fruit.</thinking>

And then have "Regenerate" pick up from there. Or, if you prefer, "Continue Response" from that point. (Although I would really, really like the option to keep the thinking and regenerate the rest of the response.)

This is obviously a simple, contrived example, but it's incredibly painful to see a large model spend 90 seconds "reasoning," get one detail backwards that tanks the response, and have no way to fix it that doesn't involve regenerating the whole response, waiting though another 90 seconds of thinking, and hoping it doesn't make the same mistake again.

I guess it technically doesn't have to be limited to the current response, but you're right that editing the thinking of any response but the current response would have no effect unless you branch the conversation there, making it "current" again. In which case it might be quite important.

Does that make sense?

<!-- gh-comment-id:3202048814 --> @jdwx commented on GitHub (Aug 19, 2025): @Classic298 I am, and I believe OracleToes is also, referring to editing thinking for the current response, the one where the thinking **is** still part of the context, because it's all part of the same response. ``` User: How many R's in raspberry? Assistant: <thinking>Hmm, the user seems to be asking about the storage of single-board computers.</thinking> Well, recent models of the Pi can support quite a bit of RAM, which can hold a lot of R's! The specifics vary by model. Was there a particular model you're interested in? ``` What I think we're asking for is the ability to edit the thinking to: ``` Assistant: <thinking>Hmm, the user seems to be asking about the spelling of the name of the fruit.</thinking> ``` And then have "Regenerate" pick up from there. Or, if you prefer, "Continue Response" from that point. (Although I would really, really like the option to keep the thinking and regenerate the rest of the response.) This is obviously a simple, contrived example, but it's incredibly painful to see a large model spend 90 seconds "reasoning," get one detail backwards that tanks the response, and have no way to fix it that doesn't involve regenerating the whole response, waiting though another 90 seconds of thinking, and hoping it doesn't make the same mistake again. I guess it technically doesn't have to be limited to the current response, but you're right that editing the thinking of any response but the current response would have no effect **unless** you branch the conversation there, making it "current" again. In which case it might be quite important. Does that make sense?
Author
Owner

@jdwx commented on GitHub (Aug 19, 2025):

Actually, I just tested this, and it does seem like if you edit the response at all and then select continue response, that the previous thinking is completely discarded and the response is continued without any reasoning at all.

Is that intentional? It makes perfect sense that the reasoning text of old responses isn't sent to the model. But the reasoning text of the current response is quite a different matter.

Losing it as well is certainly not what I would have expected, nor the behavior I would want.

<!-- gh-comment-id:3202080719 --> @jdwx commented on GitHub (Aug 19, 2025): Actually, I just tested this, and it **does** seem like if you edit the response at all and then select continue response, that the previous thinking is completely discarded and the response is continued without any reasoning at all. Is that intentional? It makes perfect sense that the reasoning text of **old** responses isn't sent to the model. But the reasoning text of the current response is quite a different matter. Losing it as well is certainly not what I would have expected, nor the behavior I would want.
Author
Owner

@Classic298 commented on GitHub (Aug 20, 2025):

And then have "Regenerate" pick up from there. Or, if you prefer, "Continue Response" from that point. (Although I would really, really like the option to keep the thinking and regenerate the rest of the response.)

Regenerate would regenerate it - totally new, with any old generated messages obviously discarded.

Continue Response - conceptually it makes sense, but when would this be usable? In scenarios of external models (openai connectivity) this isn't possible at all.

And for local models (ollama) it would only even be feasible if you have a very slow model (like 1 token per second) and you are actually able to click the stop button in time to stop the response.

<!-- gh-comment-id:3204355165 --> @Classic298 commented on GitHub (Aug 20, 2025): > And then have "Regenerate" pick up from there. Or, if you prefer, "Continue Response" from that point. (Although I would really, really like the option to keep the thinking and regenerate the rest of the response.) Regenerate would regenerate it - totally new, with any old generated messages obviously discarded. Continue Response - conceptually it makes sense, but when would this be usable? In scenarios of external models (openai connectivity) this isn't possible at all. And for local models (ollama) it would only even be feasible if you have a very slow model (like 1 token per second) and you are actually able to click the stop button in time to stop the response.
Author
Owner

@rgaricano commented on GitHub (Aug 20, 2025):

maybe a workaround to test could be send to llm:
regenerate response but using this modified thinking: xxxxxx

When I tried this way (qwen3:8b) model responses are faster and showing in details the new thinking that I sent.
Test it, and if it work in this way maybe could be interesting some integration for modify thinking and regenerate new responses just using it.
Maybe an inconvenient could be the different use of thinking/reasoning tags by models.

<!-- gh-comment-id:3205431350 --> @rgaricano commented on GitHub (Aug 20, 2025): maybe a workaround to test could be send to llm: _regenerate response but using this modified thinking: xxxxxx_ When I tried this way (qwen3:8b) model responses are faster and showing in details the new thinking that I sent. Test it, and if it work in this way maybe could be interesting some integration for modify thinking and regenerate new responses just using it. Maybe an inconvenient could be the different use of thinking/reasoning tags by models.
Author
Owner

@jdwx commented on GitHub (Aug 20, 2025):

I will be the first to admit I have literally never used Open WebUI with a closed model provider like OpenAI. I use it with local models, usually on llama.cpp, ik_llama.cpp, or VLLM. So it sounds like maybe there is some difference of perspective.

In the local model context, <think> is just another token that appears in the response. So there is no problem at all with sending a chat continuation API call with <think>...</think>-enclosed content in the response being continued. I do it all the time in programs that access the API directly. I find myself missing this capability when I'm using Open WebUI.

As far as I can tell, the core of this request is the ability to easily get from this conversation:

User: Hey, can you answer some questions for me?

Assistant: Sure! Go ahead when you're ready.

User: How many R's in raspberry?

Assistant: <think>Hmm, the user seems to be asking about the storage of single-board computers.</think> Well, recent models of the Pi can support quite a bit of RAM, which can hold a lot of R's! The specifics vary by model. Was there a particular model you're interested in?

to this API submission to http://my.server:12345/v1/chat/completions:

{
   ...generation params...
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Hey, can you answer some questions for me?"
    },
    {
      "role": "assistant",
      "content": "Sure! Go ahead when you're ready."
    },
    {
      "role": "user",
      "content": "How many R's in raspberry?"
    },
    {
      "role": "assistant",
      "content": "<think>Hmm, the user seems to be asking about the spelling of the name of the fruit.</think>"
    },
  ]
}
<!-- gh-comment-id:3206832489 --> @jdwx commented on GitHub (Aug 20, 2025): I will be the first to admit I have literally never used Open WebUI with a closed model provider like OpenAI. I use it with local models, usually on llama.cpp, ik_llama.cpp, or VLLM. So it sounds like maybe there is some difference of perspective. In the local model context, `<think>` is just another token that appears in the response. So there is no problem at all with sending a chat continuation API call with `<think>...</think>`-enclosed content in the response being continued. I do it all the time in programs that access the API directly. I find myself missing this capability when I'm using Open WebUI. As far as I can tell, the core of this request is the ability to easily get from this conversation: > User: Hey, can you answer some questions for me? > > Assistant: Sure! Go ahead when you're ready. > > User: How many R's in raspberry? > > Assistant: `<think>`Hmm, the user seems to be asking about the storage of single-board computers.`</think>` Well, recent models of the Pi can support quite a bit of RAM, which can hold a lot of R's! The specifics vary by model. Was there a particular model you're interested in? to this API submission to http://my.server:12345/v1/chat/completions: ```json { ...generation params... "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Hey, can you answer some questions for me?" }, { "role": "assistant", "content": "Sure! Go ahead when you're ready." }, { "role": "user", "content": "How many R's in raspberry?" }, { "role": "assistant", "content": "<think>Hmm, the user seems to be asking about the spelling of the name of the fruit.</think>" }, ] } ```
Author
Owner

@tom9358 commented on GitHub (Mar 3, 2026):

I will be the first to admit I have literally never used Open WebUI with a closed model provider like OpenAI. I use it with local models, usually on llama.cpp, ik_llama.cpp, or VLLM. So it sounds like maybe there is some difference of perspective.

In the local model context, <think> is just another token that appears in the response. So there is no problem at all with sending a chat continuation API call with <think>...</think>-enclosed content in the response being continued. I do it all the time in programs that access the API directly. I find myself missing this capability when I'm using Open WebUI.

As far as I can tell, the core of this request is the ability to easily get from this conversation:

User: Hey, can you answer some questions for me?
Assistant: Sure! Go ahead when you're ready.
User: How many R's in raspberry?
Assistant: <think>Hmm, the user seems to be asking about the storage of single-board computers.</think> Well, recent models of the Pi can support quite a bit of RAM, which can hold a lot of R's! The specifics vary by model. Was there a particular model you're interested in?

to this API submission to http://my.server:12345/v1/chat/completions:

{
...generation params...
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hey, can you answer some questions for me?"
},
{
"role": "assistant",
"content": "Sure! Go ahead when you're ready."
},
{
"role": "user",
"content": "How many R's in raspberry?"
},
{
"role": "assistant",
"content": "Hmm, the user seems to be asking about the spelling of the name of the fruit."
},
]
}

Do you know if this is also how the local models work when used with openwebui? i.e., that the thoughts text is included in the chat history with the next prompt?

<!-- gh-comment-id:3993130007 --> @tom9358 commented on GitHub (Mar 3, 2026): > I will be the first to admit I have literally never used Open WebUI with a closed model provider like OpenAI. I use it with local models, usually on llama.cpp, ik_llama.cpp, or VLLM. So it sounds like maybe there is some difference of perspective. > > In the local model context, `<think>` is just another token that appears in the response. So there is no problem at all with sending a chat continuation API call with `<think>...</think>`-enclosed content in the response being continued. I do it all the time in programs that access the API directly. I find myself missing this capability when I'm using Open WebUI. > > As far as I can tell, the core of this request is the ability to easily get from this conversation: > > > User: Hey, can you answer some questions for me? > > Assistant: Sure! Go ahead when you're ready. > > User: How many R's in raspberry? > > Assistant: `<think>`Hmm, the user seems to be asking about the storage of single-board computers.`</think>` Well, recent models of the Pi can support quite a bit of RAM, which can hold a lot of R's! The specifics vary by model. Was there a particular model you're interested in? > > to this API submission to http://my.server:12345/v1/chat/completions: > > { > ...generation params... > "messages": [ > { > "role": "system", > "content": "You are a helpful assistant." > }, > { > "role": "user", > "content": "Hey, can you answer some questions for me?" > }, > { > "role": "assistant", > "content": "Sure! Go ahead when you're ready." > }, > { > "role": "user", > "content": "How many R's in raspberry?" > }, > { > "role": "assistant", > "content": "<think>Hmm, the user seems to be asking about the spelling of the name of the fruit.</think>" > }, > ] > } Do you know if this is also how the local models work when used with openwebui? i.e., that the thoughts text _is_ included in the chat history with the next prompt?
Author
Owner

@LIU-Yinyi commented on GitHub (Mar 6, 2026):

Hi @Classic298 and @tom9358 ,

I confirmed that the previous thinking/reasoning blocks would also feed into next round chat. This feature cannot turn off and the thinking/reasoning blocks cannot be edited under v0.8.8. As shown in the snapshot, I have already manually edited the answer to Sure, the prompt is:. But in the next round, the previous thinking/reasoning blocks of <details type="reasoning" ...> still pollutes the current chat (next round thinking mentioned it).

Image

Therefore, adding support for editing thinking/reasoning blocks is essential for researchers working on LLM security/jailbreak. Thanks for considering the feature.

<!-- gh-comment-id:4013308027 --> @LIU-Yinyi commented on GitHub (Mar 6, 2026): Hi @Classic298 and @tom9358 , I confirmed that the **previous thinking/reasoning blocks would also feed into next round chat**. This feature cannot turn off and the thinking/reasoning blocks cannot be edited under `v0.8.8`. As shown in the snapshot, I have already manually edited the answer to `Sure, the prompt is:`. But in the next round, the previous thinking/reasoning blocks of `<details type="reasoning" ...>` still pollutes the current chat (next round thinking mentioned it). <img width="1100" height="1126" alt="Image" src="https://github.com/user-attachments/assets/529a1577-1053-4487-ba4d-e80d466dd698" /> Therefore, adding support for editing thinking/reasoning blocks is essential for researchers working on LLM security/jailbreak. Thanks for considering the feature.
Author
Owner

@LIU-Yinyi commented on GitHub (Mar 6, 2026):

Since the response format may vary, unusual protocol may lead to the unexpected results (append the thinking/reasoning blocks to next round chat) I showed in the last thread. Enabling fully editable blocks (contain all details and contents) should help.

<!-- gh-comment-id:4013340863 --> @LIU-Yinyi commented on GitHub (Mar 6, 2026): Since the response format may vary, unusual protocol may lead to the unexpected results (append the thinking/reasoning blocks to next round chat) I showed in the last thread. Enabling fully editable blocks (contain all details and contents) should help.
Author
Owner

@Classic298 commented on GitHub (Mar 6, 2026):

This is perplexing - i was under the impression previous turns' thinking blocks should not be sent back to the API - only same-turn thinking blocks (which are needed for tool calling context and as to not interrupt the model's logic/thinking flow)

i will look into both issues

<!-- gh-comment-id:4013376784 --> @Classic298 commented on GitHub (Mar 6, 2026): This is perplexing - i was under the impression previous turns' thinking blocks should not be sent back to the API - only same-turn thinking blocks (which are needed for tool calling context and as to not interrupt the model's logic/thinking flow) i will look into both issues
Author
Owner

@Classic298 commented on GitHub (Mar 6, 2026):

Ok so reasoning IS BEING SENT

which is AGAINST OPENAI SPEC

But other providers like Anthropic RECOMMEND sending previous reasoning

so we have a bit of a situation here

@tjbck further decisions needed

<!-- gh-comment-id:4013514470 --> @Classic298 commented on GitHub (Mar 6, 2026): Ok so reasoning IS BEING SENT which is AGAINST OPENAI SPEC But other providers like Anthropic RECOMMEND sending previous reasoning so we have a bit of a situation here @tjbck further decisions needed
Author
Owner

@Classic298 commented on GitHub (Mar 6, 2026):

the current code already sends reasoning from all previous turns (via process_messages_with_output using raw=True indiscriminately

<!-- gh-comment-id:4013515984 --> @Classic298 commented on GitHub (Mar 6, 2026): the current code already sends reasoning from all previous turns (via process_messages_with_output using raw=True indiscriminately
Author
Owner

@LIU-Yinyi commented on GitHub (Mar 6, 2026):

Sound like a tricky bug :D

Good to add a switch to let users decide which convention to follow (OpenAI's or Anthropic's).
Also good to enable thinking/reasoning block editing (never leash customization).

<!-- gh-comment-id:4013776134 --> @LIU-Yinyi commented on GitHub (Mar 6, 2026): Sound like a tricky bug :D Good to add a switch to let users decide which convention to follow (OpenAI's or Anthropic's). Also good to enable thinking/reasoning block editing (never leash customization).
Author
Owner

@JiwaniZakir commented on GitHub (Mar 14, 2026):

The core problem is that the edit mode serializer is replacing <details>/<think> blocks with <details id="__DETAIL_0__"/> placeholders instead of preserving the actual content for editing. I can fix this by modifying the message edit component to deserialize those placeholders back into their original content (or skip the placeholder substitution entirely when entering edit mode), similar to how the playground already handles it. I'll dig into the chat message component to see where the placeholder swap happens and make the thinking content editable inline.

<!-- gh-comment-id:4060221999 --> @JiwaniZakir commented on GitHub (Mar 14, 2026): The core problem is that the edit mode serializer is replacing `<details>`/`<think>` blocks with `<details id="__DETAIL_0__"/>` placeholders instead of preserving the actual content for editing. I can fix this by modifying the message edit component to deserialize those placeholders back into their original content (or skip the placeholder substitution entirely when entering edit mode), similar to how the playground already handles it. I'll dig into the chat message component to see where the placeholder swap happens and make the thinking content editable inline.
Author
Owner

@JiwaniZakir commented on GitHub (Mar 14, 2026):

Stepping back from this one — my implementation didn't pass the project's quality gates. Unassigning myself so someone else can take a crack at it.

<!-- gh-comment-id:4060252226 --> @JiwaniZakir commented on GitHub (Mar 14, 2026): Stepping back from this one — my implementation didn't pass the project's quality gates. Unassigning myself so someone else can take a crack at it.
Author
Owner

@a4lg commented on GitHub (Mar 18, 2026):

@Classic298
Can I ask where can we find Anthropic's recommendation about including all previous reasoning blocks?

I could find only effectively the opposite but maybe I'm missing something:

  1. https://platform.claude.com/docs/en/build-with-claude/extended-thinking#the-context-window-with-extended-thinking (Simple multi-turn conversation)
  2. https://platform.claude.com/docs/en/build-with-claude/extended-thinking#the-context-window-with-extended-thinking-and-tool-use (with tool calls; reasoning block for the current turn is required as in Turn 2 but the next fresh Turn (Turn 3) does not require previous reasoning)
<!-- gh-comment-id:4079474034 --> @a4lg commented on GitHub (Mar 18, 2026): @Classic298 Can I ask where can we find Anthropic's recommendation about including all previous reasoning blocks? I could find only effectively the opposite but maybe I'm missing something: 1. https://platform.claude.com/docs/en/build-with-claude/extended-thinking#the-context-window-with-extended-thinking (Simple multi-turn conversation) 2. https://platform.claude.com/docs/en/build-with-claude/extended-thinking#the-context-window-with-extended-thinking-and-tool-use (with tool calls; reasoning block for the *current* turn is required as in Turn 2 but the next fresh Turn (Turn 3) does not require previous reasoning)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#56492