Feat: 'Thinking' text for long output reasoning models like QwQ #2999

Closed
opened 2025-11-11 15:19:23 -06:00 by GiteaMirror · 0 comments
Owner

Originally created by @nick-tonjum on GitHub (Dec 11, 2024).

There's no doubt that QwQ is a revolutionary model for it's size, it's reasoning skills are ABSURD for a 32b model. The only downside to these models are huge outputs, QwQ can write an entire book of reasoning just to answer a simple question.

I believe that we can reduce all of this unnecessary reasoning output into a 'Thinking..' field similar to GPT o1, this can be a simple dropdown that resides on one line of the output, unless it's clicked and then it can expand to show the entire output if needed. When complete, it can display the generation time like 'Thought for 12 seconds...' and then provide a summarized output using another model (or the same model as long as it can take instructions on NOT to think outloud).

The work flow would be like so:

  1. User asks query

  2. Display 'Thinking' while generating

  3. When generation completes, show finished thinking time as 'Thought for xx seconds'

  4. Have another model (configurable in settings with a prompt) provide a clean version of the final answer / output, along with a short summary of the reasoning if needed.

This will make QwQ and other long reasoning models be much cleaner in the webUI. I truly believe that this is the future of language models and will absolutely need to be a requirement in the future.

Originally created by @nick-tonjum on GitHub (Dec 11, 2024). There's no doubt that QwQ is a revolutionary model for it's size, it's reasoning skills are ABSURD for a 32b model. The only downside to these models are huge outputs, QwQ can write an entire book of reasoning just to answer a simple question. I believe that we can reduce all of this unnecessary reasoning output into a 'Thinking..' field similar to GPT o1, this can be a simple dropdown that resides on one line of the output, unless it's clicked and then it can expand to show the entire output if needed. When complete, it can display the generation time like 'Thought for 12 seconds...' and then provide a summarized output using another model (or the same model as long as it can take instructions on NOT to think outloud). The work flow would be like so: 1. User asks query 2. Display 'Thinking' while generating 3. When generation completes, show finished thinking time as 'Thought for xx seconds' 4. Have another model (configurable in settings with a prompt) provide a clean version of the final answer / output, along with a short summary of the reasoning if needed. This will make QwQ and other long reasoning models be much cleaner in the webUI. I truly believe that this is the future of language models and will absolutely need to be a requirement in the future.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#2999