Feat: 'Thinking' text for long output reasoning models like QwQ #2999

New Issue

GiteaMirror · 2025-11-11T15:19:23-06:00

GiteaMirror commented

2025-11-11 15:19:23 -06:00

Originally created by @nick-tonjum on GitHub (Dec 11, 2024).

There's no doubt that QwQ is a revolutionary model for it's size, it's reasoning skills are ABSURD for a 32b model. The only downside to these models are huge outputs, QwQ can write an entire book of reasoning just to answer a simple question.

I believe that we can reduce all of this unnecessary reasoning output into a 'Thinking..' field similar to GPT o1, this can be a simple dropdown that resides on one line of the output, unless it's clicked and then it can expand to show the entire output if needed. When complete, it can display the generation time like 'Thought for 12 seconds...' and then provide a summarized output using another model (or the same model as long as it can take instructions on NOT to think outloud).

The work flow would be like so:

User asks query
Display 'Thinking' while generating
When generation completes, show finished thinking time as 'Thought for xx seconds'
Have another model (configurable in settings with a prompt) provide a clean version of the final answer / output, along with a short summary of the reasoning if needed.

This will make QwQ and other long reasoning models be much cleaner in the webUI. I truly believe that this is the future of language models and will absolutely need to be a requirement in the future.

Originally created by @nick-tonjum on GitHub (Dec 11, 2024). There's no doubt that QwQ is a revolutionary model for it's size, it's reasoning skills are ABSURD for a 32b model. The only downside to these models are huge outputs, QwQ can write an entire book of reasoning just to answer a simple question. I believe that we can reduce all of this unnecessary reasoning output into a 'Thinking..' field similar to GPT o1, this can be a simple dropdown that resides on one line of the output, unless it's clicked and then it can expand to show the entire output if needed. When complete, it can display the generation time like 'Thought for 12 seconds...' and then provide a summarized output using another model (or the same model as long as it can take instructions on NOT to think outloud). The work flow would be like so: 1. User asks query 2. Display 'Thinking' while generating 3. When generation completes, show finished thinking time as 'Thought for xx seconds' 4. Have another model (configurable in settings with a prompt) provide a clean version of the final answer / output, along with a short summary of the reasoning if needed. This will make QwQ and other long reasoning models be much cleaner in the webUI. I truly believe that this is the future of language models and will absolutely need to be a requirement in the future.

GiteaMirror closed this issue

2025-11-11 15:19:24 -06:00

GiteaMirror referenced this issue

2025-11-11 17:41:03 -06:00

[PR #2999] [MERGED] fix #7951

GiteaMirror referenced this issue

2025-11-11 17:41:06 -06:00

[PR #3001] [MERGED] dev #7953

GiteaMirror referenced this issue

2026-04-20 03:22:39 -05:00

[PR #2999] [MERGED] fix #21155

GiteaMirror referenced this issue