[feat] Title Generation with large Models #212

Closed
opened 2025-11-11 14:11:27 -06:00 by GiteaMirror · 4 comments
Owner

Originally created by @davidamacey on GitHub (Jan 19, 2024).

Originally assigned to: @tjbck on GitHub.

Is your feature request related to a problem? Please describe.
This is a feature request is for Title Generation related to Issue #206 that has been closed. I would like to additional selection.

Describe the solution you'd like
When running a large full size Mixtral model and the title generation is set to Default and my Mixtral is not the default model. Then the following happens.

  • Select Large Mixtral Model
  • Start Chat
  • Get Response
  • Send next chat, this will take awhile as it summarizes the response with the 'Default' model (not the running model)
  • User waits twice as long for the second response

Describe alternatives you've considered
I suggest to add a feature in the settings menu to select the title generation to use the Default, Current, or a Selectable choice.

Allowing the option for 'Current' reduces the overhead of ollama having to switch models between prompts, with full size 16bit models, it takes a couple minutes.

User resources dependent, I would run a small model in a second Ollama container just for title generation. This way it can take advantage of the title summary happening in the background. Also having a second small model would allow for other features such as summaries, review, auto create tags, and proof reading a response, etc

Additional context
Great work on this project!

Originally created by @davidamacey on GitHub (Jan 19, 2024). Originally assigned to: @tjbck on GitHub. **Is your feature request related to a problem? Please describe.** This is a feature request is for Title Generation related to Issue #206 that has been closed. I would like to additional selection. **Describe the solution you'd like** When running a large full size Mixtral model and the title generation is set to Default and my Mixtral is not the default model. Then the following happens. - Select Large Mixtral Model - Start Chat - Get Response - Send next chat, this will take awhile as it summarizes the response with the 'Default' model (not the running model) - User waits twice as long for the second response **Describe alternatives you've considered** I suggest to add a feature in the settings menu to select the title generation to use the Default, Current, or a Selectable choice. Allowing the option for 'Current' reduces the overhead of ollama having to switch models between prompts, with full size 16bit models, it takes a couple minutes. User resources dependent, I would run a small model in a second Ollama container just for title generation. This way it can take advantage of the title summary happening in the background. Also having a second small model would allow for other features such as summaries, review, auto create tags, and proof reading a response, etc **Additional context** Great work on this project!
GiteaMirror added the core label 2025-11-11 14:11:27 -06:00
Author
Owner

@justinh-rahb commented on GitHub (Jan 19, 2024):

This behaviour would explain some issues I've had with titles not getting generated or causing a large thrashing of memory. Agreed on all suggested solutions.

@justinh-rahb commented on GitHub (Jan 19, 2024): This behaviour would explain some issues I've had with titles not getting generated or causing a large thrashing of memory. Agreed on all suggested solutions.
Author
Owner

@justinh-rahb commented on GitHub (Jan 22, 2024):

A note I'd add from my experiments: you can run multiple ollama instances on a single system, and while you do get into diminishing returns pretty fast.. two is doable for an unbinned M2 Max 96GB at least, and allows concurrent generation even with Mixtral (at reduced tok/s).

@justinh-rahb commented on GitHub (Jan 22, 2024): A note I'd add from my experiments: you can run multiple ollama instances on a single system, and while you do get into diminishing returns pretty fast.. two is doable for an unbinned M2 Max 96GB at least, and allows concurrent generation even with Mixtral (at reduced tok/s).
Author
Owner

@tjbck commented on GitHub (Jan 25, 2024):

I'm taking a look at this now, but I'm not entirely sure if I'm understanding the feature reqeust correctly. I guess the wording "Default" is causing a confusion for all of us, but it's suppose to mean "default option" which would be the current running model. I believe you're asking for one more additional option to use the global default model for title generation, correct?

@tjbck commented on GitHub (Jan 25, 2024): I'm taking a look at this now, but I'm not entirely sure if I'm understanding the feature reqeust correctly. I guess the wording "Default" is causing a confusion for all of us, but it's suppose to mean "default option" which would be the current running model. I believe you're asking for one more additional option to use the global default model for title generation, correct?
Author
Owner

@tjbck commented on GitHub (Jan 25, 2024):

Just updated the wording with #567, I'll close this issue for now as everything should've been working as per your feature request. If, for some reason, the webui is using the default model instead of the current running model, sharing your console logs and network activity logs from the dev tools would help us to troubleshoot, Thanks!

image
@tjbck commented on GitHub (Jan 25, 2024): Just updated the wording with #567, I'll close this issue for now as everything should've been working as per your feature request. If, for some reason, the webui is using the default model instead of the current running model, sharing your console logs and network activity logs from the dev tools would help us to troubleshoot, Thanks! <img width="660" alt="image" src="https://github.com/ollama-webui/ollama-webui/assets/25473318/d87ad05b-0be2-4a9e-a958-2ae66fa276bc">
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#212