feat: Support Magistral models (keeping reasoning traces) #5535

Closed
opened 2025-11-11 16:23:46 -06:00 by GiteaMirror · 4 comments
Owner

Originally created by @Tureti on GitHub (Jun 13, 2025).

Check Existing Issues

  • I have searched the existing issues and discussions.

Problem Description

The new Magistral model seems to want it's own reasoning to stay in context, even though with most other reasoning models you should only keep the final answer. I haven't found a setting in Open-WebUI to change that behavior as it always removes the part inside the tags from the conversation.

Desired Solution you'd like

Add a toggle to choose whether the reasoning should be kept or excluded from the conversation.

Alternatives Considered

No response

Additional Context

No response

Originally created by @Tureti on GitHub (Jun 13, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. ### Problem Description The new Magistral model seems to want it's own reasoning to stay in context, even though with most other reasoning models you should only keep the final answer. I haven't found a setting in Open-WebUI to change that behavior as it always removes the part inside the <think> tags from the conversation. ### Desired Solution you'd like Add a toggle to choose whether the reasoning should be kept or excluded from the conversation. ### Alternatives Considered _No response_ ### Additional Context _No response_
Author
Owner

@gaby commented on GitHub (Jun 14, 2025):

@Tureti Is this why the model goes into an infinite loop? I can't get it to work with OWUI.

I keeps "thinking" about the same thing in an endless loop.

@gaby commented on GitHub (Jun 14, 2025): @Tureti Is this why the model goes into an infinite loop? I can't get it to work with OWUI. I keeps "thinking" about the same thing in an endless loop.
Author
Owner

@Tureti commented on GitHub (Jun 14, 2025):

@gaby I don't think that this has anything to do with the looping. If you are using a quantized version of the small model maybe this helps you. It's from the Qwen team about their own reasoning models, but might help:

We recommend setting presence_penalty to 1.5 for quantized models to suppress repetitive outputs. You can adjust the presence_penalty parameter between 0 and 2. A higher value may occasionally lead to language mixing and a slight reduction in model performance.

@Tureti commented on GitHub (Jun 14, 2025): @gaby I don't think that this has anything to do with the looping. If you are using a quantized version of the small model maybe this helps you. It's from the Qwen team about their own reasoning models, but might help: > We recommend setting presence_penalty to 1.5 for quantized models to suppress repetitive outputs. You can adjust the presence_penalty parameter between 0 and 2. A higher value may occasionally lead to language mixing and a slight reduction in model performance.
Author
Owner

@gaby commented on GitHub (Jun 14, 2025):

@Tureti I'm not using the GGUF version. I'm serving the model using the official instructions from Mistral with vLLM.

vllm serve mistralai/Magistral-Small-2506 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 2

When prompting it from Open-WebUI it goes into an infinite loop thinking.


EDIT: It might be a vLLM issue, this person reported the same issue https://huggingface.co/mistralai/Magistral-Small-2506/discussions/15

@gaby commented on GitHub (Jun 14, 2025): @Tureti I'm not using the GGUF version. I'm serving the model using the official instructions from `Mistral` with `vLLM`. ``` vllm serve mistralai/Magistral-Small-2506 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 2 ``` When prompting it from `Open-WebUI` it goes into an infinite loop thinking. --- EDIT: It might be a `vLLM` issue, this person reported the same issue https://huggingface.co/mistralai/Magistral-Small-2506/discussions/15
Author
Owner

@Tureti commented on GitHub (Jun 14, 2025):

@gaby Okay. As long as it does this in the first message as well (first one is unaffected by this open-webui behavior), it has nothing to do with this issue. If it only happens after multiple turns then it might be affected, but I'm honestly not sure if it could cause looping.

@Tureti commented on GitHub (Jun 14, 2025): @gaby Okay. As long as it does this in the first message as well (first one is unaffected by this open-webui behavior), it has nothing to do with this issue. If it only happens after multiple turns then it might be affected, but I'm honestly not sure if it could cause looping.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#5535