[GH-ISSUE #3915] Chat Controls: ollama model parameters override the options payload. #13430

New Issue

GiteaMirror · 2026-04-19T20:10:34-05:00

GiteaMirror commented

2026-04-19 20:10:34 -05:00

Originally created by @ProjectMoon on GitHub (Jul 16, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/3915

Bug Report

Description

Bug Summary:
In chat controls, you can set various options. For simplicity, we will use num_ctx as the example (but it applies to all parameters). If I set num_ctx to a value, but the model has a parameter set in OpenWebUI (not ollama!), then it is the parameter of the OpenWebUI model that will be used.

Steps to Reproduce:

Create an ollama model in OpenWebUI. Set its num_ctx parameter to something, say 4096.
Start a chat with said model, and set the num_ctx parameter to say 1024 in the chat controls box.
The API request goes to /ollama/api/chat, and sends the proper payload to the server.
Check ollama logs, and num_ctx will still be 4096.

This affects all parameters that OpenWebUI supports.

Expected Behavior:
In the example above, I'd expect ollama to spawn with num_ctx = 1024.

Actual Behavior:
OpenWebUI forcibly overwrites any incoming parameters in the Ollama chat completion endpoint, if the model has parameters defined in OpenWebUI:

    
    model_id = form_data.model
    model_info = Models.get_model_by_id(model_id)

    if model_info:
        if model_info.base_model_id:
            payload["model"] = model_info.base_model_id

        model_info.params = model_info.params.model_dump()

        if model_info.params:
            payload["options"] = {}

            if model_info.params.get("mirostat", None):
                payload["options"]["mirostat"] = model_info.params.get("mirostat", None)

            # and so on...

Since the options key is cleared out, this means chat controls subtly do not always work for ollama models.

Environment

Open WebUI Version: 0.3.8
Ollama (if applicable): 0.2.5
Operating System: Gentoo Linux
Browser (if applicable): Firefox

Reproduction Details

Confirmation:

I have read and followed all the instructions provided in the README.md.
I am on the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.

Logs and Screenshots

Browser Console Logs:
This is the request sent to the server. I have set num_ctx to 8192. But the model has it forcibly defined as 1024 in OpenWebUI.

Ollama still shows num_ctx as 1024:


llm_load_tensors: offloading 40 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 41/41 layers to GPU
llm_load_tensors:      ROCm0 buffer size =  8897.23 MiB
llm_load_tensors:        CPU buffer size =   629.00 MiB
llama_new_context_with_model: n_ctx      = 1024
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 5000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      ROCm0 KV buffer size =    40.00 MiB
llama_new_context_with_model: KV self size  =   40.00 MiB, K (f16):   20.00 MiB, V (f16):   20.00 MiB
llama_new_context_with_model:  ROCm_Host  output buffer size =     0.59 MiB
llama_new_context_with_model:      ROCm0 compute buffer size =   304.00 MiB
llama_new_context_with_model:  ROCm_Host compute buffer size =    10.01 MiB
llama_new_context_with_model: graph nodes  = 1606
llama_new_context_with_model: graph splits = 2

Installation Method

Docker

Originally created by @ProjectMoon on GitHub (Jul 16, 2024). Original GitHub issue: https://github.com/open-webui/open-webui/issues/3915 # Bug Report ## Description **Bug Summary:** In chat controls, you can set various options. For simplicity, we will use num_ctx as the example (but it applies to all parameters). If I set num_ctx to a value, but the model has a parameter set in OpenWebUI (not ollama!), then it is the parameter of the OpenWebUI model that will be used. **Steps to Reproduce:** 1. Create an ollama model in OpenWebUI. Set its num_ctx parameter to something, say 4096. 2. Start a chat with said model, and set the num_ctx parameter to say 1024 in the chat controls box. 3. The API request goes to `/ollama/api/chat`, and sends the proper payload to the server. 4. Check ollama logs, and num_ctx will still be 4096. This affects all parameters that OpenWebUI supports. **Expected Behavior:** In the example above, I'd expect ollama to spawn with num_ctx = 1024. **Actual Behavior:** OpenWebUI forcibly overwrites any incoming parameters in the Ollama chat completion endpoint, if the model has parameters defined in OpenWebUI: ```python model_id = form_data.model model_info = Models.get_model_by_id(model_id) if model_info: if model_info.base_model_id: payload["model"] = model_info.base_model_id model_info.params = model_info.params.model_dump() if model_info.params: payload["options"] = {} if model_info.params.get("mirostat", None): payload["options"]["mirostat"] = model_info.params.get("mirostat", None) # and so on... ``` Since the `options` key is cleared out, this means chat controls subtly do not always work for ollama models. ## Environment - **Open WebUI Version:** 0.3.8 - **Ollama (if applicable):** 0.2.5 - **Operating System:** Gentoo Linux - **Browser (if applicable):** Firefox ## Reproduction Details **Confirmation:** - [X] I have read and followed all the instructions provided in the README.md. - [X] I am on the latest version of both Open WebUI and Ollama. - [X] I have included the browser console logs. - [X] I have included the Docker container logs. ## Logs and Screenshots **Browser Console Logs:** This is the request sent to the server. I have set num_ctx to 8192. But the model has it forcibly defined as 1024 in OpenWebUI. ![image](https://github.com/user-attachments/assets/6cb221ff-d8c0-4f31-bfa7-4578154471f8) Ollama still shows num_ctx as 1024: ``` llm_load_tensors: offloading 40 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 41/41 layers to GPU llm_load_tensors: ROCm0 buffer size = 8897.23 MiB llm_load_tensors: CPU buffer size = 629.00 MiB llama_new_context_with_model: n_ctx = 1024 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 5000000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: ROCm0 KV buffer size = 40.00 MiB llama_new_context_with_model: KV self size = 40.00 MiB, K (f16): 20.00 MiB, V (f16): 20.00 MiB llama_new_context_with_model: ROCm_Host output buffer size = 0.59 MiB llama_new_context_with_model: ROCm0 compute buffer size = 304.00 MiB llama_new_context_with_model: ROCm_Host compute buffer size = 10.01 MiB llama_new_context_with_model: graph nodes = 1606 llama_new_context_with_model: graph splits = 2 ``` # Installation Method Docker

GiteaMirror closed this issue

2026-04-19 20:10:34 -05:00

GiteaMirror commented

2026-04-19 20:10:35 -05:00

@tjbck commented on GitHub (Jul 17, 2024):

Should be fixed on dev, keep me updated!

@tjbck commented on GitHub (Jul 17, 2024): Should be fixed on dev, keep me updated!

GiteaMirror referenced this issue

2026-04-20 04:41:37 -05:00

[PR #13430] [CLOSED] **build** huggingface_hub[hf_xet] #23199

GiteaMirror referenced this issue

2026-04-25 11:39:19 -05:00