issue: preserve context in multi-turn chat #5910

New Issue

GiteaMirror · 2025-11-11T16:38:10-06:00

GiteaMirror commented

2025-11-11 16:38:10 -06:00

Originally created by @dariko on GitHub (Jul 30, 2025).

Check Existing Issues

I have searched the existing issues and discussions.
I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v143

Ollama Version (if applicable)

No response

Operating System

debian 12

Browser (if applicable)

firefox 135

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

When running a chat with multiple turns, all of the context from the first request should be sent in subsequent requests.

Actual Behavior

When running a chat with multiple turns, system prompt and files attached to the first prompt are removed from the prompt when submitting following requests.

Steps to Reproduce

create a file /tmp/file.txt containing a string ("the cat is on the table")
create a new chat in open-webui
attach /tmp/file.txt, write first prompt, ctrl+enter
the chat responds
submitting another message (second prompt)

Logs & Screenshots

chat screenshot:

prompts received by llama-server (dumped using /slots):
first prompt:

<|im_start|>system\n### Task:\nRespond to the user query using the provided context, incorporating inline citations in the format [id] **only when the <source> tag includes an explicit id attribute** (e.g., <source id=\"1\">).\n\n### Guidelines:\n- If you don't know the answer, clearly state that.\n- If uncertain, ask the user for clarification.\n- Respond in the same language as the user's query.\n- If the context is unreadable or of poor quality, inform the user and provide the best possible answer.\n- If the answer isn't present in the context but you possess the knowledge, explain this to the user and provide the answer using your own understanding.\n- **Only include inline citations using [id] (e.g., [1], [2]) when the <source> tag includes an id attribute.**\n- Do not cite if the <source> tag does not contain an id attribute.\n- Do not use XML tags in your response.\n- Ensure citations are concise and directly related to the information provided.\n\n### Example of Citation:\nIf the user asks about a specific topic and the information is found in a source with a provided id attribute, the response should include the citation like in the following example:\n* \"According to the study, the proposed method increases efficiency by 20% [1].\"\n\n### Output:\nProvide a clear and direct response to the user's query, including inline citations in the format [id] only when the <source> tag with id attribute is present in the context.\n\n<context>\n<source id=\"1\" name=\"file.txt\">the cat is on the table\n</source>\n</context>\n\n<user_query>\nfirst prompt\n</user_query>\n<|im_end|>\n<|im_start|>user\nfirst prompt<|im_end|>\n<|im_start|>assistant\n

second prompt:

<|im_start|>user\nfirst prompt<|im_end|>\n<|im_start|>assistant\nThe cat is on the table [1].<|im_end|>\n<|im_start|>user\nsecond prompt<|im_end|>\n<|im_start|>assistant\n

container log

2025-07-30 08:35:49.713 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 127.0.0.1:56906 - "GET / HTTP/1.1" 200 - {}
2025-07-30 08:35:50.555 | DEBUG    | aiocache.base:get:201 - GET open_webui.routers.openaiget_all_models(<starlette.requests.Request object at 0x7f5dbbf7b790>,)[('user', UserModel(id='8376a5ec-9044-4fcf-9c4b-f9fac59a1eb3', name='User', email='admin@localhost', role='admin', profile_image_url='/user.png', last_active_at=1753864535, updated_at=1737372372, created_at=1737372372, api_key=None, settings=UserSettings(ui={'version': '0.6.18', 'directConnections': {'OPENAI_API_BASE_URLS': ['http://localhost:8081/v1', 'https://openrouter.ai/api/v1'], 'OPENAI_API_KEYS': ['aa', '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'], 'OPENAI_API_CONFIGS': {'0': {'enable': True, 'tags': [], 'prefix_id': '', 'model_ids': [], 'connection_type': 'external'}, '1': {'enable': True, 'tags': [], 'prefix_id': '', 'model_ids': ['qwen/qwen3-235b-a22b-07-25:free'], 'connection_type': 'external'}}}, 'ctrlEnterToSend': True, 'models': ['Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL'], 'imageCompressionSize': {'width': '', 'height': ''}, 'autoFollowUps': False}), info=None, oauth_sub=None))] False (0.0000)s - {}
2025-07-30 08:35:50.555 | INFO     | open_webui.routers.openai:get_all_models:392 - get_all_models() - {}
2025-07-30 08:35:50.557 | DEBUG    | open_webui.routers.openai:get_all_models_responses:373 - get_all_models:responses() [{'data': [{'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Qwen-14B-Q5_K_L', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Qwen-32B-Q5_K_M.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'google_gemma-3-12b-it-Q6_K_L.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'THUDM_GLM-4-32B-0414-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'Qwen_Qwen3-14B-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'Qwen_Qwen3-30B-A3B-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'Qwen3-30B-A3B-UD-Q3_K_XL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'Qwen2.5-14B-Instruct-Q6_K', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'deepseek-ai_DeepSeek-R1-0528-Qwen3-8B-Q5_K_L.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'DeepSeek-R1-0528-Qwen3-8B-UD-Q5_K_XL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'Qwen3-14B-UD-Q4_K_XL', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}], 'object': 'list'}, None] - {}
2025-07-30 08:35:50.558 | DEBUG    | open_webui.routers.openai:merge_models_lists:407 - merge_models_lists <map object at 0x7f5dbbf93e80> - {}
2025-07-30 08:35:50.558 | DEBUG    | open_webui.routers.openai:get_all_models:446 - models: {'data': [{'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Qwen-14B-Q5_K_L', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'DeepSeek-R1-Distill-Qwen-14B-Q5_K_L', 'openai': {'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Qwen-14B-Q5_K_L', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Qwen-32B-Q5_K_M.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'DeepSeek-R1-Distill-Qwen-32B-Q5_K_M.gguf', 'openai': {'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Qwen-32B-Q5_K_M.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'google_gemma-3-12b-it-Q6_K_L.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'google_gemma-3-12b-it-Q6_K_L.gguf', 'openai': {'created': 1753864550, 'id': 'google_gemma-3-12b-it-Q6_K_L.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'THUDM_GLM-4-32B-0414-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'THUDM_GLM-4-32B-0414-IQ4_NL.gguf', 'openai': {'created': 1753864550, 'id': 'THUDM_GLM-4-32B-0414-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'Qwen_Qwen3-14B-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'Qwen_Qwen3-14B-IQ4_NL.gguf', 'openai': {'created': 1753864550, 'id': 'Qwen_Qwen3-14B-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'Qwen_Qwen3-30B-A3B-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'Qwen_Qwen3-30B-A3B-IQ4_NL.gguf', 'openai': {'created': 1753864550, 'id': 'Qwen_Qwen3-30B-A3B-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'Qwen3-30B-A3B-UD-Q3_K_XL.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'Qwen3-30B-A3B-UD-Q3_K_XL.gguf', 'openai': {'created': 1753864550, 'id': 'Qwen3-30B-A3B-UD-Q3_K_XL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL', 'openai': {'created': 1753864550, 'id': 'Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'Qwen2.5-14B-Instruct-Q6_K', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'Qwen2.5-14B-Instruct-Q6_K', 'openai': {'created': 1753864550, 'id': 'Qwen2.5-14B-Instruct-Q6_K', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf', 'openai': {'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'deepseek-ai_DeepSeek-R1-0528-Qwen3-8B-Q5_K_L.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'deepseek-ai_DeepSeek-R1-0528-Qwen3-8B-Q5_K_L.gguf', 'openai': {'created': 1753864550, 'id': 'deepseek-ai_DeepSeek-R1-0528-Qwen3-8B-Q5_K_L.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'DeepSeek-R1-0528-Qwen3-8B-UD-Q5_K_XL.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'DeepSeek-R1-0528-Qwen3-8B-UD-Q5_K_XL.gguf', 'openai': {'created': 1753864550, 'id': 'DeepSeek-R1-0528-Qwen3-8B-UD-Q5_K_XL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'Qwen3-14B-UD-Q4_K_XL', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'Qwen3-14B-UD-Q4_K_XL', 'openai': {'created': 1753864550, 'id': 'Qwen3-14B-UD-Q4_K_XL', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}]} - {}
2025-07-30 08:35:50.558 | DEBUG    | aiocache.base:set:280 - SET open_webui.routers.openaiget_all_models(<starlette.requests.Request object at 0x7f5dbbf7b790>,)[('user', UserModel(id='8376a5ec-9044-4fcf-9c4b-f9fac59a1eb3', name='User', email='admin@localhost', role='admin', profile_image_url='/user.png', last_active_at=1753864535, updated_at=1737372372, created_at=1737372372, api_key=None, settings=UserSettings(ui={'version': '0.6.18', 'directConnections': {'OPENAI_API_BASE_URLS': ['http://localhost:8081/v1', 'https://openrouter.ai/api/v1'], 'OPENAI_API_KEYS': ['aa', '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'], 'OPENAI_API_CONFIGS': {'0': {'enable': True, 'tags': [], 'prefix_id': '', 'model_ids': [], 'connection_type': 'external'}, '1': {'enable': True, 'tags': [], 'prefix_id': '', 'model_ids': ['qwen/qwen3-235b-a22b-07-25:free'], 'connection_type': 'external'}}}, 'ctrlEnterToSend': True, 'models': ['Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL'], 'imageCompressionSize': {'width': '', 'height': ''}, 'autoFollowUps': False}), info=None, oauth_sub=None))] 1 (0.0000)s - {}
2025-07-30 08:35:50.576 | DEBUG    | open_webui.utils.models:get_all_models:308 - get_all_models() returned 14 models - {}
2025-07-30 08:35:50.576 | DEBUG    | open_webui.main:get_models:1298 - /api/models returned filtered models accessible to the user: ["DeepSeek-R1-Distill-Qwen-14B-Q5_K_L", "DeepSeek-R1-Distill-Qwen-32B-Q5_K_M.gguf", "google_gemma-3-12b-it-Q6_K_L.gguf", "THUDM_GLM-4-32B-0414-IQ4_NL.gguf", "Qwen_Qwen3-14B-IQ4_NL.gguf", "Qwen_Qwen3-30B-A3B-IQ4_NL.gguf", "Qwen3-30B-A3B-UD-Q3_K_XL.gguf", "Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL", "Qwen2.5-14B-Instruct-Q6_K", "DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf", "deepseek-ai_DeepSeek-R1-0528-Qwen3-8B-Q5_K_L.gguf", "DeepSeek-R1-0528-Qwen3-8B-UD-Q5_K_XL.gguf", "Qwen3-14B-UD-Q4_K_XL", "arena-model"] - {}
2025-07-30 08:35:50.578 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 127.0.0.1:56914 - "GET /api/models HTTP/1.1" 200 - {}

Additional Information

Having the <system prompt/files> removed from subsequent requests makes the model/server:

ignore (not aware of) the <system prompt/files> in those requests.
unable to optimally use kv cache

I honestly don't know if this is intended behaviour, but it makes using open-webui a lot slower with local models (and limited resources/preprocessing speed).

edit: reword/reformat

Originally created by @dariko on GitHub (Jul 30, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v143 ### Ollama Version (if applicable) _No response_ ### Operating System debian 12 ### Browser (if applicable) firefox 135 ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior When running a chat with multiple turns, all of the context from the first request should be sent in subsequent requests. ### Actual Behavior When running a chat with multiple turns, system prompt and files attached to the first prompt are removed from the prompt when submitting following requests. ### Steps to Reproduce - create a file `/tmp/file.txt` containing a string ("the cat is on the table") - create a new chat in open-webui - attach `/tmp/file.txt`, write `first prompt`, ctrl+enter - the chat responds - submitting another message (`second prompt`) ### Logs & Screenshots chat screenshot: <img width="1414" height="636" alt="Image" src="https://github.com/user-attachments/assets/0a245fee-c249-46cb-b439-5263316f3bc0" /> prompts received by llama-server (dumped using `/slots`): first prompt: ``` <|im_start|>system\n### Task:\nRespond to the user query using the provided context, incorporating inline citations in the format [id] **only when the <source> tag includes an explicit id attribute** (e.g., <source id=\"1\">).\n\n### Guidelines:\n- If you don't know the answer, clearly state that.\n- If uncertain, ask the user for clarification.\n- Respond in the same language as the user's query.\n- If the context is unreadable or of poor quality, inform the user and provide the best possible answer.\n- If the answer isn't present in the context but you possess the knowledge, explain this to the user and provide the answer using your own understanding.\n- **Only include inline citations using [id] (e.g., [1], [2]) when the <source> tag includes an id attribute.**\n- Do not cite if the <source> tag does not contain an id attribute.\n- Do not use XML tags in your response.\n- Ensure citations are concise and directly related to the information provided.\n\n### Example of Citation:\nIf the user asks about a specific topic and the information is found in a source with a provided id attribute, the response should include the citation like in the following example:\n* \"According to the study, the proposed method increases efficiency by 20% [1].\"\n\n### Output:\nProvide a clear and direct response to the user's query, including inline citations in the format [id] only when the <source> tag with id attribute is present in the context.\n\n<context>\n<source id=\"1\" name=\"file.txt\">the cat is on the table\n</source>\n</context>\n\n<user_query>\nfirst prompt\n</user_query>\n<|im_end|>\n<|im_start|>user\nfirst prompt<|im_end|>\n<|im_start|>assistant\n ``` second prompt: ``` <|im_start|>user\nfirst prompt<|im_end|>\n<|im_start|>assistant\nThe cat is on the table [1].<|im_end|>\n<|im_start|>user\nsecond prompt<|im_end|>\n<|im_start|>assistant\n ``` --- container log ``` 2025-07-30 08:35:49.713 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 127.0.0.1:56906 - "GET / HTTP/1.1" 200 - {} 2025-07-30 08:35:50.555 | DEBUG | aiocache.base:get:201 - GET open_webui.routers.openaiget_all_models(<starlette.requests.Request object at 0x7f5dbbf7b790>,)[('user', UserModel(id='8376a5ec-9044-4fcf-9c4b-f9fac59a1eb3', name='User', email='admin@localhost', role='admin', profile_image_url='/user.png', last_active_at=1753864535, updated_at=1737372372, created_at=1737372372, api_key=None, settings=UserSettings(ui={'version': '0.6.18', 'directConnections': {'OPENAI_API_BASE_URLS': ['http://localhost:8081/v1', 'https://openrouter.ai/api/v1'], 'OPENAI_API_KEYS': ['aa', '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'], 'OPENAI_API_CONFIGS': {'0': {'enable': True, 'tags': [], 'prefix_id': '', 'model_ids': [], 'connection_type': 'external'}, '1': {'enable': True, 'tags': [], 'prefix_id': '', 'model_ids': ['qwen/qwen3-235b-a22b-07-25:free'], 'connection_type': 'external'}}}, 'ctrlEnterToSend': True, 'models': ['Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL'], 'imageCompressionSize': {'width': '', 'height': ''}, 'autoFollowUps': False}), info=None, oauth_sub=None))] False (0.0000)s - {} 2025-07-30 08:35:50.555 | INFO | open_webui.routers.openai:get_all_models:392 - get_all_models() - {} 2025-07-30 08:35:50.557 | DEBUG | open_webui.routers.openai:get_all_models_responses:373 - get_all_models:responses() [{'data': [{'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Qwen-14B-Q5_K_L', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Qwen-32B-Q5_K_M.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'google_gemma-3-12b-it-Q6_K_L.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'THUDM_GLM-4-32B-0414-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'Qwen_Qwen3-14B-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'Qwen_Qwen3-30B-A3B-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'Qwen3-30B-A3B-UD-Q3_K_XL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'Qwen2.5-14B-Instruct-Q6_K', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'deepseek-ai_DeepSeek-R1-0528-Qwen3-8B-Q5_K_L.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'DeepSeek-R1-0528-Qwen3-8B-UD-Q5_K_XL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'Qwen3-14B-UD-Q4_K_XL', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}], 'object': 'list'}, None] - {} 2025-07-30 08:35:50.558 | DEBUG | open_webui.routers.openai:merge_models_lists:407 - merge_models_lists <map object at 0x7f5dbbf93e80> - {} 2025-07-30 08:35:50.558 | DEBUG | open_webui.routers.openai:get_all_models:446 - models: {'data': [{'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Qwen-14B-Q5_K_L', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'DeepSeek-R1-Distill-Qwen-14B-Q5_K_L', 'openai': {'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Qwen-14B-Q5_K_L', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Qwen-32B-Q5_K_M.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'DeepSeek-R1-Distill-Qwen-32B-Q5_K_M.gguf', 'openai': {'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Qwen-32B-Q5_K_M.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'google_gemma-3-12b-it-Q6_K_L.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'google_gemma-3-12b-it-Q6_K_L.gguf', 'openai': {'created': 1753864550, 'id': 'google_gemma-3-12b-it-Q6_K_L.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'THUDM_GLM-4-32B-0414-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'THUDM_GLM-4-32B-0414-IQ4_NL.gguf', 'openai': {'created': 1753864550, 'id': 'THUDM_GLM-4-32B-0414-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'Qwen_Qwen3-14B-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'Qwen_Qwen3-14B-IQ4_NL.gguf', 'openai': {'created': 1753864550, 'id': 'Qwen_Qwen3-14B-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'Qwen_Qwen3-30B-A3B-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'Qwen_Qwen3-30B-A3B-IQ4_NL.gguf', 'openai': {'created': 1753864550, 'id': 'Qwen_Qwen3-30B-A3B-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'Qwen3-30B-A3B-UD-Q3_K_XL.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'Qwen3-30B-A3B-UD-Q3_K_XL.gguf', 'openai': {'created': 1753864550, 'id': 'Qwen3-30B-A3B-UD-Q3_K_XL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL', 'openai': {'created': 1753864550, 'id': 'Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'Qwen2.5-14B-Instruct-Q6_K', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'Qwen2.5-14B-Instruct-Q6_K', 'openai': {'created': 1753864550, 'id': 'Qwen2.5-14B-Instruct-Q6_K', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf', 'openai': {'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'deepseek-ai_DeepSeek-R1-0528-Qwen3-8B-Q5_K_L.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'deepseek-ai_DeepSeek-R1-0528-Qwen3-8B-Q5_K_L.gguf', 'openai': {'created': 1753864550, 'id': 'deepseek-ai_DeepSeek-R1-0528-Qwen3-8B-Q5_K_L.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'DeepSeek-R1-0528-Qwen3-8B-UD-Q5_K_XL.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'DeepSeek-R1-0528-Qwen3-8B-UD-Q5_K_XL.gguf', 'openai': {'created': 1753864550, 'id': 'DeepSeek-R1-0528-Qwen3-8B-UD-Q5_K_XL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'Qwen3-14B-UD-Q4_K_XL', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'Qwen3-14B-UD-Q4_K_XL', 'openai': {'created': 1753864550, 'id': 'Qwen3-14B-UD-Q4_K_XL', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}]} - {} 2025-07-30 08:35:50.558 | DEBUG | aiocache.base:set:280 - SET open_webui.routers.openaiget_all_models(<starlette.requests.Request object at 0x7f5dbbf7b790>,)[('user', UserModel(id='8376a5ec-9044-4fcf-9c4b-f9fac59a1eb3', name='User', email='admin@localhost', role='admin', profile_image_url='/user.png', last_active_at=1753864535, updated_at=1737372372, created_at=1737372372, api_key=None, settings=UserSettings(ui={'version': '0.6.18', 'directConnections': {'OPENAI_API_BASE_URLS': ['http://localhost:8081/v1', 'https://openrouter.ai/api/v1'], 'OPENAI_API_KEYS': ['aa', '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'], 'OPENAI_API_CONFIGS': {'0': {'enable': True, 'tags': [], 'prefix_id': '', 'model_ids': [], 'connection_type': 'external'}, '1': {'enable': True, 'tags': [], 'prefix_id': '', 'model_ids': ['qwen/qwen3-235b-a22b-07-25:free'], 'connection_type': 'external'}}}, 'ctrlEnterToSend': True, 'models': ['Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL'], 'imageCompressionSize': {'width': '', 'height': ''}, 'autoFollowUps': False}), info=None, oauth_sub=None))] 1 (0.0000)s - {} 2025-07-30 08:35:50.576 | DEBUG | open_webui.utils.models:get_all_models:308 - get_all_models() returned 14 models - {} 2025-07-30 08:35:50.576 | DEBUG | open_webui.main:get_models:1298 - /api/models returned filtered models accessible to the user: ["DeepSeek-R1-Distill-Qwen-14B-Q5_K_L", "DeepSeek-R1-Distill-Qwen-32B-Q5_K_M.gguf", "google_gemma-3-12b-it-Q6_K_L.gguf", "THUDM_GLM-4-32B-0414-IQ4_NL.gguf", "Qwen_Qwen3-14B-IQ4_NL.gguf", "Qwen_Qwen3-30B-A3B-IQ4_NL.gguf", "Qwen3-30B-A3B-UD-Q3_K_XL.gguf", "Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL", "Qwen2.5-14B-Instruct-Q6_K", "DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf", "deepseek-ai_DeepSeek-R1-0528-Qwen3-8B-Q5_K_L.gguf", "DeepSeek-R1-0528-Qwen3-8B-UD-Q5_K_XL.gguf", "Qwen3-14B-UD-Q4_K_XL", "arena-model"] - {} 2025-07-30 08:35:50.578 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 127.0.0.1:56914 - "GET /api/models HTTP/1.1" 200 - {} ``` ### Additional Information Having the <system prompt/files> removed from subsequent requests makes the model/server: - ignore (not aware of) the <system prompt/files> in those requests. - unable to optimally use kv cache I honestly don't know if this is intended behaviour, but it makes using open-webui a lot slower with local models (and limited resources/preprocessing speed). _edit: reword/reformat_

GiteaMirror added the bug label 2025-11-11 16:38:10 -06:00

GiteaMirror closed this issue

2025-11-11 16:38:10 -06:00

GiteaMirror commented

2025-11-11 16:38:11 -06:00

@tjbck commented on GitHub (Jul 30, 2025):

This has to do with model context length, related: https://docs.openwebui.com/troubleshooting/rag

@tjbck commented on GitHub (Jul 30, 2025): This has to do with model context length, related: https://docs.openwebui.com/troubleshooting/rag

GiteaMirror commented

2025-11-11 16:38:12 -06:00

@rgaricano commented on GitHub (Jul 30, 2025):

For local (ollama), also you can set num_keep advanced param to always keep in context first x tokens of conversation.

@rgaricano commented on GitHub (Jul 30, 2025): For local (ollama), also you can set _num_keep_ advanced param to always keep in context first x tokens of conversation.

GiteaMirror commented

2025-11-11 16:38:12 -06:00

@dariko commented on GitHub (Jul 30, 2025):

thank you for pointing me to "model context length"!

I raised Chat Controls -> num_keep (Ollama) (which was 24) to a number greater the the token count and now the behavior is what i expected.

I'm a little confused about why this parameter has Ollama in its name, if it is applied also to openapi-compatible endpoints.

@dariko commented on GitHub (Jul 30, 2025): thank you for pointing me to "model context length"! I raised `Chat Controls -> num_keep (Ollama)` (which was `24`) to a number greater the the token count and now the behavior is what i expected. I'm a little confused about why this parameter has `Ollama` in its name, if it is applied also to `openapi-compatible` endpoints.

GiteaMirror commented

2025-11-11 16:38:12 -06:00

@rgaricano commented on GitHub (Jul 30, 2025):

Maybe, because some openAI api compatible endpoints doesn't support it & could give errors (but others could to admit its, e.g. ollama's added as openAI API comp)

If I'm not wrong, official admitted openAI API params are:

model: Specifies the model to use (e.g., gpt-3.5-turbo).
prompt: The input text or sequence of prompts.
max_tokens: The maximum number of tokens to generate (e.g., 1500).
temperature: Controls randomness in the output (e.g., 0.7).
top_p: Controls diversity via nucleus sampling (e.g., 0.9).
n: Number of completions to generate for each prompt (e.g., 2).
stop: A list of tokens where the generation should stop.
presence_penalty: Encourages the model to include new topics.
frequency_penalty: Reduces repetition by penalizing repeated words

@rgaricano commented on GitHub (Jul 30, 2025): Maybe, because some openAI api compatible endpoints doesn't support it & could give errors (but others could to admit its, e.g. ollama's added as openAI API comp) If I'm not wrong, official admitted openAI API params are: - model: Specifies the model to use (e.g., gpt-3.5-turbo). - prompt: The input text or sequence of prompts. - max_tokens: The maximum number of tokens to generate (e.g., 1500). - temperature: Controls randomness in the output (e.g., 0.7). - top_p: Controls diversity via nucleus sampling (e.g., 0.9). - n: Number of completions to generate for each prompt (e.g., 2). - stop: A list of tokens where the generation should stop. - presence_penalty: Encourages the model to include new topics. - frequency_penalty: Reduces repetition by penalizing repeated words

GiteaMirror commented

2025-11-11 16:38:12 -06:00

@dariko commented on GitHub (Jul 30, 2025):

If i understand correctly num_keep (ollama) is not sent to the model but used locally when "generating" the prompt sent to the model.
I think so because i do not see the the default value (24) in the llama-server/slots dumps i collected when opening this issue.

slots_first_prompt.json

slots_second_prompt.json

$ grep -iE '(num|keep)' /tmp/slots_*
/tmp/slots_first_prompt.json:      "n_keep": 0,
/tmp/slots_first_prompt.json:      "min_keep": 0,
/tmp/slots_second_prompt.json:      "n_keep": 0,
/tmp/slots_second_prompt.json:      "min_keep": 0,

edit: add dumps grep

@dariko commented on GitHub (Jul 30, 2025): If i understand correctly `num_keep (ollama)` is not sent to the model but used locally when "generating" the prompt sent to the model. I think so because i do not see the the default value (`24`) in the `llama-server/slots` dumps i collected when opening this issue. [slots_first_prompt.json](https://github.com/user-attachments/files/21506368/slots_first_prompt.json) [slots_second_prompt.json](https://github.com/user-attachments/files/21506373/slots_second_prompt.json) ``` $ grep -iE '(num|keep)' /tmp/slots_* /tmp/slots_first_prompt.json: "n_keep": 0, /tmp/slots_first_prompt.json: "min_keep": 0, /tmp/slots_second_prompt.json: "n_keep": 0, /tmp/slots_second_prompt.json: "min_keep": 0, ``` _edit: add dumps grep_

GiteaMirror commented

2025-11-11 16:38:13 -06:00

@rgaricano commented on GitHub (Jul 30, 2025):

openAI advance params mapping:
b8da4a8cd8/backend/open_webui/utils/payload.py (L102-L114)

ollama's advance params mapping:
b8da4a8cd8/backend/open_webui/utils/payload.py (L148-L176)

But in openAI API comp also you can add its as custom params if need it.

@rgaricano commented on GitHub (Jul 30, 2025): openAI advance params mapping: https://github.com/open-webui/open-webui/blob/b8da4a8cd8257d4846f3608e299618a0b4f185ed/backend/open_webui/utils/payload.py#L102-L114 ollama's advance params mapping: https://github.com/open-webui/open-webui/blob/b8da4a8cd8257d4846f3608e299618a0b4f185ed/backend/open_webui/utils/payload.py#L148-L176 But in openAI API comp also you can add its as custom params if need it.

GiteaMirror referenced this issue

2026-04-19 20:37:28 -05:00

[GH-ISSUE #5910] feat: UI scale settings - smaller text (especially on mobile) #14171

GiteaMirror referenced this issue

2026-04-25 04:06:37 -05:00

[GH-ISSUE #5910] feat: UI scale settings - smaller text (especially on mobile) #29699

GiteaMirror referenced this issue

2026-05-05 13:59:39 -05:00

[GH-ISSUE #5910] feat: UI scale settings - smaller text (especially on mobile) #52837

Sign in to join this conversation.