issue: preserve context in multi-turn chat #5910

Closed
opened 2025-11-11 16:38:10 -06:00 by GiteaMirror · 6 comments
Owner

Originally created by @dariko on GitHub (Jul 30, 2025).

Check Existing Issues

  • I have searched the existing issues and discussions.
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v143

Ollama Version (if applicable)

No response

Operating System

debian 12

Browser (if applicable)

firefox 135

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

When running a chat with multiple turns, all of the context from the first request should be sent in subsequent requests.

Actual Behavior

When running a chat with multiple turns, system prompt and files attached to the first prompt are removed from the prompt when submitting following requests.

Steps to Reproduce

  • create a file /tmp/file.txt containing a string ("the cat is on the table")
  • create a new chat in open-webui
  • attach /tmp/file.txt, write first prompt, ctrl+enter
  • the chat responds
  • submitting another message (second prompt)

Logs & Screenshots

chat screenshot:

Image

prompts received by llama-server (dumped using /slots):
first prompt:

<|im_start|>system\n### Task:\nRespond to the user query using the provided context, incorporating inline citations in the format [id] **only when the <source> tag includes an explicit id attribute** (e.g., <source id=\"1\">).\n\n### Guidelines:\n- If you don't know the answer, clearly state that.\n- If uncertain, ask the user for clarification.\n- Respond in the same language as the user's query.\n- If the context is unreadable or of poor quality, inform the user and provide the best possible answer.\n- If the answer isn't present in the context but you possess the knowledge, explain this to the user and provide the answer using your own understanding.\n- **Only include inline citations using [id] (e.g., [1], [2]) when the <source> tag includes an id attribute.**\n- Do not cite if the <source> tag does not contain an id attribute.\n- Do not use XML tags in your response.\n- Ensure citations are concise and directly related to the information provided.\n\n### Example of Citation:\nIf the user asks about a specific topic and the information is found in a source with a provided id attribute, the response should include the citation like in the following example:\n* \"According to the study, the proposed method increases efficiency by 20% [1].\"\n\n### Output:\nProvide a clear and direct response to the user's query, including inline citations in the format [id] only when the <source> tag with id attribute is present in the context.\n\n<context>\n<source id=\"1\" name=\"file.txt\">the cat is on the table\n</source>\n</context>\n\n<user_query>\nfirst prompt\n</user_query>\n<|im_end|>\n<|im_start|>user\nfirst prompt<|im_end|>\n<|im_start|>assistant\n

second prompt:

<|im_start|>user\nfirst prompt<|im_end|>\n<|im_start|>assistant\nThe cat is on the table [1].<|im_end|>\n<|im_start|>user\nsecond prompt<|im_end|>\n<|im_start|>assistant\n

container log

2025-07-30 08:35:49.713 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 127.0.0.1:56906 - "GET / HTTP/1.1" 200 - {}
2025-07-30 08:35:50.555 | DEBUG    | aiocache.base:get:201 - GET open_webui.routers.openaiget_all_models(<starlette.requests.Request object at 0x7f5dbbf7b790>,)[('user', UserModel(id='8376a5ec-9044-4fcf-9c4b-f9fac59a1eb3', name='User', email='admin@localhost', role='admin', profile_image_url='/user.png', last_active_at=1753864535, updated_at=1737372372, created_at=1737372372, api_key=None, settings=UserSettings(ui={'version': '0.6.18', 'directConnections': {'OPENAI_API_BASE_URLS': ['http://localhost:8081/v1', 'https://openrouter.ai/api/v1'], 'OPENAI_API_KEYS': ['aa', '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'], 'OPENAI_API_CONFIGS': {'0': {'enable': True, 'tags': [], 'prefix_id': '', 'model_ids': [], 'connection_type': 'external'}, '1': {'enable': True, 'tags': [], 'prefix_id': '', 'model_ids': ['qwen/qwen3-235b-a22b-07-25:free'], 'connection_type': 'external'}}}, 'ctrlEnterToSend': True, 'models': ['Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL'], 'imageCompressionSize': {'width': '', 'height': ''}, 'autoFollowUps': False}), info=None, oauth_sub=None))] False (0.0000)s - {}
2025-07-30 08:35:50.555 | INFO     | open_webui.routers.openai:get_all_models:392 - get_all_models() - {}
2025-07-30 08:35:50.557 | DEBUG    | open_webui.routers.openai:get_all_models_responses:373 - get_all_models:responses() [{'data': [{'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Qwen-14B-Q5_K_L', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Qwen-32B-Q5_K_M.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'google_gemma-3-12b-it-Q6_K_L.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'THUDM_GLM-4-32B-0414-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'Qwen_Qwen3-14B-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'Qwen_Qwen3-30B-A3B-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'Qwen3-30B-A3B-UD-Q3_K_XL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'Qwen2.5-14B-Instruct-Q6_K', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'deepseek-ai_DeepSeek-R1-0528-Qwen3-8B-Q5_K_L.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'DeepSeek-R1-0528-Qwen3-8B-UD-Q5_K_XL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'Qwen3-14B-UD-Q4_K_XL', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}], 'object': 'list'}, None] - {}
2025-07-30 08:35:50.558 | DEBUG    | open_webui.routers.openai:merge_models_lists:407 - merge_models_lists <map object at 0x7f5dbbf93e80> - {}
2025-07-30 08:35:50.558 | DEBUG    | open_webui.routers.openai:get_all_models:446 - models: {'data': [{'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Qwen-14B-Q5_K_L', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'DeepSeek-R1-Distill-Qwen-14B-Q5_K_L', 'openai': {'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Qwen-14B-Q5_K_L', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Qwen-32B-Q5_K_M.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'DeepSeek-R1-Distill-Qwen-32B-Q5_K_M.gguf', 'openai': {'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Qwen-32B-Q5_K_M.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'google_gemma-3-12b-it-Q6_K_L.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'google_gemma-3-12b-it-Q6_K_L.gguf', 'openai': {'created': 1753864550, 'id': 'google_gemma-3-12b-it-Q6_K_L.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'THUDM_GLM-4-32B-0414-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'THUDM_GLM-4-32B-0414-IQ4_NL.gguf', 'openai': {'created': 1753864550, 'id': 'THUDM_GLM-4-32B-0414-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'Qwen_Qwen3-14B-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'Qwen_Qwen3-14B-IQ4_NL.gguf', 'openai': {'created': 1753864550, 'id': 'Qwen_Qwen3-14B-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'Qwen_Qwen3-30B-A3B-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'Qwen_Qwen3-30B-A3B-IQ4_NL.gguf', 'openai': {'created': 1753864550, 'id': 'Qwen_Qwen3-30B-A3B-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'Qwen3-30B-A3B-UD-Q3_K_XL.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'Qwen3-30B-A3B-UD-Q3_K_XL.gguf', 'openai': {'created': 1753864550, 'id': 'Qwen3-30B-A3B-UD-Q3_K_XL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL', 'openai': {'created': 1753864550, 'id': 'Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'Qwen2.5-14B-Instruct-Q6_K', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'Qwen2.5-14B-Instruct-Q6_K', 'openai': {'created': 1753864550, 'id': 'Qwen2.5-14B-Instruct-Q6_K', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf', 'openai': {'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'deepseek-ai_DeepSeek-R1-0528-Qwen3-8B-Q5_K_L.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'deepseek-ai_DeepSeek-R1-0528-Qwen3-8B-Q5_K_L.gguf', 'openai': {'created': 1753864550, 'id': 'deepseek-ai_DeepSeek-R1-0528-Qwen3-8B-Q5_K_L.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'DeepSeek-R1-0528-Qwen3-8B-UD-Q5_K_XL.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'DeepSeek-R1-0528-Qwen3-8B-UD-Q5_K_XL.gguf', 'openai': {'created': 1753864550, 'id': 'DeepSeek-R1-0528-Qwen3-8B-UD-Q5_K_XL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'Qwen3-14B-UD-Q4_K_XL', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'Qwen3-14B-UD-Q4_K_XL', 'openai': {'created': 1753864550, 'id': 'Qwen3-14B-UD-Q4_K_XL', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}]} - {}
2025-07-30 08:35:50.558 | DEBUG    | aiocache.base:set:280 - SET open_webui.routers.openaiget_all_models(<starlette.requests.Request object at 0x7f5dbbf7b790>,)[('user', UserModel(id='8376a5ec-9044-4fcf-9c4b-f9fac59a1eb3', name='User', email='admin@localhost', role='admin', profile_image_url='/user.png', last_active_at=1753864535, updated_at=1737372372, created_at=1737372372, api_key=None, settings=UserSettings(ui={'version': '0.6.18', 'directConnections': {'OPENAI_API_BASE_URLS': ['http://localhost:8081/v1', 'https://openrouter.ai/api/v1'], 'OPENAI_API_KEYS': ['aa', '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'], 'OPENAI_API_CONFIGS': {'0': {'enable': True, 'tags': [], 'prefix_id': '', 'model_ids': [], 'connection_type': 'external'}, '1': {'enable': True, 'tags': [], 'prefix_id': '', 'model_ids': ['qwen/qwen3-235b-a22b-07-25:free'], 'connection_type': 'external'}}}, 'ctrlEnterToSend': True, 'models': ['Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL'], 'imageCompressionSize': {'width': '', 'height': ''}, 'autoFollowUps': False}), info=None, oauth_sub=None))] 1 (0.0000)s - {}
2025-07-30 08:35:50.576 | DEBUG    | open_webui.utils.models:get_all_models:308 - get_all_models() returned 14 models - {}
2025-07-30 08:35:50.576 | DEBUG    | open_webui.main:get_models:1298 - /api/models returned filtered models accessible to the user: ["DeepSeek-R1-Distill-Qwen-14B-Q5_K_L", "DeepSeek-R1-Distill-Qwen-32B-Q5_K_M.gguf", "google_gemma-3-12b-it-Q6_K_L.gguf", "THUDM_GLM-4-32B-0414-IQ4_NL.gguf", "Qwen_Qwen3-14B-IQ4_NL.gguf", "Qwen_Qwen3-30B-A3B-IQ4_NL.gguf", "Qwen3-30B-A3B-UD-Q3_K_XL.gguf", "Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL", "Qwen2.5-14B-Instruct-Q6_K", "DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf", "deepseek-ai_DeepSeek-R1-0528-Qwen3-8B-Q5_K_L.gguf", "DeepSeek-R1-0528-Qwen3-8B-UD-Q5_K_XL.gguf", "Qwen3-14B-UD-Q4_K_XL", "arena-model"] - {}
2025-07-30 08:35:50.578 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 127.0.0.1:56914 - "GET /api/models HTTP/1.1" 200 - {}

Additional Information

Having the <system prompt/files> removed from subsequent requests makes the model/server:

  • ignore (not aware of) the <system prompt/files> in those requests.
  • unable to optimally use kv cache

I honestly don't know if this is intended behaviour, but it makes using open-webui a lot slower with local models (and limited resources/preprocessing speed).

edit: reword/reformat

Originally created by @dariko on GitHub (Jul 30, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v143 ### Ollama Version (if applicable) _No response_ ### Operating System debian 12 ### Browser (if applicable) firefox 135 ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior When running a chat with multiple turns, all of the context from the first request should be sent in subsequent requests. ### Actual Behavior When running a chat with multiple turns, system prompt and files attached to the first prompt are removed from the prompt when submitting following requests. ### Steps to Reproduce - create a file `/tmp/file.txt` containing a string ("the cat is on the table") - create a new chat in open-webui - attach `/tmp/file.txt`, write `first prompt`, ctrl+enter - the chat responds - submitting another message (`second prompt`) ### Logs & Screenshots chat screenshot: <img width="1414" height="636" alt="Image" src="https://github.com/user-attachments/assets/0a245fee-c249-46cb-b439-5263316f3bc0" /> prompts received by llama-server (dumped using `/slots`): first prompt: ``` <|im_start|>system\n### Task:\nRespond to the user query using the provided context, incorporating inline citations in the format [id] **only when the <source> tag includes an explicit id attribute** (e.g., <source id=\"1\">).\n\n### Guidelines:\n- If you don't know the answer, clearly state that.\n- If uncertain, ask the user for clarification.\n- Respond in the same language as the user's query.\n- If the context is unreadable or of poor quality, inform the user and provide the best possible answer.\n- If the answer isn't present in the context but you possess the knowledge, explain this to the user and provide the answer using your own understanding.\n- **Only include inline citations using [id] (e.g., [1], [2]) when the <source> tag includes an id attribute.**\n- Do not cite if the <source> tag does not contain an id attribute.\n- Do not use XML tags in your response.\n- Ensure citations are concise and directly related to the information provided.\n\n### Example of Citation:\nIf the user asks about a specific topic and the information is found in a source with a provided id attribute, the response should include the citation like in the following example:\n* \"According to the study, the proposed method increases efficiency by 20% [1].\"\n\n### Output:\nProvide a clear and direct response to the user's query, including inline citations in the format [id] only when the <source> tag with id attribute is present in the context.\n\n<context>\n<source id=\"1\" name=\"file.txt\">the cat is on the table\n</source>\n</context>\n\n<user_query>\nfirst prompt\n</user_query>\n<|im_end|>\n<|im_start|>user\nfirst prompt<|im_end|>\n<|im_start|>assistant\n ``` second prompt: ``` <|im_start|>user\nfirst prompt<|im_end|>\n<|im_start|>assistant\nThe cat is on the table [1].<|im_end|>\n<|im_start|>user\nsecond prompt<|im_end|>\n<|im_start|>assistant\n ``` --- container log ``` 2025-07-30 08:35:49.713 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 127.0.0.1:56906 - "GET / HTTP/1.1" 200 - {} 2025-07-30 08:35:50.555 | DEBUG | aiocache.base:get:201 - GET open_webui.routers.openaiget_all_models(<starlette.requests.Request object at 0x7f5dbbf7b790>,)[('user', UserModel(id='8376a5ec-9044-4fcf-9c4b-f9fac59a1eb3', name='User', email='admin@localhost', role='admin', profile_image_url='/user.png', last_active_at=1753864535, updated_at=1737372372, created_at=1737372372, api_key=None, settings=UserSettings(ui={'version': '0.6.18', 'directConnections': {'OPENAI_API_BASE_URLS': ['http://localhost:8081/v1', 'https://openrouter.ai/api/v1'], 'OPENAI_API_KEYS': ['aa', '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'], 'OPENAI_API_CONFIGS': {'0': {'enable': True, 'tags': [], 'prefix_id': '', 'model_ids': [], 'connection_type': 'external'}, '1': {'enable': True, 'tags': [], 'prefix_id': '', 'model_ids': ['qwen/qwen3-235b-a22b-07-25:free'], 'connection_type': 'external'}}}, 'ctrlEnterToSend': True, 'models': ['Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL'], 'imageCompressionSize': {'width': '', 'height': ''}, 'autoFollowUps': False}), info=None, oauth_sub=None))] False (0.0000)s - {} 2025-07-30 08:35:50.555 | INFO | open_webui.routers.openai:get_all_models:392 - get_all_models() - {} 2025-07-30 08:35:50.557 | DEBUG | open_webui.routers.openai:get_all_models_responses:373 - get_all_models:responses() [{'data': [{'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Qwen-14B-Q5_K_L', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Qwen-32B-Q5_K_M.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'google_gemma-3-12b-it-Q6_K_L.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'THUDM_GLM-4-32B-0414-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'Qwen_Qwen3-14B-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'Qwen_Qwen3-30B-A3B-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'Qwen3-30B-A3B-UD-Q3_K_XL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'Qwen2.5-14B-Instruct-Q6_K', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'deepseek-ai_DeepSeek-R1-0528-Qwen3-8B-Q5_K_L.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'DeepSeek-R1-0528-Qwen3-8B-UD-Q5_K_XL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, {'created': 1753864550, 'id': 'Qwen3-14B-UD-Q4_K_XL', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}], 'object': 'list'}, None] - {} 2025-07-30 08:35:50.558 | DEBUG | open_webui.routers.openai:merge_models_lists:407 - merge_models_lists <map object at 0x7f5dbbf93e80> - {} 2025-07-30 08:35:50.558 | DEBUG | open_webui.routers.openai:get_all_models:446 - models: {'data': [{'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Qwen-14B-Q5_K_L', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'DeepSeek-R1-Distill-Qwen-14B-Q5_K_L', 'openai': {'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Qwen-14B-Q5_K_L', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Qwen-32B-Q5_K_M.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'DeepSeek-R1-Distill-Qwen-32B-Q5_K_M.gguf', 'openai': {'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Qwen-32B-Q5_K_M.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'google_gemma-3-12b-it-Q6_K_L.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'google_gemma-3-12b-it-Q6_K_L.gguf', 'openai': {'created': 1753864550, 'id': 'google_gemma-3-12b-it-Q6_K_L.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'THUDM_GLM-4-32B-0414-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'THUDM_GLM-4-32B-0414-IQ4_NL.gguf', 'openai': {'created': 1753864550, 'id': 'THUDM_GLM-4-32B-0414-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'Qwen_Qwen3-14B-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'Qwen_Qwen3-14B-IQ4_NL.gguf', 'openai': {'created': 1753864550, 'id': 'Qwen_Qwen3-14B-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'Qwen_Qwen3-30B-A3B-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'Qwen_Qwen3-30B-A3B-IQ4_NL.gguf', 'openai': {'created': 1753864550, 'id': 'Qwen_Qwen3-30B-A3B-IQ4_NL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'Qwen3-30B-A3B-UD-Q3_K_XL.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'Qwen3-30B-A3B-UD-Q3_K_XL.gguf', 'openai': {'created': 1753864550, 'id': 'Qwen3-30B-A3B-UD-Q3_K_XL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL', 'openai': {'created': 1753864550, 'id': 'Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'Qwen2.5-14B-Instruct-Q6_K', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'Qwen2.5-14B-Instruct-Q6_K', 'openai': {'created': 1753864550, 'id': 'Qwen2.5-14B-Instruct-Q6_K', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf', 'openai': {'created': 1753864550, 'id': 'DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'deepseek-ai_DeepSeek-R1-0528-Qwen3-8B-Q5_K_L.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'deepseek-ai_DeepSeek-R1-0528-Qwen3-8B-Q5_K_L.gguf', 'openai': {'created': 1753864550, 'id': 'deepseek-ai_DeepSeek-R1-0528-Qwen3-8B-Q5_K_L.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'DeepSeek-R1-0528-Qwen3-8B-UD-Q5_K_XL.gguf', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'DeepSeek-R1-0528-Qwen3-8B-UD-Q5_K_XL.gguf', 'openai': {'created': 1753864550, 'id': 'DeepSeek-R1-0528-Qwen3-8B-UD-Q5_K_XL.gguf', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}, {'created': 1753864550, 'id': 'Qwen3-14B-UD-Q4_K_XL', 'object': 'model', 'owned_by': 'openai', 'connection_type': 'external', 'name': 'Qwen3-14B-UD-Q4_K_XL', 'openai': {'created': 1753864550, 'id': 'Qwen3-14B-UD-Q4_K_XL', 'object': 'model', 'owned_by': 'llama-swap', 'connection_type': 'external'}, 'urlIdx': 0}]} - {} 2025-07-30 08:35:50.558 | DEBUG | aiocache.base:set:280 - SET open_webui.routers.openaiget_all_models(<starlette.requests.Request object at 0x7f5dbbf7b790>,)[('user', UserModel(id='8376a5ec-9044-4fcf-9c4b-f9fac59a1eb3', name='User', email='admin@localhost', role='admin', profile_image_url='/user.png', last_active_at=1753864535, updated_at=1737372372, created_at=1737372372, api_key=None, settings=UserSettings(ui={'version': '0.6.18', 'directConnections': {'OPENAI_API_BASE_URLS': ['http://localhost:8081/v1', 'https://openrouter.ai/api/v1'], 'OPENAI_API_KEYS': ['aa', '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'], 'OPENAI_API_CONFIGS': {'0': {'enable': True, 'tags': [], 'prefix_id': '', 'model_ids': [], 'connection_type': 'external'}, '1': {'enable': True, 'tags': [], 'prefix_id': '', 'model_ids': ['qwen/qwen3-235b-a22b-07-25:free'], 'connection_type': 'external'}}}, 'ctrlEnterToSend': True, 'models': ['Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL'], 'imageCompressionSize': {'width': '', 'height': ''}, 'autoFollowUps': False}), info=None, oauth_sub=None))] 1 (0.0000)s - {} 2025-07-30 08:35:50.576 | DEBUG | open_webui.utils.models:get_all_models:308 - get_all_models() returned 14 models - {} 2025-07-30 08:35:50.576 | DEBUG | open_webui.main:get_models:1298 - /api/models returned filtered models accessible to the user: ["DeepSeek-R1-Distill-Qwen-14B-Q5_K_L", "DeepSeek-R1-Distill-Qwen-32B-Q5_K_M.gguf", "google_gemma-3-12b-it-Q6_K_L.gguf", "THUDM_GLM-4-32B-0414-IQ4_NL.gguf", "Qwen_Qwen3-14B-IQ4_NL.gguf", "Qwen_Qwen3-30B-A3B-IQ4_NL.gguf", "Qwen3-30B-A3B-UD-Q3_K_XL.gguf", "Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL", "Qwen2.5-14B-Instruct-Q6_K", "DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf", "deepseek-ai_DeepSeek-R1-0528-Qwen3-8B-Q5_K_L.gguf", "DeepSeek-R1-0528-Qwen3-8B-UD-Q5_K_XL.gguf", "Qwen3-14B-UD-Q4_K_XL", "arena-model"] - {} 2025-07-30 08:35:50.578 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 127.0.0.1:56914 - "GET /api/models HTTP/1.1" 200 - {} ``` ### Additional Information Having the <system prompt/files> removed from subsequent requests makes the model/server: - ignore (not aware of) the <system prompt/files> in those requests. - unable to optimally use kv cache I honestly don't know if this is intended behaviour, but it makes using open-webui a lot slower with local models (and limited resources/preprocessing speed). _edit: reword/reformat_
GiteaMirror added the bug label 2025-11-11 16:38:10 -06:00
Author
Owner

@tjbck commented on GitHub (Jul 30, 2025):

This has to do with model context length, related: https://docs.openwebui.com/troubleshooting/rag

@tjbck commented on GitHub (Jul 30, 2025): This has to do with model context length, related: https://docs.openwebui.com/troubleshooting/rag
Author
Owner

@rgaricano commented on GitHub (Jul 30, 2025):

For local (ollama), also you can set num_keep advanced param to always keep in context first x tokens of conversation.

@rgaricano commented on GitHub (Jul 30, 2025): For local (ollama), also you can set _num_keep_ advanced param to always keep in context first x tokens of conversation.
Author
Owner

@dariko commented on GitHub (Jul 30, 2025):

thank you for pointing me to "model context length"!

I raised Chat Controls -> num_keep (Ollama) (which was 24) to a number greater the the token count and now the behavior is what i expected.

I'm a little confused about why this parameter has Ollama in its name, if it is applied also to openapi-compatible endpoints.

@dariko commented on GitHub (Jul 30, 2025): thank you for pointing me to "model context length"! I raised `Chat Controls -> num_keep (Ollama)` (which was `24`) to a number greater the the token count and now the behavior is what i expected. I'm a little confused about why this parameter has `Ollama` in its name, if it is applied also to `openapi-compatible` endpoints.
Author
Owner

@rgaricano commented on GitHub (Jul 30, 2025):

Maybe, because some openAI api compatible endpoints doesn't support it & could give errors (but others could to admit its, e.g. ollama's added as openAI API comp)

If I'm not wrong, official admitted openAI API params are:

  • model: Specifies the model to use (e.g., gpt-3.5-turbo).
  • prompt: The input text or sequence of prompts.
  • max_tokens: The maximum number of tokens to generate (e.g., 1500).
  • temperature: Controls randomness in the output (e.g., 0.7).
  • top_p: Controls diversity via nucleus sampling (e.g., 0.9).
  • n: Number of completions to generate for each prompt (e.g., 2).
  • stop: A list of tokens where the generation should stop.
  • presence_penalty: Encourages the model to include new topics.
  • frequency_penalty: Reduces repetition by penalizing repeated words
@rgaricano commented on GitHub (Jul 30, 2025): Maybe, because some openAI api compatible endpoints doesn't support it & could give errors (but others could to admit its, e.g. ollama's added as openAI API comp) If I'm not wrong, official admitted openAI API params are: - model: Specifies the model to use (e.g., gpt-3.5-turbo). - prompt: The input text or sequence of prompts. - max_tokens: The maximum number of tokens to generate (e.g., 1500). - temperature: Controls randomness in the output (e.g., 0.7). - top_p: Controls diversity via nucleus sampling (e.g., 0.9). - n: Number of completions to generate for each prompt (e.g., 2). - stop: A list of tokens where the generation should stop. - presence_penalty: Encourages the model to include new topics. - frequency_penalty: Reduces repetition by penalizing repeated words
Author
Owner

@dariko commented on GitHub (Jul 30, 2025):

If i understand correctly num_keep (ollama) is not sent to the model but used locally when "generating" the prompt sent to the model.
I think so because i do not see the the default value (24) in the llama-server/slots dumps i collected when opening this issue.

slots_first_prompt.json

slots_second_prompt.json

$ grep -iE '(num|keep)' /tmp/slots_*
/tmp/slots_first_prompt.json:      "n_keep": 0,
/tmp/slots_first_prompt.json:      "min_keep": 0,
/tmp/slots_second_prompt.json:      "n_keep": 0,
/tmp/slots_second_prompt.json:      "min_keep": 0,

edit: add dumps grep

@dariko commented on GitHub (Jul 30, 2025): If i understand correctly `num_keep (ollama)` is not sent to the model but used locally when "generating" the prompt sent to the model. I think so because i do not see the the default value (`24`) in the `llama-server/slots` dumps i collected when opening this issue. [slots_first_prompt.json](https://github.com/user-attachments/files/21506368/slots_first_prompt.json) [slots_second_prompt.json](https://github.com/user-attachments/files/21506373/slots_second_prompt.json) ``` $ grep -iE '(num|keep)' /tmp/slots_* /tmp/slots_first_prompt.json: "n_keep": 0, /tmp/slots_first_prompt.json: "min_keep": 0, /tmp/slots_second_prompt.json: "n_keep": 0, /tmp/slots_second_prompt.json: "min_keep": 0, ``` _edit: add dumps grep_
Author
Owner

@rgaricano commented on GitHub (Jul 30, 2025):

openAI advance params mapping:
b8da4a8cd8/backend/open_webui/utils/payload.py (L102-L114)

ollama's advance params mapping:
b8da4a8cd8/backend/open_webui/utils/payload.py (L148-L176)

But in openAI API comp also you can add its as custom params if need it.

@rgaricano commented on GitHub (Jul 30, 2025): openAI advance params mapping: https://github.com/open-webui/open-webui/blob/b8da4a8cd8257d4846f3608e299618a0b4f185ed/backend/open_webui/utils/payload.py#L102-L114 ollama's advance params mapping: https://github.com/open-webui/open-webui/blob/b8da4a8cd8257d4846f3608e299618a0b4f185ed/backend/open_webui/utils/payload.py#L148-L176 But in openAI API comp also you can add its as custom params if need it.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#5910