deepseek70B 32B's answer is confusion #3754

New Issue

GiteaMirror · 2025-11-11T15:39:10-06:00

GiteaMirror commented

2025-11-11 15:39:10 -06:00

Originally created by @aodexiusi1997 on GitHub (Feb 11, 2025).

Bug Report

Installation Method

pip

Environment

**Open WebUI Version: 0.5.10
**Ollama (if applicable):0.5.8-rc11
**Operating System:Ubuntu 20.04
**Browser (if applicable):edge

Confirmation:

[yes ] I have read and followed all the instructions provided in the README.md.
[❌] I am on the latest version of both Open WebUI and Ollama.
[❌] I have included the browser console logs.
[❌] I have included the Docker container logs.
[yes ] I have provided the exact steps to reproduce the bug in the "Steps to Reproduce" section below.

Expected Behavior:

Thank you, my friends. I have tried DeepSEEK32b and DeepSEEK70b, and both have encountered this issue. The result I hope for is to be able to display the output of the model correctly, rather than sometimes interrupting a single answer without any results, or requiring the model to continue asking the same question before answering the previous one. At the same time, the answer interface regarding the entire COT is not displayed correctly, only the latter half.

Actual Behavior:

As I mentioned earlier, there are 2/3 cases. Here are some screenshots. If you have relevant documents that solve this problem, I may not have seen them. I am very sorry, and I hope you can point them out.

Description

Bug Summary:

Deepseek's COT results are incomplete
Sometimes there are situations where answers do not provide results
Sending the same sentence multiple times only gives the result of the user's earlier question

Reproduction Details

1、host PC ubuntu20.04
2、download ollama (docker)0.5.8-rc11 than, in the containter install miniforge->create python=3.12->pip install open-webui
3、get deepseek 32b or 70b model(gguf),from hugging or modelscope!
4、ollama run and run open-webui

Logs and Screenshots

Browser Console Logs:

None

Docker Container Logs:

time=2025-02-11T15:21:18.146+08:00 llama_model_loader: llama_model_loader: llama_model_loader: - kv 0: llama_model_loader: - kv 1: llama_model_loader: - kv 2: llama_model_loader: - kv 3: llama_model_loader: - kv 4: llama_model_loader: - kv 5: llama_model_loader: - kv 6: llama_model_loader: - kv 7: llama_model_loader: - kv 8: llama_model_loader: - kv 9: llama_model_loader: - kv 10: llama_model_loader: - kv 11: llama_model_loader: - kv 12: llama_model_loader: - kv 13: llama_model_loader: - kv 14: llama_model_loader: - kv 15: llama_model_loader: - kv 16: llama_model_loader: - kv 17: llama_model_loader: - kv 18: llama_model_loader: - kv 19: llama_model_loader: - kv 20: llama_model_loader: - kv 21: llama_model_loader: - kv 22: llama_model_loader: - kv 23: llama_model_loader: - kv 24: llama_model_loader: - kv 25: llama_model_loader: - kv 26: llama_model_loader: - kv 27: llama_model_loader: - kv 28: llama_model_loader: - kv 29: llama_model_loader: - kv 30: llama_model_loader: - kv 31: llama_model_loader: - kv 32: llama_model_loader: - kv 33: llama_model_loader: - kv 34: llama_model_loader: - kv 35: llama_model_loader: - kv 36: llama_model_loader: - kv 37: llama_model_loader: - kv 38: llama_model_loader: - type llama_model_loader: - type q3_K: llama_model_loader: - type q4_K: llama_model_loader: - type q5_K: llama_model_loader: - type q6_K: llm_load_vocab: special_eos_id llm_load_vocab: special llm_load_vocab: token llm_load_print_meta: format llm_load_print_meta: arch llm_load_print_meta: vocab type llm_load_print_meta: n_vocab llm_load_print_meta: n_merges llm_load_print_meta: vocab_only llm_load_print_meta: model type llm_load_print_meta: model ftype llm_load_print_meta: model params llm_load_print_meta: model size llm_load_print_meta: general.name llm_load_print_meta: BOS token llm_load_print_meta: EOS token llm_load_print_meta: EOT token llm_load_print_meta: EOM token llm_load_print_meta: PAD token llm_load_print_meta: LF token llm_load_print_meta: EOG token llm_load_print_meta: EOG token llm_load_print_meta: EOG token llm_load_print_meta: llama_model_load: vocab [GIN] 2025/02/11 - 15:21:20 | 200 | level=INFO source=server.go:597 msg="llama runner started in 5.77 seconds"
loaded meta data with 39 key-value pairs and 724 tensors from /root/.ollama/models/blobs/sha256-61ed3f9e10925faa36f64996b41bfc0ee9da5f5002b63459d3f918e155fbed88 (version GGUF V3 (latest))
Dumping metadata keys/values. Note: KV overrides do not apply in this output.
general.architecture str = llama
general.type str = model
general.name str = DeepSeek R1 Distill Llama 70B
general.organization str = Deepseek Ai
general.basename str = DeepSeek-R1-Distill-Llama
general.size_label str = 70B
general.license str = llama3.3
general.base_model.count u32 = 1
general.base_model.0.name str = DeepSeek R1 Distill Llama 70B
general.base_model.0.organization str = Deepseek Ai
general.base_model.0.repo_url str = https://huggingface.co/deepseek-ai/De...
general.tags arr[str,6] = ["deepseek", "unsloth", "transformers...
general.languages arr[str,1] = ["en"]
llama.block_count u32 = 80
llama.context_length u32 = 131072
llama.embedding_length u32 = 8192
llama.feed_forward_length u32 = 28672
llama.attention.head_count u32 = 64
llama.attention.head_count_kv u32 = 8
llama.rope.freq_base f32 = 500000.000000
llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama.attention.key_length u32 = 128
llama.attention.value_length u32 = 128
llama.vocab_size u32 = 128256
llama.rope.dimension_count u32 = 128
tokenizer.ggml.model str = gpt2
tokenizer.ggml.pre str = llama-bpe
tokenizer.ggml.tokens arr[str,128256] = ["!", """, "#", "$", "%", "&", "'", ...
tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
tokenizer.ggml.bos_token_id u32 = 128000
tokenizer.ggml.eos_token_id u32 = 128001
tokenizer.ggml.padding_token_id u32 = 128004
tokenizer.ggml.add_bos_token bool = true
tokenizer.ggml.add_eos_token bool = false
tokenizer.chat_template str = {% if not add_generation_prompt is de...
tokenizer.ggml.add_space_prefix bool = false
general.quantization_version u32 = 2
general.file_type u32 = 12
f32: 162 tensors
321 tensors
155 tensors
85 tensors
1 tensors
is not in special_eog_ids - the tokenizer config may be incorrect
tokens cache size = 256
to piece cache size = 0.7999 MB
= GGUF V3 (latest)
= llama
= BPE
= 128256
= 280147
= 1
= ?B
= all F32
= 70.55 B
= 31.91 GiB (3.88 BPW)
= DeepSeek R1 Distill Llama 70B
= 128000 '<｜begin▁of▁sentence｜>'
= 128001 '<｜end▁of▁sentence｜>'
= 128009 '<|eot_id|>'
= 128008 '<|eom_id|>'
= 128004 '<|finetune_right_pad_id|>'
= 128 'Ä'
= 128001 '<｜end▁of▁sentence｜>'
= 128008 '<|eom_id|>'
= 128009 '<|eot_id|>'
max token length = 256
only - skipping tensors
8.592600692s | 127.0.0.1 | POST "/api/chat"

Screenshots/Screen Recordings (if applicable):
[Attach any relevant screenshots to help illustrate the issue]

Additional Information

None

Note

If the bug report is incomplete or does not follow the provided instructions, it may not be addressed. Please ensure that you have followed the steps outlined in the README.md and troubleshooting.md documents, and provide all necessary information for us to reproduce and address the issue. Thank you!

Originally created by @aodexiusi1997 on GitHub (Feb 11, 2025). # Bug Report --- ## Installation Method pip ## Environment - **Open WebUI Version: 0.5.10 - **Ollama (if applicable):[0.5.8-rc11](https://docker.aityp.com/image/docker.io/ollama/ollama:0.5.8-rc11) - **Operating System:Ubuntu 20.04 - **Browser (if applicable):edge **Confirmation:** - [yes ] I have read and followed all the instructions provided in the README.md. - [❌] I am on the latest version of both Open WebUI and Ollama. - [❌] I have included the browser console logs. - [❌] I have included the Docker container logs. - [yes ] I have provided the exact steps to reproduce the bug in the "Steps to Reproduce" section below. ## Expected Behavior: Thank you, my friends. I have tried DeepSEEK32b and DeepSEEK70b, and both have encountered this issue. The result I hope for is to be able to display the output of the model correctly, rather than sometimes interrupting a single answer without any results, or requiring the model to continue asking the same question before answering the previous one. At the same time, the answer interface regarding the entire COT is not displayed correctly, only the latter half</tthink>. ## Actual Behavior: As I mentioned earlier, there are 2/3 cases. Here are some screenshots. If you have relevant documents that solve this problem, I may not have seen them. I am very sorry, and I hope you can point them out. ![Image](https://github.com/user-attachments/assets/ce62699c-b2ea-429d-99e5-49824fe9dc05) ![Image](https://github.com/user-attachments/assets/66e1db14-49e0-4f0f-9530-0cc77a6fb03d) ![Image](https://github.com/user-attachments/assets/676887ed-f19f-40a0-9d6f-3dcbfde972ce) ## Description **Bug Summary:** 1. Deepseek's COT results are incomplete 2. Sometimes there are situations where answers do not provide results 3. Sending the same sentence multiple times only gives the result of the user's earlier question ## Reproduction Details 1、host PC ubuntu20.04 2、download ollama (docker)0.5.8-rc11 than, in the containter install miniforge->create python=3.12->pip install open-webui 3、get deepseek 32b or 70b model(gguf),from hugging or modelscope! 4、ollama run and run open-webui ## Logs and Screenshots **Browser Console Logs:** None **Docker Container Logs:** time=2025-02-11T15:21:18.146+08:00 level=INFO source=server.go:597 msg="llama runner started in 5.77 seconds" llama_model_loader: loaded meta data with 39 key-value pairs and 724 tensors from /root/.ollama/models/blobs/sha256-61ed3f9e10925faa36f64996b41bfc0ee9da5f5002b63459d3f918e155fbed88 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = DeepSeek R1 Distill Llama 70B llama_model_loader: - kv 3: general.organization str = Deepseek Ai llama_model_loader: - kv 4: general.basename str = DeepSeek-R1-Distill-Llama llama_model_loader: - kv 5: general.size_label str = 70B llama_model_loader: - kv 6: general.license str = llama3.3 llama_model_loader: - kv 7: general.base_model.count u32 = 1 llama_model_loader: - kv 8: general.base_model.0.name str = DeepSeek R1 Distill Llama 70B llama_model_loader: - kv 9: general.base_model.0.organization str = Deepseek Ai llama_model_loader: - kv 10: general.base_model.0.repo_url str = https://huggingface.co/deepseek-ai/De... llama_model_loader: - kv 11: general.tags arr[str,6] = ["deepseek", "unsloth", "transformers... llama_model_loader: - kv 12: general.languages arr[str,1] = ["en"] llama_model_loader: - kv 13: llama.block_count u32 = 80 llama_model_loader: - kv 14: llama.context_length u32 = 131072 llama_model_loader: - kv 15: llama.embedding_length u32 = 8192 llama_model_loader: - kv 16: llama.feed_forward_length u32 = 28672 llama_model_loader: - kv 17: llama.attention.head_count u32 = 64 llama_model_loader: - kv 18: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 19: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 20: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 21: llama.attention.key_length u32 = 128 llama_model_loader: - kv 22: llama.attention.value_length u32 = 128 llama_model_loader: - kv 23: llama.vocab_size u32 = 128256 llama_model_loader: - kv 24: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 25: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 26: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 27: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 28: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 29: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 31: tokenizer.ggml.eos_token_id u32 = 128001 llama_model_loader: - kv 32: tokenizer.ggml.padding_token_id u32 = 128004 llama_model_loader: - kv 33: tokenizer.ggml.add_bos_token bool = true llama_model_loader: - kv 34: tokenizer.ggml.add_eos_token bool = false llama_model_loader: - kv 35: tokenizer.chat_template str = {% if not add_generation_prompt is de... llama_model_loader: - kv 36: tokenizer.ggml.add_space_prefix bool = false llama_model_loader: - kv 37: general.quantization_version u32 = 2 llama_model_loader: - kv 38: general.file_type u32 = 12 llama_model_loader: - type f32: 162 tensors llama_model_loader: - type q3_K: 321 tensors llama_model_loader: - type q4_K: 155 tensors llama_model_loader: - type q5_K: 85 tensors llama_model_loader: - type q6_K: 1 tensors llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect llm_load_vocab: special tokens cache size = 256 llm_load_vocab: token to piece cache size = 0.7999 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: vocab_only = 1 llm_load_print_meta: model type = ?B llm_load_print_meta: model ftype = all F32 llm_load_print_meta: model params = 70.55 B llm_load_print_meta: model size = 31.91 GiB (3.88 BPW) llm_load_print_meta: general.name = DeepSeek R1 Distill Llama 70B llm_load_print_meta: BOS token = 128000 '<｜begin▁of▁sentence｜>' llm_load_print_meta: EOS token = 128001 '<｜end▁of▁sentence｜>' llm_load_print_meta: EOT token = 128009 '<|eot_id|>' llm_load_print_meta: EOM token = 128008 '<|eom_id|>' llm_load_print_meta: PAD token = 128004 '<|finetune_right_pad_id|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_print_meta: EOG token = 128001 '<｜end▁of▁sentence｜>' llm_load_print_meta: EOG token = 128008 '<|eom_id|>' llm_load_print_meta: EOG token = 128009 '<|eot_id|>' llm_load_print_meta: max token length = 256 llama_model_load: vocab only - skipping tensors [GIN] 2025/02/11 - 15:21:20 | 200 | 8.592600692s | 127.0.0.1 | POST "/api/chat" **Screenshots/Screen Recordings (if applicable):** [Attach any relevant screenshots to help illustrate the issue] ## Additional Information None ## Note If the bug report is incomplete or does not follow the provided instructions, it may not be addressed. Please ensure that you have followed the steps outlined in the README.md and troubleshooting.md documents, and provide all necessary information for us to reproduce and address the issue. Thank you!

GiteaMirror closed this issue

2025-11-11 15:39:11 -06:00

GiteaMirror commented

2025-11-11 15:39:11 -06:00

@michaelmarziani commented on GitHub (Feb 11, 2025):

For the first 2 screenshots, I was not able to reproduce the issue. I asked "What is COT?", then I tried to submit the question again while it's answering, and the interface would not resubmit the question. Are you using the regular interface or the "Call mode"? I noticed something like this when testing call mode, but I was also seeing errors on the console when this happened. For me, just typing in the questions I am not able to get a result like your first 2 screenshots.

For the third screenshot, this is not a bug. For some questions with Deepseek, the answers are "hardcoded", so to speak, so there is no chain of thought, and the <think></think> section will be empty or a couple blank lines.

If you could add whatever console output is produced while performing the use case in the first 2 screenshots, I think that would help troubleshooting.

@michaelmarziani commented on GitHub (Feb 11, 2025): For the first 2 screenshots, I was not able to reproduce the issue. I asked "What is COT?", then I tried to submit the question again while it's answering, and the interface would not resubmit the question. Are you using the regular interface or the "Call mode"? I noticed something like this when testing call mode, but I was also seeing errors on the console when this happened. For me, just typing in the questions I am not able to get a result like your first 2 screenshots. For the third screenshot, this is not a bug. For some questions with Deepseek, the answers are "hardcoded", so to speak, so there is no chain of thought, and the `<think></think>` section will be empty or a couple blank lines. If you could add whatever console output is produced while performing the use case in the first 2 screenshots, I think that would help troubleshooting.

GiteaMirror referenced this issue

2026-04-19 20:07:59 -05:00

[GH-ISSUE #3754] Feature: Show if a model is updatable #13373

GiteaMirror referenced this issue

2026-04-25 03:25:28 -05:00

[GH-ISSUE #3754] Feature: Show if a model is updatable #28901

GiteaMirror referenced this issue

2026-05-05 13:15:09 -05:00

[GH-ISSUE #3754] Feature: Show if a model is updatable #52039

Sign in to join this conversation.