deepseek70B 32B's answer is confusion #3754

Closed
opened 2025-11-11 15:39:10 -06:00 by GiteaMirror · 1 comment
Owner

Originally created by @aodexiusi1997 on GitHub (Feb 11, 2025).

Bug Report


Installation Method

pip

Environment

  • **Open WebUI Version: 0.5.10

  • **Ollama (if applicable):0.5.8-rc11

  • **Operating System:Ubuntu 20.04

  • **Browser (if applicable):edge

Confirmation:

  • [yes ] I have read and followed all the instructions provided in the README.md.
  • [] I am on the latest version of both Open WebUI and Ollama.
  • [] I have included the browser console logs.
  • [] I have included the Docker container logs.
  • [yes ] I have provided the exact steps to reproduce the bug in the "Steps to Reproduce" section below.

Expected Behavior:

Thank you, my friends. I have tried DeepSEEK32b and DeepSEEK70b, and both have encountered this issue. The result I hope for is to be able to display the output of the model correctly, rather than sometimes interrupting a single answer without any results, or requiring the model to continue asking the same question before answering the previous one. At the same time, the answer interface regarding the entire COT is not displayed correctly, only the latter half.

Actual Behavior:

As I mentioned earlier, there are 2/3 cases. Here are some screenshots. If you have relevant documents that solve this problem, I may not have seen them. I am very sorry, and I hope you can point them out.

Image

Image

Image

Description

Bug Summary:

  1. Deepseek's COT results are incomplete
  2. Sometimes there are situations where answers do not provide results
  3. Sending the same sentence multiple times only gives the result of the user's earlier question

Reproduction Details

1、host PC ubuntu20.04
2、download ollama (docker)0.5.8-rc11 than, in the containter install miniforge->create python=3.12->pip install open-webui
3、get deepseek 32b or 70b model(gguf),from hugging or modelscope!
4、ollama run and run open-webui

Logs and Screenshots

Browser Console Logs:

None

Docker Container Logs:

time=2025-02-11T15:21:18.146+08:00 level=INFO source=server.go:597 msg="llama runner started in 5.77 seconds"
llama_model_loader: loaded meta data with 39 key-value pairs and 724 tensors from /root/.ollama/models/blobs/sha256-61ed3f9e10925faa36f64996b41bfc0ee9da5f5002b63459d3f918e155fbed88 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = DeepSeek R1 Distill Llama 70B
llama_model_loader: - kv 3: general.organization str = Deepseek Ai
llama_model_loader: - kv 4: general.basename str = DeepSeek-R1-Distill-Llama
llama_model_loader: - kv 5: general.size_label str = 70B
llama_model_loader: - kv 6: general.license str = llama3.3
llama_model_loader: - kv 7: general.base_model.count u32 = 1
llama_model_loader: - kv 8: general.base_model.0.name str = DeepSeek R1 Distill Llama 70B
llama_model_loader: - kv 9: general.base_model.0.organization str = Deepseek Ai
llama_model_loader: - kv 10: general.base_model.0.repo_url str = https://huggingface.co/deepseek-ai/De...
llama_model_loader: - kv 11: general.tags arr[str,6] = ["deepseek", "unsloth", "transformers...
llama_model_loader: - kv 12: general.languages arr[str,1] = ["en"]
llama_model_loader: - kv 13: llama.block_count u32 = 80
llama_model_loader: - kv 14: llama.context_length u32 = 131072
llama_model_loader: - kv 15: llama.embedding_length u32 = 8192
llama_model_loader: - kv 16: llama.feed_forward_length u32 = 28672
llama_model_loader: - kv 17: llama.attention.head_count u32 = 64
llama_model_loader: - kv 18: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 19: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 20: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 21: llama.attention.key_length u32 = 128
llama_model_loader: - kv 22: llama.attention.value_length u32 = 128
llama_model_loader: - kv 23: llama.vocab_size u32 = 128256
llama_model_loader: - kv 24: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 25: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 26: tokenizer.ggml.pre str = llama-bpe
llama_model_loader: - kv 27: tokenizer.ggml.tokens arr[str,128256] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 28: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 29: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 31: tokenizer.ggml.eos_token_id u32 = 128001
llama_model_loader: - kv 32: tokenizer.ggml.padding_token_id u32 = 128004
llama_model_loader: - kv 33: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 34: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 35: tokenizer.chat_template str = {% if not add_generation_prompt is de...
llama_model_loader: - kv 36: tokenizer.ggml.add_space_prefix bool = false
llama_model_loader: - kv 37: general.quantization_version u32 = 2
llama_model_loader: - kv 38: general.file_type u32 = 12
llama_model_loader: - type f32: 162 tensors
llama_model_loader: - type q3_K: 321 tensors
llama_model_loader: - type q4_K: 155 tensors
llama_model_loader: - type q5_K: 85 tensors
llama_model_loader: - type q6_K: 1 tensors
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
llm_load_vocab: special tokens cache size = 256
llm_load_vocab: token to piece cache size = 0.7999 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 128256
llm_load_print_meta: n_merges = 280147
llm_load_print_meta: vocab_only = 1
llm_load_print_meta: model type = ?B
llm_load_print_meta: model ftype = all F32
llm_load_print_meta: model params = 70.55 B
llm_load_print_meta: model size = 31.91 GiB (3.88 BPW)
llm_load_print_meta: general.name = DeepSeek R1 Distill Llama 70B
llm_load_print_meta: BOS token = 128000 '<|begin▁of▁sentence|>'
llm_load_print_meta: EOS token = 128001 '<|end▁of▁sentence|>'
llm_load_print_meta: EOT token = 128009 '<|eot_id|>'
llm_load_print_meta: EOM token = 128008 '<|eom_id|>'
llm_load_print_meta: PAD token = 128004 '<|finetune_right_pad_id|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_print_meta: EOG token = 128001 '<|end▁of▁sentence|>'
llm_load_print_meta: EOG token = 128008 '<|eom_id|>'
llm_load_print_meta: EOG token = 128009 '<|eot_id|>'
llm_load_print_meta: max token length = 256
llama_model_load: vocab only - skipping tensors
[GIN] 2025/02/11 - 15:21:20 | 200 | 8.592600692s | 127.0.0.1 | POST "/api/chat"

Screenshots/Screen Recordings (if applicable):
[Attach any relevant screenshots to help illustrate the issue]

Additional Information

None

Note

If the bug report is incomplete or does not follow the provided instructions, it may not be addressed. Please ensure that you have followed the steps outlined in the README.md and troubleshooting.md documents, and provide all necessary information for us to reproduce and address the issue. Thank you!

Originally created by @aodexiusi1997 on GitHub (Feb 11, 2025). # Bug Report --- ## Installation Method pip ## Environment - **Open WebUI Version: 0.5.10 - **Ollama (if applicable):[0.5.8-rc11](https://docker.aityp.com/image/docker.io/ollama/ollama:0.5.8-rc11) - **Operating System:Ubuntu 20.04 - **Browser (if applicable):edge **Confirmation:** - [yes ] I have read and followed all the instructions provided in the README.md. - [❌] I am on the latest version of both Open WebUI and Ollama. - [❌] I have included the browser console logs. - [❌] I have included the Docker container logs. - [yes ] I have provided the exact steps to reproduce the bug in the "Steps to Reproduce" section below. ## Expected Behavior: Thank you, my friends. I have tried DeepSEEK32b and DeepSEEK70b, and both have encountered this issue. The result I hope for is to be able to display the output of the model correctly, rather than sometimes interrupting a single answer without any results, or requiring the model to continue asking the same question before answering the previous one. At the same time, the answer interface regarding the entire COT is not displayed correctly, only the latter half</tthink>. ## Actual Behavior: As I mentioned earlier, there are 2/3 cases. Here are some screenshots. If you have relevant documents that solve this problem, I may not have seen them. I am very sorry, and I hope you can point them out. ![Image](https://github.com/user-attachments/assets/ce62699c-b2ea-429d-99e5-49824fe9dc05) ![Image](https://github.com/user-attachments/assets/66e1db14-49e0-4f0f-9530-0cc77a6fb03d) ![Image](https://github.com/user-attachments/assets/676887ed-f19f-40a0-9d6f-3dcbfde972ce) ## Description **Bug Summary:** 1. Deepseek's COT results are incomplete 2. Sometimes there are situations where answers do not provide results 3. Sending the same sentence multiple times only gives the result of the user's earlier question ## Reproduction Details 1、host PC ubuntu20.04 2、download ollama (docker)0.5.8-rc11 than, in the containter install miniforge->create python=3.12->pip install open-webui 3、get deepseek 32b or 70b model(gguf),from hugging or modelscope! 4、ollama run and run open-webui ## Logs and Screenshots **Browser Console Logs:** None **Docker Container Logs:** time=2025-02-11T15:21:18.146+08:00 level=INFO source=server.go:597 msg="llama runner started in 5.77 seconds" llama_model_loader: loaded meta data with 39 key-value pairs and 724 tensors from /root/.ollama/models/blobs/sha256-61ed3f9e10925faa36f64996b41bfc0ee9da5f5002b63459d3f918e155fbed88 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = DeepSeek R1 Distill Llama 70B llama_model_loader: - kv 3: general.organization str = Deepseek Ai llama_model_loader: - kv 4: general.basename str = DeepSeek-R1-Distill-Llama llama_model_loader: - kv 5: general.size_label str = 70B llama_model_loader: - kv 6: general.license str = llama3.3 llama_model_loader: - kv 7: general.base_model.count u32 = 1 llama_model_loader: - kv 8: general.base_model.0.name str = DeepSeek R1 Distill Llama 70B llama_model_loader: - kv 9: general.base_model.0.organization str = Deepseek Ai llama_model_loader: - kv 10: general.base_model.0.repo_url str = https://huggingface.co/deepseek-ai/De... llama_model_loader: - kv 11: general.tags arr[str,6] = ["deepseek", "unsloth", "transformers... llama_model_loader: - kv 12: general.languages arr[str,1] = ["en"] llama_model_loader: - kv 13: llama.block_count u32 = 80 llama_model_loader: - kv 14: llama.context_length u32 = 131072 llama_model_loader: - kv 15: llama.embedding_length u32 = 8192 llama_model_loader: - kv 16: llama.feed_forward_length u32 = 28672 llama_model_loader: - kv 17: llama.attention.head_count u32 = 64 llama_model_loader: - kv 18: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 19: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 20: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 21: llama.attention.key_length u32 = 128 llama_model_loader: - kv 22: llama.attention.value_length u32 = 128 llama_model_loader: - kv 23: llama.vocab_size u32 = 128256 llama_model_loader: - kv 24: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 25: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 26: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 27: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 28: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 29: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 31: tokenizer.ggml.eos_token_id u32 = 128001 llama_model_loader: - kv 32: tokenizer.ggml.padding_token_id u32 = 128004 llama_model_loader: - kv 33: tokenizer.ggml.add_bos_token bool = true llama_model_loader: - kv 34: tokenizer.ggml.add_eos_token bool = false llama_model_loader: - kv 35: tokenizer.chat_template str = {% if not add_generation_prompt is de... llama_model_loader: - kv 36: tokenizer.ggml.add_space_prefix bool = false llama_model_loader: - kv 37: general.quantization_version u32 = 2 llama_model_loader: - kv 38: general.file_type u32 = 12 llama_model_loader: - type f32: 162 tensors llama_model_loader: - type q3_K: 321 tensors llama_model_loader: - type q4_K: 155 tensors llama_model_loader: - type q5_K: 85 tensors llama_model_loader: - type q6_K: 1 tensors llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect llm_load_vocab: special tokens cache size = 256 llm_load_vocab: token to piece cache size = 0.7999 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: vocab_only = 1 llm_load_print_meta: model type = ?B llm_load_print_meta: model ftype = all F32 llm_load_print_meta: model params = 70.55 B llm_load_print_meta: model size = 31.91 GiB (3.88 BPW) llm_load_print_meta: general.name = DeepSeek R1 Distill Llama 70B llm_load_print_meta: BOS token = 128000 '<|begin▁of▁sentence|>' llm_load_print_meta: EOS token = 128001 '<|end▁of▁sentence|>' llm_load_print_meta: EOT token = 128009 '<|eot_id|>' llm_load_print_meta: EOM token = 128008 '<|eom_id|>' llm_load_print_meta: PAD token = 128004 '<|finetune_right_pad_id|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_print_meta: EOG token = 128001 '<|end▁of▁sentence|>' llm_load_print_meta: EOG token = 128008 '<|eom_id|>' llm_load_print_meta: EOG token = 128009 '<|eot_id|>' llm_load_print_meta: max token length = 256 llama_model_load: vocab only - skipping tensors [GIN] 2025/02/11 - 15:21:20 | 200 | 8.592600692s | 127.0.0.1 | POST "/api/chat" **Screenshots/Screen Recordings (if applicable):** [Attach any relevant screenshots to help illustrate the issue] ## Additional Information None ## Note If the bug report is incomplete or does not follow the provided instructions, it may not be addressed. Please ensure that you have followed the steps outlined in the README.md and troubleshooting.md documents, and provide all necessary information for us to reproduce and address the issue. Thank you!
Author
Owner

@michaelmarziani commented on GitHub (Feb 11, 2025):

For the first 2 screenshots, I was not able to reproduce the issue. I asked "What is COT?", then I tried to submit the question again while it's answering, and the interface would not resubmit the question. Are you using the regular interface or the "Call mode"? I noticed something like this when testing call mode, but I was also seeing errors on the console when this happened. For me, just typing in the questions I am not able to get a result like your first 2 screenshots.

For the third screenshot, this is not a bug. For some questions with Deepseek, the answers are "hardcoded", so to speak, so there is no chain of thought, and the <think></think> section will be empty or a couple blank lines.

If you could add whatever console output is produced while performing the use case in the first 2 screenshots, I think that would help troubleshooting.

@michaelmarziani commented on GitHub (Feb 11, 2025): For the first 2 screenshots, I was not able to reproduce the issue. I asked "What is COT?", then I tried to submit the question again while it's answering, and the interface would not resubmit the question. Are you using the regular interface or the "Call mode"? I noticed something like this when testing call mode, but I was also seeing errors on the console when this happened. For me, just typing in the questions I am not able to get a result like your first 2 screenshots. For the third screenshot, this is not a bug. For some questions with Deepseek, the answers are "hardcoded", so to speak, so there is no chain of thought, and the `<think></think>` section will be empty or a couple blank lines. If you could add whatever console output is produced while performing the use case in the first 2 screenshots, I think that would help troubleshooting.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#3754