mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 19:38:46 -05:00
deepseek70B 32B's answer is confusion #3754
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @aodexiusi1997 on GitHub (Feb 11, 2025).
Bug Report
Installation Method
pip
Environment
**Open WebUI Version: 0.5.10
**Ollama (if applicable):0.5.8-rc11
**Operating System:Ubuntu 20.04
**Browser (if applicable):edge
Confirmation:
Expected Behavior:
Thank you, my friends. I have tried DeepSEEK32b and DeepSEEK70b, and both have encountered this issue. The result I hope for is to be able to display the output of the model correctly, rather than sometimes interrupting a single answer without any results, or requiring the model to continue asking the same question before answering the previous one. At the same time, the answer interface regarding the entire COT is not displayed correctly, only the latter half.
Actual Behavior:
As I mentioned earlier, there are 2/3 cases. Here are some screenshots. If you have relevant documents that solve this problem, I may not have seen them. I am very sorry, and I hope you can point them out.
Description
Bug Summary:
Reproduction Details
1、host PC ubuntu20.04
2、download ollama (docker)0.5.8-rc11 than, in the containter install miniforge->create python=3.12->pip install open-webui
3、get deepseek 32b or 70b model(gguf),from hugging or modelscope!
4、ollama run and run open-webui
Logs and Screenshots
Browser Console Logs:
None
Docker Container Logs:
time=2025-02-11T15:21:18.146+08:00 level=INFO source=server.go:597 msg="llama runner started in 5.77 seconds"
llama_model_loader: loaded meta data with 39 key-value pairs and 724 tensors from /root/.ollama/models/blobs/sha256-61ed3f9e10925faa36f64996b41bfc0ee9da5f5002b63459d3f918e155fbed88 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = DeepSeek R1 Distill Llama 70B
llama_model_loader: - kv 3: general.organization str = Deepseek Ai
llama_model_loader: - kv 4: general.basename str = DeepSeek-R1-Distill-Llama
llama_model_loader: - kv 5: general.size_label str = 70B
llama_model_loader: - kv 6: general.license str = llama3.3
llama_model_loader: - kv 7: general.base_model.count u32 = 1
llama_model_loader: - kv 8: general.base_model.0.name str = DeepSeek R1 Distill Llama 70B
llama_model_loader: - kv 9: general.base_model.0.organization str = Deepseek Ai
llama_model_loader: - kv 10: general.base_model.0.repo_url str = https://huggingface.co/deepseek-ai/De...
llama_model_loader: - kv 11: general.tags arr[str,6] = ["deepseek", "unsloth", "transformers...
llama_model_loader: - kv 12: general.languages arr[str,1] = ["en"]
llama_model_loader: - kv 13: llama.block_count u32 = 80
llama_model_loader: - kv 14: llama.context_length u32 = 131072
llama_model_loader: - kv 15: llama.embedding_length u32 = 8192
llama_model_loader: - kv 16: llama.feed_forward_length u32 = 28672
llama_model_loader: - kv 17: llama.attention.head_count u32 = 64
llama_model_loader: - kv 18: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 19: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 20: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 21: llama.attention.key_length u32 = 128
llama_model_loader: - kv 22: llama.attention.value_length u32 = 128
llama_model_loader: - kv 23: llama.vocab_size u32 = 128256
llama_model_loader: - kv 24: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 25: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 26: tokenizer.ggml.pre str = llama-bpe
llama_model_loader: - kv 27: tokenizer.ggml.tokens arr[str,128256] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 28: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 29: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 31: tokenizer.ggml.eos_token_id u32 = 128001
llama_model_loader: - kv 32: tokenizer.ggml.padding_token_id u32 = 128004
llama_model_loader: - kv 33: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 34: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 35: tokenizer.chat_template str = {% if not add_generation_prompt is de...
llama_model_loader: - kv 36: tokenizer.ggml.add_space_prefix bool = false
llama_model_loader: - kv 37: general.quantization_version u32 = 2
llama_model_loader: - kv 38: general.file_type u32 = 12
llama_model_loader: - type f32: 162 tensors
llama_model_loader: - type q3_K: 321 tensors
llama_model_loader: - type q4_K: 155 tensors
llama_model_loader: - type q5_K: 85 tensors
llama_model_loader: - type q6_K: 1 tensors
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
llm_load_vocab: special tokens cache size = 256
llm_load_vocab: token to piece cache size = 0.7999 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 128256
llm_load_print_meta: n_merges = 280147
llm_load_print_meta: vocab_only = 1
llm_load_print_meta: model type = ?B
llm_load_print_meta: model ftype = all F32
llm_load_print_meta: model params = 70.55 B
llm_load_print_meta: model size = 31.91 GiB (3.88 BPW)
llm_load_print_meta: general.name = DeepSeek R1 Distill Llama 70B
llm_load_print_meta: BOS token = 128000 '<|begin▁of▁sentence|>'
llm_load_print_meta: EOS token = 128001 '<|end▁of▁sentence|>'
llm_load_print_meta: EOT token = 128009 '<|eot_id|>'
llm_load_print_meta: EOM token = 128008 '<|eom_id|>'
llm_load_print_meta: PAD token = 128004 '<|finetune_right_pad_id|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_print_meta: EOG token = 128001 '<|end▁of▁sentence|>'
llm_load_print_meta: EOG token = 128008 '<|eom_id|>'
llm_load_print_meta: EOG token = 128009 '<|eot_id|>'
llm_load_print_meta: max token length = 256
llama_model_load: vocab only - skipping tensors
[GIN] 2025/02/11 - 15:21:20 | 200 | 8.592600692s | 127.0.0.1 | POST "/api/chat"
Screenshots/Screen Recordings (if applicable):
[Attach any relevant screenshots to help illustrate the issue]
Additional Information
None
Note
If the bug report is incomplete or does not follow the provided instructions, it may not be addressed. Please ensure that you have followed the steps outlined in the README.md and troubleshooting.md documents, and provide all necessary information for us to reproduce and address the issue. Thank you!
@michaelmarziani commented on GitHub (Feb 11, 2025):
For the first 2 screenshots, I was not able to reproduce the issue. I asked "What is COT?", then I tried to submit the question again while it's answering, and the interface would not resubmit the question. Are you using the regular interface or the "Call mode"? I noticed something like this when testing call mode, but I was also seeing errors on the console when this happened. For me, just typing in the questions I am not able to get a result like your first 2 screenshots.
For the third screenshot, this is not a bug. For some questions with Deepseek, the answers are "hardcoded", so to speak, so there is no chain of thought, and the
<think></think>section will be empty or a couple blank lines.If you could add whatever console output is produced while performing the use case in the first 2 screenshots, I think that would help troubleshooting.