[GH-ISSUE #8967] when using deepseek-r1:1.5b cannot get token usage #67875

Open
opened 2026-05-04 11:57:07 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @fourfireM on GitHub (Feb 9, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8967

What is the issue?

When I use deepseek r1 1.5b, the usage returned is 0.

I used this method to call the llm
chat_complete_res = [self.infer(input_list=[payload]) for i in range(n)]

the chat_complete_res is like:

[[{'id': 'chatcmpl-572', 'choices': [{'finish_reason': None, 'index': 0, 'logprobs': None, 'message': {'content': '{"thoughts":["Because 196 is a perfect square, it has an odd number of divisors."]\n\n}\n\n\n\n\n\n  \n  \n\n  \n  \n  \n  \n \n \n  \n \n \n   \n \n \n  \n \n \n  \n \n \n  \n \n', 'refusal': None, 'role': 'assistant', 'audio': None, 'function_call': None, 'tool_calls': None}}], 'created': 1739090196, 'model': 'deepseek-r1:1.5b', 'object': 'chat.completion', 'service_tier': None, 'system_fingerprint': 'fp_ollama', 'usage': {'completion_tokens': 0, 'prompt_tokens': 0, 'total_tokens': 0, 'completion_tokens_details': None, 'prompt_tokens_details': None}}]]

I don't know what went wrong that resulted in no record of token usage.

Relevant log output


OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @fourfireM on GitHub (Feb 9, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8967 ### What is the issue? When I use deepseek r1 1.5b, the usage returned is 0. I used this method to call the llm ```chat_complete_res = [self.infer(input_list=[payload]) for i in range(n)]``` the `chat_complete_res` is like: ``` [[{'id': 'chatcmpl-572', 'choices': [{'finish_reason': None, 'index': 0, 'logprobs': None, 'message': {'content': '{"thoughts":["Because 196 is a perfect square, it has an odd number of divisors."]\n\n}\n\n\n\n\n\n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n', 'refusal': None, 'role': 'assistant', 'audio': None, 'function_call': None, 'tool_calls': None}}], 'created': 1739090196, 'model': 'deepseek-r1:1.5b', 'object': 'chat.completion', 'service_tier': None, 'system_fingerprint': 'fp_ollama', 'usage': {'completion_tokens': 0, 'prompt_tokens': 0, 'total_tokens': 0, 'completion_tokens_details': None, 'prompt_tokens_details': None}}]] ``` I don't know what went wrong that resulted in no record of token usage. ### Relevant log output ```shell ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the bug label 2026-05-04 11:57:07 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 9, 2025):

The sequence \n\n\n\n\n\n \n \n\n \n \n indicates that the model lost coherence and started to ramble, possibly triggering the "token repeat limit reached" code. Because it's an error, the token usage doesn't get filled in. It looks like you are generating JSON output, have you told the model in the prompt?

<!-- gh-comment-id:2646146366 --> @rick-github commented on GitHub (Feb 9, 2025): The sequence `\n\n\n\n\n\n \n \n\n \n \n ` indicates that the model lost coherence and started to ramble, possibly triggering the ["token repeat limit reached"](https://github.com/ollama/ollama/blob/1f766c36fb61f7b1969664645bf38dae93f568a2/llm/server.go#L818) code. Because it's an error, the token usage doesn't get filled in. It looks like you are generating JSON output, have you [told the model](https://github.com/ollama/ollama/blob/main/docs/api.md#json-mode:~:text=generate%20large%20amounts%20whitespace) in the prompt?
Author
Owner

@fourfireM commented on GitHub (Feb 9, 2025):

Yes, I expect to get a json formatted response. I prompted the model to do so in the prompt, and set "response_format" : json_object when calling.

In fact, I found that the model is indeed repeated all the time, even if it is not repeated '\n\n' sometimes, the 'json' returned will be repeated all the time

Does this mean that the calls that do not get token usage are due to model duplication, and I can consider this call as a failure?

So how can I solve this duplication, or do you have any better suggestions?

<!-- gh-comment-id:2646153398 --> @fourfireM commented on GitHub (Feb 9, 2025): Yes, I expect to get a json formatted response. I prompted the model to do so in the prompt, and set `"response_format" : json_object` when calling. In fact, I found that the model is indeed repeated all the time, even if it is not repeated '\n\n' sometimes, the 'json' returned will be repeated all the time Does this mean that the calls that do not get token usage are due to model duplication, and I can consider this call as a failure? So how can I solve this duplication, or do you have any better suggestions?
Author
Owner

@rick-github commented on GitHub (Feb 9, 2025):

Does this mean that the calls that do not get token usage are due to model duplication, and I can consider this call as a failure?

The model failed to generate an EOS sequence, but if the returned JSON is complete, then it's a judgement call. If I fail to end this sentence with punctuation, is this answer a failure

So how can I solve this duplication, or do you have any better suggestions?

This is likely a function of the model. You can try modifying the prompt to get better output or playing with parameters (eg repeat_penalty, but it might be easier to just use a different model.

<!-- gh-comment-id:2646157143 --> @rick-github commented on GitHub (Feb 9, 2025): > Does this mean that the calls that do not get token usage are due to model duplication, and I can consider this call as a failure? The model failed to generate an EOS sequence, but if the returned JSON is complete, then it's a judgement call. If I fail to end this sentence with punctuation, is this answer a failure > So how can I solve this duplication, or do you have any better suggestions? This is likely a function of the model. You can try modifying the prompt to get better output or playing with parameters (eg [`repeat_penalty`](https://github.com/ollama/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values:~:text=repeat_last_n%2064-,repeat_penalty,-Sets%20how%20strongly), but it might be easier to just use a different model.
Author
Owner

@teneous commented on GitHub (Feb 10, 2025):

I have the same issue, I deployed deepseek-r1-14b-q8:latest by ollama.
I send a request to ollama API, the result content returns \n\n\n\n\n.
It still not work when I try to increase max_token to 4096

This is my request :

{
	"model": "deepseek-r1-14b-q8:latest",
	"temperature": 0.05,
	"frequency_penalty": 0.6,
	"presence_penalty": 0.7,
	"max_tokens": 4000,
	"response_format": {
		"type": "json_object"
	},
	"messages": [....]
}

Here is my result.

Image
<!-- gh-comment-id:2646811320 --> @teneous commented on GitHub (Feb 10, 2025): I have the same issue, I deployed `deepseek-r1-14b-q8:latest` by ollama. I send a request to ollama API, the result content returns \n\n\n\n\n. It still not work when I try to increase `max_token` to `4096` This is my request : ```json { "model": "deepseek-r1-14b-q8:latest", "temperature": 0.05, "frequency_penalty": 0.6, "presence_penalty": 0.7, "max_tokens": 4000, "response_format": { "type": "json_object" }, "messages": [....] } ``` Here is my result. <img width="490" alt="Image" src="https://github.com/user-attachments/assets/75cd0c98-2222-439c-92ab-0a860cc42c1e" />
Author
Owner

@rick-github commented on GitHub (Feb 10, 2025):

What's in messages?

<!-- gh-comment-id:2646818408 --> @rick-github commented on GitHub (Feb 10, 2025): What's in `messages`?
Author
Owner

@teneous commented on GitHub (Feb 10, 2025):

What's in messages?

I hid the messages because the data is sensitive. The structure is like

{
  "messages": [
    {
      "role": "system",
      "content": "\n    # 指令\n    ## 角色\n    你是一个NER....,你必须以JSON格式返回..."
    },
    {
      "role": "user",
      "content": "以下是我的段落内容:...."
    }
  ]
}
<!-- gh-comment-id:2646828186 --> @teneous commented on GitHub (Feb 10, 2025): > What's in `messages`? I hid the messages because the data is sensitive. The structure is like ```json { "messages": [ { "role": "system", "content": "\n # 指令\n ## 角色\n 你是一个NER....,你必须以JSON格式返回..." }, { "role": "user", "content": "以下是我的段落内容:...." } ] } ```
Author
Owner

@teneous commented on GitHub (Feb 10, 2025):

I sent the same request to DeepSeek-R1, and it worked well. I’m not sure whether the issue is due to the 14B-Q8 quantized model’s lack of support for JSON schema or something related to Ollama. I’m planning to switch to the 7B half-precision model to test it.

deepseek-r1:
Image

7B half-precision:
Image

<!-- gh-comment-id:2646841718 --> @teneous commented on GitHub (Feb 10, 2025): I sent the same request to `DeepSeek-R1`, and it worked well. I’m not sure whether the issue is due to the 14B-Q8 quantized model’s lack of support for JSON schema or something related to Ollama. I’m planning to switch to the 7B half-precision model to test it. deepseek-r1: <img width="451" alt="Image" src="https://github.com/user-attachments/assets/b5c0893d-00f6-4a1d-968c-a334a05dc2c2" /> 7B half-precision: <img width="1115" alt="Image" src="https://github.com/user-attachments/assets/876b09e3-e06e-4fd6-87de-5aa070303453" />
Author
Owner

@manykarim commented on GitHub (Mar 13, 2025):

It's the

"response_format": {
	"type": "json_object"
}

which is causing the problem with deepseek on ollama.
Try to send your request without it.

<!-- gh-comment-id:2722007800 --> @manykarim commented on GitHub (Mar 13, 2025): It's the ``` "response_format": { "type": "json_object" } ``` which is causing the problem with deepseek on ollama. Try to send your request without it.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#67875