[GH-ISSUE #9698] Huge server.log when using gemma3 and OLLAMA_DEBUG=1 #32092

Closed
opened 2026-04-22 13:01:06 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @TheMasterFX on GitHub (Mar 12, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9698

What is the issue?

When using gemma3 with OLLAMA_DEBUG=1 the debug-log in server.log is huge (like 4 MB after some chatting with some images). Also I do not see the generated request including system-prompt like on other models.
When using e.g. llama3.2 everthing is fine.
I reduced the log output just to show what I see

Relevant log output

time=2025-03-12T18:33:39.415+01:00 level=DEBUG source=ggml.go:220 msg="created tensor" name=v.blk.18.mlp.fc2.bias shape=[1152] dtype=0 buffer_type=CUDA0
time=2025-03-12T18:33:39.415+01:00 level=DEBUG source=ggml.go:220 msg="created tensor" name=v.blk.18.mlp.fc2.weight shape="[4304 1152]" dtype=1 buffer_type=CUDA0
time=2025-03-12T18:33:39.415+01:00 level=DEBUG source=ggml.go:220 msg="created tensor" name=v.blk.19.attn_k.bias shape=[1152] dtype=0 buffer_type=CUDA0
time=2025-03-12T18:33:39.415+01:00 level=DEBUG source=ggml.go:220 msg="created tensor" name=v.blk.19.attn_k.weight shape="[1152 1152]" dtype=1 buffer_type=CUDA0
time=2025-03-12T18:33:39.415+01:00 level=DEBUG source=ggml.go:220 msg="created tensor" name=v.blk.19.attn_output.bias shape=[1152] dtype=0 buffer_type=CUDA0
time=2025-03-12T18:33:39.415+01:00 level=DEBUG source=ggml.go:220 msg="created tensor" name=v.blk.19.attn_output.weight shape="[1152 1152]" dtype=1 buffer_type=CUDA0
time=2025-03-12T18:33:39.415+01:00 level=DEBUG source=ggml.go:220 msg="created tensor" name=v.blk.19.attn_q.bias shape=[1152] dtype=0 buffer_type=CUDA0
time=2025-03-12T18:33:39.415+01:00 level=DEBUG source=ggml.go:220 msg="created tensor" name=v.blk.19.attn_q.weight shape="[1152 1152]" dtype=1 buffer_type=CUDA0
time=2025-03-12T18:33:39.415+01:00 level=DEBUG source=ggml.go:220 msg="created tensor" name=v.blk.19.attn_v.bias shape=[1152] dtype=0 buffer_type=CUDA0
...
time=2025-03-12T18:33:41.445+01:00 level=DEBUG source=process_text_spm.go:167 msg=pair left="{p:-1 n:5 runes:[9601 100 97 110 107]}" right="{p:0 n:5 runes:[]}"
time=2025-03-12T18:33:41.445+01:00 level=DEBUG source=process_text_spm.go:167 msg=pair left="{p:2 n:4 runes:[]}" right="{p:2 n:5 runes:[]}"
time=2025-03-12T18:33:41.445+01:00 level=DEBUG source=process_text_spm.go:192 msg=merges merges="[{p:-1 n:5 runes:[9601 100 97 110 107]} {p:0 n:2 runes:[]} {p:0 n:5 runes:[]} {p:2 n:4 runes:[]} {p:2 n:5 runes:[]} {p:0 n:6 runes:[101]}]"
time=2025-03-12T18:33:41.445+01:00 level=DEBUG source=process_text_spm.go:132 msg=tokenizer merges="[{p:-1 n:1 runes:[9601]} {p:0 n:2 runes:[109]} {p:1 n:3 runes:[97]} {p:2 n:4 runes:[99]} {p:3 n:5 runes:[104]} {p:4 n:6 runes:[115]} {p:5 n:7 runes:[116]}]"
time=2025-03-12T18:33:41.445+01:00 level=DEBUG source=process_text_spm.go:159 msg=candidate candidate="&{a:0 b:1 score:-26}"
time=2025-03-12T18:33:41.445+01:00 level=DEBUG source=process_text_spm.go:159 msg=candidate candidate="&{a:5 b:6 score:-46}"
time=2025-03-12T18:33:41.445+01:00 level=DEBUG source=process_text_spm.go:159 msg=candidate candidate="&{a:3 b:4 score:-80}"
time=2025-03-12T18:33:41.445+01:00 level=DEBUG source=process_text_spm.go:159 msg=candidate candidate="&{a:2 b:3 score:-58}"
time=2025-03-12T18:33:41.445+01:00 level=DEBUG source=process_text_spm.go:159 msg=candidate candidate="&{a:1 b:2 score:-347}"
time=2025-03-12T18:33:41.445+01:00 level=DEBUG source=process_text_spm.go:159 msg=candidate candidate="&{a:4 b:5 score:-26321}"
time=2025-03-12T18:33:41.445+01:00 level=DEBUG source=process_text_spm.go:167 msg=pair left="{p:-1 n:1 runes:[9601]}" right="{p:0 n:2 runes:[109]}"
time=2025-03-12T18:33:41.445+01:00 level=DEBUG source=process_text_spm.go:167 msg=pair left="{p:4 n:6 runes:[115]}" right="{p:5 n:7 runes:[116]}"
time=2025-03-12T18:33:41.445+01:00 level=DEBUG source=process_text_spm.go:167 msg=pair left="{p:0 n:3 runes:[97]}" right="{p:2 n:4 runes:[99]}"
time=2025-03-12T18:33:41.445+01:00 level=DEBUG source=process_text_spm.go:167 msg=pair left="{p:2 n:4 runes:[]}" right="{p:2 n:5 runes:[104]}"
...
time=2025-03-12T18:33:41.481+01:00 level=DEBUG source=process_text_spm.go:212 msg="adding bos token to prompt" id=2
time=2025-03-12T18:33:41.481+01:00 level=DEBUG source=cache.go:134 msg="loading cache slot" id=0 cache=0 prompt=160 used=0 remaining=160
time=2025-03-12T18:33:41.721+01:00 level=DEBUG source=process_text_spm.go:244 msg=decoded ids=[19058] text=Okay
time=2025-03-12T18:33:41.744+01:00 level=DEBUG source=process_text_spm.go:244 msg=decoded ids=[236764] text=,
time=2025-03-12T18:33:41.761+01:00 level=DEBUG source=process_text_spm.go:244 msg=decoded ids=[1531] text=" let"
time=2025-03-12T18:33:41.776+01:00 level=DEBUG source=process_text_spm.go:244 msg=decoded ids=[236858] text=time=2025-03-12T18:33:41.789+01:00 level=DEBUG source=process_text_spm.go:244 msg=decoded ids=[236751] text=s
time=2025-03-12T18:33:41.803+01:00 level=DEBUG source=process_text_spm.go:244 msg=decoded ids=[78543] text=" brainstorm"
time=2025-03-12T18:33:41.817+01:00 level=DEBUG source=process_text_spm.go:244 msg=decoded ids=[1070] text=" some"
time=2025-03-12T18:33:41.830+01:00 level=DEBUG source=process_text_spm.go:244 msg=decoded ids=[6549] text=" ideas"
time=2025-03-12T18:33:41.844+01:00 level=DEBUG source=process_text_spm.go:244 msg=decoded ids=[573] text=" for"
time=2025-03-12T18:33:41.856+01:00 level=DEBUG source=process_text_spm.go:244 msg=decoded ids=[822] text=" your"
time=2025-03-12T18:33:41.869+01:00 level=DEBUG source=process_text_spm.go:244 msg=decoded ids=[9137] text=" evening"
time=2025-03-12T18:33:41.883+01:00 level=DEBUG source=process_text_spm.go:244 msg=decoded ids=[236888] text=!

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.6.0

Originally created by @TheMasterFX on GitHub (Mar 12, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9698 ### What is the issue? When using gemma3 with OLLAMA_DEBUG=1 the debug-log in server.log is huge (like 4 MB after some chatting with some images). Also I do not see the generated request including system-prompt like on other models. When using e.g. llama3.2 everthing is fine. I reduced the log output just to show what I see ### Relevant log output ```shell time=2025-03-12T18:33:39.415+01:00 level=DEBUG source=ggml.go:220 msg="created tensor" name=v.blk.18.mlp.fc2.bias shape=[1152] dtype=0 buffer_type=CUDA0 time=2025-03-12T18:33:39.415+01:00 level=DEBUG source=ggml.go:220 msg="created tensor" name=v.blk.18.mlp.fc2.weight shape="[4304 1152]" dtype=1 buffer_type=CUDA0 time=2025-03-12T18:33:39.415+01:00 level=DEBUG source=ggml.go:220 msg="created tensor" name=v.blk.19.attn_k.bias shape=[1152] dtype=0 buffer_type=CUDA0 time=2025-03-12T18:33:39.415+01:00 level=DEBUG source=ggml.go:220 msg="created tensor" name=v.blk.19.attn_k.weight shape="[1152 1152]" dtype=1 buffer_type=CUDA0 time=2025-03-12T18:33:39.415+01:00 level=DEBUG source=ggml.go:220 msg="created tensor" name=v.blk.19.attn_output.bias shape=[1152] dtype=0 buffer_type=CUDA0 time=2025-03-12T18:33:39.415+01:00 level=DEBUG source=ggml.go:220 msg="created tensor" name=v.blk.19.attn_output.weight shape="[1152 1152]" dtype=1 buffer_type=CUDA0 time=2025-03-12T18:33:39.415+01:00 level=DEBUG source=ggml.go:220 msg="created tensor" name=v.blk.19.attn_q.bias shape=[1152] dtype=0 buffer_type=CUDA0 time=2025-03-12T18:33:39.415+01:00 level=DEBUG source=ggml.go:220 msg="created tensor" name=v.blk.19.attn_q.weight shape="[1152 1152]" dtype=1 buffer_type=CUDA0 time=2025-03-12T18:33:39.415+01:00 level=DEBUG source=ggml.go:220 msg="created tensor" name=v.blk.19.attn_v.bias shape=[1152] dtype=0 buffer_type=CUDA0 ... time=2025-03-12T18:33:41.445+01:00 level=DEBUG source=process_text_spm.go:167 msg=pair left="{p:-1 n:5 runes:[9601 100 97 110 107]}" right="{p:0 n:5 runes:[]}" time=2025-03-12T18:33:41.445+01:00 level=DEBUG source=process_text_spm.go:167 msg=pair left="{p:2 n:4 runes:[]}" right="{p:2 n:5 runes:[]}" time=2025-03-12T18:33:41.445+01:00 level=DEBUG source=process_text_spm.go:192 msg=merges merges="[{p:-1 n:5 runes:[9601 100 97 110 107]} {p:0 n:2 runes:[]} {p:0 n:5 runes:[]} {p:2 n:4 runes:[]} {p:2 n:5 runes:[]} {p:0 n:6 runes:[101]}]" time=2025-03-12T18:33:41.445+01:00 level=DEBUG source=process_text_spm.go:132 msg=tokenizer merges="[{p:-1 n:1 runes:[9601]} {p:0 n:2 runes:[109]} {p:1 n:3 runes:[97]} {p:2 n:4 runes:[99]} {p:3 n:5 runes:[104]} {p:4 n:6 runes:[115]} {p:5 n:7 runes:[116]}]" time=2025-03-12T18:33:41.445+01:00 level=DEBUG source=process_text_spm.go:159 msg=candidate candidate="&{a:0 b:1 score:-26}" time=2025-03-12T18:33:41.445+01:00 level=DEBUG source=process_text_spm.go:159 msg=candidate candidate="&{a:5 b:6 score:-46}" time=2025-03-12T18:33:41.445+01:00 level=DEBUG source=process_text_spm.go:159 msg=candidate candidate="&{a:3 b:4 score:-80}" time=2025-03-12T18:33:41.445+01:00 level=DEBUG source=process_text_spm.go:159 msg=candidate candidate="&{a:2 b:3 score:-58}" time=2025-03-12T18:33:41.445+01:00 level=DEBUG source=process_text_spm.go:159 msg=candidate candidate="&{a:1 b:2 score:-347}" time=2025-03-12T18:33:41.445+01:00 level=DEBUG source=process_text_spm.go:159 msg=candidate candidate="&{a:4 b:5 score:-26321}" time=2025-03-12T18:33:41.445+01:00 level=DEBUG source=process_text_spm.go:167 msg=pair left="{p:-1 n:1 runes:[9601]}" right="{p:0 n:2 runes:[109]}" time=2025-03-12T18:33:41.445+01:00 level=DEBUG source=process_text_spm.go:167 msg=pair left="{p:4 n:6 runes:[115]}" right="{p:5 n:7 runes:[116]}" time=2025-03-12T18:33:41.445+01:00 level=DEBUG source=process_text_spm.go:167 msg=pair left="{p:0 n:3 runes:[97]}" right="{p:2 n:4 runes:[99]}" time=2025-03-12T18:33:41.445+01:00 level=DEBUG source=process_text_spm.go:167 msg=pair left="{p:2 n:4 runes:[]}" right="{p:2 n:5 runes:[104]}" ... time=2025-03-12T18:33:41.481+01:00 level=DEBUG source=process_text_spm.go:212 msg="adding bos token to prompt" id=2 time=2025-03-12T18:33:41.481+01:00 level=DEBUG source=cache.go:134 msg="loading cache slot" id=0 cache=0 prompt=160 used=0 remaining=160 time=2025-03-12T18:33:41.721+01:00 level=DEBUG source=process_text_spm.go:244 msg=decoded ids=[19058] text=Okay time=2025-03-12T18:33:41.744+01:00 level=DEBUG source=process_text_spm.go:244 msg=decoded ids=[236764] text=, time=2025-03-12T18:33:41.761+01:00 level=DEBUG source=process_text_spm.go:244 msg=decoded ids=[1531] text=" let" time=2025-03-12T18:33:41.776+01:00 level=DEBUG source=process_text_spm.go:244 msg=decoded ids=[236858] text=’ time=2025-03-12T18:33:41.789+01:00 level=DEBUG source=process_text_spm.go:244 msg=decoded ids=[236751] text=s time=2025-03-12T18:33:41.803+01:00 level=DEBUG source=process_text_spm.go:244 msg=decoded ids=[78543] text=" brainstorm" time=2025-03-12T18:33:41.817+01:00 level=DEBUG source=process_text_spm.go:244 msg=decoded ids=[1070] text=" some" time=2025-03-12T18:33:41.830+01:00 level=DEBUG source=process_text_spm.go:244 msg=decoded ids=[6549] text=" ideas" time=2025-03-12T18:33:41.844+01:00 level=DEBUG source=process_text_spm.go:244 msg=decoded ids=[573] text=" for" time=2025-03-12T18:33:41.856+01:00 level=DEBUG source=process_text_spm.go:244 msg=decoded ids=[822] text=" your" time=2025-03-12T18:33:41.869+01:00 level=DEBUG source=process_text_spm.go:244 msg=decoded ids=[9137] text=" evening" time=2025-03-12T18:33:41.883+01:00 level=DEBUG source=process_text_spm.go:244 msg=decoded ids=[236888] text=! ``` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.6.0
GiteaMirror added the bug label 2026-04-22 13:01:06 -05:00
Author
Owner

@pdevine commented on GitHub (Mar 12, 2025):

Yes, this is expected as gemma3 is running on the new engine w/ a new tokenizer and we needed a lot of debugging information. We'll make this less verbose in an upcoming release (most of the tokenizer debugging can probably go to a new Trace log level)

<!-- gh-comment-id:2718832343 --> @pdevine commented on GitHub (Mar 12, 2025): Yes, this is expected as `gemma3` is running on the new engine w/ a new tokenizer and we needed a lot of debugging information. We'll make this less verbose in an upcoming release (most of the tokenizer debugging can probably go to a new Trace log level)
Author
Owner

@TheMasterFX commented on GitHub (Mar 12, 2025):

Yes, this is expected as gemma3 is running on the new engine w/ a new tokenizer and we needed a lot of debugging information. We'll make this less verbose in an upcoming release (most of the tokenizer debugging can probably go to a new Trace log level)

That's almost what I thought. Maybe introducing different levels of verbosity makes sense?

<!-- gh-comment-id:2718850527 --> @TheMasterFX commented on GitHub (Mar 12, 2025): > Yes, this is expected as `gemma3` is running on the new engine w/ a new tokenizer and we needed a lot of debugging information. We'll make this less verbose in an upcoming release (most of the tokenizer debugging can probably go to a new Trace log level) That's almost what I thought. Maybe introducing different levels of verbosity makes sense?
Author
Owner

@dhiltgen commented on GitHub (Jul 5, 2025):

Unless you're trying to troubleshoot a problem, you shouldn't set the debug log level. The volume of logging in debug will fluctuate over time as we add/remove logging calls in areas of the system that are new or maturing.

<!-- gh-comment-id:3039827226 --> @dhiltgen commented on GitHub (Jul 5, 2025): Unless you're trying to troubleshoot a problem, you shouldn't set the debug log level. The volume of logging in debug will fluctuate over time as we add/remove logging calls in areas of the system that are new or maturing.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#32092