[GH-ISSUE #4806] codegemma broken on releases after v0.1.39 #49543

Closed
opened 2026-04-28 12:13:09 -05:00 by GiteaMirror · 23 comments
Owner

Originally created by @evertjr on GitHub (Jun 4, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4806

Originally assigned to: @jmorganca on GitHub.

What is the issue?

I use codegemma with continue.dev extension on vscode, it works fine on version 0.1.39. but on the last two releases it doesn't generate completions and behave very strangely in terminal.

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

No response

Originally created by @evertjr on GitHub (Jun 4, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4806 Originally assigned to: @jmorganca on GitHub. ### What is the issue? I use codegemma with continue.dev extension on vscode, it works fine on version 0.1.39. but on the last two releases it doesn't generate completions and behave very strangely in terminal. ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-28 12:13:09 -05:00
Author
Owner

@jmorganca commented on GitHub (Jun 4, 2024):

Sorry this happened. Are you using codegemma:code?

<!-- gh-comment-id:2146662900 --> @jmorganca commented on GitHub (Jun 4, 2024): Sorry this happened. Are you using `codegemma:code`?
Author
Owner

@evertjr commented on GitHub (Jun 4, 2024):

Sorry this happened. Are you using codegemma:code?

I use codegemma:2b-code-v1.1-q8_0 only for tab autocomplete, it's very good but I noticed it stopped giving suggestions.

I tried to revert to old versions of the continue extension but no luck, but as soon as I reverted Ollama to the mentioned version it worked again.

<!-- gh-comment-id:2147470077 --> @evertjr commented on GitHub (Jun 4, 2024): > Sorry this happened. Are you using `codegemma:code`? I use codegemma:2b-code-v1.1-q8_0 only for tab autocomplete, it's very good but I noticed it stopped giving suggestions. I tried to revert to old versions of the continue extension but no luck, but as soon as I reverted Ollama to the mentioned version it worked again.
Author
Owner

@robwilkes commented on GitHub (Jun 5, 2024):

For me it seems to give no outputs.

$ ollama run codegemma:7b-code-q6_K
>>> <|fim_prefix|>import datetime
... def calculate_age(birth_year):
...     """Calculates a person's age based on their birth year."""
...     current_year = datetime.date.today().year
...     <|fim_suffix|>
...     return age<|fim_middle|>



>>> 
... <|fim_prefix|>import datetime
... def calculate_age(birth_year):
...     """Calculates a person's age based on their birth year."""
...     current_year = datetime.date.today().year
...     <|fim_suffix|>
...     return age<|fim_middle|>
... 


>>> Send a message (/? for help)

The instruct models work, but the code models seem to generate no output

<!-- gh-comment-id:2148722154 --> @robwilkes commented on GitHub (Jun 5, 2024): For me it seems to give no outputs. ``` $ ollama run codegemma:7b-code-q6_K >>> <|fim_prefix|>import datetime ... def calculate_age(birth_year): ... """Calculates a person's age based on their birth year.""" ... current_year = datetime.date.today().year ... <|fim_suffix|> ... return age<|fim_middle|> >>> ... <|fim_prefix|>import datetime ... def calculate_age(birth_year): ... """Calculates a person's age based on their birth year.""" ... current_year = datetime.date.today().year ... <|fim_suffix|> ... return age<|fim_middle|> ... >>> Send a message (/? for help) ``` The instruct models work, but the code models seem to generate no output
Author
Owner

@ashokgelal commented on GitHub (Jun 12, 2024):

Can confirm this is still an issue even with the latest v0.1.43

OS: Apple Silicone
Model: codegemma:latest

This is the error I'm getting:

llama runner process has terminated: signal: abort trap error:unsupported op 'FLASH_ATTN_EXT'\nGGML_ASSERT: /Users/runner/work/ollama/ollama/llm/llama.cpp/ggml-metal.m:918:
<!-- gh-comment-id:2163339098 --> @ashokgelal commented on GitHub (Jun 12, 2024): Can confirm this is still an issue even with the latest v0.1.43 OS: Apple Silicone Model: codegemma:latest This is the error I'm getting: ``` llama runner process has terminated: signal: abort trap error:unsupported op 'FLASH_ATTN_EXT'\nGGML_ASSERT: /Users/runner/work/ollama/ollama/llm/llama.cpp/ggml-metal.m:918: ```
Author
Owner

@evertjr commented on GitHub (Jun 14, 2024):

problem still persist in the latest update

<!-- gh-comment-id:2167054206 --> @evertjr commented on GitHub (Jun 14, 2024): problem still persist in the latest update
Author
Owner

@robwilkes commented on GitHub (Jun 20, 2024):

still problematic in ollama version 0.1.45-rc4
I'm using codestral and deepseek v2 fine, but code gemma doesn't work, which is disappointing given code gemma is ranked top for FIM.

<!-- gh-comment-id:2179702681 --> @robwilkes commented on GitHub (Jun 20, 2024): still problematic in ollama version 0.1.45-rc4 I'm using codestral and deepseek v2 fine, but code gemma doesn't work, which is disappointing given code gemma is ranked top for FIM.
Author
Owner

@evertjr commented on GitHub (Jun 20, 2024):

I'm using granite-code for now, also good, but I still miss codegemma, it's the best for FIM.

<!-- gh-comment-id:2179723264 --> @evertjr commented on GitHub (Jun 20, 2024): I'm using granite-code for now, also good, but I still miss codegemma, it's the best for FIM.
Author
Owner

@evertjr commented on GitHub (Jun 25, 2024):

It's still broken on latest release

<!-- gh-comment-id:2189772277 --> @evertjr commented on GitHub (Jun 25, 2024): It's still broken on latest release
Author
Owner

@knilink commented on GitHub (Jul 16, 2024):

My guess is something wrong with the tokenizer which didn't treat <|fim_prefix|> etc. as special tokens

image

I tried the google official gguf with llama-tokenize and getting the same issue.

However decoding seems to be working fine.
image

Edit:
The latest llama.cpp seems to have the issue fixed. Reapply quantization to the original HuggingFace model using the latest llama.cpp(commit 1666f92) fixes the issue on my local.
image

For the official 1.1-2b the tokenizer config seems to be incorrected. https://huggingface.co/google/codegemma-1.1-2b/discussions/1
copy tokenizer.json and tokenzier_config.json from the 2b model as a workaround fixed the issue for me

<!-- gh-comment-id:2230722957 --> @knilink commented on GitHub (Jul 16, 2024): My guess is something wrong with the tokenizer which didn't treat `<|fim_prefix|>` etc. as special tokens ![image](https://github.com/user-attachments/assets/4241411d-3fbf-4d5d-8d98-0432497590fb) I tried the [google official gguf](https://huggingface.co/google/codegemma-2b-GGUF) with `llama-tokenize` and getting the same issue. However decoding seems to be working fine. ![image](https://github.com/user-attachments/assets/ae81dc07-0e5e-4e19-b97c-fcce76e23aab) **Edit:** The latest llama.cpp seems to have the issue fixed. Reapply quantization to the [original HuggingFace model](https://huggingface.co/google/codegemma-2b) using the latest llama.cpp(commit `1666f92`) fixes the issue on my local. ![image](https://github.com/user-attachments/assets/07abb0fd-cab7-4816-bfd5-32690c1f73b6) For the official `1.1-2b` the tokenizer config seems to be incorrected. https://huggingface.co/google/codegemma-1.1-2b/discussions/1 copy `tokenizer.json` and `tokenzier_config.json` from the `2b` model as a workaround fixed the issue for me
Author
Owner

@knilink commented on GitHub (Jul 17, 2024):

did a bit more investigation can confirm llamacpp's update breaking codegemma tokenizer
tested llama.cpp@74f33ad, which ollama@v0.1.39 was using, with the old gguf model and tokenization worked fine.
so i guess need to reapply the quantization with latest llamacpp to get the issue fixed otherwise stick with old ollama.

<!-- gh-comment-id:2233564598 --> @knilink commented on GitHub (Jul 17, 2024): did a bit more investigation can confirm llamacpp's update breaking codegemma tokenizer tested `llama.cpp@74f33ad`, which `ollama@v0.1.39` was using, with the old gguf model and tokenization worked fine. so i guess need to reapply the quantization with latest llamacpp to get the issue fixed otherwise stick with old ollama.
Author
Owner

@slashedstar commented on GitHub (Aug 5, 2024):

I couldn't get the updated version of codegemma to work so I uploaded the older one: https://ollama.com/edwardz/codegemmaq6
If you want you can quantize it to other values with https://huggingface.co/spaces/ggml-org/gguf-my-repo

<!-- gh-comment-id:2269348270 --> @slashedstar commented on GitHub (Aug 5, 2024): I couldn't get the updated version of codegemma to work so I uploaded the older one: https://ollama.com/edwardz/codegemmaq6 If you want you can quantize it to other values with https://huggingface.co/spaces/ggml-org/gguf-my-repo
Author
Owner

@evertjr commented on GitHub (Aug 5, 2024):

For anyone interested I uploaded a fixed version: https://ollama.com/edwardz/codegemmaq6 If you want you can quantize it to other values with https://huggingface.co/spaces/ggml-org/gguf-my-repo

I tried to requantize but it didn't really fix the issue I was having with code completion on continue.dev extension, so I ended spinning another Ollama server on 0.1.39 version to keep using it, at least until a proper fix. Let's hope google released a new codegemma model based on Gemma 2

<!-- gh-comment-id:2269514285 --> @evertjr commented on GitHub (Aug 5, 2024): > For anyone interested I uploaded a fixed version: https://ollama.com/edwardz/codegemmaq6 If you want you can quantize it to other values with https://huggingface.co/spaces/ggml-org/gguf-my-repo I tried to requantize but it didn't really fix the issue I was having with code completion on continue.dev extension, so I ended spinning another Ollama server on 0.1.39 version to keep using it, at least until a proper fix. Let's hope google released a new codegemma model based on Gemma 2
Author
Owner

@knilink commented on GitHub (Aug 6, 2024):

Hi @evertjr, did you replace tokenizer.json and tokenzier_config.json before requant? https://huggingface.co/google/codegemma-1.1-2b/discussions/1
for code-v1.1 those 2 files need to be fixed as well. i copied those configs form 2b and it worked for me.

<!-- gh-comment-id:2270447595 --> @knilink commented on GitHub (Aug 6, 2024): Hi @evertjr, did you replace `tokenizer.json` and `tokenzier_config.json` before requant? https://huggingface.co/google/codegemma-1.1-2b/discussions/1 for code-v1.1 those 2 files need to be fixed as well. i copied those configs form [2b](https://huggingface.co/google/codegemma-2b) and it worked for me.
Author
Owner

@evertjr commented on GitHub (Aug 7, 2024):

@knilink I did like you mentioned, it seems to work better but something feels off, it behave strangely and still doesn't produce completions in continue.dev for tab auto complete, whereas on 0.1.39 works perfectly without any modification.

<!-- gh-comment-id:2272517701 --> @evertjr commented on GitHub (Aug 7, 2024): @knilink I did like you mentioned, it seems to work better but something feels off, it behave strangely and still doesn't produce completions in continue.dev for tab auto complete, whereas on 0.1.39 works perfectly without any modification.
Author
Owner

@slashedstar commented on GitHub (Aug 7, 2024):

I also couldn't get 1.1 to work, I downloaded the tokenizer fix version, uploaded, converted to .gguf and it behaves just like the one in ollama's library

<!-- gh-comment-id:2272566911 --> @slashedstar commented on GitHub (Aug 7, 2024): I also couldn't get 1.1 to work, I downloaded the tokenizer fix version, uploaded, converted to .gguf and it behaves just like the one in ollama's library
Author
Owner

@m0ngr31 commented on GitHub (Sep 11, 2024):

Any updates on this?

<!-- gh-comment-id:2344202599 --> @m0ngr31 commented on GitHub (Sep 11, 2024): Any updates on this?
Author
Owner

@pidgeon777 commented on GitHub (Oct 30, 2024):

I'm also interested.

<!-- gh-comment-id:2447697041 --> @pidgeon777 commented on GitHub (Oct 30, 2024): I'm also interested.
Author
Owner

@jmorganca commented on GitHub (Nov 12, 2024):

Hi there, this should be fixed now. Sorry for all the trouble

<!-- gh-comment-id:2469422440 --> @jmorganca commented on GitHub (Nov 12, 2024): Hi there, this should be fixed now. Sorry for all the trouble
Author
Owner

@knilink commented on GitHub (Nov 12, 2024):

Hi @jmorganca,
Just tested on ollama:0.4.1 with codegemma:2b-code-q4_K_M( b593bb191655, updated 3 month ago) and test result showed it didn't seem to have been fixed because the issue lay in model's tokenizer

here is the prompt i used for testing on open-webui's playground

<|fim_prefix|>def speak(msg):
    """
    A simple function that 'speaks' a message by printing it with a prefix
    """
   <|fim_suffix|>

def greet(name):
    """
    A greeting function that depends on the speak function to output its message
    """
    message = f"Hello, {name}! Welcome!"<|fim_middle|>

the model generated speak(message) because <|fim_prefix|>, <|fim_suffix|> and <|fim_middle|> wasn't tokenized correctly so the the model saw it as continuing greet function while it's supposed to completed speak function by generating print(msg)

<!-- gh-comment-id:2469690389 --> @knilink commented on GitHub (Nov 12, 2024): Hi @jmorganca, Just tested on ollama:0.4.1 with `codegemma:2b-code-q4_K_M`( `b593bb191655`, updated 3 month ago) and test result showed it didn't seem to have been fixed because the issue lay in model's tokenizer here is the prompt i used for testing on open-webui's playground ``` <|fim_prefix|>def speak(msg): """ A simple function that 'speaks' a message by printing it with a prefix """ <|fim_suffix|> def greet(name): """ A greeting function that depends on the speak function to output its message """ message = f"Hello, {name}! Welcome!"<|fim_middle|> ``` the model generated `speak(message)` because `<|fim_prefix|>`, `<|fim_suffix|>` and `<|fim_middle|>` wasn't tokenized correctly so the the model saw it as continuing `greet` function while it's supposed to completed `speak` function by generating `print(msg)`
Author
Owner

@pidgeon777 commented on GitHub (Nov 12, 2024):

Should this issue be re-opened?

<!-- gh-comment-id:2470063608 --> @pidgeon777 commented on GitHub (Nov 12, 2024): Should this issue be re-opened?
Author
Owner

@SJ-USAF commented on GitHub (Dec 13, 2024):

Yes please re-open I am unable to get codegemma to function

<!-- gh-comment-id:2542561038 --> @SJ-USAF commented on GitHub (Dec 13, 2024): Yes please re-open I am unable to get codegemma to function
Author
Owner

@jakehlee commented on GitHub (Jun 15, 2025):

@jmorganca codegemma:code is indeed still broken, and continues to ignore FIM tokens. Please reopen this issue.

<!-- gh-comment-id:2973520766 --> @jakehlee commented on GitHub (Jun 15, 2025): @jmorganca `codegemma:code` is indeed still broken, and continues to ignore FIM tokens. Please reopen this issue.
Author
Owner

@jakehlee commented on GitHub (Jun 15, 2025):

For others searching how to get codegemma working with ollama and continue.dev, follow the instructions at:
https://huggingface.co/sh2nd/codegemma-1.1-2b-fix-Q4_K_M-GGUF/blob/main/README.md

<!-- gh-comment-id:2973527125 --> @jakehlee commented on GitHub (Jun 15, 2025): For others searching how to get codegemma working with ollama and continue.dev, follow the instructions at: https://huggingface.co/sh2nd/codegemma-1.1-2b-fix-Q4_K_M-GGUF/blob/main/README.md
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#49543