[GH-ISSUE #496] CodeLlama tokenizer <FILL_ME> token support #62264

Closed
opened 2026-05-03 08:01:51 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @regularfry on GitHub (Sep 8, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/496

It might be that I just can't find the right setting to make this work, but CodeLlama's upstream model docs refer to a fill_token for splitting the input and constructing the prompt for code infill. I can't seem to make this work on any of the codellama:7b variants using that token, whereas the HF hosted version of 13b seems to support it fine.

They give this example prompt for using <FILL_ME>:

def remove_non_ascii(s: str) -> str:
    """<FILL_ME>
    return result

Here's the ollama output for the online 13b-instruct version:

def remove_non_ascii(s: str) -> str:
    """Remove non-ASCII characters from a string."""
    return "".join(i for i in s if ord(i) < 128)

Here's the output for local 7b:

Sure! Here's the code to remove non-ASCII characters from a string in Python:
```python
def remove_non_ascii(s):
    # Create a new string with only ASCII characters
    result = ""
    for char in s:
        if ord(char) < 128:
            result += char

    return result
```
This function takes a string as input and returns a new string that contains only ASCII characters. The `ord()` function is used to convert each character to its corresponding Unicode code point, which allows us to check if the character is in the ASCII range. If it is not, then we skip adding it to the result string.

The code is ok (other than that it ignored the multiline docstring prompt); the surrounding commentary and markdown formatting is not.

I know this isn't a direct like-for-like comparison, but I can't run 13b locally, and I can't seem to find 7b hosted online anywhere; it's just too big for HF's free tier.

Am I holding it wrong?

Originally created by @regularfry on GitHub (Sep 8, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/496 It might be that I just can't find the right setting to make this work, but CodeLlama's upstream model docs refer to a [fill_token](https://huggingface.co/docs/transformers/main/model_doc/code_llama#transformers.CodeLlamaTokenizer.fill_token) for splitting the input and constructing the prompt for code infill. I can't seem to make this work on any of the `codellama:7b` variants using that token, whereas the HF hosted version of 13b seems to support it fine. They give this example prompt for using `<FILL_ME>`: ``` def remove_non_ascii(s: str) -> str: """<FILL_ME> return result ``` Here's the ollama output for the online 13b-instruct version: ``` def remove_non_ascii(s: str) -> str: """Remove non-ASCII characters from a string.""" return "".join(i for i in s if ord(i) < 128) ``` Here's the output for local 7b: Sure! Here's the code to remove non-ASCII characters from a string in Python: ```python def remove_non_ascii(s): # Create a new string with only ASCII characters result = "" for char in s: if ord(char) < 128: result += char return result ``` This function takes a string as input and returns a new string that contains only ASCII characters. The `ord()` function is used to convert each character to its corresponding Unicode code point, which allows us to check if the character is in the ASCII range. If it is not, then we skip adding it to the result string. The code is ok (other than that it ignored the multiline docstring prompt); the surrounding commentary and markdown formatting is not. I know this isn't a direct like-for-like comparison, but I can't run 13b locally, and I can't seem to find 7b hosted online anywhere; it's just too big for HF's free tier. Am I holding it wrong?
GiteaMirror added the feature request label 2026-05-03 08:01:51 -05:00
Author
Owner

@mxyng commented on GitHub (Sep 8, 2023):

<FILL_ME> is not a real token as far as I know. It's used as a delimiter for the model runner to split the inputs into the infill prefix and suffix. You can see it in action here.

For infill with Ollama, you need to split the input into their prefix and suffixes and attach the right tokens. This looks like <PRE> {{ .Prefix }}<SUF> {{ .Suffix }} <MID> for prefix-suffix-middle and <PRE> <SUF>{{ .Suffix }}} <MID> {{ .Prefix }} for suffix-prefix-middle. See reference: https://github.com/facebookresearch/codellama/blob/main/llama/generation.py#L380

<!-- gh-comment-id:1711874465 --> @mxyng commented on GitHub (Sep 8, 2023): `<FILL_ME>` is not a real token as far as I know. It's used as a delimiter for the model runner to split the inputs into the infill prefix and suffix. You can see it in action [here](https://github.com/facebookresearch/codellama/blob/main/example_infilling.py#L62). For infill with Ollama, you need to split the input into their prefix and suffixes and attach the right tokens. This looks like `<PRE> {{ .Prefix }}<SUF> {{ .Suffix }} <MID>` for prefix-suffix-middle and `<PRE> <SUF>{{ .Suffix }}} <MID> {{ .Prefix }}` for suffix-prefix-middle. See reference: https://github.com/facebookresearch/codellama/blob/main/llama/generation.py#L380
Author
Owner

@regularfry commented on GitHub (Sep 8, 2023):

It's a real token in the sense that it's processed by the codellama tokeniser so that you don't have to manually split the prefix and suffix and attach the right tokens, which they say they did because it's more robust. It would be good to see that supported.

It does look like a change from what they published originally for Llama, though - they seem quite proud that infilling is supported out of the box here.

<!-- gh-comment-id:1711912206 --> @regularfry commented on GitHub (Sep 8, 2023): It's a real token in the sense that it's [processed by the codellama tokeniser](https://github.com/huggingface/transformers/blob/18ee1fe76295239335bf1528c744fe1cfba21cc8/src/transformers/models/code_llama/tokenization_code_llama.py#L258) so that you don't have to manually split the prefix and suffix and attach the right tokens, which they say they did because it's more robust. It would be good to see that supported. It does look like a change from what they published originally for Llama, though - they seem quite proud that infilling is supported out of the box [here](https://huggingface.co/docs/transformers/model_doc/code_llama).
Author
Owner

@mxyng commented on GitHub (Sep 8, 2023):

Ah yes. That looks like a HF exclusive. While there's currently no plans for model specific tokenizers right now, we are looking at other ways of achieve similar results. One example is https://github.com/jmorganca/ollama/pull/466

<!-- gh-comment-id:1711926122 --> @mxyng commented on GitHub (Sep 8, 2023): Ah yes. That looks like a HF exclusive. While there's currently no plans for model specific tokenizers right now, we are looking at other ways of achieve similar results. One example is https://github.com/jmorganca/ollama/pull/466
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#62264