[GH-ISSUE #10199] Error: vocabulary is larger than expected '262145' instead of '262144' #6691

Closed
opened 2026-04-12 18:25:27 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @Tobias-DM on GitHub (Apr 9, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10199

What is the issue?

I get the error shown in title when I try to load a .safetensors model file, which was fine tuned, however, when I check the config.json the vocabulary size matches 262144, my process so far has been;

  1. Train model -> download model files
  2. Create a Modelfile which contains; FROM /path/to/model/dir
  3. ran the command "ollama create my-model" from the same directory as the modelfile was created
    In step 3 is where I run into the error, is there a known fix for this?

Relevant log output


OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.6.2

Originally created by @Tobias-DM on GitHub (Apr 9, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10199 ### What is the issue? I get the error shown in title when I try to load a .safetensors model file, which was fine tuned, however, when I check the config.json the vocabulary size matches 262144, my process so far has been; 1. Train model -> download model files 2. Create a Modelfile which contains; FROM /path/to/model/dir 3. ran the command "ollama create my-model" from the same directory as the modelfile was created In step 3 is where I run into the error, is there a known fix for this? ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.6.2
GiteaMirror added the bug label 2026-04-12 18:25:27 -05:00
Author
Owner

@brthom commented on GitHub (Apr 11, 2025):

Ollama is getting 262145 from counting the vocab tokens, and it's getting 262144 from config.json. This suggest to me that the issue isn't that it's getting the vocab size from your config.json incorrectly, it's that your config.json is incorrect. If you change vocab size in config.json to 262145 (the expected value by ollama) does this fix the issue?

<!-- gh-comment-id:2797532586 --> @brthom commented on GitHub (Apr 11, 2025): Ollama is getting 262145 from counting the vocab tokens, and it's getting 262144 from config.json. This suggest to me that the issue isn't that it's getting the vocab size from your config.json incorrectly, it's that your config.json is incorrect. If you change vocab size in config.json to 262145 (the expected value by ollama) does this fix the issue?
Author
Owner

@Tobias-DM commented on GitHub (Apr 11, 2025):

Hi Ben, yes it did fix the issue, tried it out earlier, do you know why the vocab was saved differently to what Ollama expected? I thought vocab was based on tokenizer intrinsic to the model, or does ollama provide/change the tokenizer?

<!-- gh-comment-id:2797556612 --> @Tobias-DM commented on GitHub (Apr 11, 2025): Hi Ben, yes it did fix the issue, tried it out earlier, do you know why the vocab was saved differently to what Ollama expected? I thought vocab was based on tokenizer intrinsic to the model, or does ollama provide/change the tokenizer?
Author
Owner

@brthom commented on GitHub (Apr 11, 2025):

Not entirely sure, I only did a surface level check. Error checking in question is in ./convert/convert.go if you want to take a look

<!-- gh-comment-id:2797567765 --> @brthom commented on GitHub (Apr 11, 2025): Not entirely sure, I only did a surface level check. Error checking in question is in ./convert/convert.go if you want to take a look
Author
Owner

@Tobias-DM commented on GitHub (Apr 11, 2025):

Thanks! I'm new to this, I'll check it out

<!-- gh-comment-id:2797732126 --> @Tobias-DM commented on GitHub (Apr 11, 2025): Thanks! I'm new to this, I'll check it out
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6691