[GH-ISSUE #8238] vocabulary is larger than expected #67316

Closed
opened 2026-05-04 09:55:20 -05:00 by GiteaMirror · 10 comments
Owner

Originally created by @lx687 on GitHub (Dec 25, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/8238

What is the issue?

ollama create model -f ./Modelfile report errors: Error: vocabulary is larger than expected '128257'instead of '128256'

OS

Linux

GPU

Nvidia

CPU

No response

Ollama version

0.5.4

Originally created by @lx687 on GitHub (Dec 25, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/8238 ### What is the issue? ollama create model -f ./Modelfile report errors: Error: vocabulary is larger than expected '128257'instead of '128256' ### OS Linux ### GPU Nvidia ### CPU _No response_ ### Ollama version 0.5.4
GiteaMirror added the bug label 2026-05-04 09:55:20 -05:00
Author
Owner

@rick-github commented on GitHub (Dec 25, 2024):

The size of the vocabulary in config.json doesn't match the size of the vocabulary in the model file. Where did you get the model you are trying to convert?

<!-- gh-comment-id:2561797931 --> @rick-github commented on GitHub (Dec 25, 2024): The size of the vocabulary in config.json doesn't match the size of the vocabulary in the model file. Where did you get the model you are trying to convert?
Author
Owner

@lx687 commented on GitHub (Dec 26, 2024):

config.json 中的词汇表大小与模型文件中的词汇表大小不匹配。您从哪里获取要转换的模型?

from huggingface,The problem caused by introducing ollama running after training with llama factory,The vocabulary in config. json remains unchanged after training,May I ask how to solve this

<!-- gh-comment-id:2562068550 --> @lx687 commented on GitHub (Dec 26, 2024): > config.json 中的词汇表大小与模型文件中的词汇表大小不匹配。您从哪里获取要转换的模型? from huggingface,The problem caused by introducing ollama running after training with llama factory,The vocabulary in config. json remains unchanged after training,May I ask how to solve this
Author
Owner

@rick-github commented on GitHub (Dec 26, 2024):

What model from HF?

<!-- gh-comment-id:2562109351 --> @rick-github commented on GitHub (Dec 26, 2024): What model from HF?
Author
Owner

@lx687 commented on GitHub (Dec 26, 2024):

What model from HF?

shenzhi-wang/Llama3-8B-Chinese-Chat

<!-- gh-comment-id:2562219713 --> @lx687 commented on GitHub (Dec 26, 2024): > What model from HF? shenzhi-wang/Llama3-8B-Chinese-Chat
Author
Owner

@rick-github commented on GitHub (Dec 26, 2024):

How are you exporting you model from LLaMaFactory? LORA adapter, merged GGUF, etC?

<!-- gh-comment-id:2562328242 --> @rick-github commented on GitHub (Dec 26, 2024): How are you exporting you model from LLaMaFactory? LORA adapter, merged GGUF, etC?
Author
Owner

@pdevine commented on GitHub (Dec 27, 2024):

I wasn't able to reproduce this. Can you update your weights?

% git clone https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat
Cloning into 'Llama3-8B-Chinese-Chat'...
remote: Enumerating objects: 159, done.
remote: Counting objects: 100% (154/154), done.
remote: Compressing objects: 100% (154/154), done.
remote: Total 159 (delta 77), reused 0 (delta 0), pack-reused 5 (from 1)
Receiving objects: 100% (159/159), 2.34 MiB | 9.91 MiB/s, done.
Resolving deltas: 100% (77/77), done.
Filtering content: 100% (4/4), 2.95 GiB | 9.26 MiB/s, done.
% cd Llama3-8B-Chinese-Chat
% ollama create test
transferring model data 100%
converting model
using existing layer sha256:287463ff7eee5fc4bf5a1efcb0d141c0c5e919da256f507cfb523531fba22c4c
using existing layer sha256:b21e6799b8a71e0aafad1d38392d77f128894477c5d942701fd7d5b37441e7f8
writing manifest
success

I suspect that the original weights had the wrong vocab_size value in config.json. NOTE you'll see need to add the correct TEMPLATE in your Modelfile.

I'll go ahead and close the issue.

<!-- gh-comment-id:2563225473 --> @pdevine commented on GitHub (Dec 27, 2024): I wasn't able to reproduce this. Can you update your weights? ``` % git clone https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat Cloning into 'Llama3-8B-Chinese-Chat'... remote: Enumerating objects: 159, done. remote: Counting objects: 100% (154/154), done. remote: Compressing objects: 100% (154/154), done. remote: Total 159 (delta 77), reused 0 (delta 0), pack-reused 5 (from 1) Receiving objects: 100% (159/159), 2.34 MiB | 9.91 MiB/s, done. Resolving deltas: 100% (77/77), done. Filtering content: 100% (4/4), 2.95 GiB | 9.26 MiB/s, done. % cd Llama3-8B-Chinese-Chat % ollama create test transferring model data 100% converting model using existing layer sha256:287463ff7eee5fc4bf5a1efcb0d141c0c5e919da256f507cfb523531fba22c4c using existing layer sha256:b21e6799b8a71e0aafad1d38392d77f128894477c5d942701fd7d5b37441e7f8 writing manifest success ``` I suspect that the original weights had the wrong `vocab_size` value in `config.json`. *NOTE* you'll see need to add the correct `TEMPLATE` in your Modelfile. I'll go ahead and close the issue.
Author
Owner

@lx687 commented on GitHub (Dec 27, 2024):

I wasn't able to reproduce this. Can you update your weights?

% git clone https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat
Cloning into 'Llama3-8B-Chinese-Chat'...
remote: Enumerating objects: 159, done.
remote: Counting objects: 100% (154/154), done.
remote: Compressing objects: 100% (154/154), done.
remote: Total 159 (delta 77), reused 0 (delta 0), pack-reused 5 (from 1)
Receiving objects: 100% (159/159), 2.34 MiB | 9.91 MiB/s, done.
Resolving deltas: 100% (77/77), done.
Filtering content: 100% (4/4), 2.95 GiB | 9.26 MiB/s, done.
% cd Llama3-8B-Chinese-Chat
% ollama create test
transferring model data 100%
converting model
using existing layer sha256:287463ff7eee5fc4bf5a1efcb0d141c0c5e919da256f507cfb523531fba22c4c
using existing layer sha256:b21e6799b8a71e0aafad1d38392d77f128894477c5d942701fd7d5b37441e7f8
writing manifest
success

I suspect that the original weights had the wrong vocab_size value in config.json. NOTE you'll see need to add the correct TEMPLATE in your Modelfile.

I'll go ahead and close the issue.

I have fine tuned the model through llamafactory, and the unadjusted model can be loaded into ollama

<!-- gh-comment-id:2563230096 --> @lx687 commented on GitHub (Dec 27, 2024): > I wasn't able to reproduce this. Can you update your weights? > > ``` > % git clone https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat > Cloning into 'Llama3-8B-Chinese-Chat'... > remote: Enumerating objects: 159, done. > remote: Counting objects: 100% (154/154), done. > remote: Compressing objects: 100% (154/154), done. > remote: Total 159 (delta 77), reused 0 (delta 0), pack-reused 5 (from 1) > Receiving objects: 100% (159/159), 2.34 MiB | 9.91 MiB/s, done. > Resolving deltas: 100% (77/77), done. > Filtering content: 100% (4/4), 2.95 GiB | 9.26 MiB/s, done. > % cd Llama3-8B-Chinese-Chat > % ollama create test > transferring model data 100% > converting model > using existing layer sha256:287463ff7eee5fc4bf5a1efcb0d141c0c5e919da256f507cfb523531fba22c4c > using existing layer sha256:b21e6799b8a71e0aafad1d38392d77f128894477c5d942701fd7d5b37441e7f8 > writing manifest > success > ``` > > I suspect that the original weights had the wrong `vocab_size` value in `config.json`. _NOTE_ you'll see need to add the correct `TEMPLATE` in your Modelfile. > > I'll go ahead and close the issue. I have fine tuned the model through llamafactory, and the unadjusted model can be loaded into ollama
Author
Owner

@lx687 commented on GitHub (Dec 27, 2024):

vocab_size

I manually changed the VOCAB_Size variable in config. json, which can be loaded into ollama using the create command, but cannot run the model
(base) [root@iZ8psj2fjgopvor6vx3rtyZ ~]# ollama run Llama3-8B-Chinese-Chat-new-merged:latest Error: llama runner process has terminated: error loading model: check_tensor_dims: tensor 'token_embd.weight' has wrong shape; expected 4096, 128257, got 4096, 128256, 1, 1

<!-- gh-comment-id:2563236185 --> @lx687 commented on GitHub (Dec 27, 2024): > `vocab_size` I manually changed the VOCAB_Size variable in config. json, which can be loaded into ollama using the create command, but cannot run the model `(base) [root@iZ8psj2fjgopvor6vx3rtyZ ~]# ollama run Llama3-8B-Chinese-Chat-new-merged:latest Error: llama runner process has terminated: error loading model: check_tensor_dims: tensor 'token_embd.weight' has wrong shape; expected 4096, 128257, got 4096, 128256, 1, 1`
Author
Owner

@lynnna-xu commented on GitHub (Feb 24, 2025):

I had same issue for models trained with LLaMaFactory. Importing through hugging face AutoModelForCausalLM and AutoTokenizer doesn't have such issue. Any idea how to fix this?

<!-- gh-comment-id:2677578386 --> @lynnna-xu commented on GitHub (Feb 24, 2025): I had same issue for models trained with LLaMaFactory. Importing through hugging face AutoModelForCausalLM and AutoTokenizer doesn't have such issue. Any idea how to fix this?
Author
Owner

@pdevine commented on GitHub (Mar 11, 2025):

Sorry for the slow response here. Unfortunately you can't just change the vocab size because the vocabulary has to be the same size as the token embeddings as that's the way the model was actually trained.

I'm not sure what llamafactory changed, but it seems like maybe it added some additional tokens that shouldn't have been there? I'm guessing the bug is with llamafactory?

<!-- gh-comment-id:2715976460 --> @pdevine commented on GitHub (Mar 11, 2025): Sorry for the slow response here. Unfortunately you can't just change the vocab size because the vocabulary has to be the same size as the token embeddings as that's the way the model was actually trained. I'm not sure what llamafactory changed, but it seems like maybe it added some additional tokens that shouldn't have been there? I'm guessing the bug is with llamafactory?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#67316