[GH-ISSUE #5663] Error: llama runner process has terminated: signal: abort trap error:vocab size mismatch. #50043

Closed
opened 2026-04-28 13:55:55 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @asap-blocky on GitHub (Jul 13, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5663

What is the issue?

While attempting to run my fine tuned model using the Ollama library, I got this error message, "Error: llama runner process has terminated: signal: abort trap error:vocab size mismatch."

Model and Environment:

  • The model was fine-tuned using the FastLanguageModel from the unsloth library and saved in the GGUF format.
  • The tokenizer was applied using a chat template for formatting inputs.
  • The model and tokenizer were loaded correctly, and the inference process was initiated.

Error Occurrence:

  • The error occurs immediately after issuing the ollama run model-name command.
  • The detailed logs indicate a vocabulary size mismatch between the model and the tokenizer.

Model Metadata:

  • The model’s configuration (config.json) indicates a vocabulary size of 32064.
  • The tokenizer configuration (tokenizer.json) and metadata logs show different values, leading to the mismatch.

Steps Taken:

  1. Verified the consistency of vocabulary size across all relevant configuration files.
  2. Attempted to resize token embeddings to match the tokenizer’s vocabulary size during model loading.
  3. Checked for any missing or additional special tokens in the tokenizer configuration.

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

ollama version is 0.2.2 Warning: client version is 0.2.1

Originally created by @asap-blocky on GitHub (Jul 13, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5663 ### What is the issue? While attempting to run my fine tuned model using the Ollama library, I got this error message, "Error: llama runner process has terminated: signal: abort trap error:vocab size mismatch." ### Model and Environment: - The model was fine-tuned using the FastLanguageModel from the unsloth library and saved in the GGUF format. - The tokenizer was applied using a chat template for formatting inputs. - The model and tokenizer were loaded correctly, and the inference process was initiated. ### Error Occurrence: - The error occurs immediately after issuing the **ollama run model-name** command. - The detailed logs indicate a vocabulary size mismatch between the model and the tokenizer. ### Model Metadata: - The model’s configuration (config.json) indicates a vocabulary size of 32064. - The tokenizer configuration (tokenizer.json) and metadata logs show different values, leading to the mismatch. ### Steps Taken: 1. Verified the consistency of vocabulary size across all relevant configuration files. 2. Attempted to resize token embeddings to match the tokenizer’s vocabulary size during model loading. 3. Checked for any missing or additional special tokens in the tokenizer configuration. ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version ollama version is 0.2.2 Warning: client version is 0.2.1
GiteaMirror added the bug label 2026-04-28 13:55:55 -05:00
Author
Owner

@ramenw4ve commented on GitHub (Aug 4, 2024):

so how did you fix it?

<!-- gh-comment-id:2267437570 --> @ramenw4ve commented on GitHub (Aug 4, 2024): so how did you fix it?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#50043