[GH-ISSUE #4629] granite-code:20b-instruct-q8_0 error loading model vocabulary: unknown pre-tokenizer type: 'refact' #2907

Closed
opened 2026-04-12 13:15:40 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @ekolawole on GitHub (May 25, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4629

Originally assigned to: @BruceMacD on GitHub.

What is the issue?

llama_model_loader: - type f32: 419 tensors
llama_model_loader: - type q8_0: 209 tensors
llama_model_load: error loading model: error loading model vocabulary: unknown pre-tokenizer type: 'refact'
llama_load_model_from_file: exception loading model
libc++abi: terminating due to uncaught exception of type std::runtime_error: error loading model vocabulary: unknown pre-tokenizer type: 'refact'
time=2024-05-25T03:28:13.829-04:00 level=ERROR source=sched.go:344 msg="error loading llama server" error="llama runner process has terminated: signal: abort trap error:error loading model vocabulary: unknown pre-tokenizer type: 'refact'"
[GIN] 2024/05/25 - 03:28:13 | 500 | 446.67725ms | | POST "/v1/chat/completions"

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.1.38

Originally created by @ekolawole on GitHub (May 25, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4629 Originally assigned to: @BruceMacD on GitHub. ### What is the issue? llama_model_loader: - type f32: 419 tensors llama_model_loader: - type q8_0: 209 tensors llama_model_load: error loading model: error loading model vocabulary: unknown pre-tokenizer type: 'refact' llama_load_model_from_file: exception loading model libc++abi: terminating due to uncaught exception of type std::runtime_error: error loading model vocabulary: unknown pre-tokenizer type: 'refact' time=2024-05-25T03:28:13.829-04:00 level=ERROR source=sched.go:344 msg="error loading llama server" error="llama runner process has terminated: signal: abort trap error:error loading model vocabulary: unknown pre-tokenizer type: 'refact'" [GIN] 2024/05/25 - 03:28:13 | 500 | 446.67725ms | | POST "/v1/chat/completions" ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.1.38
GiteaMirror added the bug label 2026-04-12 13:15:40 -05:00
Author
Owner

@nurena24 commented on GitHub (May 26, 2024):

getting the same error with granite-code:20b-instruct-q3_K_M model in docker desktop running on 4090 GPU

<!-- gh-comment-id:2132060554 --> @nurena24 commented on GitHub (May 26, 2024): getting the same error with granite-code:20b-instruct-q3_K_M model in docker desktop running on 4090 GPU
Author
Owner

@wang-junjian commented on GitHub (May 26, 2024):

Error: llama runner process has terminated: signal: abort trap error:error loading model vocabulary: unknown pre-tokenizer type: 'refact'

macOS 14.5
ollama version is 0.1.38

<!-- gh-comment-id:2132276493 --> @wang-junjian commented on GitHub (May 26, 2024): Error: llama runner process has terminated: signal: abort trap error:error loading model vocabulary: unknown pre-tokenizer type: 'refact' macOS 14.5 ollama version is 0.1.38
Author
Owner

@wang-junjian commented on GitHub (May 28, 2024):

Install 0.1.39 version.
https://github.com/ollama/ollama/releases/tag/v0.1.39

<!-- gh-comment-id:2135332601 --> @wang-junjian commented on GitHub (May 28, 2024): Install 0.1.39 version. https://github.com/ollama/ollama/releases/tag/v0.1.39
Author
Owner

@BruceMacD commented on GitHub (May 30, 2024):

Thanks for the report. As noted above this model (and other granite family models) will work in ollama v0.1.39+ now.

<!-- gh-comment-id:2140902028 --> @BruceMacD commented on GitHub (May 30, 2024): Thanks for the report. As noted above this model (and other granite family models) will work in ollama v0.1.39+ now.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#2907