[GH-ISSUE #5897] Error: llama3.1 runner process has terminated: signal: aborted #65716

Closed
opened 2026-05-03 22:22:54 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @harnalashok on GitHub (Jul 24, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5897

What is the issue?

I have downloaded ollama3.1:8b using ollama. I am getting the following error while running llama3.1. llama3 runs fine on the same syste,.:

Error: llama runner process has terminated: signal: aborted

OS

Windows 11 wsl2 Ubuntu

GPU

GeForce RTX 4070

CPU

No response

Ollama version

ollama version is 0.1.38

Here are the server logs:

lines 974-1002/1002 (END)
Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 10: llama.attention.head_count u32 = 32
Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 11: llama.attention.head_count_kv u32 = 8
Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 12: llama.rope.freq_base f32 = 500000.000000
Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 13: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 14: general.file_type u32 = 2
Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 15: llama.vocab_size u32 = 128256
Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 16: llama.rope.dimension_count u32 = 128
Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 17: tokenizer.ggml.model str = gpt2
Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 18: tokenizer.ggml.pre str = smaug-bpe
Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 19: tokenizer.ggml.tokens arr[str,128256] = ["!", """, "#", "$", "%", ">
Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 20: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, >
Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 21: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", ".>
Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 22: tokenizer.ggml.bos_token_id u32 = 128000
Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 23: tokenizer.ggml.eos_token_id u32 = 128001
Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 24: tokenizer.chat_template str = {% set loop_messages = messa>
Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 25: general.quantization_version u32 = 2
Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - type f32: 65 tensors
Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - type q4_0: 225 tensors
Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - type q6_K: 1 tensors
Jul 24 07:14:08 ashok ollama[158]: time=2024-07-24T07:14:08.359+05:30 level=INFO source=server.go:540 msg="waiting for server to become available" status=">
Jul 24 07:14:08 ashok ollama[158]: llama_model_load: error loading model: error loading model vocabulary: unknown pre-tokenizer type: 'smaug-bpe'
Jul 24 07:14:08 ashok ollama[158]: llama_load_model_from_file: exception loading model
Jul 24 07:14:08 ashok ollama[158]: terminate called after throwing an instance of 'std::runtime_error'
Jul 24 07:14:08 ashok ollama[158]: what(): error loading model vocabulary: unknown pre-tokenizer type: 'smaug-bpe'
Jul 24 07:14:08 ashok ollama[158]: time=2024-07-24T07:14:08.610+05:30 level=ERROR source=sched.go:344 msg="error loading llama server" error="llama runner >
Jul 24 07:14:08 ashok ollama[158]: [GIN] 2024/07/24 - 07:14:08 | 500 | 4.333690088s | 127.0.0.1 | POST "/api/chat"
Jul 24 07:14:15 ashok ollama[158]: time=2024-07-24T07:14:15.636+05:30 level=WARN source=sched.go:512 msg="gpu VRAM usage didn't recover within timeout" sec>
Jul 24 07:14:19 ashok ollama[158]: time=2024-07-24T07:14:19.624+05:30 level=WARN source=sched.go:512 msg="gpu VRAM usage didn't recover within timeout" sec>
Jul 24 07:14:23 ashok ollama[158]: time=2024-07-24T07:14:23.388+05:30 level=WARN source=sched.go:512 msg="gpu VRAM usage didn't recover within timeout" sec>
~
~

Originally created by @harnalashok on GitHub (Jul 24, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5897 ### What is the issue? I have downloaded ollama3.1:8b using ollama. I am getting the following error while running llama3.1. llama3 runs fine on the same syste,.: Error: llama runner process has terminated: signal: aborted ### OS Windows 11 wsl2 Ubuntu ### GPU GeForce RTX 4070 ### CPU _No response_ ### Ollama version ollama version is 0.1.38 Here are the server logs: lines 974-1002/1002 (END) Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 10: llama.attention.head_count u32 = 32 Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 11: llama.attention.head_count_kv u32 = 8 Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 12: llama.rope.freq_base f32 = 500000.000000 Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 13: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 14: general.file_type u32 = 2 Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 15: llama.vocab_size u32 = 128256 Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 16: llama.rope.dimension_count u32 = 128 Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 17: tokenizer.ggml.model str = gpt2 Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 18: tokenizer.ggml.pre str = smaug-bpe Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 19: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "> Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 20: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, > Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 21: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", ".> Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 22: tokenizer.ggml.bos_token_id u32 = 128000 Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 23: tokenizer.ggml.eos_token_id u32 = 128001 Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 24: tokenizer.chat_template str = {% set loop_messages = messa> Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - kv 25: general.quantization_version u32 = 2 Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - type f32: 65 tensors Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - type q4_0: 225 tensors Jul 24 07:14:08 ashok ollama[158]: llama_model_loader: - type q6_K: 1 tensors Jul 24 07:14:08 ashok ollama[158]: time=2024-07-24T07:14:08.359+05:30 level=INFO source=server.go:540 msg="waiting for server to become available" status="> Jul 24 07:14:08 ashok ollama[158]: llama_model_load: error loading model: error loading model vocabulary: unknown pre-tokenizer type: 'smaug-bpe' Jul 24 07:14:08 ashok ollama[158]: llama_load_model_from_file: exception loading model Jul 24 07:14:08 ashok ollama[158]: terminate called after throwing an instance of 'std::runtime_error' Jul 24 07:14:08 ashok ollama[158]: what(): error loading model vocabulary: unknown pre-tokenizer type: 'smaug-bpe' Jul 24 07:14:08 ashok ollama[158]: time=2024-07-24T07:14:08.610+05:30 level=ERROR source=sched.go:344 msg="error loading llama server" error="llama runner > Jul 24 07:14:08 ashok ollama[158]: [GIN] 2024/07/24 - 07:14:08 | 500 | 4.333690088s | 127.0.0.1 | POST "/api/chat" Jul 24 07:14:15 ashok ollama[158]: time=2024-07-24T07:14:15.636+05:30 level=WARN source=sched.go:512 msg="gpu VRAM usage didn't recover within timeout" sec> Jul 24 07:14:19 ashok ollama[158]: time=2024-07-24T07:14:19.624+05:30 level=WARN source=sched.go:512 msg="gpu VRAM usage didn't recover within timeout" sec> Jul 24 07:14:23 ashok ollama[158]: time=2024-07-24T07:14:23.388+05:30 level=WARN source=sched.go:512 msg="gpu VRAM usage didn't recover within timeout" sec> ~ ~
GiteaMirror added the bug label 2026-05-03 22:22:54 -05:00
Author
Owner

@rick-github commented on GitHub (Jul 24, 2024):

Server logs will help diagnose the issue.

<!-- gh-comment-id:2246636387 --> @rick-github commented on GitHub (Jul 24, 2024): Server logs will help diagnose the issue.
Author
Owner

@Cephra commented on GitHub (Jul 24, 2024):

Are you using a docker container for ollama maybe?

I've had some minor hiccup trying to get llama3.1 running.

Turned out it was because I've used an older version of the ollama rocm image that I'm using

I'd give that a try! Good luck!

<!-- gh-comment-id:2246669382 --> @Cephra commented on GitHub (Jul 24, 2024): Are you using a docker container for ollama maybe? I've had some minor hiccup trying to get llama3.1 running. Turned out it was because I've used an older version of the ollama rocm image that I'm using I'd give that a try! Good luck!
Author
Owner

@harnalashok commented on GitHub (Jul 24, 2024):

Server logs will help diagnose the issue.

I updated my question with server logs. Kindly see. Thanks

<!-- gh-comment-id:2246701977 --> @harnalashok commented on GitHub (Jul 24, 2024): > Server logs will help diagnose the issue. I updated my question with server logs. Kindly see. Thanks
Author
Owner

@harnalashok commented on GitHub (Jul 24, 2024):

Are you using a docker container for ollama maybe?

I've had some minor hiccup trying to get llama3.1 running.

Turned out it was because I've used an older version of the ollama rocm image that I'm using

I'd give that a try! Good luck!

I am not using docker container. I have updated my question with server logs.

<!-- gh-comment-id:2246703005 --> @harnalashok commented on GitHub (Jul 24, 2024): > Are you using a docker container for ollama maybe? > > I've had some minor hiccup trying to get llama3.1 running. > > Turned out it was because I've used an older version of the ollama rocm image that I'm using > > I'd give that a try! Good luck! I am not using docker container. I have updated my question with server logs.
Author
Owner

@rick-github commented on GitHub (Jul 24, 2024):

Jul 24 07:14:08 ashok ollama[158]: what(): error loading model vocabulary: unknown pre-tokenizer type: 'smaug-bpe'

You are trying to load a model that is not supported by your version of ollama. Try upgrading, but be aware that llama3.1 has only just been released and there are reported bugs, so problems may persist for the next couple of releases.

<!-- gh-comment-id:2246706596 --> @rick-github commented on GitHub (Jul 24, 2024): ``` Jul 24 07:14:08 ashok ollama[158]: what(): error loading model vocabulary: unknown pre-tokenizer type: 'smaug-bpe' ``` You are trying to load a model that is not supported by your version of ollama. Try upgrading, but be aware that llama3.1 has only just been released and there are reported bugs, so problems may persist for the next couple of releases.
Author
Owner

@benhalverson commented on GitHub (Jul 24, 2024):

I also had this issue. I was using an outdated version. Here is what I did to fix the issue.
curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl stop ollama stops the service
sudo systemctl start ollama starts the service
sudo systemctl status ollama shows the status of the service running
ollama --version should be version 0.2.8

<!-- gh-comment-id:2246709329 --> @benhalverson commented on GitHub (Jul 24, 2024): I also had this issue. I was using an outdated version. Here is what I did to fix the issue. `curl -fsSL https://ollama.com/install.sh | sh` `sudo systemctl stop ollama` stops the service `sudo systemctl start ollama` starts the service `sudo systemctl status ollama` shows the status of the service running `ollama --version` should be version 0.2.8
Author
Owner

@harnalashok commented on GitHub (Jul 24, 2024):

I also had this issue. I was using an outdated version. Here is what I did to fix the issue. curl -fsSL https://ollama.com/install.sh | sh sudo systemctl stop ollama stops the service sudo systemctl start ollama starts the service sudo systemctl status ollama shows the status of the service running ollama --version should be version 0.2.8

Thanks. I updated my Ollama to 0.2.8. ollama3.1 runs now.

<!-- gh-comment-id:2246715263 --> @harnalashok commented on GitHub (Jul 24, 2024): > I also had this issue. I was using an outdated version. Here is what I did to fix the issue. `curl -fsSL https://ollama.com/install.sh | sh` `sudo systemctl stop ollama` stops the service `sudo systemctl start ollama` starts the service `sudo systemctl status ollama` shows the status of the service running `ollama --version` should be version 0.2.8 Thanks. I updated my Ollama to 0.2.8. ollama3.1 runs now.
Author
Owner

@harnalashok commented on GitHub (Jul 24, 2024):

Jul 24 07:14:08 ashok ollama[158]: what(): error loading model vocabulary: unknown pre-tokenizer type: 'smaug-bpe'

You are trying to load a model that is not supported by your version of ollama. Try upgrading, but be aware that llama3.1 has only just been released and there are reported bugs, so problems may persist for the next couple of releases.

Ok. Thanks for the forewarning.

<!-- gh-comment-id:2246716297 --> @harnalashok commented on GitHub (Jul 24, 2024): > ``` > Jul 24 07:14:08 ashok ollama[158]: what(): error loading model vocabulary: unknown pre-tokenizer type: 'smaug-bpe' > ``` > > You are trying to load a model that is not supported by your version of ollama. Try upgrading, but be aware that llama3.1 has only just been released and there are reported bugs, so problems may persist for the next couple of releases. Ok. Thanks for the forewarning.
Author
Owner

@plattenschieber commented on GitHub (Jul 24, 2024):

Thanks. I updated my Ollama to 0.2.8. ollama3.1 runs now.

Could you close this issue then again @harnalashok ?

<!-- gh-comment-id:2247198332 --> @plattenschieber commented on GitHub (Jul 24, 2024): > Thanks. I updated my Ollama to 0.2.8. ollama3.1 runs now. Could you close this issue then again @harnalashok ?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#65716