[GH-ISSUE #6044] Illegal istruction in ollama_llama_server runner #65820

Closed
opened 2026-05-03 22:48:36 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @SnowyCoder on GitHub (Jul 29, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6044

What is the issue?

I tried to run llama3 model with ollama.
Reproduction (with my CPU: AMD Ryzen 7 7735HS)

  1. Start server: ollama serve
  2. Try to run llama3: `ollama run llama3
    (The same error occurs with llama3.1)

The server will execute a new runner with the following arguments:
/tmp/ollama996131774/runners/cpu/ollama_llama_server --model /home/snowy/.ollama/models/blobs/ sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 8192 --batch-size 512 --embedding --log-disable --no-mmap --parallel 4 --port 42857

And the model will fail with the following errors:

llama runner process has terminated: signal: illegal instruction (core dumped)

When opening the runner with GDB, the program crashes when trying to run the following instruction: vmovupd %zmm0,0x13(%rax) (ggml_init+775) that requires AVX512F feature flag.

Complete log:

❯ ollama serve
2024/07/29 12:09:49 routes.go:1099: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/snowy/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-07-29T12:09:49.854+02:00 level=INFO source=images.go:784 msg="total blobs: 10"
time=2024-07-29T12:09:49.855+02:00 level=INFO source=images.go:791 msg="total unused blobs removed: 0"
time=2024-07-29T12:09:49.855+02:00 level=INFO source=routes.go:1146 msg="Listening on 127.0.0.1:11434 (version 0.3.0)"
time=2024-07-29T12:09:49.855+02:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama996131774/runners
time=2024-07-29T12:09:57.575+02:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu rocm]"
time=2024-07-29T12:09:57.575+02:00 level=INFO source=gpu.go:205 msg="looking for compatible GPUs"
time=2024-07-29T12:09:57.657+02:00 level=WARN source=amd_linux.go:58 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-07-29T12:09:57.662+02:00 level=WARN source=amd_linux.go:325 msg="amdgpu is not supported" gpu=0 gpu_type=gfx1035 library=/opt/rocm/lib supported_types="[gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]"
time=2024-07-29T12:09:57.662+02:00 level=WARN source=amd_linux.go:327 msg="See https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides for HSA_OVERRIDE_GFX_VERSION usage"
time=2024-07-29T12:09:57.662+02:00 level=INFO source=amd_linux.go:345 msg="no compatible amdgpu devices detected"
time=2024-07-29T12:09:57.662+02:00 level=INFO source=gpu.go:346 msg="no compatible GPUs were discovered"
time=2024-07-29T12:09:57.662+02:00 level=INFO source=types.go:105 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="13.3 GiB" available="9.1 GiB"
[GIN] 2024/07/29 - 12:09:57 | 200 |      66.764µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/07/29 - 12:09:57 | 200 |   41.795654ms |       127.0.0.1 | POST     "/api/show"
time=2024-07-29T12:09:57.790+02:00 level=INFO source=memory.go:309 msg="offload to cpu" layers.requested=-1 layers.model=33 layers.offload=0 layers.split="" memory.available="[9.1 GiB]" memory.required.full="5.8 GiB" memory.required.partial="0 B" memory.required.kv="1.0 GiB" memory.required.allocations="[5.8 GiB]" memory.weights.total="4.7 GiB" memory.weights.repeating="4.3 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="560.0 MiB" memory.graph.partial="677.5 MiB"
time=2024-07-29T12:09:57.792+02:00 level=INFO source=server.go:383 msg="starting llama server" cmd="/tmp/ollama996131774/runners/cpu/ollama_llama_server --model /home/snowy/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 8192 --batch-size 512 --embedding --log-disable --no-mmap --parallel 4 --port 42857"
time=2024-07-29T12:09:57.793+02:00 level=INFO source=sched.go:437 msg="loaded runners" count=1
time=2024-07-29T12:09:57.793+02:00 level=INFO source=server.go:583 msg="waiting for llama runner to start responding"
time=2024-07-29T12:09:57.794+02:00 level=INFO source=server.go:617 msg="waiting for server to become available" status="llm server error"
time=2024-07-29T12:09:58.044+02:00 level=ERROR source=sched.go:443 msg="error loading llama server" error="llama runner process has terminated: signal: illegal instruction (core dumped)"
[GIN] 2024/07/29 - 12:09:58 | 500 |  338.214098ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2024/07/29 - 12:18:18 | 200 |       63.63µs |       127.0.0.1 | GET      "/api/version"

coredump.zip
cpuinfo.txt

OS

Linux

GPU

AMD

CPU

AMD

Ollama version

0.3.0

Originally created by @SnowyCoder on GitHub (Jul 29, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6044 ### What is the issue? I tried to run llama3 model with ollama. Reproduction (with my CPU: `AMD Ryzen 7 7735HS`) 1. Start server: `ollama serve` 2. Try to run llama3: `ollama run llama3 (The same error occurs with llama3.1) The server will execute a new runner with the following arguments: ```/tmp/ollama996131774/runners/cpu/ollama_llama_server --model /home/snowy/.ollama/models/blobs/ sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 8192 --batch-size 512 --embedding --log-disable --no-mmap --parallel 4 --port 42857``` And the model will fail with the following errors: ``` llama runner process has terminated: signal: illegal instruction (core dumped) ``` When opening the runner with GDB, the program crashes when trying to run the following instruction: `vmovupd %zmm0,0x13(%rax)` (ggml_init+775) that requires AVX512F feature flag. Complete log: ``` ❯ ollama serve 2024/07/29 12:09:49 routes.go:1099: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/snowy/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" time=2024-07-29T12:09:49.854+02:00 level=INFO source=images.go:784 msg="total blobs: 10" time=2024-07-29T12:09:49.855+02:00 level=INFO source=images.go:791 msg="total unused blobs removed: 0" time=2024-07-29T12:09:49.855+02:00 level=INFO source=routes.go:1146 msg="Listening on 127.0.0.1:11434 (version 0.3.0)" time=2024-07-29T12:09:49.855+02:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama996131774/runners time=2024-07-29T12:09:57.575+02:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu rocm]" time=2024-07-29T12:09:57.575+02:00 level=INFO source=gpu.go:205 msg="looking for compatible GPUs" time=2024-07-29T12:09:57.657+02:00 level=WARN source=amd_linux.go:58 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-07-29T12:09:57.662+02:00 level=WARN source=amd_linux.go:325 msg="amdgpu is not supported" gpu=0 gpu_type=gfx1035 library=/opt/rocm/lib supported_types="[gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]" time=2024-07-29T12:09:57.662+02:00 level=WARN source=amd_linux.go:327 msg="See https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides for HSA_OVERRIDE_GFX_VERSION usage" time=2024-07-29T12:09:57.662+02:00 level=INFO source=amd_linux.go:345 msg="no compatible amdgpu devices detected" time=2024-07-29T12:09:57.662+02:00 level=INFO source=gpu.go:346 msg="no compatible GPUs were discovered" time=2024-07-29T12:09:57.662+02:00 level=INFO source=types.go:105 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="13.3 GiB" available="9.1 GiB" [GIN] 2024/07/29 - 12:09:57 | 200 | 66.764µs | 127.0.0.1 | HEAD "/" [GIN] 2024/07/29 - 12:09:57 | 200 | 41.795654ms | 127.0.0.1 | POST "/api/show" time=2024-07-29T12:09:57.790+02:00 level=INFO source=memory.go:309 msg="offload to cpu" layers.requested=-1 layers.model=33 layers.offload=0 layers.split="" memory.available="[9.1 GiB]" memory.required.full="5.8 GiB" memory.required.partial="0 B" memory.required.kv="1.0 GiB" memory.required.allocations="[5.8 GiB]" memory.weights.total="4.7 GiB" memory.weights.repeating="4.3 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="560.0 MiB" memory.graph.partial="677.5 MiB" time=2024-07-29T12:09:57.792+02:00 level=INFO source=server.go:383 msg="starting llama server" cmd="/tmp/ollama996131774/runners/cpu/ollama_llama_server --model /home/snowy/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 8192 --batch-size 512 --embedding --log-disable --no-mmap --parallel 4 --port 42857" time=2024-07-29T12:09:57.793+02:00 level=INFO source=sched.go:437 msg="loaded runners" count=1 time=2024-07-29T12:09:57.793+02:00 level=INFO source=server.go:583 msg="waiting for llama runner to start responding" time=2024-07-29T12:09:57.794+02:00 level=INFO source=server.go:617 msg="waiting for server to become available" status="llm server error" time=2024-07-29T12:09:58.044+02:00 level=ERROR source=sched.go:443 msg="error loading llama server" error="llama runner process has terminated: signal: illegal instruction (core dumped)" [GIN] 2024/07/29 - 12:09:58 | 500 | 338.214098ms | 127.0.0.1 | POST "/api/chat" [GIN] 2024/07/29 - 12:18:18 | 200 | 63.63µs | 127.0.0.1 | GET "/api/version" ``` [coredump.zip](https://github.com/user-attachments/files/16411350/coredump.zip) [cpuinfo.txt](https://github.com/user-attachments/files/16411356/cpuinfo.txt) ### OS Linux ### GPU AMD ### CPU AMD ### Ollama version 0.3.0
GiteaMirror added the bug label 2026-05-03 22:48:36 -05:00
Author
Owner

@rick-github commented on GitHub (Jul 29, 2024):

https://github.com/ggerganov/llama.cpp/issues/8734

<!-- gh-comment-id:2255596125 --> @rick-github commented on GitHub (Jul 29, 2024): https://github.com/ggerganov/llama.cpp/issues/8734
Author
Owner

@SnowyCoder commented on GitHub (Jul 29, 2024):

Thanks, I'll add details in that issue

<!-- gh-comment-id:2255599102 --> @SnowyCoder commented on GitHub (Jul 29, 2024): Thanks, I'll add details in that issue
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#65820