[GH-ISSUE #7012] Error: no suitable llama servers found - ran out of tmpfs space #4443

Closed
opened 2026-04-12 15:22:28 -05:00 by GiteaMirror · 10 comments
Owner

Originally created by @gfkdliucheng on GitHub (Sep 28, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7012

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

aarch 64 it can pull the model but finally ‘Error: no suitable llama servers found’, here is the log:
Sep 28 08:43:15 orangepi5 ollama[2563639]: Couldn't find '/usr/share/ollama/.ollama/id_ed25519'. Generating new privat>
Sep 28 08:43:15 orangepi5 ollama[2563639]: Your new public key is:
Sep 28 08:43:15 orangepi5 ollama[2563639]: ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHeo8oQtpkwmLudISuFZEbMoDEUgI6w0aKmAIBw>
Sep 28 08:43:15 orangepi5 ollama[2563639]: 2024/09/28 08:43:15 routes.go:1153: INFO server config env="map[CUDA_VISIBL>
Sep 28 08:43:15 orangepi5 ollama[2563639]: time=2024-09-28T08:43:15.751+08:00 level=INFO source=images.go:753 msg="tot>
Sep 28 08:43:15 orangepi5 ollama[2563639]: time=2024-09-28T08:43:15.751+08:00 level=INFO source=images.go:760 msg="tot>
Sep 28 08:43:15 orangepi5 ollama[2563639]: time=2024-09-28T08:43:15.752+08:00 level=INFO source=routes.go:1200 msg="Li>
Sep 28 08:43:15 orangepi5 ollama[2563639]: time=2024-09-28T08:43:15.753+08:00 level=INFO source=common.go:135 msg="ext>
Sep 28 08:43:30 orangepi5 ollama[2563639]: time=2024-09-28T08:43:30.282+08:00 level=ERROR source=common.go:214 msg="fa>
Sep 28 08:43:30 orangepi5 ollama[2563639]: time=2024-09-28T08:43:30.689+08:00 level=INFO source=common.go:49 msg="Dyna>
Sep 28 08:43:30 orangepi5 ollama[2563639]: time=2024-09-28T08:43:30.689+08:00 level=INFO source=gpu.go:199 msg="lookin>
Sep 28 08:43:30 orangepi5 ollama[2563639]: time=2024-09-28T08:43:30.689+08:00 level=WARN source=gpu.go:669 msg="unable>
Sep 28 08:43:30 orangepi5 ollama[2563639]: time=2024-09-28T08:43:30.689+08:00 level=WARN source=gpu.go:669 msg="unable>
Sep 28 08:43:30 orangepi5 ollama[2563639]: time=2024-09-28T08:43:30.689+08:00 level=WARN source=gpu.go:669 msg="unable>
Sep 28 08:43:30 orangepi5 ollama[2563639]: time=2024-09-28T08:43:30.693+08:00 level=WARN source=gpu.go:669 msg="unable>
Sep 28 08:43:30 orangepi5 ollama[2563639]: time=2024-09-28T08:43:30.694+08:00 level=INFO source=gpu.go:347 msg="no com>
Sep 28 08:43:30 orangepi5 ollama[2563639]: time=2024-09-28T08:43:30.694+08:00 level=INFO source=types.go:107 msg="infe>
Sep 28 08:43:30 orangepi5 ollama[2563639]: [GIN] 2024/09/28 - 08:43:30 | 200 | 248.788µs | 127.0.0.1 | HEAD >
Sep 28 08:43:30 orangepi5 ollama[2563639]: [GIN] 2024/09/28 - 08:43:30 | 200 | 413.577µs | 127.0.0.1 | GET >
Sep 28 08:43:50 orangepi5 ollama[2563639]: [GIN] 2024/09/28 - 08:43:50 | 200 | 50.166µs | 127.0.0.1 | HEAD >
Sep 28 08:43:50 orangepi5 ollama[2563639]: [GIN] 2024/09/28 - 08:43:50 | 404 | 348.244µs | 127.0.0.1 | POST >
Sep 28 08:43:57 orangepi5 ollama[2563639]: time=2024-09-28T08:43:57.010+08:00 level=INFO source=download.go:175 msg="d>
Sep 28 08:45:49 orangepi5 ollama[2563639]: [GIN] 2024/09/28 - 08:45:49 | 200 | 1m59s | 127.0.0.1 | POST >
Sep 28 08:45:51 orangepi5 ollama[2563639]: [GIN] 2024/09/28 - 08:45:51 | 200 | 95.665µs | 127.0.0.1 | HEAD >
Sep 28 08:45:51 orangepi5 ollama[2563639]: [GIN] 2024/09/28 - 08:45:51 | 404 | 293.412µs | 127.0.0.1 | POST >
Sep 28 08:45:52 orangepi5 ollama[2563639]: time=2024-09-28T08:45:52.986+08:00 level=INFO source=download.go:175 msg="d>
Sep 28 08:45:59 orangepi5 ollama[2563639]: time=2024-09-28T08:45:59.746+08:00 level=INFO source=download.go:175 msg="d>
Sep 28 08:46:02 orangepi5 ollama[2563639]: time=2024-09-28T08:46:02.516+08:00 level=INFO source=download.go:175 msg="d>
Sep 28 08:46:05 orangepi5 ollama[2563639]: time=2024-09-28T08:46:05.227+08:00 level=INFO source=download.go:175 msg="d>
Sep 28 08:46:08 orangepi5 ollama[2563639]: time=2024-09-28T08:46:08.140+08:00 level=INFO source=download.go:175 msg="d>
Sep 28 08:46:12 orangepi5 ollama[2563639]: [GIN] 2024/09/28 - 08:46:12 | 200 | 21.331892482s | 127.0.0.1 | POST >
Sep 28 08:46:12 orangepi5 ollama[2563639]: [GIN] 2024/09/28 - 08:46:12 | 200 | 82.897373ms | 127.0.0.1 | POST >
Sep 28 08:46:12 orangepi5 ollama[2563639]: time=2024-09-28T08:46:12.855+08:00 level=INFO source=server.go:103 msg="sys>
Sep 28 08:46:12 orangepi5 ollama[2563639]: time=2024-09-28T08:46:12.856+08:00 level=INFO source=memory.go:326 msg="off>
Sep 28 08:46:26 orangepi5 ollama[2563639]: time=2024-09-28T08:46:26.830+08:00 level=ERROR source=common.go:214 msg="fa>
Sep 28 08:46:27 orangepi5 ollama[2563639]: time=2024-09-28T08:46:27.288+08:00 level=INFO source=sched.go:428 msg="NewL>
Sep 28 08:46:27 orangepi5 ollama[2563639]: [GIN] 2024/09/28 - 08:46:27 | 500 | 14.552434001s | 127.0.0.1 | POST >

OS

Linux

GPU

Other

CPU

Other

Ollama version

0.3.12

Originally created by @gfkdliucheng on GitHub (Sep 28, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7012 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? aarch 64 it can pull the model but finally ‘Error: no suitable llama servers found’, here is the log: Sep 28 08:43:15 orangepi5 ollama[2563639]: Couldn't find '/usr/share/ollama/.ollama/id_ed25519'. Generating new privat> Sep 28 08:43:15 orangepi5 ollama[2563639]: Your new public key is: Sep 28 08:43:15 orangepi5 ollama[2563639]: ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHeo8oQtpkwmLudISuFZEbMoDEUgI6w0aKmAIBw> Sep 28 08:43:15 orangepi5 ollama[2563639]: 2024/09/28 08:43:15 routes.go:1153: INFO server config env="map[CUDA_VISIBL> Sep 28 08:43:15 orangepi5 ollama[2563639]: time=2024-09-28T08:43:15.751+08:00 level=INFO source=images.go:753 msg="tot> Sep 28 08:43:15 orangepi5 ollama[2563639]: time=2024-09-28T08:43:15.751+08:00 level=INFO source=images.go:760 msg="tot> Sep 28 08:43:15 orangepi5 ollama[2563639]: time=2024-09-28T08:43:15.752+08:00 level=INFO source=routes.go:1200 msg="Li> Sep 28 08:43:15 orangepi5 ollama[2563639]: time=2024-09-28T08:43:15.753+08:00 level=INFO source=common.go:135 msg="ext> Sep 28 08:43:30 orangepi5 ollama[2563639]: time=2024-09-28T08:43:30.282+08:00 level=ERROR source=common.go:214 msg="fa> Sep 28 08:43:30 orangepi5 ollama[2563639]: time=2024-09-28T08:43:30.689+08:00 level=INFO source=common.go:49 msg="Dyna> Sep 28 08:43:30 orangepi5 ollama[2563639]: time=2024-09-28T08:43:30.689+08:00 level=INFO source=gpu.go:199 msg="lookin> Sep 28 08:43:30 orangepi5 ollama[2563639]: time=2024-09-28T08:43:30.689+08:00 level=WARN source=gpu.go:669 msg="unable> Sep 28 08:43:30 orangepi5 ollama[2563639]: time=2024-09-28T08:43:30.689+08:00 level=WARN source=gpu.go:669 msg="unable> Sep 28 08:43:30 orangepi5 ollama[2563639]: time=2024-09-28T08:43:30.689+08:00 level=WARN source=gpu.go:669 msg="unable> Sep 28 08:43:30 orangepi5 ollama[2563639]: time=2024-09-28T08:43:30.693+08:00 level=WARN source=gpu.go:669 msg="unable> Sep 28 08:43:30 orangepi5 ollama[2563639]: time=2024-09-28T08:43:30.694+08:00 level=INFO source=gpu.go:347 msg="no com> Sep 28 08:43:30 orangepi5 ollama[2563639]: time=2024-09-28T08:43:30.694+08:00 level=INFO source=types.go:107 msg="infe> Sep 28 08:43:30 orangepi5 ollama[2563639]: [GIN] 2024/09/28 - 08:43:30 | 200 | 248.788µs | 127.0.0.1 | HEAD > Sep 28 08:43:30 orangepi5 ollama[2563639]: [GIN] 2024/09/28 - 08:43:30 | 200 | 413.577µs | 127.0.0.1 | GET > Sep 28 08:43:50 orangepi5 ollama[2563639]: [GIN] 2024/09/28 - 08:43:50 | 200 | 50.166µs | 127.0.0.1 | HEAD > Sep 28 08:43:50 orangepi5 ollama[2563639]: [GIN] 2024/09/28 - 08:43:50 | 404 | 348.244µs | 127.0.0.1 | POST > Sep 28 08:43:57 orangepi5 ollama[2563639]: time=2024-09-28T08:43:57.010+08:00 level=INFO source=download.go:175 msg="d> Sep 28 08:45:49 orangepi5 ollama[2563639]: [GIN] 2024/09/28 - 08:45:49 | 200 | 1m59s | 127.0.0.1 | POST > Sep 28 08:45:51 orangepi5 ollama[2563639]: [GIN] 2024/09/28 - 08:45:51 | 200 | 95.665µs | 127.0.0.1 | HEAD > Sep 28 08:45:51 orangepi5 ollama[2563639]: [GIN] 2024/09/28 - 08:45:51 | 404 | 293.412µs | 127.0.0.1 | POST > Sep 28 08:45:52 orangepi5 ollama[2563639]: time=2024-09-28T08:45:52.986+08:00 level=INFO source=download.go:175 msg="d> Sep 28 08:45:59 orangepi5 ollama[2563639]: time=2024-09-28T08:45:59.746+08:00 level=INFO source=download.go:175 msg="d> Sep 28 08:46:02 orangepi5 ollama[2563639]: time=2024-09-28T08:46:02.516+08:00 level=INFO source=download.go:175 msg="d> Sep 28 08:46:05 orangepi5 ollama[2563639]: time=2024-09-28T08:46:05.227+08:00 level=INFO source=download.go:175 msg="d> Sep 28 08:46:08 orangepi5 ollama[2563639]: time=2024-09-28T08:46:08.140+08:00 level=INFO source=download.go:175 msg="d> Sep 28 08:46:12 orangepi5 ollama[2563639]: [GIN] 2024/09/28 - 08:46:12 | 200 | 21.331892482s | 127.0.0.1 | POST > Sep 28 08:46:12 orangepi5 ollama[2563639]: [GIN] 2024/09/28 - 08:46:12 | 200 | 82.897373ms | 127.0.0.1 | POST > Sep 28 08:46:12 orangepi5 ollama[2563639]: time=2024-09-28T08:46:12.855+08:00 level=INFO source=server.go:103 msg="sys> Sep 28 08:46:12 orangepi5 ollama[2563639]: time=2024-09-28T08:46:12.856+08:00 level=INFO source=memory.go:326 msg="off> Sep 28 08:46:26 orangepi5 ollama[2563639]: time=2024-09-28T08:46:26.830+08:00 level=ERROR source=common.go:214 msg="fa> Sep 28 08:46:27 orangepi5 ollama[2563639]: time=2024-09-28T08:46:27.288+08:00 level=INFO source=sched.go:428 msg="NewL> Sep 28 08:46:27 orangepi5 ollama[2563639]: [GIN] 2024/09/28 - 08:46:27 | 500 | 14.552434001s | 127.0.0.1 | POST > ### OS Linux ### GPU Other ### CPU Other ### Ollama version 0.3.12
GiteaMirror added the questionlinux labels 2026-04-12 15:22:28 -05:00
Author
Owner

@rick-github commented on GitHub (Sep 28, 2024):

This log is truncated, use journalctl -u ollama --no-pager.

<!-- gh-comment-id:2380341942 --> @rick-github commented on GitHub (Sep 28, 2024): This log is truncated, use `journalctl -u ollama --no-pager`.
Author
Owner

@gfkdliucheng commented on GitHub (Sep 28, 2024):

Sep 28 08:46:12 orangepi5 ollama[2563639]: [GIN] 2024/09/28 - 08:46:12 | 200 | 82.897373ms | 127.0.0.1 | POST "/api/show"
Sep 28 08:46:12 orangepi5 ollama[2563639]: time=2024-09-28T08:46:12.855+08:00 level=INFO source=server.go:103 msg="system memory" total="3.8 GiB" free="3.2 GiB" free_swap="1.7 GiB"
Sep 28 08:46:12 orangepi5 ollama[2563639]: time=2024-09-28T08:46:12.856+08:00 level=INFO source=memory.go:326 msg="offload to cpu" layers.requested=-1 layers.model=29 layers.offload=0 layers.split="" memory.available="[3.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="1.5 GiB" memory.required.partial="0 B" memory.required.kv="224.0 MiB" memory.required.allocations="[1.5 GiB]" memory.weights.total="976.1 MiB" memory.weights.repeating="793.5 MiB" memory.weights.nonrepeating="182.6 MiB" memory.graph.full="299.8 MiB" memory.graph.partial="482.3 MiB"
Sep 28 08:46:26 orangepi5 ollama[2563639]: time=2024-09-28T08:46:26.830+08:00 level=ERROR source=common.go:214 msg="failed to extract files" error="copy payload linux/arm64/cuda_v12/libggml.so: write /tmp/ollama2127264499/runners/cuda_v12/libggml.so: no space left on device"
Sep 28 08:46:27 orangepi5 ollama[2563639]: time=2024-09-28T08:46:27.288+08:00 level=INFO source=sched.go:428 msg="NewLlamaServer failed" model=/usr/share/ollama/.ollama/models/blobs/sha256-183715c435899236895da3869489cc30ac241476b4971a20285b1a462818a5b4 error="no suitable llama servers found"
Sep 28 08:46:27 orangepi5 ollama[2563639]: [GIN] 2024/09/28 - 08:46:27 | 500 | 14.552434001s | 127.0.0.1 | POST "/api/generate"

<!-- gh-comment-id:2380356479 --> @gfkdliucheng commented on GitHub (Sep 28, 2024): Sep 28 08:46:12 orangepi5 ollama[2563639]: [GIN] 2024/09/28 - 08:46:12 | 200 | 82.897373ms | 127.0.0.1 | POST "/api/show" Sep 28 08:46:12 orangepi5 ollama[2563639]: time=2024-09-28T08:46:12.855+08:00 level=INFO source=server.go:103 msg="system memory" total="3.8 GiB" free="3.2 GiB" free_swap="1.7 GiB" Sep 28 08:46:12 orangepi5 ollama[2563639]: time=2024-09-28T08:46:12.856+08:00 level=INFO source=memory.go:326 msg="offload to cpu" layers.requested=-1 layers.model=29 layers.offload=0 layers.split="" memory.available="[3.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="1.5 GiB" memory.required.partial="0 B" memory.required.kv="224.0 MiB" memory.required.allocations="[1.5 GiB]" memory.weights.total="976.1 MiB" memory.weights.repeating="793.5 MiB" memory.weights.nonrepeating="182.6 MiB" memory.graph.full="299.8 MiB" memory.graph.partial="482.3 MiB" Sep 28 08:46:26 orangepi5 ollama[2563639]: time=2024-09-28T08:46:26.830+08:00 level=ERROR source=common.go:214 msg="failed to extract files" error="copy payload linux/arm64/cuda_v12/libggml.so: write /tmp/ollama2127264499/runners/cuda_v12/libggml.so: no space left on device" Sep 28 08:46:27 orangepi5 ollama[2563639]: time=2024-09-28T08:46:27.288+08:00 level=INFO source=sched.go:428 msg="NewLlamaServer failed" model=/usr/share/ollama/.ollama/models/blobs/sha256-183715c435899236895da3869489cc30ac241476b4971a20285b1a462818a5b4 error="no suitable llama servers found" Sep 28 08:46:27 orangepi5 ollama[2563639]: [GIN] 2024/09/28 - 08:46:27 | 500 | 14.552434001s | 127.0.0.1 | POST "/api/generate"
Author
Owner

@rick-github commented on GitHub (Sep 28, 2024):

no space left on device

Your /tmp directory has insufficient free space. You can use OLLAMA_TMPDIR to tell ollama to use a different filesystem to unpack the files.

<!-- gh-comment-id:2380404887 --> @rick-github commented on GitHub (Sep 28, 2024): ``` no space left on device ``` Your `/tmp` directory has insufficient free space. You can use `OLLAMA_TMPDIR` to tell ollama to use a different filesystem to unpack the files.
Author
Owner

@0x164C9DFC commented on GitHub (Sep 29, 2024):

I have a similar error trying to run llama3.2 using Termux on a Samsung S24 Ultra.

~/ollama $ ./ollama serve &
[1] 14098
~/ollama $ 2024/09/29 09:32:43 routes.go:1153: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/data/data/com.termux/files/home/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2024-09-29T09:32:43.979Z level=INFO source=images.go:753 msg="total blobs: 5"
time=2024-09-29T09:32:43.979Z level=INFO source=images.go:760 msg="total unused blobs removed: 0"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.
[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.

  • using env: export GIN_MODE=release
  • using code: gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST /api/pull --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers)
[GIN-debug] POST /api/generate --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers)
[GIN-debug] POST /api/chat --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers) [GIN-debug] POST /api/embed --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers)
[GIN-debug] POST /api/embeddings --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers)
[GIN-debug] POST /api/create --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers)
[GIN-debug] POST /api/push --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers)
[GIN-debug] POST /api/copy --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers)
[GIN-debug] DELETE /api/delete --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers)
[GIN-debug] POST /api/show --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers)
[GIN-debug] POST /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers)
[GIN-debug] HEAD /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers)
[GIN-debug] GET /api/ps --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers) [GIN-debug] POST /v1/chat/completions --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers)
[GIN-debug] POST /v1/completions --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers)
[GIN-debug] POST /v1/embeddings --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers) [GIN-debug] GET /v1/models --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers)
[GIN-debug] GET /v1/models/:model --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers)
[GIN-debug] GET / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] GET /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] GET /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] HEAD /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] HEAD /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
time=2024-09-29T09:32:43.979Z level=INFO source=routes.go:1200 msg="Listening on 127.0.0.1:11434 (version 0.0.0)"
time=2024-09-29T09:32:43.981Z level=INFO source=common.go:135 msg="extracting embedded files" dir=/data/data/com.termux/files/usr/tmp/ollama1255798484/runners
time=2024-09-29T09:32:43.984Z level=ERROR source=common.go:214 msg="failed to extract files" error="decompress payload linux/arm64/cpu/ollama_llama_server.gz: EOF"
time=2024-09-29T09:32:43.984Z level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners=[]
time=2024-09-29T09:32:43.984Z level=INFO source=gpu.go:199 msg="looking for compatible GPUs"
time=2024-09-29T09:32:43.985Z level=WARN source=gpu.go:669 msg="unable to locate gpu dependency libraries"
time=2024-09-29T09:32:43.985Z level=WARN source=gpu.go:669 msg="unable to locate gpu dependency libraries"
time=2024-09-29T09:32:43.985Z level=WARN source=gpu.go:669 msg="unable to locate gpu dependency libraries"
time=2024-09-29T09:32:43.985Z level=WARN source=gpu.go:669 msg="unable to locate gpu dependency libraries"
time=2024-09-29T09:32:43.985Z level=INFO source=gpu.go:347 msg="no compatible GPUs were discovered"
time=2024-09-29T09:32:43.985Z level=INFO source=types.go:107 msg="inference compute" id=0 library=cpu variant="no vector extensions" compute="" driver=0.0 name="" total="10.8 GiB" available="4.3 GiB"

It looks as if there is an issue with one of the files it tries to extract.

When trying to run the model, it shows:

~/ollama $ ./ollama run llama3.2:1b-instruct-q5_K_M
[GIN] 2024/09/29 - 09:39:06 | 200 | 38.021µs | 127.0.0.1 | HEAD "/" [GIN] 2024/09/29 - 09:39:06 | 200 | 26.716615ms | 127.0.0.1 | POST "/api/show" time=2024-09-29T09:39:06.267Z level=INFO source=server.go:103 msg="system memory" total="10.8 GiB" free="4.7 GiB" free_swap="6.1 GiB"
⠋ time=2024-09-29T09:39:06.274Z level=INFO source=memory.go:326 msg="offload to cpu" layers.requested=-1 layers.model=17 layers.offload=0 layers.split="" memory.available="[4.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="1.7 GiB" memory.required.partial="0 B" memory.required.kv="256.0 MiB" memory.required.allocations="[1.7 GiB]" memory.weights.total="912.3 MiB" memory.weights.repeating="706.8 MiB" memory.weights.nonrepeating="205.5 MiB" memory.graph.full="544.0 MiB" memory.graph.partial="554.3 MiB" time=2024-09-29T09:39:06.275Z level=ERROR source=common.go:214 msg="failed to extract files" error="decompress payload linux/arm64/cpu/ollama_llama_server.gz: EOF" time=2024-09-29T09:39:06.275Z level=INFO source=sched.go:428 msg="NewLlamaServer failed" model=/data/data/com.termux/files/home/.ollama/models/blobs/sha256-a1d443469dbae8c1aa29fc52db681e65d58bd521872c00d410a70670f852191b error="no suitable llama servers found" [GIN] 2024/09/29 - 09:39:06 | 500 | 102.73526ms | 127.0.0.1 | POST "/api/generate"
Error: no suitable llama servers found

Ollama was compiled from source earlier this morning.

<!-- gh-comment-id:2381289068 --> @0x164C9DFC commented on GitHub (Sep 29, 2024): I have a similar error trying to run llama3.2 using Termux on a Samsung S24 Ultra. ~/ollama $ ./ollama serve & [1] 14098 ~/ollama $ 2024/09/29 09:32:43 routes.go:1153: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/data/data/com.termux/files/home/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2024-09-29T09:32:43.979Z level=INFO source=images.go:753 msg="total blobs: 5" time=2024-09-29T09:32:43.979Z level=INFO source=images.go:760 msg="total unused blobs removed: 0" [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached. [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production. - using env: export GIN_MODE=release - using code: gin.SetMode(gin.ReleaseMode) [GIN-debug] POST /api/pull --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers) [GIN-debug] POST /api/generate --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers) [GIN-debug] POST /api/chat --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers) [GIN-debug] POST /api/embed --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers) [GIN-debug] POST /api/embeddings --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers) [GIN-debug] POST /api/create --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers) [GIN-debug] POST /api/push --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers) [GIN-debug] POST /api/copy --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers) [GIN-debug] DELETE /api/delete --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers) [GIN-debug] POST /api/show --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers) [GIN-debug] POST /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers) [GIN-debug] HEAD /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers) [GIN-debug] GET /api/ps --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers) [GIN-debug] POST /v1/chat/completions --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers) [GIN-debug] POST /v1/completions --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers) [GIN-debug] POST /v1/embeddings --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers) [GIN-debug] GET /v1/models --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers) [GIN-debug] GET /v1/models/:model --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers) [GIN-debug] GET / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers) [GIN-debug] GET /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers) [GIN-debug] GET /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) [GIN-debug] HEAD / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers) [GIN-debug] HEAD /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers) [GIN-debug] HEAD /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) time=2024-09-29T09:32:43.979Z level=INFO source=routes.go:1200 msg="Listening on 127.0.0.1:11434 (version 0.0.0)" time=2024-09-29T09:32:43.981Z level=INFO source=common.go:135 msg="extracting embedded files" dir=/data/data/com.termux/files/usr/tmp/ollama1255798484/runners time=2024-09-29T09:32:43.984Z level=ERROR source=common.go:214 msg="failed to extract files" error="decompress payload linux/arm64/cpu/ollama_llama_server.gz: EOF" time=2024-09-29T09:32:43.984Z level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners=[] time=2024-09-29T09:32:43.984Z level=INFO source=gpu.go:199 msg="looking for compatible GPUs" time=2024-09-29T09:32:43.985Z level=WARN source=gpu.go:669 msg="unable to locate gpu dependency libraries" time=2024-09-29T09:32:43.985Z level=WARN source=gpu.go:669 msg="unable to locate gpu dependency libraries" time=2024-09-29T09:32:43.985Z level=WARN source=gpu.go:669 msg="unable to locate gpu dependency libraries" time=2024-09-29T09:32:43.985Z level=WARN source=gpu.go:669 msg="unable to locate gpu dependency libraries" time=2024-09-29T09:32:43.985Z level=INFO source=gpu.go:347 msg="no compatible GPUs were discovered" time=2024-09-29T09:32:43.985Z level=INFO source=types.go:107 msg="inference compute" id=0 library=cpu variant="no vector extensions" compute="" driver=0.0 name="" total="10.8 GiB" available="4.3 GiB" It looks as if there is an issue with one of the files it tries to extract. When trying to run the model, it shows: ~/ollama $ ./ollama run llama3.2:1b-instruct-q5_K_M [GIN] 2024/09/29 - 09:39:06 | 200 | 38.021µs | 127.0.0.1 | HEAD "/" [GIN] 2024/09/29 - 09:39:06 | 200 | 26.716615ms | 127.0.0.1 | POST "/api/show" time=2024-09-29T09:39:06.267Z level=INFO source=server.go:103 msg="system memory" total="10.8 GiB" free="4.7 GiB" free_swap="6.1 GiB" ⠋ time=2024-09-29T09:39:06.274Z level=INFO source=memory.go:326 msg="offload to cpu" layers.requested=-1 layers.model=17 layers.offload=0 layers.split="" memory.available="[4.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="1.7 GiB" memory.required.partial="0 B" memory.required.kv="256.0 MiB" memory.required.allocations="[1.7 GiB]" memory.weights.total="912.3 MiB" memory.weights.repeating="706.8 MiB" memory.weights.nonrepeating="205.5 MiB" memory.graph.full="544.0 MiB" memory.graph.partial="554.3 MiB" time=2024-09-29T09:39:06.275Z level=ERROR source=common.go:214 msg="failed to extract files" error="decompress payload linux/arm64/cpu/ollama_llama_server.gz: EOF" time=2024-09-29T09:39:06.275Z level=INFO source=sched.go:428 msg="NewLlamaServer failed" model=/data/data/com.termux/files/home/.ollama/models/blobs/sha256-a1d443469dbae8c1aa29fc52db681e65d58bd521872c00d410a70670f852191b error="no suitable llama servers found" [GIN] 2024/09/29 - 09:39:06 | 500 | 102.73526ms | 127.0.0.1 | POST "/api/generate" Error: no suitable llama servers found Ollama was compiled from source earlier this morning.
Author
Owner

@rick-github commented on GitHub (Sep 29, 2024):

Your /data/data/com.termux/files/usr/tmp directory has insufficient free space. You can use OLLAMA_TMPDIR to tell ollama to use a different filesystem to unpack the files.

It may also be the case that termux ran out of disk space when compiling and the file linux/arm64/cpu/ollama_llama_server.gz has been truncated.

<!-- gh-comment-id:2381329595 --> @rick-github commented on GitHub (Sep 29, 2024): Your `/data/data/com.termux/files/usr/tmp` directory has insufficient free space. You can use OLLAMA_TMPDIR to tell ollama to use a different filesystem to unpack the files. It may also be the case that termux ran out of disk space when compiling and the file `linux/arm64/cpu/ollama_llama_server.gz` has been truncated.
Author
Owner

@0x164C9DFC commented on GitHub (Sep 29, 2024):

The phone has 512GB of storage. It would be suspicious if it ran out of space.
/data/data/com.termux/files/usr/tmp has 425GB available.

<!-- gh-comment-id:2381417486 --> @0x164C9DFC commented on GitHub (Sep 29, 2024): The phone has 512GB of storage. It would be suspicious if it ran out of space. /data/data/com.termux/files/usr/tmp has 425GB available.
Author
Owner

@dhiltgen commented on GitHub (Sep 30, 2024):

@0x2E16CF0F you may have run out of space during go generate ./... You can try tar tzf ./build/linux/arm64/cpu/ollama_llama_server.gz (in the ollama source directory) to see if the file was truncated during build.

<!-- gh-comment-id:2384074317 --> @dhiltgen commented on GitHub (Sep 30, 2024): @0x2E16CF0F you may have run out of space during `go generate ./...` You can try `tar tzf ./build/linux/arm64/cpu/ollama_llama_server.gz` (in the ollama source directory) to see if the file was truncated during build.
Author
Owner

@0x164C9DFC commented on GitHub (Oct 4, 2024):

@dhiltgen Yes. The files has been truncated.
When I tried installing it with curl -fsSL https://ollama.com/ollama.sh | sh the same happens.
However, I am tempted to believe it has more to do with Termux or a combination of Termux with a specific phone model (S24 Ultra) that it does with Ollama.
I tried the default installer after deploying Opensuse Tumbleweed using proot-distro and the problem no longer occurs.
image
image

<!-- gh-comment-id:2393385172 --> @0x164C9DFC commented on GitHub (Oct 4, 2024): @dhiltgen Yes. The files has been truncated. When I tried installing it with `curl -fsSL https://ollama.com/ollama.sh | sh` the same happens. However, I am tempted to believe it has more to do with Termux or a combination of Termux with a specific phone model (S24 Ultra) that it does with Ollama. I tried the default installer after deploying Opensuse Tumbleweed using proot-distro and the problem no longer occurs. ![image](https://github.com/user-attachments/assets/4768b7d3-145e-410e-a1d3-ddc8e167d729) ![image](https://github.com/user-attachments/assets/d9d2e966-4bcd-470a-bdc7-c188716be039)
Author
Owner

@dhiltgen commented on GitHub (Oct 17, 2024):

@0x2E16CF0F if the build or download files are being truncated due to insufficient filesystem space, Ollama isn't going to work properly. Mobile devices aren't officially supported, so the install script might make mistakes in how it's trying to set things up. I'd suggest you may have better luck using the manual install instructions, and setting OLLAMA_TMPDIR and OLLAMA_MODELS to filesystem paths that you know have sufficient space.

<!-- gh-comment-id:2420140532 --> @dhiltgen commented on GitHub (Oct 17, 2024): @0x2E16CF0F if the build or download files are being truncated due to insufficient filesystem space, Ollama isn't going to work properly. Mobile devices aren't officially supported, so the install script might make mistakes in how it's trying to set things up. I'd suggest you may have better luck using the [manual install instructions](https://github.com/ollama/ollama/blob/main/docs/linux.md#manual-install), and setting OLLAMA_TMPDIR and OLLAMA_MODELS to filesystem paths that you know have sufficient space.
Author
Owner

@tristan-k commented on GitHub (Oct 26, 2024):

I did experience the same issue on a Samsung S23. For me it was proably related to the termux Play Store version. After downgrading to the F-Droid Version 0.118.1 (1000) and specifiying the ollama version with git clone --branch v0.3.14 --depth 1 https://github.com/ollama/ollama.git the issue went away.

<!-- gh-comment-id:2439542652 --> @tristan-k commented on GitHub (Oct 26, 2024): I did experience the same issue on a Samsung S23. For me it was proably related to the termux Play Store version. After downgrading to the F-Droid `Version 0.118.1 (1000)` and specifiying the ollama version with `git clone --branch v0.3.14 --depth 1 https://github.com/ollama/ollama.git` the issue went away.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#4443