[GH-ISSUE #3035] AMD / ROCM failure on 0.1.29 #48376

Closed
opened 2026-04-28 07:58:29 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @stephensrmmartin on GitHub (Mar 10, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3035

Originally assigned to: @dhiltgen on GitHub.

Config

Environment: Arch Linux; no docker
GPU: Radeon 6700xt
Kernel: 6.6.21-1-lts
Env vars: "HSA_OVERRIDE_GFX_VERSION=10.3.0" "OLLAMA_DEBUG=1"

Summary

ollama run mistral

Error: write payload llama.cpp/build/linux/x86_64/rocm/lib/libext_server.so.gz: open /tmp/ollama93856906/rocm/libext_server.so: permission denied
## journalctl -b --unit ollama

Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.789-08:00 level=INFO source=images.go:806 msg="total blobs: 33"
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.790-08:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0"
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
Mar 09 21:00:13 hwkiller-desktop ollama[98459]:  - using env:        export GIN_MODE=release
Mar 09 21:00:13 hwkiller-desktop ollama[98459]:  - using code:        gin.SetMode(gin.ReleaseMode)
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] POST   /api/pull                 --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers)
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] POST   /api/generate             --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers)
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] POST   /api/chat                 --> github.com/jmorganca/ollama/server.ChatHandler (5 handlers)
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] POST   /api/embeddings           --> github.com/jmorganca/ollama/server.EmbeddingsHandler (5 handlers)
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] POST   /api/create               --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers)
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] POST   /api/push                 --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers)
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] POST   /api/copy                 --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers)
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] DELETE /api/delete               --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers)
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] POST   /api/show                 --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers)
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] POST   /api/blobs/:digest        --> github.com/jmorganca/ollama/server.CreateBlobHandler (5 handlers)
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] HEAD   /api/blobs/:digest        --> github.com/jmorganca/ollama/server.HeadBlobHandler (5 handlers)
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] POST   /v1/chat/completions      --> github.com/jmorganca/ollama/server.ChatHandler (6 handlers)
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] GET    /                         --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] GET    /api/tags                 --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] GET    /api/version              --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)

Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.790-08:00 level=INFO source=routes.go:1082 msg="Listening on [::]:11434 (version 0.0.0)"
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.790-08:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama1019928670 ..."
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.889-08:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [cpu rocm cpu_avx2 cpu_avx]"
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.889-08:00 level=DEBUG source=payload_common.go:140 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.889-08:00 level=INFO source=gpu.go:77 msg="Detecting GPU type"
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.889-08:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.889-08:00 level=DEBUG source=gpu.go:209 msg="gpu management search paths: [/usr/local/cuda/lib64/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/x86_6>
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.894-08:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []"
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.894-08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.894-08:00 level=WARN source=amd_linux.go:50 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat >
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.894-08:00 level=INFO source=amd_linux.go:85 msg="detected amdgpu versions [gfx1031]"
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.894-08:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /tmp/ollama1019928670/rocm"
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.894-08:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/share/ollama/lib/rocm"
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.894-08:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /var/lib/ollama"
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.894-08:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.896-08:00 level=DEBUG source=amd_linux.go:268 msg="host rocm linked /opt/rocm/lib => /tmp/ollama1019928670/rocm"
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.896-08:00 level=DEBUG source=amd_linux.go:120 msg="skipping rocm gfx compatibility check with HSA_OVERRIDE_GFX_VERSION=10.3.0"
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.897-08:00 level=DEBUG source=amd_linux.go:168 msg="discovering amdgpu devices [1]"
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.897-08:00 level=INFO source=amd_linux.go:235 msg="[1] amdgpu totalMemory 12868124672"
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.897-08:00 level=INFO source=amd_linux.go:236 msg="[1] amdgpu freeMemory  12868124672"
Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.897-08:00 level=DEBUG source=gpu.go:180 msg="rocm detected 1 devices with 11044M available memory"

Mar 09 21:00:15 hwkiller-desktop ollama[98459]: [GIN] 2024/03/09 - 21:00:15 | 200 |       25.83µs |       127.0.0.1 | HEAD     "/"
Mar 09 21:00:15 hwkiller-desktop ollama[98459]: [GIN] 2024/03/09 - 21:00:15 | 200 |     299.559µs |       127.0.0.1 | POST     "/api/show"
Mar 09 21:00:15 hwkiller-desktop ollama[98459]: [GIN] 2024/03/09 - 21:00:15 | 200 |     163.959µs |       127.0.0.1 | POST     "/api/show"
Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=WARN source=amd_linux.go:50 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat >
Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=INFO source=amd_linux.go:85 msg="detected amdgpu versions [gfx1031]"
Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /tmp/ollama1019928670/rocm"
Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=DEBUG source=amd_linux.go:120 msg="skipping rocm gfx compatibility check with HSA_OVERRIDE_GFX_VERSION=10.3.0"
Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=DEBUG source=amd_linux.go:168 msg="discovering amdgpu devices [1]"
Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=INFO source=amd_linux.go:235 msg="[1] amdgpu totalMemory 12868124672"
Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=INFO source=amd_linux.go:236 msg="[1] amdgpu freeMemory  12868124672"
Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=DEBUG source=gpu.go:180 msg="rocm detected 1 devices with 11044M available memory"
Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=WARN source=amd_linux.go:50 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat >
Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=INFO source=amd_linux.go:85 msg="detected amdgpu versions [gfx1031]"
Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /tmp/ollama1019928670/rocm"
Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=DEBUG source=amd_linux.go:120 msg="skipping rocm gfx compatibility check with HSA_OVERRIDE_GFX_VERSION=10.3.0"
Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=DEBUG source=amd_linux.go:168 msg="discovering amdgpu devices [1]"
Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=INFO source=amd_linux.go:235 msg="[1] amdgpu totalMemory 12868124672"
Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=INFO source=amd_linux.go:236 msg="[1] amdgpu freeMemory  12868124672"
Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.157-08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.157-08:00 level=DEBUG source=payload_common.go:93 msg="ordered list of LLM libraries to try [/tmp/ollama1019928670/rocm/libext_server.so /tmp/ollama1019928670/cpu_avx2/libext_server.so]"
Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.157-08:00 level=INFO source=llm.go:157 msg="/tmp/ollama1019928670/rocm/libext_server.so has disappeared, reloading libraries"
Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.157-08:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama1019928670 ..."
Mar 09 21:00:16 hwkiller-desktop ollama[98459]: [GIN] 2024/03/09 - 21:00:16 | 500 |  237.588294ms |       127.0.0.1 | POST     "/api/chat"

Notes

Interestingly, I had this exact same problem on 0.1.27, 0.1.28 also, and was what triggered me to look at this. Then v0.1.29 came out, and in the process of bisecting and cloning all new repos, v0.1.27/28 started working (I am legitimately confused about why, since I was getting a clean clone each time).
I notice that my /tmp has several build objects; perhaps the builds were, at some point, using these [presumably wrong] objects?

image

Originally created by @stephensrmmartin on GitHub (Mar 10, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3035 Originally assigned to: @dhiltgen on GitHub. # Config **Environment**: Arch Linux; no docker **GPU**: Radeon 6700xt **Kernel**: 6.6.21-1-lts **Env vars**: "HSA_OVERRIDE_GFX_VERSION=10.3.0" "OLLAMA_DEBUG=1" # Summary ``` ollama run mistral Error: write payload llama.cpp/build/linux/x86_64/rocm/lib/libext_server.so.gz: open /tmp/ollama93856906/rocm/libext_server.so: permission denied ``` ``` ## journalctl -b --unit ollama Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.789-08:00 level=INFO source=images.go:806 msg="total blobs: 33" Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.790-08:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0" Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached. Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production. Mar 09 21:00:13 hwkiller-desktop ollama[98459]: - using env: export GIN_MODE=release Mar 09 21:00:13 hwkiller-desktop ollama[98459]: - using code: gin.SetMode(gin.ReleaseMode) Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] POST /api/pull --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers) Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] POST /api/generate --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers) Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] POST /api/chat --> github.com/jmorganca/ollama/server.ChatHandler (5 handlers) Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] POST /api/embeddings --> github.com/jmorganca/ollama/server.EmbeddingsHandler (5 handlers) Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] POST /api/create --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers) Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] POST /api/push --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers) Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] POST /api/copy --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers) Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] DELETE /api/delete --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers) Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] POST /api/show --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers) Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] POST /api/blobs/:digest --> github.com/jmorganca/ollama/server.CreateBlobHandler (5 handlers) Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] HEAD /api/blobs/:digest --> github.com/jmorganca/ollama/server.HeadBlobHandler (5 handlers) Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] POST /v1/chat/completions --> github.com/jmorganca/ollama/server.ChatHandler (6 handlers) Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] GET / --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers) Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] GET /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers) Mar 09 21:00:13 hwkiller-desktop ollama[98459]: [GIN-debug] GET /api/version --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.790-08:00 level=INFO source=routes.go:1082 msg="Listening on [::]:11434 (version 0.0.0)" Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.790-08:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama1019928670 ..." Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.889-08:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [cpu rocm cpu_avx2 cpu_avx]" Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.889-08:00 level=DEBUG source=payload_common.go:140 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.889-08:00 level=INFO source=gpu.go:77 msg="Detecting GPU type" Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.889-08:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.889-08:00 level=DEBUG source=gpu.go:209 msg="gpu management search paths: [/usr/local/cuda/lib64/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/x86_6> Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.894-08:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []" Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.894-08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.894-08:00 level=WARN source=amd_linux.go:50 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat > Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.894-08:00 level=INFO source=amd_linux.go:85 msg="detected amdgpu versions [gfx1031]" Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.894-08:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /tmp/ollama1019928670/rocm" Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.894-08:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/share/ollama/lib/rocm" Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.894-08:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /var/lib/ollama" Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.894-08:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.896-08:00 level=DEBUG source=amd_linux.go:268 msg="host rocm linked /opt/rocm/lib => /tmp/ollama1019928670/rocm" Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.896-08:00 level=DEBUG source=amd_linux.go:120 msg="skipping rocm gfx compatibility check with HSA_OVERRIDE_GFX_VERSION=10.3.0" Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.897-08:00 level=DEBUG source=amd_linux.go:168 msg="discovering amdgpu devices [1]" Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.897-08:00 level=INFO source=amd_linux.go:235 msg="[1] amdgpu totalMemory 12868124672" Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.897-08:00 level=INFO source=amd_linux.go:236 msg="[1] amdgpu freeMemory 12868124672" Mar 09 21:00:13 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:13.897-08:00 level=DEBUG source=gpu.go:180 msg="rocm detected 1 devices with 11044M available memory" Mar 09 21:00:15 hwkiller-desktop ollama[98459]: [GIN] 2024/03/09 - 21:00:15 | 200 | 25.83µs | 127.0.0.1 | HEAD "/" Mar 09 21:00:15 hwkiller-desktop ollama[98459]: [GIN] 2024/03/09 - 21:00:15 | 200 | 299.559µs | 127.0.0.1 | POST "/api/show" Mar 09 21:00:15 hwkiller-desktop ollama[98459]: [GIN] 2024/03/09 - 21:00:15 | 200 | 163.959µs | 127.0.0.1 | POST "/api/show" Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=WARN source=amd_linux.go:50 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat > Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=INFO source=amd_linux.go:85 msg="detected amdgpu versions [gfx1031]" Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /tmp/ollama1019928670/rocm" Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=DEBUG source=amd_linux.go:120 msg="skipping rocm gfx compatibility check with HSA_OVERRIDE_GFX_VERSION=10.3.0" Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=DEBUG source=amd_linux.go:168 msg="discovering amdgpu devices [1]" Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=INFO source=amd_linux.go:235 msg="[1] amdgpu totalMemory 12868124672" Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=INFO source=amd_linux.go:236 msg="[1] amdgpu freeMemory 12868124672" Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=DEBUG source=gpu.go:180 msg="rocm detected 1 devices with 11044M available memory" Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=WARN source=amd_linux.go:50 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat > Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=INFO source=amd_linux.go:85 msg="detected amdgpu versions [gfx1031]" Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /tmp/ollama1019928670/rocm" Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=DEBUG source=amd_linux.go:120 msg="skipping rocm gfx compatibility check with HSA_OVERRIDE_GFX_VERSION=10.3.0" Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=DEBUG source=amd_linux.go:168 msg="discovering amdgpu devices [1]" Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=INFO source=amd_linux.go:235 msg="[1] amdgpu totalMemory 12868124672" Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.156-08:00 level=INFO source=amd_linux.go:236 msg="[1] amdgpu freeMemory 12868124672" Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.157-08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.157-08:00 level=DEBUG source=payload_common.go:93 msg="ordered list of LLM libraries to try [/tmp/ollama1019928670/rocm/libext_server.so /tmp/ollama1019928670/cpu_avx2/libext_server.so]" Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.157-08:00 level=INFO source=llm.go:157 msg="/tmp/ollama1019928670/rocm/libext_server.so has disappeared, reloading libraries" Mar 09 21:00:16 hwkiller-desktop ollama[98459]: time=2024-03-09T21:00:16.157-08:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama1019928670 ..." Mar 09 21:00:16 hwkiller-desktop ollama[98459]: [GIN] 2024/03/09 - 21:00:16 | 500 | 237.588294ms | 127.0.0.1 | POST "/api/chat" ``` # Notes Interestingly, I had this exact same problem on 0.1.27, 0.1.28 also, and was what triggered me to look at this. Then v0.1.29 came out, and in the process of bisecting and cloning all new repos, v0.1.27/28 started working (I am legitimately confused about why, since I was getting a clean clone each time). I notice that my /tmp has several build objects; perhaps the builds were, at some point, using these [presumably wrong] objects? ![image](https://github.com/ollama/ollama/assets/2332684/a8968681-df88-4a0d-9e3f-1b5148b9c41e) ```
Author
Owner

@dhiltgen commented on GitHub (Mar 10, 2024):

The failure to write out coupled with the msg="/tmp/ollama1019928670/rocm/libext_server.so has disappeared, reloading libraries" is unexpected.

Could you do some ls -l checks in /tmp/ollama* to see if we can try to piece together how it got into this state? Did the ownership change, permissions change, unexpected umask, sticky bit... something else? Are you using our install script or is this packaged via some OS specific packaging/install process that maybe got out of sync with recent changes we've made? If you're building from source, maybe things did get out of sync on incremental builds?

<!-- gh-comment-id:1987307439 --> @dhiltgen commented on GitHub (Mar 10, 2024): The failure to write out coupled with the `msg="/tmp/ollama1019928670/rocm/libext_server.so has disappeared, reloading libraries"` is unexpected. Could you do some `ls -l` checks in /tmp/ollama* to see if we can try to piece together how it got into this state? Did the ownership change, permissions change, unexpected umask, sticky bit... something else? Are you using our install script or is this packaged via some OS specific packaging/install process that maybe got out of sync with recent changes we've made? If you're building from source, maybe things did get out of sync on incremental builds?
Author
Owner

@stephensrmmartin commented on GitHub (Mar 10, 2024):

This is arch linux, not using the install script. Below are the exact steps I used, which are the same steps used for ollama v0.1.{26, 27, 28} (with the minor versions changed, obviously).

git clone https://github.com/ollama/ollama.git ./ollama-29
cd ./ollama-29
git checkout v0.1.29
go generate ./...
go build ./
sudo cp ./ollama /usr/bin/ollama
sudo systemctl start ollama

Here are some supporting files (originally from the ollama package in arch, with one override).
ollama.service:

# /usr/lib/systemd/system/ollama.service
[Unit]
Description=Ollama Service
Wants=network-online.target
After=network.target network-online.target

[Service]
ExecStart=/usr/bin/ollama serve
WorkingDirectory=/var/lib/ollama
Environment="HOME=/var/lib/ollama" "GIN_MODE=release"
User=ollama
Group=ollama
Restart=on-failure
RestartSec=3
Type=simple
PrivateTmp=yes
ProtectSystem=full
ProtectHome=yes

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/ollama.service.d/override.conf
[Service]
Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0" "OLLAMA_HOST=0.0.0.0" "OLLAMA_DEBUG=1"

/usr/lib/sysusers.d/ollama.conf

g ollama - -
u ollama - "ollama user" /var/lib/ollama

/usr/lib/tmpfiles.d/ollama.conf

Q /var/lib/ollama 0755 ollama ollama

As I copied/pasted these things, I just realized; I wonder if the PrivateTmp=yes is causing the issue. I will test that.

I am currently rebuilding 29 from scratch again, and will get the /tmp info for you, as well as test the PrivateTmp=no config.

<!-- gh-comment-id:1987363420 --> @stephensrmmartin commented on GitHub (Mar 10, 2024): This is arch linux, not using the install script. Below are the exact steps I used, which are the same steps used for ollama v0.1.{26, 27, 28} (with the minor versions changed, obviously). ``` git clone https://github.com/ollama/ollama.git ./ollama-29 cd ./ollama-29 git checkout v0.1.29 go generate ./... go build ./ sudo cp ./ollama /usr/bin/ollama sudo systemctl start ollama ``` Here are some supporting files (originally from the ollama package in arch, with one override). `ollama.service`: ``` # /usr/lib/systemd/system/ollama.service [Unit] Description=Ollama Service Wants=network-online.target After=network.target network-online.target [Service] ExecStart=/usr/bin/ollama serve WorkingDirectory=/var/lib/ollama Environment="HOME=/var/lib/ollama" "GIN_MODE=release" User=ollama Group=ollama Restart=on-failure RestartSec=3 Type=simple PrivateTmp=yes ProtectSystem=full ProtectHome=yes [Install] WantedBy=multi-user.target # /etc/systemd/system/ollama.service.d/override.conf [Service] Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0" "OLLAMA_HOST=0.0.0.0" "OLLAMA_DEBUG=1" ``` `/usr/lib/sysusers.d/ollama.conf` ``` g ollama - - u ollama - "ollama user" /var/lib/ollama ``` `/usr/lib/tmpfiles.d/ollama.conf` ``` Q /var/lib/ollama 0755 ollama ollama ``` As I copied/pasted these things, I just realized; I wonder if the PrivateTmp=yes is causing the issue. I will test that. I am currently rebuilding 29 from scratch again, and will get the /tmp info for you, as well as test the PrivateTmp=no config.
Author
Owner

@stephensrmmartin commented on GitHub (Mar 10, 2024):

/tmp:

adb.1000.log
dumps
plasma-csd-generator.zQrCSZ
pressure-vessel-libs-MM3DK2
sddm-auth-70a21b01-8761-4a81-907d-5770cfdfdf83
sddm--cujpng
steam_chrome_shmem_uid1000_spid1855
steamfcBPnJ
systemd-private-ae7bcdda1fbc498397766aed0bb0897f-bluetooth.service-kokY4D
systemd-private-ae7bcdda1fbc498397766aed0bb0897f-dbus-broker.service-KTLxCo
systemd-private-ae7bcdda1fbc498397766aed0bb0897f-ollama.service-GYObBM
systemd-private-ae7bcdda1fbc498397766aed0bb0897f-polkit.service-uF3UMY
systemd-private-ae7bcdda1fbc498397766aed0bb0897f-systemd-logind.service-FSYN0r
systemd-private-ae7bcdda1fbc498397766aed0bb0897f-systemd-resolved.service-Uq2NMn
systemd-private-ae7bcdda1fbc498397766aed0bb0897f-systemd-timesyncd.service-CZzndE
systemd-private-ae7bcdda1fbc498397766aed0bb0897f-upower.service-ah8LuD
Temp-7066129c-6351-4ca8-9d3d-41ed38389844
tmux-1000

Note, there's a private tmp for ollama. In that:

# cd /tmp/systemd-private-ae7bcdda1fbc498397766aed0bb0897f-ollama.service-bBsAHH
# ls -l ./tmp/ollama1074375394/
total 0
drwxr-xr-x 2 ollama ollama 60 Mar 10 14:13 cpu
drwxr-xr-x 2 ollama ollama 60 Mar 10 14:13 cpu_avx
drwxr-xr-x 2 ollama ollama 60 Mar 10 14:13 cpu_avx2
lrwxrwxrwx 1 ollama ollama 13 Mar 10 14:13 rocm -> /opt/rocm/lib

I then ran it with PrivateTmp=no:

# ls -l /tmp
-rw-r----- 1 hwkiller hwkiller 901 Mar 10 13:58 adb.1000.log
drwxr-x--T 6 hwkiller hwkiller 140 Mar 10 13:58 dumps
drwx------ 5 ollama   ollama   120 Mar 10 14:17 ollama3429579114
...
...

However, the error remains:

Error: write payload llama.cpp/build/linux/x86_64/rocm/lib/libext_server.so.gz: open /tmp/ollama3429579114/rocm/libext_server.so: permission denied

Given that /tmp/ollama*/rocm points to /opt/rocm/lib, I checked that out, and there is no libext_server.so:

.-(/opt/rocm/lib)-------------------------------------------------------------------------------------------------------------(hwkiller@hwkiller-desktop)-
`--> ls
cmake                     libhiprand.so.1                  libhsa-runtime64.so         librocalution_hip.so.1.0.0  librocsolver.so.0
libamd_comgr.so           libhiprand.so.1.1                libhsa-runtime64.so.1       librocalution.so            librocsolver.so.0.1
libamd_comgr.so.2         libhiprtc-builtins.so            libhsa-runtime64.so.1.12.0  librocalution.so.1          librocsparse.so
libamd_comgr.so.2.6       libhiprtc-builtins.so.6          libMIOpen.so                librocalution.so.1.0        librocsparse.so.1
libamdhip64.so            libhiprtc-builtins.so.6.0.32830  libMIOpen.so.1              librocblas.so               librocsparse.so.1.0.0
libamdhip64.so.6          libhiprtc.so                     libMIOpen.so.1.0            librocblas.so.4             libroctracer64.so
libamdhip64.so.6.0.32830  libhiprtc.so.6                   liboam.so                   librocblas.so.4.0           libroctracer64.so.4
libamdocl64.so            libhiprtc.so.6.0.32830           liboam.so.1                 librocfft.so                libroctracer64.so.4.1.0
libcltrace.so             libhipsolver.so                  liboam.so.1.0               librocfft.so.0              libroctx64.so
libdevice_operations.a    libhipsolver.so.0                libOpenCL.so                librocfft.so.0.1            libroctx64.so.4
libhipblas.so             libhipsolver.so.0.1              libOpenCL.so.1              librocm_smi64.so            libroctx64.so.4.1.0
libhipblas.so.2           libhipsparse.so                  libOpenCL.so.1.2            librocm_smi64.so.1          libutility.a
libhipblas.so.2.0         libhipsparse.so.1                librccl.so                  librocm_smi64.so.1.0        llvm
libhipfft.so              libhipsparse.so.1.0.0            librccl.so.1                librocrand.so               pkgconfig
libhipfft.so.0            libhsakmt.so                     librccl.so.1.0              librocrand.so.1             rocblas
libhipfft.so.0.1          libhsakmt.so.1                   librocalution_hip.so        librocrand.so.1.1           rocfft
libhiprand.so             libhsakmt.so.1.0.6               librocalution_hip.so.1      librocsolver.so             roctracer

And further, it's not packaged:

.-(/opt/rocm/lib)-------------------------------------------------------------------------------------------------------------(hwkiller@hwkiller-desktop)-
`--> sudo pacman -Fx "librocsolver"     
extra/rocsolver 6.0.0-1 [installed]
    opt/rocm/lib/librocsolver.so
    opt/rocm/lib/librocsolver.so.0
    opt/rocm/lib/librocsolver.so.0.1
    opt/rocm/rocsolver/lib/librocsolver.so
.-(/opt/rocm/lib)-------------------------------------------------------------------------------------------------------------(hwkiller@hwkiller-desktop)-
`--> sudo pacman -Fx "libext_server"  
.-(/opt/rocm/lib)-------------------------------------------------------------------------------------------------------------(hwkiller@hwkiller-desktop)-
`--> 

<!-- gh-comment-id:1987367365 --> @stephensrmmartin commented on GitHub (Mar 10, 2024): `/tmp`: ``` adb.1000.log dumps plasma-csd-generator.zQrCSZ pressure-vessel-libs-MM3DK2 sddm-auth-70a21b01-8761-4a81-907d-5770cfdfdf83 sddm--cujpng steam_chrome_shmem_uid1000_spid1855 steamfcBPnJ systemd-private-ae7bcdda1fbc498397766aed0bb0897f-bluetooth.service-kokY4D systemd-private-ae7bcdda1fbc498397766aed0bb0897f-dbus-broker.service-KTLxCo systemd-private-ae7bcdda1fbc498397766aed0bb0897f-ollama.service-GYObBM systemd-private-ae7bcdda1fbc498397766aed0bb0897f-polkit.service-uF3UMY systemd-private-ae7bcdda1fbc498397766aed0bb0897f-systemd-logind.service-FSYN0r systemd-private-ae7bcdda1fbc498397766aed0bb0897f-systemd-resolved.service-Uq2NMn systemd-private-ae7bcdda1fbc498397766aed0bb0897f-systemd-timesyncd.service-CZzndE systemd-private-ae7bcdda1fbc498397766aed0bb0897f-upower.service-ah8LuD Temp-7066129c-6351-4ca8-9d3d-41ed38389844 tmux-1000 ``` Note, there's a private tmp for ollama. In that: ``` # cd /tmp/systemd-private-ae7bcdda1fbc498397766aed0bb0897f-ollama.service-bBsAHH # ls -l ./tmp/ollama1074375394/ total 0 drwxr-xr-x 2 ollama ollama 60 Mar 10 14:13 cpu drwxr-xr-x 2 ollama ollama 60 Mar 10 14:13 cpu_avx drwxr-xr-x 2 ollama ollama 60 Mar 10 14:13 cpu_avx2 lrwxrwxrwx 1 ollama ollama 13 Mar 10 14:13 rocm -> /opt/rocm/lib ``` I then ran it with PrivateTmp=no: ``` # ls -l /tmp -rw-r----- 1 hwkiller hwkiller 901 Mar 10 13:58 adb.1000.log drwxr-x--T 6 hwkiller hwkiller 140 Mar 10 13:58 dumps drwx------ 5 ollama ollama 120 Mar 10 14:17 ollama3429579114 ... ... ``` However, the error remains: ``` Error: write payload llama.cpp/build/linux/x86_64/rocm/lib/libext_server.so.gz: open /tmp/ollama3429579114/rocm/libext_server.so: permission denied ``` Given that /tmp/ollama*/rocm points to `/opt/rocm/lib`, I checked that out, and there *is* no `libext_server.so`: ``` .-(/opt/rocm/lib)-------------------------------------------------------------------------------------------------------------(hwkiller@hwkiller-desktop)- `--> ls cmake libhiprand.so.1 libhsa-runtime64.so librocalution_hip.so.1.0.0 librocsolver.so.0 libamd_comgr.so libhiprand.so.1.1 libhsa-runtime64.so.1 librocalution.so librocsolver.so.0.1 libamd_comgr.so.2 libhiprtc-builtins.so libhsa-runtime64.so.1.12.0 librocalution.so.1 librocsparse.so libamd_comgr.so.2.6 libhiprtc-builtins.so.6 libMIOpen.so librocalution.so.1.0 librocsparse.so.1 libamdhip64.so libhiprtc-builtins.so.6.0.32830 libMIOpen.so.1 librocblas.so librocsparse.so.1.0.0 libamdhip64.so.6 libhiprtc.so libMIOpen.so.1.0 librocblas.so.4 libroctracer64.so libamdhip64.so.6.0.32830 libhiprtc.so.6 liboam.so librocblas.so.4.0 libroctracer64.so.4 libamdocl64.so libhiprtc.so.6.0.32830 liboam.so.1 librocfft.so libroctracer64.so.4.1.0 libcltrace.so libhipsolver.so liboam.so.1.0 librocfft.so.0 libroctx64.so libdevice_operations.a libhipsolver.so.0 libOpenCL.so librocfft.so.0.1 libroctx64.so.4 libhipblas.so libhipsolver.so.0.1 libOpenCL.so.1 librocm_smi64.so libroctx64.so.4.1.0 libhipblas.so.2 libhipsparse.so libOpenCL.so.1.2 librocm_smi64.so.1 libutility.a libhipblas.so.2.0 libhipsparse.so.1 librccl.so librocm_smi64.so.1.0 llvm libhipfft.so libhipsparse.so.1.0.0 librccl.so.1 librocrand.so pkgconfig libhipfft.so.0 libhsakmt.so librccl.so.1.0 librocrand.so.1 rocblas libhipfft.so.0.1 libhsakmt.so.1 librocalution_hip.so librocrand.so.1.1 rocfft libhiprand.so libhsakmt.so.1.0.6 librocalution_hip.so.1 librocsolver.so roctracer ``` And further, it's not packaged: ``` .-(/opt/rocm/lib)-------------------------------------------------------------------------------------------------------------(hwkiller@hwkiller-desktop)- `--> sudo pacman -Fx "librocsolver" extra/rocsolver 6.0.0-1 [installed] opt/rocm/lib/librocsolver.so opt/rocm/lib/librocsolver.so.0 opt/rocm/lib/librocsolver.so.0.1 opt/rocm/rocsolver/lib/librocsolver.so .-(/opt/rocm/lib)-------------------------------------------------------------------------------------------------------------(hwkiller@hwkiller-desktop)- `--> sudo pacman -Fx "libext_server" .-(/opt/rocm/lib)-------------------------------------------------------------------------------------------------------------(hwkiller@hwkiller-desktop)- `--> ```
Author
Owner

@stephensrmmartin commented on GitHub (Mar 10, 2024):

Is libext_server.so something built from ollama, and ollama is trying to extract that to /opt/rocm/lib/? If so, that is a bad idea, but perhaps I am wrong?

<!-- gh-comment-id:1987368398 --> @stephensrmmartin commented on GitHub (Mar 10, 2024): Is libext_server.so something built from ollama, and ollama is trying to extract that to /opt/rocm/lib/? If so, that is a bad idea, but perhaps I am wrong?
Author
Owner

@stephensrmmartin commented on GitHub (Mar 11, 2024):

A bisect led me here.

6c5ccb11f993ccc88c4761b8c31e0fefcbc1900f is the first bad commit
commit 6c5ccb11f993ccc88c4761b8c31e0fefcbc1900f
Author: Daniel Hiltgen <daniel@ollama.com>
Date:   Thu Feb 15 17:15:09 2024 -0800

    Revamp ROCm support
    
    This refines where we extract the LLM libraries to by adding a new
    OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already
    idempotenent, so this should speed up startups after the first time a
    new release is deployed.  It also cleans up after itself.
    
    We now build only a single ROCm version (latest major) on both windows
    and linux.  Given the large size of ROCms tensor files, we split the
    dependency out.  It's bundled into the installer on windows, and a
    separate download on windows.  The linux install script is now smart and
    detects the presence of AMD GPUs and looks to see if rocm v6 is already
    present, and if not, then downloads our dependency tar file.
    
    For Linux discovery, we now use sysfs and check each GPU against what
    ROCm supports so we can degrade to CPU gracefully instead of having
    llama.cpp+rocm assert/crash on us.  For Windows, we now use go's windows
    dynamic library loading logic to access the amdhip64.dll APIs to query
    the GPU information.

 .github/workflows/test.yaml  |  24 ++-
 Dockerfile                   |  27 ++-
 app/ollama.iss               |   8 +
 docs/development.md          |   4 +-
 docs/linux.md                |   8 +
 docs/troubleshooting.md      |  37 ++++
 docs/windows.md              |   3 +-
 gpu/amd.go                   | 101 -----------
 gpu/amd_common.go            |  58 ++++++
 gpu/amd_hip_windows.go       | 141 +++++++++++++++
 gpu/amd_linux.go             | 411 +++++++++++++++++++++++++++++++++++++++++++
 gpu/amd_windows.go           | 190 ++++++++++++++++++++
 gpu/assets.go                |  60 +++++++
 gpu/gpu.go                   | 110 +-----------
 gpu/gpu_info.h               |   1 -
 gpu/gpu_info_cuda.c          |  10 +-
 gpu/gpu_info_rocm.c          | 198 ---------------------
 gpu/gpu_info_rocm.h          |  59 -------
 llm/dyn_ext_server.c         |  19 +-
 llm/dyn_ext_server.go        |  27 +--
 llm/generate/gen_linux.sh    |  14 +-
 llm/generate/gen_windows.ps1 |  90 ++++++++--
 llm/llm.go                   |  12 +-
 llm/payload_common.go        |  58 +++---
 llm/payload_linux.go         |   2 +-
 scripts/build_linux.sh       |   1 +
 server/routes.go             |   6 +-
 27 files changed, 1091 insertions(+), 588 deletions(-)
 delete mode 100644 gpu/amd.go
 create mode 100644 gpu/amd_common.go
 create mode 100644 gpu/amd_hip_windows.go
 create mode 100644 gpu/amd_linux.go
 create mode 100644 gpu/amd_windows.go
 create mode 100644 gpu/assets.go
 delete mode 100644 gpu/gpu_info_rocm.c
 delete mode 100644 gpu/gpu_info_rocm.h
<!-- gh-comment-id:1987446548 --> @stephensrmmartin commented on GitHub (Mar 11, 2024): A bisect led me here. ``` 6c5ccb11f993ccc88c4761b8c31e0fefcbc1900f is the first bad commit commit 6c5ccb11f993ccc88c4761b8c31e0fefcbc1900f Author: Daniel Hiltgen <daniel@ollama.com> Date: Thu Feb 15 17:15:09 2024 -0800 Revamp ROCm support This refines where we extract the LLM libraries to by adding a new OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already idempotenent, so this should speed up startups after the first time a new release is deployed. It also cleans up after itself. We now build only a single ROCm version (latest major) on both windows and linux. Given the large size of ROCms tensor files, we split the dependency out. It's bundled into the installer on windows, and a separate download on windows. The linux install script is now smart and detects the presence of AMD GPUs and looks to see if rocm v6 is already present, and if not, then downloads our dependency tar file. For Linux discovery, we now use sysfs and check each GPU against what ROCm supports so we can degrade to CPU gracefully instead of having llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows dynamic library loading logic to access the amdhip64.dll APIs to query the GPU information. .github/workflows/test.yaml | 24 ++- Dockerfile | 27 ++- app/ollama.iss | 8 + docs/development.md | 4 +- docs/linux.md | 8 + docs/troubleshooting.md | 37 ++++ docs/windows.md | 3 +- gpu/amd.go | 101 ----------- gpu/amd_common.go | 58 ++++++ gpu/amd_hip_windows.go | 141 +++++++++++++++ gpu/amd_linux.go | 411 +++++++++++++++++++++++++++++++++++++++++++ gpu/amd_windows.go | 190 ++++++++++++++++++++ gpu/assets.go | 60 +++++++ gpu/gpu.go | 110 +----------- gpu/gpu_info.h | 1 - gpu/gpu_info_cuda.c | 10 +- gpu/gpu_info_rocm.c | 198 --------------------- gpu/gpu_info_rocm.h | 59 ------- llm/dyn_ext_server.c | 19 +- llm/dyn_ext_server.go | 27 +-- llm/generate/gen_linux.sh | 14 +- llm/generate/gen_windows.ps1 | 90 ++++++++-- llm/llm.go | 12 +- llm/payload_common.go | 58 +++--- llm/payload_linux.go | 2 +- scripts/build_linux.sh | 1 + server/routes.go | 6 +- 27 files changed, 1091 insertions(+), 588 deletions(-) delete mode 100644 gpu/amd.go create mode 100644 gpu/amd_common.go create mode 100644 gpu/amd_hip_windows.go create mode 100644 gpu/amd_linux.go create mode 100644 gpu/amd_windows.go create mode 100644 gpu/assets.go delete mode 100644 gpu/gpu_info_rocm.c delete mode 100644 gpu/gpu_info_rocm.h ```
Author
Owner

@dhiltgen commented on GitHub (Mar 11, 2024):

Thanks for the details - we've been moving some paths around as we prep for the release and didn't notice this potential conflict. I've got a PR up that should resolve it.

<!-- gh-comment-id:1988767155 --> @dhiltgen commented on GitHub (Mar 11, 2024): Thanks for the details - we've been moving some paths around as we prep for the release and didn't notice this potential conflict. I've got a PR up that should resolve it.
Author
Owner

@stephensrmmartin commented on GitHub (Mar 11, 2024):

Just commenting to confirm that your fix worked for me. Thanks!

<!-- gh-comment-id:1989270478 --> @stephensrmmartin commented on GitHub (Mar 11, 2024): Just commenting to confirm that your fix worked for me. Thanks!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#48376