[GH-ISSUE #738] AMD GPU & ROCm support #62384

Closed
opened 2026-05-03 08:34:17 -05:00 by GiteaMirror · 323 comments
Owner

Originally created by @deadmeu on GitHub (Oct 8, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/738

Originally assigned to: @dhiltgen on GitHub.

I have a 7900XT and would definitely love to have ROCm support. It seems like it might be coming with https://github.com/jmorganca/ollama/pull/667?

I couldn't find a dedicated issue for this so I'm creating this one to track it.

Edit: For those interested in this feature, follow https://github.com/jmorganca/ollama/pull/814.

Originally created by @deadmeu on GitHub (Oct 8, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/738 Originally assigned to: @dhiltgen on GitHub. I have a 7900XT and would definitely love to have ROCm support. It seems like it might be coming with https://github.com/jmorganca/ollama/pull/667? I couldn't find a dedicated issue for this so I'm creating this one to track it. Edit: For those interested in this feature, follow https://github.com/jmorganca/ollama/pull/814.
GiteaMirror added the amdfeature request labels 2026-05-03 08:34:18 -05:00
Author
Owner

@jmorganca commented on GitHub (Oct 8, 2023):

Thanks for creating an issue. Keep an eye on https://github.com/jmorganca/ollama/pull/667

<!-- gh-comment-id:1752109566 --> @jmorganca commented on GitHub (Oct 8, 2023): Thanks for creating an issue. Keep an eye on https://github.com/jmorganca/ollama/pull/667
Author
Owner

@65a commented on GitHub (Oct 8, 2023):

It works if you apply that patch locally and follow the updated readme/build instructions. My w7900 unfortunately had to go back to AMD for replacement because it liked to hang up in VBIOS during some boots, but I'd love to hear if you can patch locally and run it successfully. I have a RX6950 and Instinct Mi60 I am testing with currently. Should be as easy as (assuming you have ROCm and CLBlast installed):

git clone --recursive https://github.com/65a/ollama
cd ollama
ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./...
go build -tags rocm

And that should give you a ROCm-compatible ollama binary in the current directory.

Some notes: if ROCm fails, it will fall back to CPU, so you want to look carefully at the logs. Let me know if you see that happen (the symptom of course would also include low tokens/s). For reference, I just tested these instructions and I get about 11 tok/s on gfx906 (Mi60) with a 25gb q8_0 model, a 7900 (gfx1100) should do much better.

<!-- gh-comment-id:1752112807 --> @65a commented on GitHub (Oct 8, 2023): It works if you apply that patch locally and follow the updated readme/build instructions. My w7900 unfortunately had to go back to AMD for replacement because it liked to hang up in VBIOS during some boots, but I'd love to hear if you can patch locally and run it successfully. I have a RX6950 and Instinct Mi60 I am testing with currently. Should be as easy as (assuming you have ROCm and CLBlast installed): ``` git clone --recursive https://github.com/65a/ollama ``` ``` cd ollama ``` ``` ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./... ``` ``` go build -tags rocm ``` And that should give you a ROCm-compatible ollama binary in the current directory. Some notes: if ROCm fails, it will fall back to CPU, so you want to look carefully at the logs. Let me know if you see that happen (the symptom of course would also include low tokens/s). For reference, I just tested these instructions and I get about 11 tok/s on gfx906 (Mi60) with a 25gb q8_0 model, a 7900 (gfx1100) should do much better.
Author
Owner

@deadmeu commented on GitHub (Oct 9, 2023):

@65a thanks for the quick help. Unfortunately, it does seem to be falling back to the CPU. Here's what I did:

  1. I cloned the repo using git clone --recursive https://github.com/65a/ollama
  2. I cd'd inside the repo and installed the required dependencies for my distro (Arch): sudo pacman -S rocm-hip-sdk rocm-opencl-sdk clblast go
  3. I ran ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./...
  4. When the tags were generated I ran go build -tags rocm
  5. This gave me a binary which I then ran twice, once to ./ollama serve and then in another terminal ./ollama run codellama:34b
  6. Sending a test message resulted in the CPU being used to generate a response.

Here's what I'm getting in the output of the server:

2023/10/09 14:58:15 images.go:995: total blobs: 21
2023/10/09 14:58:15 images.go:1002: total unused blobs removed: 0
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:	export GIN_MODE=release
 - using code:	gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST   /api/pull                 --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers)
[GIN-debug] POST   /api/generate             --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers)
[GIN-debug] POST   /api/embeddings           --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers)
[GIN-debug] POST   /api/create               --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers)
[GIN-debug] POST   /api/push                 --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers)
[GIN-debug] POST   /api/copy                 --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers)
[GIN-debug] DELETE /api/delete               --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers)
[GIN-debug] POST   /api/show                 --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers)
[GIN-debug] GET    /                         --> github.com/jmorganca/ollama/server.Serve.func2 (5 handlers)
[GIN-debug] GET    /api/tags                 --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
[GIN-debug] HEAD   /                         --> github.com/jmorganca/ollama/server.Serve.func2 (5 handlers)
[GIN-debug] HEAD   /api/tags                 --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
2023/10/09 14:58:15 routes.go:579: Listening on 127.0.0.1:11434
2023/10/09 14:58:15 accelerator_rocm.go:32: warning: ROCM_PATH is not set. Trying a likely fallback path, but it is recommended to set this variable in the environment.
2023/10/09 14:58:15 accelerator_rocm.go:71: ROCm presenting 18512 MiB of available VRAM on device "card0"
[GIN] 2023/10/09 - 14:58:20 | 200 |        44.1µs |       127.0.0.1 | HEAD     "/"
[GIN] 2023/10/09 - 14:58:20 | 200 |     408.951µs |       127.0.0.1 | GET      "/api/tags"
2023/10/09 14:58:20 accelerator_rocm.go:32: warning: ROCM_PATH is not set. Trying a likely fallback path, but it is recommended to set this variable in the environment.
2023/10/09 14:58:20 accelerator_rocm.go:71: ROCm presenting 18523 MiB of available VRAM on device "card0"
2023/10/09 14:58:20 llama.go:219: filesize is 18169 MiB, with 48 layers. Assuming 378 MiB per layer
2023/10/09 14:58:20 llama.go:220: 18523 MiB VRAM available, loading up to 48 GPU layers
2023/10/09 14:58:20 llama.go:283: llama runner not found: stat /tmp/ollama1555236189/llama.cpp/ggml/build/rocm/bin/ollama-runner: no such file or directory
2023/10/09 14:58:20 llama.go:300: starting llama runner
2023/10/09 14:58:20 llama.go:337: waiting for llama runner to start responding
{"timestamp":1696827500,"level":"WARNING","function":"server_params_parse","line":845,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":0}
{"timestamp":1696827500,"level":"INFO","function":"main","line":1190,"message":"build info","build":1,"commit":"9e232f0"}
{"timestamp":1696827500,"level":"INFO","function":"main","line":1192,"message":"system info","n_threads":12,"total_threads":24,"system_info":"AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | "}
llama.cpp: loading model from /home/alex/.ollama/models/blobs/sha256:bcc2734eb66318d6bbbc677681b3165817a5fc15fb68b490829a119a9d97cab4
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 8192
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 64
llama_model_load_internal: n_head_kv  = 8
llama_model_load_internal: n_layer    = 48
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: n_gqa      = 8
llama_model_load_internal: rnorm_eps  = 5.0e-06
llama_model_load_internal: n_ff       = 22016
llama_model_load_internal: freq_base  = 100000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: model size = 34B
llama_model_load_internal: ggml ctx size =    0.13 MB
llama_model_load_internal: mem required  = 18168.87 MB (+  384.00 MB per state)
llama_new_context_with_model: kv self size  =  384.00 MB
llama_new_context_with_model: compute buffer total size =  305.35 MB

llama server listening at http://127.0.0.1:57581

{"timestamp":1696827501,"level":"INFO","function":"main","line":1443,"message":"HTTP server listening","hostname":"127.0.0.1","port":57581}
{"timestamp":1696827502,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":59080,"status":200,"method":"HEAD","path":"/","params":{}}
2023/10/09 14:58:22 llama.go:353: llama runner started in 1.400578 seconds
{"timestamp":1696827502,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":59080,"status":200,"method":"POST","path":"/tokenize","params":{}}
{"timestamp":1696827502,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":59080,"status":200,"method":"POST","path":"/tokenize","params":{}}
[GIN] 2023/10/09 - 14:58:22 | 200 |  1.562367232s |       127.0.0.1 | POST     "/api/generate"
{"timestamp":1696827507,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":50906,"status":200,"method":"HEAD","path":"/","params":{}}

llama_print_timings:        load time =  1480.12 ms
llama_print_timings:      sample time =     6.27 ms /    10 runs   (    0.63 ms per token,  1594.90 tokens per second)
llama_print_timings: prompt eval time =  1479.95 ms /    11 tokens (  134.54 ms per token,     7.43 tokens per second)
llama_print_timings:        eval time =  4226.63 ms /     9 runs   (  469.63 ms per token,     2.13 tokens per second)
llama_print_timings:       total time =  5715.33 ms
{"timestamp":1696827513,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":50906,"status":200,"method":"POST","path":"/completion","params":{}}
{"timestamp":1696827513,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":50918,"status":200,"method":"POST","path":"/tokenize","params":{}}
[GIN] 2023/10/09 - 14:58:33 | 200 |  5.717758645s |       127.0.0.1 | POST     "/api/generate"

Two lines that stand out to me are:
2023/10/09 14:58:20 llama.go:283: llama runner not found: stat /tmp/ollama1555236189/llama.cpp/ggml/build/rocm/bin/ollama-runner: no such file or directory
{"timestamp":1696827500,"level":"WARNING","function":"server_params_parse","line":845,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":0}

As a side note I found that go was running single-threaded. Is there an easy way to add multi-threading to the go generate and go build commands?

<!-- gh-comment-id:1752352607 --> @deadmeu commented on GitHub (Oct 9, 2023): @65a thanks for the quick help. Unfortunately, it does seem to be falling back to the CPU. Here's what I did: 1. I cloned the repo using `git clone --recursive https://github.com/65a/ollama` 2. I cd'd inside the repo and installed the required dependencies for my distro (Arch): `sudo pacman -S rocm-hip-sdk rocm-opencl-sdk clblast go` 3. I ran `ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./...` 4. When the tags were generated I ran `go build -tags rocm` 5. This gave me a binary which I then ran twice, once to `./ollama serve` and then in another terminal `./ollama run codellama:34b` 6. Sending a test message resulted in the CPU being used to generate a response. Here's what I'm getting in the output of the server: ``` 2023/10/09 14:58:15 images.go:995: total blobs: 21 2023/10/09 14:58:15 images.go:1002: total unused blobs removed: 0 [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached. [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production. - using env: export GIN_MODE=release - using code: gin.SetMode(gin.ReleaseMode) [GIN-debug] POST /api/pull --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers) [GIN-debug] POST /api/generate --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers) [GIN-debug] POST /api/embeddings --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers) [GIN-debug] POST /api/create --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers) [GIN-debug] POST /api/push --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers) [GIN-debug] POST /api/copy --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers) [GIN-debug] DELETE /api/delete --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers) [GIN-debug] POST /api/show --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers) [GIN-debug] GET / --> github.com/jmorganca/ollama/server.Serve.func2 (5 handlers) [GIN-debug] GET /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers) [GIN-debug] HEAD / --> github.com/jmorganca/ollama/server.Serve.func2 (5 handlers) [GIN-debug] HEAD /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers) 2023/10/09 14:58:15 routes.go:579: Listening on 127.0.0.1:11434 2023/10/09 14:58:15 accelerator_rocm.go:32: warning: ROCM_PATH is not set. Trying a likely fallback path, but it is recommended to set this variable in the environment. 2023/10/09 14:58:15 accelerator_rocm.go:71: ROCm presenting 18512 MiB of available VRAM on device "card0" [GIN] 2023/10/09 - 14:58:20 | 200 | 44.1µs | 127.0.0.1 | HEAD "/" [GIN] 2023/10/09 - 14:58:20 | 200 | 408.951µs | 127.0.0.1 | GET "/api/tags" 2023/10/09 14:58:20 accelerator_rocm.go:32: warning: ROCM_PATH is not set. Trying a likely fallback path, but it is recommended to set this variable in the environment. 2023/10/09 14:58:20 accelerator_rocm.go:71: ROCm presenting 18523 MiB of available VRAM on device "card0" 2023/10/09 14:58:20 llama.go:219: filesize is 18169 MiB, with 48 layers. Assuming 378 MiB per layer 2023/10/09 14:58:20 llama.go:220: 18523 MiB VRAM available, loading up to 48 GPU layers 2023/10/09 14:58:20 llama.go:283: llama runner not found: stat /tmp/ollama1555236189/llama.cpp/ggml/build/rocm/bin/ollama-runner: no such file or directory 2023/10/09 14:58:20 llama.go:300: starting llama runner 2023/10/09 14:58:20 llama.go:337: waiting for llama runner to start responding {"timestamp":1696827500,"level":"WARNING","function":"server_params_parse","line":845,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":0} {"timestamp":1696827500,"level":"INFO","function":"main","line":1190,"message":"build info","build":1,"commit":"9e232f0"} {"timestamp":1696827500,"level":"INFO","function":"main","line":1192,"message":"system info","n_threads":12,"total_threads":24,"system_info":"AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | "} llama.cpp: loading model from /home/alex/.ollama/models/blobs/sha256:bcc2734eb66318d6bbbc677681b3165817a5fc15fb68b490829a119a9d97cab4 llama_model_load_internal: format = ggjt v3 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 8192 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 64 llama_model_load_internal: n_head_kv = 8 llama_model_load_internal: n_layer = 48 llama_model_load_internal: n_rot = 128 llama_model_load_internal: n_gqa = 8 llama_model_load_internal: rnorm_eps = 5.0e-06 llama_model_load_internal: n_ff = 22016 llama_model_load_internal: freq_base = 100000.0 llama_model_load_internal: freq_scale = 1 llama_model_load_internal: ftype = 2 (mostly Q4_0) llama_model_load_internal: model size = 34B llama_model_load_internal: ggml ctx size = 0.13 MB llama_model_load_internal: mem required = 18168.87 MB (+ 384.00 MB per state) llama_new_context_with_model: kv self size = 384.00 MB llama_new_context_with_model: compute buffer total size = 305.35 MB llama server listening at http://127.0.0.1:57581 {"timestamp":1696827501,"level":"INFO","function":"main","line":1443,"message":"HTTP server listening","hostname":"127.0.0.1","port":57581} {"timestamp":1696827502,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":59080,"status":200,"method":"HEAD","path":"/","params":{}} 2023/10/09 14:58:22 llama.go:353: llama runner started in 1.400578 seconds {"timestamp":1696827502,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":59080,"status":200,"method":"POST","path":"/tokenize","params":{}} {"timestamp":1696827502,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":59080,"status":200,"method":"POST","path":"/tokenize","params":{}} [GIN] 2023/10/09 - 14:58:22 | 200 | 1.562367232s | 127.0.0.1 | POST "/api/generate" {"timestamp":1696827507,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":50906,"status":200,"method":"HEAD","path":"/","params":{}} llama_print_timings: load time = 1480.12 ms llama_print_timings: sample time = 6.27 ms / 10 runs ( 0.63 ms per token, 1594.90 tokens per second) llama_print_timings: prompt eval time = 1479.95 ms / 11 tokens ( 134.54 ms per token, 7.43 tokens per second) llama_print_timings: eval time = 4226.63 ms / 9 runs ( 469.63 ms per token, 2.13 tokens per second) llama_print_timings: total time = 5715.33 ms {"timestamp":1696827513,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":50906,"status":200,"method":"POST","path":"/completion","params":{}} {"timestamp":1696827513,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":50918,"status":200,"method":"POST","path":"/tokenize","params":{}} [GIN] 2023/10/09 - 14:58:33 | 200 | 5.717758645s | 127.0.0.1 | POST "/api/generate" ``` Two lines that stand out to me are: `2023/10/09 14:58:20 llama.go:283: llama runner not found: stat /tmp/ollama1555236189/llama.cpp/ggml/build/rocm/bin/ollama-runner: no such file or directory` `{"timestamp":1696827500,"level":"WARNING","function":"server_params_parse","line":845,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":0}` As a side note I found that go was running single-threaded. Is there an easy way to add multi-threading to the `go generate` and `go build` commands?
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

Looks like a build problem, see the not found. I actually did the same thing, make sure -tags rocm is provided for both go generate and go build.

<!-- gh-comment-id:1752358853 --> @65a commented on GitHub (Oct 9, 2023): Looks like a build problem, see the not found. I actually did the same thing, make sure `-tags rocm` is provided for both go generate and go build.
Author
Owner

@deadmeu commented on GitHub (Oct 9, 2023):

Ok, I rebuilt (and ran) the binary with the ROCM_PATH and CLBlast_DIR env vars included and am getting this new warning: 2023/10/09 15:11:40 routes.go:599: Warning: GPU support may not enabled, check you have installed install GPU drivers: rocm-smi command failed Edit: Never mind - I had the wrong paths set.

This might be an issue with the installed ROCm packages on my system. I have this strange issue where, even though I have packages like rocm-smi-lib and rocminfo installed, I cannot run them:

> rocminfo
zsh: command not found: rocminfo
<!-- gh-comment-id:1752359612 --> @deadmeu commented on GitHub (Oct 9, 2023): ~~Ok, I rebuilt (and ran) the binary with the `ROCM_PATH` and `CLBlast_DIR` env vars included and am getting this new warning: `2023/10/09 15:11:40 routes.go:599: Warning: GPU support may not enabled, check you have installed install GPU drivers: rocm-smi command failed`~~ Edit: Never mind - I had the wrong paths set. This might be an issue with the installed ROCm packages on my system. I have this strange issue where, even though I have packages like `rocm-smi-lib` and `rocminfo` installed, I cannot run them: ``` > rocminfo zsh: command not found: rocminfo ```
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

Where is your rocminfo binary? Set ROCM_PATH when running the binary...for me it's at /opt/rocm/bin/rocminfo

<!-- gh-comment-id:1752360159 --> @65a commented on GitHub (Oct 9, 2023): Where is your rocminfo binary? Set ROCM_PATH when running the binary...for me it's at /opt/rocm/bin/rocminfo
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

Note also it wants rocm-smi, so make sure that is located at ROCM_PATH/bin/rocm-smi

<!-- gh-comment-id:1752361174 --> @65a commented on GitHub (Oct 9, 2023): Note also it wants `rocm-smi`, so make sure that is located at ROCM_PATH/bin/rocm-smi
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

It's working? Good to hear. At least on Arch, /opt/rocm/bin is not on the path, so it only works if you add it to PATH, or call it directly with the full path to the binary, which is the approach the code uses (also safer generally). Is your SDK also installed there? If not I can test against multiple distro-specific fallbacks if you're on a fairly mainstream distro.

<!-- gh-comment-id:1752368404 --> @65a commented on GitHub (Oct 9, 2023): It's working? Good to hear. At least on Arch, /opt/rocm/bin is not on the path, so it only works if you add it to PATH, or call it directly with the full path to the binary, which is the approach the code uses (also safer generally). Is your SDK also installed there? If not I can test against multiple distro-specific fallbacks if you're on a fairly mainstream distro.
Author
Owner

@deadmeu commented on GitHub (Oct 9, 2023):

Oops, I had the wrong paths set earlier. I regenerated the tags and rebuilt with the correct paths ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast but I still get the message {"timestamp":1696829167,"level":"WARNING","function":"server_params_parse","line":845,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":0}

I can confirm that /opt/rocm/bin/rocm-smi is where my rocm-smi binary is.

<!-- gh-comment-id:1752368854 --> @deadmeu commented on GitHub (Oct 9, 2023): Oops, I had the wrong paths set earlier. I regenerated the tags and rebuilt with the correct paths `ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast` but I still get the message `{"timestamp":1696829167,"level":"WARNING","function":"server_params_parse","line":845,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":0}` I can confirm that `/opt/rocm/bin/rocm-smi` is where my `rocm-smi` binary is.
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

Kill the binary, and do a clean run, and attach the log here. Usually you will see two llama.cpp logs, first the ROCm one, which will have some error, then it will start again with a CPU one.

<!-- gh-comment-id:1752369334 --> @65a commented on GitHub (Oct 9, 2023): Kill the binary, and do a clean run, and attach the log here. Usually you will see two llama.cpp logs, first the ROCm one, which will have some error, then it will start again with a CPU one.
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

I think I saw it correctly read the VRAM in your first log though.

<!-- gh-comment-id:1752369651 --> @65a commented on GitHub (Oct 9, 2023): I think I saw it correctly read the VRAM in your first log though.
Author
Owner

@deadmeu commented on GitHub (Oct 9, 2023):

Are there any other log files written other than what's being printed in the terminal?

<!-- gh-comment-id:1752369994 --> @deadmeu commented on GitHub (Oct 9, 2023): Are there any other log files written other than what's being printed in the terminal?
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

Just the terminal one is good. I'd expect to see

llm_load_tensors: using ROCm for GPU acceleration
llm_load_tensors: mem required  =  166.20 MB (+ 4960.00 MB per state)
llm_load_tensors: offloading 62 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloading v cache to GPU
llm_load_tensors: offloading k cache to GPU
llm_load_tensors: offloaded 65/65 layers to GPU
llm_load_tensors: VRAM used: 25056 MB
....................................................................................................
llama_new_context_with_model: kv self size  = 4960.00 MB
llama_new_context_with_model: compute buffer total size =  351.47 MB
llama_new_context_with_model: VRAM scratch buffer: 350.00 MB

llama server listening at http://127.0.0.1:56549

But instead with some error after offload, in the first part of the log, and the GPU no compiled error only after that run failed.

<!-- gh-comment-id:1752370751 --> @65a commented on GitHub (Oct 9, 2023): Just the terminal one is good. I'd expect to see ``` llm_load_tensors: using ROCm for GPU acceleration llm_load_tensors: mem required = 166.20 MB (+ 4960.00 MB per state) llm_load_tensors: offloading 62 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloading v cache to GPU llm_load_tensors: offloading k cache to GPU llm_load_tensors: offloaded 65/65 layers to GPU llm_load_tensors: VRAM used: 25056 MB .................................................................................................... llama_new_context_with_model: kv self size = 4960.00 MB llama_new_context_with_model: compute buffer total size = 351.47 MB llama_new_context_with_model: VRAM scratch buffer: 350.00 MB llama server listening at http://127.0.0.1:56549 ``` But instead with some error after offload, in the first part of the log, and the GPU no compiled error only after that run failed.
Author
Owner

@deadmeu commented on GitHub (Oct 9, 2023):

> ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast ./ollama serve
2023/10/09 15:35:52 images.go:995: total blobs: 21
2023/10/09 15:35:52 images.go:1002: total unused blobs removed: 0
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:	export GIN_MODE=release
 - using code:	gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST   /api/pull                 --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers)
[GIN-debug] POST   /api/generate             --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers)
[GIN-debug] POST   /api/embeddings           --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers)
[GIN-debug] POST   /api/create               --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers)
[GIN-debug] POST   /api/push                 --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers)
[GIN-debug] POST   /api/copy                 --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers)
[GIN-debug] DELETE /api/delete               --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers)
[GIN-debug] POST   /api/show                 --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers)
[GIN-debug] GET    /                         --> github.com/jmorganca/ollama/server.Serve.func2 (5 handlers)
[GIN-debug] GET    /api/tags                 --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
[GIN-debug] HEAD   /                         --> github.com/jmorganca/ollama/server.Serve.func2 (5 handlers)
[GIN-debug] HEAD   /api/tags                 --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
2023/10/09 15:35:52 routes.go:579: Listening on 127.0.0.1:11434
2023/10/09 15:35:52 accelerator_rocm.go:71: ROCm presenting 18114 MiB of available VRAM on device "card0"

And then when I run ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast ./ollama run codellama:34b in another terminal I get the following printed out in the server's terminal:

[GIN] 2023/10/09 - 15:36:39 | 200 |      27.801µs |       127.0.0.1 | HEAD     "/"
[GIN] 2023/10/09 - 15:36:39 | 200 |     519.394µs |       127.0.0.1 | GET      "/api/tags"
2023/10/09 15:36:39 accelerator_rocm.go:71: ROCm presenting 18092 MiB of available VRAM on device "card0"
2023/10/09 15:36:39 llama.go:219: filesize is 18169 MiB, with 48 layers. Assuming 378 MiB per layer
2023/10/09 15:36:39 llama.go:220: 18092 MiB VRAM available, loading up to 47 GPU layers
2023/10/09 15:36:39 llama.go:283: llama runner not found: stat /tmp/ollama1663762207/llama.cpp/ggml/build/rocm/bin/ollama-runner: no such file or directory
2023/10/09 15:36:39 llama.go:300: starting llama runner
2023/10/09 15:36:39 llama.go:337: waiting for llama runner to start responding
{"timestamp":1696829799,"level":"WARNING","function":"server_params_parse","line":845,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":0}
{"timestamp":1696829799,"level":"INFO","function":"main","line":1190,"message":"build info","build":1,"commit":"9e232f0"}
{"timestamp":1696829799,"level":"INFO","function":"main","line":1192,"message":"system info","n_threads":12,"total_threads":24,"system_info":"AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | "}
llama.cpp: loading model from /home/alex/.ollama/models/blobs/sha256:bcc2734eb66318d6bbbc677681b3165817a5fc15fb68b490829a119a9d97cab4
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 8192
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 64
llama_model_load_internal: n_head_kv  = 8
llama_model_load_internal: n_layer    = 48
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: n_gqa      = 8
llama_model_load_internal: rnorm_eps  = 5.0e-06
llama_model_load_internal: n_ff       = 22016
llama_model_load_internal: freq_base  = 100000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: model size = 34B
llama_model_load_internal: ggml ctx size =    0.13 MB
llama_model_load_internal: mem required  = 18168.87 MB (+  384.00 MB per state)
llama_new_context_with_model: kv self size  =  384.00 MB
llama_new_context_with_model: compute buffer total size =  305.35 MB

llama server listening at http://127.0.0.1:61687

{"timestamp":1696829800,"level":"INFO","function":"main","line":1443,"message":"HTTP server listening","hostname":"127.0.0.1","port":61687}
{"timestamp":1696829800,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":44708,"status":200,"method":"HEAD","path":"/","params":{}}
2023/10/09 15:36:40 llama.go:353: llama runner started in 1.400692 seconds
{"timestamp":1696829800,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":44708,"status":200,"method":"POST","path":"/tokenize","params":{}}
{"timestamp":1696829800,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":44708,"status":200,"method":"POST","path":"/tokenize","params":{}}
[GIN] 2023/10/09 - 15:36:40 | 200 |  1.559558557s |       127.0.0.1 | POST     "/api/generate"

I don't see any llm_load_tensors messages.

<!-- gh-comment-id:1752373069 --> @deadmeu commented on GitHub (Oct 9, 2023): ``` > ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast ./ollama serve 2023/10/09 15:35:52 images.go:995: total blobs: 21 2023/10/09 15:35:52 images.go:1002: total unused blobs removed: 0 [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached. [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production. - using env: export GIN_MODE=release - using code: gin.SetMode(gin.ReleaseMode) [GIN-debug] POST /api/pull --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers) [GIN-debug] POST /api/generate --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers) [GIN-debug] POST /api/embeddings --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers) [GIN-debug] POST /api/create --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers) [GIN-debug] POST /api/push --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers) [GIN-debug] POST /api/copy --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers) [GIN-debug] DELETE /api/delete --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers) [GIN-debug] POST /api/show --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers) [GIN-debug] GET / --> github.com/jmorganca/ollama/server.Serve.func2 (5 handlers) [GIN-debug] GET /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers) [GIN-debug] HEAD / --> github.com/jmorganca/ollama/server.Serve.func2 (5 handlers) [GIN-debug] HEAD /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers) 2023/10/09 15:35:52 routes.go:579: Listening on 127.0.0.1:11434 2023/10/09 15:35:52 accelerator_rocm.go:71: ROCm presenting 18114 MiB of available VRAM on device "card0" ``` And then when I run `ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast ./ollama run codellama:34b` in another terminal I get the following printed out in the server's terminal: ``` [GIN] 2023/10/09 - 15:36:39 | 200 | 27.801µs | 127.0.0.1 | HEAD "/" [GIN] 2023/10/09 - 15:36:39 | 200 | 519.394µs | 127.0.0.1 | GET "/api/tags" 2023/10/09 15:36:39 accelerator_rocm.go:71: ROCm presenting 18092 MiB of available VRAM on device "card0" 2023/10/09 15:36:39 llama.go:219: filesize is 18169 MiB, with 48 layers. Assuming 378 MiB per layer 2023/10/09 15:36:39 llama.go:220: 18092 MiB VRAM available, loading up to 47 GPU layers 2023/10/09 15:36:39 llama.go:283: llama runner not found: stat /tmp/ollama1663762207/llama.cpp/ggml/build/rocm/bin/ollama-runner: no such file or directory 2023/10/09 15:36:39 llama.go:300: starting llama runner 2023/10/09 15:36:39 llama.go:337: waiting for llama runner to start responding {"timestamp":1696829799,"level":"WARNING","function":"server_params_parse","line":845,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":0} {"timestamp":1696829799,"level":"INFO","function":"main","line":1190,"message":"build info","build":1,"commit":"9e232f0"} {"timestamp":1696829799,"level":"INFO","function":"main","line":1192,"message":"system info","n_threads":12,"total_threads":24,"system_info":"AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | "} llama.cpp: loading model from /home/alex/.ollama/models/blobs/sha256:bcc2734eb66318d6bbbc677681b3165817a5fc15fb68b490829a119a9d97cab4 llama_model_load_internal: format = ggjt v3 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 8192 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 64 llama_model_load_internal: n_head_kv = 8 llama_model_load_internal: n_layer = 48 llama_model_load_internal: n_rot = 128 llama_model_load_internal: n_gqa = 8 llama_model_load_internal: rnorm_eps = 5.0e-06 llama_model_load_internal: n_ff = 22016 llama_model_load_internal: freq_base = 100000.0 llama_model_load_internal: freq_scale = 1 llama_model_load_internal: ftype = 2 (mostly Q4_0) llama_model_load_internal: model size = 34B llama_model_load_internal: ggml ctx size = 0.13 MB llama_model_load_internal: mem required = 18168.87 MB (+ 384.00 MB per state) llama_new_context_with_model: kv self size = 384.00 MB llama_new_context_with_model: compute buffer total size = 305.35 MB llama server listening at http://127.0.0.1:61687 {"timestamp":1696829800,"level":"INFO","function":"main","line":1443,"message":"HTTP server listening","hostname":"127.0.0.1","port":61687} {"timestamp":1696829800,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":44708,"status":200,"method":"HEAD","path":"/","params":{}} 2023/10/09 15:36:40 llama.go:353: llama runner started in 1.400692 seconds {"timestamp":1696829800,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":44708,"status":200,"method":"POST","path":"/tokenize","params":{}} {"timestamp":1696829800,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":44708,"status":200,"method":"POST","path":"/tokenize","params":{}} [GIN] 2023/10/09 - 15:36:40 | 200 | 1.559558557s | 127.0.0.1 | POST "/api/generate" ``` I don't see any `llm_load_tensors` messages.
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

This is your problem: llama runner not found: stat /tmp/ollama1663762207/llama.cpp/ggml/build/rocm/bin/ollama-runner: no such file or directory

For some reason the binary you are running doesn't have the rocm runner embedded in it. Can you try deleting ollama and generating/building again and ensure -tags rocm are set? Also, ls llm/llama.cpp/gguf/build/rocm/bin/ should show a file ollama-runner

<!-- gh-comment-id:1752374439 --> @65a commented on GitHub (Oct 9, 2023): This is your problem: `llama runner not found: stat /tmp/ollama1663762207/llama.cpp/ggml/build/rocm/bin/ollama-runner: no such file or directory` For some reason the binary you are running doesn't have the rocm runner embedded in it. Can you try deleting ollama and generating/building again and ensure `-tags rocm` are set? Also, `ls llm/llama.cpp/gguf/build/rocm/bin/` should show a file `ollama-runner`
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

If the file isn't there, something went wrong with go generate. If it is there, something went wrong with go build.

<!-- gh-comment-id:1752377315 --> @65a commented on GitHub (Oct 9, 2023): If the file isn't there, something went wrong with go generate. If it is there, something went wrong with go build.
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

Actually, since we see the VRAM check work, I think go generate either didn't have -tags rocm, or something weird happened there...did it look like it maybe failed or something?

<!-- gh-comment-id:1752378628 --> @65a commented on GitHub (Oct 9, 2023): Actually, since we see the VRAM check work, I think go generate either didn't have `-tags rocm`, or something weird happened there...did it look like it maybe failed or something?
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

Here's what I run locally, your paths may vary (note the ./... after go generate also)

ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./...

There should be a bunch of CMake output, first for CPU and then a second run that talks about HIPBLAS
then

go build -tags rocm

That should result in a binary with the correct runners on a clean checkout.

<!-- gh-comment-id:1752382485 --> @65a commented on GitHub (Oct 9, 2023): Here's what I run locally, your paths may vary (note the ./... after go generate also) ``` ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./... ``` There should be a bunch of CMake output, first for CPU and then a second run that talks about HIPBLAS then ``` go build -tags rocm ``` That should result in a binary with the correct runners on a clean checkout.
Author
Owner

@deadmeu commented on GitHub (Oct 9, 2023):

Sorry I'm not very familiar with go development. Is there a clean command which will clean up everything from both the go build and go generate commands? Otherwise I may try to start over from a new directory 😛

Also, I can confirm llm/llama.cpp/gguf/build/rocm/bin/ollama-runner exists.

<!-- gh-comment-id:1752384699 --> @deadmeu commented on GitHub (Oct 9, 2023): Sorry I'm not very familiar with go development. Is there a clean command which will clean up everything from both the `go build` and `go generate` commands? Otherwise I may try to start over from a new directory :stuck_out_tongue: Also, I can confirm `llm/llama.cpp/gguf/build/rocm/bin/ollama-runner` exists.
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

You can just run them again, at least it works for me. I think you should see the words hipblas go by on the generate command. A new directory isn't a terrible idea, because CMake does stash environment variables everywhere, so that might be part of the issue here, if rebuilding with the commands above still is missing the runner. I'll also do a clean checkout and see if I am missing something.

<!-- gh-comment-id:1752385997 --> @65a commented on GitHub (Oct 9, 2023): You can just run them again, at least it works for me. I think you should see the words hipblas go by on the generate command. A new directory isn't a terrible idea, because CMake does stash environment variables everywhere, so that might be part of the issue here, if rebuilding with the commands above still is missing the runner. I'll also do a clean checkout and see if I am missing something.
Author
Owner

@deadmeu commented on GitHub (Oct 9, 2023):

It looks like hipBLAS is working: -- HIP and hipBLAS found.
Here's the complete output for the go generate command:
generate-log.log

<!-- gh-comment-id:1752400048 --> @deadmeu commented on GitHub (Oct 9, 2023): It looks like hipBLAS is working: `-- HIP and hipBLAS found`. Here's the complete output for the `go generate` command: [generate-log.log](https://github.com/jmorganca/ollama/files/12843100/generate-log.log)
Author
Owner

@deadmeu commented on GitHub (Oct 9, 2023):

Do you know why the /tmp/ollama1663762207/ path is being prefixed when trying to look up /llama.cpp/ggml/build/rocm/bin/ollama-runner?

<!-- gh-comment-id:1752401528 --> @deadmeu commented on GitHub (Oct 9, 2023): Do you know why the `/tmp/ollama1663762207/` path is being prefixed when trying to look up `/llama.cpp/ggml/build/rocm/bin/ollama-runner`?
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

I am noticing something weird on my clean checkout, I may have borked a merge or something let me poke at it a bit. Thanks again for testing. I can do ROCm-assisted inference with the result, but it looks like it ignored my build tags or something... Wrong code directory. Human error is real.

The path is actually an embedded binary, or should be, inside ollama using go-embed I think.

<!-- gh-comment-id:1752401991 --> @65a commented on GitHub (Oct 9, 2023): ~~I am noticing something weird on my clean checkout, I may have borked a merge or something let me poke at it a bit. Thanks again for testing. I can do ROCm-assisted inference with the result, but it looks like it ignored my build tags or something...~~ Wrong code directory. Human error is real. The path is actually an embedded binary, or should be, inside `ollama` using go-embed I think.
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

Generate output looks similar to my working output, I suspect if you run the ollama-runner directly you would get gpu inference on it. So we need to figure out why the go build -tags rocm isn't embedding your binary...Starting my clean checkout over, so I'll try to reproduce as well.

<!-- gh-comment-id:1752404309 --> @65a commented on GitHub (Oct 9, 2023): Generate output looks similar to my working output, I suspect if you run the ollama-runner directly you would get gpu inference on it. So we need to figure out why the `go build -tags rocm` isn't embedding your binary...Starting my clean checkout over, so I'll try to reproduce as well.
Author
Owner

@deadmeu commented on GitHub (Oct 9, 2023):

Are there any other dependencies required which I may be missing? Maybe something required for go embeds to work?

<!-- gh-comment-id:1752407970 --> @deadmeu commented on GitHub (Oct 9, 2023): Are there any other dependencies required which I may be missing? Maybe something required for go embeds to work?
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

I don't think so. Here's my log starting with go build -tags rocm and running inference successfully: https://pastebin.com/p0ZpqFEE

Are you running the ollama by typing ./ollama serve?

<!-- gh-comment-id:1752410878 --> @65a commented on GitHub (Oct 9, 2023): I don't think so. Here's my log starting with `go build -tags rocm` and running inference successfully: https://pastebin.com/p0ZpqFEE Are you running the ollama by typing `./ollama serve`?
Author
Owner

@deadmeu commented on GitHub (Oct 9, 2023):

Yes, with environment variables included: ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast ./ollama serve

It might be worth mentioning I also have ollama installed from the Arch repo but for this ROCm debugging session I've been running the locally built binary.

<!-- gh-comment-id:1752413791 --> @deadmeu commented on GitHub (Oct 9, 2023): Yes, with environment variables included: `ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast ./ollama serve` It might be worth mentioning I also have ollama installed [from the Arch repo](https://archlinux.org/packages/extra/x86_64/ollama/) but for this ROCm debugging session I've been running the locally built binary.
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

ab0668293c/llm/llama.go (L28) the embed is right here, given you have a binary that matches that glob, I would expect go build -tags rocm to contain it.

<!-- gh-comment-id:1752413807 --> @65a commented on GitHub (Oct 9, 2023): https://github.com/jmorganca/ollama/blob/ab0668293cbfc2188b736d4c2b7dc0b7b997f5bf/llm/llama.go#L28 the embed is right here, given you have a binary that matches that glob, I would expect `go build -tags rocm` to contain it.
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

Were you able to try again with a rebuilt binary? I don't think the packaged version also being installed matters, so long as you invoke the right one.

<!-- gh-comment-id:1752416023 --> @65a commented on GitHub (Oct 9, 2023): Were you able to try again with a rebuilt binary? I don't think the packaged version also being installed matters, so long as you invoke the right one.
Author
Owner

@deadmeu commented on GitHub (Oct 9, 2023):

I've started fresh with a new directory, cloning your repo down and building again, and still nothing 😢

It looks like I'm on commit 87b11be, is that what you're using too?

<!-- gh-comment-id:1752421440 --> @deadmeu commented on GitHub (Oct 9, 2023): I've started fresh with a new directory, cloning your repo down and building again, and still nothing :cry: It looks like I'm on commit `87b11be`, is that what you're using too?
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

Yes. On arch, I have the following packages. Also make sure that you did git clone --recursive, though I think so. It might be interesting to try to manually run ./llm/llama.cpp/gguf/build/rocm/bin/ollama-runner, it's has a little webui and stuff, you can use --help to see the flags. That will at least show the runner is working.
Arch packages I have installed that start with ROCm:

rocm-clang-ocl 5.6.1-1
rocm-cmake 5.6.1-1
rocm-core 5.6.1-1
rocm-device-libs 5.6.1-1
rocm-hip-libraries 5.6.1-1
rocm-hip-runtime 5.6.1-1
rocm-hip-sdk 5.6.1-1
rocm-language-runtime 5.6.1-1
rocm-llvm 5.6.1-1
rocm-opencl-runtime 5.6.1-1
rocm-opencl-sdk 5.6.1-1
rocm-smi-lib 5.6.1-1
rocminfo 5.6.1-1
<!-- gh-comment-id:1752424389 --> @65a commented on GitHub (Oct 9, 2023): Yes. On arch, I have the following packages. Also make sure that you did git clone --recursive, though I think so. It might be interesting to try to manually run `./llm/llama.cpp/gguf/build/rocm/bin/ollama-runner`, it's has a little webui and stuff, you can use `--help` to see the flags. That will at least show the runner is working. Arch packages I have installed that start with ROCm: ``` rocm-clang-ocl 5.6.1-1 rocm-cmake 5.6.1-1 rocm-core 5.6.1-1 rocm-device-libs 5.6.1-1 rocm-hip-libraries 5.6.1-1 rocm-hip-runtime 5.6.1-1 rocm-hip-sdk 5.6.1-1 rocm-language-runtime 5.6.1-1 rocm-llvm 5.6.1-1 rocm-opencl-runtime 5.6.1-1 rocm-opencl-sdk 5.6.1-1 rocm-smi-lib 5.6.1-1 rocminfo 5.6.1-1 ```
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

I'm wondering if running the command is failing because of a missing runtime dep, and it is actually embedded correctly...

<!-- gh-comment-id:1752425053 --> @65a commented on GitHub (Oct 9, 2023): I'm wondering if running the command is failing because of a missing runtime dep, and it is actually embedded correctly...
Author
Owner

@deadmeu commented on GitHub (Oct 9, 2023):

Are you running an AMD CPU? rocminfo lists my 3900X as the first "agent" so maybe it's defaulting to that?

<!-- gh-comment-id:1752425442 --> @deadmeu commented on GitHub (Oct 9, 2023): Are you running an AMD CPU? rocminfo lists my 3900X as the first "agent" so maybe it's defaulting to that?
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

rocminfo lists my CPU as the first agent too, it's smart enough not to use it. I have AMD and Intel CPUs, integrated graphics might be an open problem though.

<!-- gh-comment-id:1752426481 --> @65a commented on GitHub (Oct 9, 2023): rocminfo lists my CPU as the first agent too, it's smart enough not to use it. I have AMD and Intel CPUs, integrated graphics might be an open problem though.
Author
Owner

@deadmeu commented on GitHub (Oct 9, 2023):

What's the difference between ggml and gguf? Could the model I'm using be the problem?

<!-- gh-comment-id:1752428920 --> @deadmeu commented on GitHub (Oct 9, 2023): What's the difference between ggml and gguf? Could the model I'm using be the problem?
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

Are you using a .gguf file? or a .ggml?

<!-- gh-comment-id:1752429273 --> @65a commented on GitHub (Oct 9, 2023): Are you using a .gguf file? or a .ggml?
Author
Owner

@deadmeu commented on GitHub (Oct 9, 2023):

Are you using a .gguf file? or a .ggml?

I've tried running llama2 and codellama:34b so whatever they are. I think they might be .ggml

<!-- gh-comment-id:1752431642 --> @deadmeu commented on GitHub (Oct 9, 2023): > Are you using a .gguf file? or a .ggml? I've tried running `llama2` and `codellama:34b` so whatever they are. I think they might be .ggml
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

Aha. I don't have any GGMLs around, and GGML uses CLBlast only. Let me grab one and try it. GGUF is the newer format, if you want to grab one off hugging face and try that, we might be on to something finally!

<!-- gh-comment-id:1752432871 --> @65a commented on GitHub (Oct 9, 2023): Aha. I don't have any GGMLs around, and GGML uses CLBlast only. Let me grab one and try it. GGUF is the newer format, if you want to grab one off hugging face and try that, we might be on to something finally!
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

Thanks again for testing stuff, I will double check all the ggml paths and go generate code, if there's an obvious bug or something while I download this q2k 7b model that is probably going to actually drool :)

<!-- gh-comment-id:1752435059 --> @65a commented on GitHub (Oct 9, 2023): Thanks again for testing stuff, I will double check all the ggml paths and go generate code, if there's an obvious bug or something while I download this q2k 7b model that is probably going to actually drool :)
Author
Owner

@deadmeu commented on GitHub (Oct 9, 2023):

I'm very new to this space so thanks for your patience and help with getting set up. What would be the easiest way to run a GGUF model with ollama? I've been relying on ollama run to automatically import and set everything up for me. From what I've seen on Hugging Face many models are provided in some multi-part .bin format.

<!-- gh-comment-id:1752437293 --> @deadmeu commented on GitHub (Oct 9, 2023): I'm very new to this space so thanks for your patience and help with getting set up. What would be the easiest way to run a GGUF model with ollama? I've been relying on `ollama run` to automatically import and set everything up for me. From what I've seen on Hugging Face many models are provided in some multi-part .bin format.
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

Dude, I am sorry! You have found a bug in my PR. Stand by, I'll push a fix.

<!-- gh-comment-id:1752437385 --> @65a commented on GitHub (Oct 9, 2023): Dude, I am sorry! You have found a bug in my PR. Stand by, I'll push a fix.
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

tl;dr, when they renamed the ollama-runner from server, I did that for gguf but not ggml. I just force pushed a fix, let me stare at it again and make sure I got it all.

<!-- gh-comment-id:1752441028 --> @65a commented on GitHub (Oct 9, 2023): tl;dr, when they renamed the ollama-runner from server, I did that for gguf but not ggml. I just force pushed a fix, let me stare at it again and make sure I got it all.
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

Fix pushed, I would use a new directory just to be sure, you want 22d1439328b30ddace69503339f2ce6043fc3e7d

<!-- gh-comment-id:1752442536 --> @65a commented on GitHub (Oct 9, 2023): Fix pushed, I would use a new directory just to be sure, you want 22d1439328b30ddace69503339f2ce6043fc3e7d
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

Note that GGUF will be faster, since it will use ROCm, and not CLBlast. You shouldn't need any environment variables for ./ollama serve, since /opt/rocm will be used by default and the CLBlast_DIR is just a builld thing.

<!-- gh-comment-id:1752443705 --> @65a commented on GitHub (Oct 9, 2023): Note that GGUF will be faster, since it will use ROCm, and not CLBlast. You shouldn't need any environment variables for `./ollama serve`, since /opt/rocm will be used by default and the CLBlast_DIR is just a builld thing.
Author
Owner

@deadmeu commented on GitHub (Oct 9, 2023):

Ok I deleted the old directory and re-cloned (with --recursive) then checked out 22d1439328 but it's still only running on the CPU on the llama2 GGML model 🤔 Edit: 2023/10/09 17:10:15 llama.go:283: llama runner not found: stat /tmp/ollama3703397108/llama.cpp/ggml/build/rocm/bin/ollama-runner: no such file or directory
I've downloaded a GGUF model so I'll try that now as well.

<!-- gh-comment-id:1752455547 --> @deadmeu commented on GitHub (Oct 9, 2023): Ok I deleted the old directory and re-cloned (with `--recursive`) then checked out https://github.com/jmorganca/ollama/commit/22d1439328b30ddace69503339f2ce6043fc3e7d but it's still only running on the CPU on the llama2 GGML model :thinking: Edit: `2023/10/09 17:10:15 llama.go:283: llama runner not found: stat /tmp/ollama3703397108/llama.cpp/ggml/build/rocm/bin/ollama-runner: no such file or directory` I've downloaded a GGUF model so I'll try that now as well.
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

Ok, let me try locally and see what I can find with ggml issues, I didn't test that much (as we unfortunately found out).

<!-- gh-comment-id:1752457971 --> @65a commented on GitHub (Oct 9, 2023): Ok, let me try locally and see what I can find with ggml issues, I didn't test that much (as we unfortunately found out).
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

Ok, yeah, my fix didn't take for some reason, I don't see it in a clean checkout. Standby, I'll ensure I actually committed the right things before I share a commit 😄 I expect GGUF may work though already.

<!-- gh-comment-id:1752467521 --> @65a commented on GitHub (Oct 9, 2023): Ok, yeah, my fix didn't take for some reason, I don't see it in a clean checkout. Standby, I'll ensure I actually committed the right things before I share a commit :smile: I expect GGUF may work though already.
Author
Owner

@deadmeu commented on GitHub (Oct 9, 2023):

I managed to import a GGUF model (codellama-7b.Q2_K.gguf from https://huggingface.co/TheBloke/CodeLlama-7B-GGUF) but when I run it I still see a large amount of CPU utilisation ☹️

<!-- gh-comment-id:1752467666 --> @deadmeu commented on GitHub (Oct 9, 2023): I managed to import a GGUF model (`codellama-7b.Q2_K.gguf` from https://huggingface.co/TheBloke/CodeLlama-7B-GGUF) but when I run it I still see a large amount of CPU utilisation :frowning_face:
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

You might need PARAMETER num_gpu 50 or similar in the modelfile, depending on how you are running it. Logs help, did it find the runner for gguf at least?

<!-- gh-comment-id:1752470073 --> @65a commented on GitHub (Oct 9, 2023): You might need `PARAMETER num_gpu 50` or similar in the modelfile, depending on how you are running it. Logs help, did it find the runner for gguf at least?
Author
Owner

@deadmeu commented on GitHub (Oct 9, 2023):

Sorry, there is definitely something interesting happening in the logs:
serve-my-model.log

CUDA error 98 at /home/alex/.tools/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:6246: invalid device function
current device: 0
2023/10/09 17:26:41 llama.go:310: llama runner exited with error: exit status 1
2023/10/09 17:26:41 llama.go:317: error starting llama runner: llama runner process has terminated
<!-- gh-comment-id:1752475201 --> @deadmeu commented on GitHub (Oct 9, 2023): Sorry, there is definitely something interesting happening in the logs: [serve-my-model.log](https://github.com/jmorganca/ollama/files/12843475/serve-my-model.log) ``` CUDA error 98 at /home/alex/.tools/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:6246: invalid device function current device: 0 2023/10/09 17:26:41 llama.go:310: llama runner exited with error: exit status 1 2023/10/09 17:26:41 llama.go:317: error starting llama runner: llama runner process has terminated ```
Author
Owner

@deadmeu commented on GitHub (Oct 9, 2023):

Could be related to https://github.com/ggerganov/llama.cpp/issues/3320?

<!-- gh-comment-id:1752477947 --> @deadmeu commented on GitHub (Oct 9, 2023): Could be related to https://github.com/ggerganov/llama.cpp/issues/3320?
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

I verified caae3cbab8136350da09c6d6e02c240c3a4db659 has my fix looking in the github UI, it also has a cuda fix of the same nature. You can try that one for GGML.

For the error, try adding AMDGPU_TARGETS=gfx1100 GPU_TARGETS=gfx1100 in front of the go generate, though I'm not sure how to do that (or if it's necessary) for CLBlast/ggml. It should help with gguf.

<!-- gh-comment-id:1752479594 --> @65a commented on GitHub (Oct 9, 2023): I verified caae3cbab8136350da09c6d6e02c240c3a4db659 has my fix looking in the github UI, it also has a cuda fix of the same nature. You can try that one for GGML. For the error, try adding `AMDGPU_TARGETS=gfx1100 GPU_TARGETS=gfx1100` in front of the go generate, though I'm not sure how to do that (or if it's necessary) for CLBlast/ggml. It should help with gguf.
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

Re #3320, it's that error, try the AMDGPU_TARGETS=gfx1100 GPU_TARGETS=gfx1100 ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./... and it will probably help.

<!-- gh-comment-id:1752482729 --> @65a commented on GitHub (Oct 9, 2023): Re #3320, it's that error, try the `AMDGPU_TARGETS=gfx1100 GPU_TARGETS=gfx1100 ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./...` and it will probably help.
Author
Owner

@deadmeu commented on GitHub (Oct 9, 2023):

It works!

AMDGPU_TARGETS=gfx1100 GPU_TARGETS=gfx1100 fixed it.

It's screaming now:

llama_print_timings:        load time =  1347.70 ms
llama_print_timings:      sample time =   308.01 ms /   460 runs   (    0.67 ms per token,  1493.47 tokens per second)
llama_print_timings: prompt eval time =   179.28 ms /     8 tokens (   22.41 ms per token,    44.62 tokens per second)
llama_print_timings:        eval time =  5349.34 ms /   459 runs   (   11.65 ms per token,    85.80 tokens per second)
llama_print_timings:       total time =  5855.91 ms

I'll give your new commit a try for GGML models.

<!-- gh-comment-id:1752485543 --> @deadmeu commented on GitHub (Oct 9, 2023): It works! `AMDGPU_TARGETS=gfx1100 GPU_TARGETS=gfx1100` fixed it. It's screaming now: ``` llama_print_timings: load time = 1347.70 ms llama_print_timings: sample time = 308.01 ms / 460 runs ( 0.67 ms per token, 1493.47 tokens per second) llama_print_timings: prompt eval time = 179.28 ms / 8 tokens ( 22.41 ms per token, 44.62 tokens per second) llama_print_timings: eval time = 5349.34 ms / 459 runs ( 11.65 ms per token, 85.80 tokens per second) llama_print_timings: total time = 5855.91 ms ``` I'll give your new commit a try for GGML models.
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

Thanks again for testing, I almost never use GGML, so this is great feedback.

<!-- gh-comment-id:1752486146 --> @65a commented on GitHub (Oct 9, 2023): Thanks again for testing, I almost never use GGML, so this is great feedback.
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

It is a q2k of a 7B, but 85tok/s is pretty nice to see!

<!-- gh-comment-id:1752486843 --> @65a commented on GitHub (Oct 9, 2023): It is a q2k of a 7B, but 85tok/s is pretty nice to see!
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

I'm going to try the q2k 7b ggml on a gfx1030 (RX6950XT) in a second, and see if I can get that working.

<!-- gh-comment-id:1752487674 --> @65a commented on GitHub (Oct 9, 2023): I'm going to try the q2k 7b ggml on a gfx1030 (RX6950XT) in a second, and see if I can get that working.
Author
Owner

@deadmeu commented on GitHub (Oct 9, 2023):

Ok testing GGML on caae3cbab8 is interesting. It seems to be using some of the GPU (~30%) but still plenty of CPU. Is this expected, given what you said about the format?

<!-- gh-comment-id:1752493755 --> @deadmeu commented on GitHub (Oct 9, 2023): Ok testing GGML on https://github.com/jmorganca/ollama/commit/caae3cbab8136350da09c6d6e02c240c3a4db659 is interesting. It seems to be using some of the GPU (~30%) but still plenty of CPU. Is this expected, given what you said about the format?
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

GGML also works at caae3cbab8136350da09c6d6e02c240c3a4db659 for me, though as you can see much slower than ROCm (though gfx1100 is a beast compared to gfx1030)

llama_print_timings:        load time =  1889.26 ms
llama_print_timings:      sample time =    35.83 ms /    64 runs   (    0.56 ms per token,  1786.01 tokens per second)
llama_print_timings: prompt eval time =  1888.78 ms /    22 tokens (   85.85 ms per token,    11.65 tokens per second)
llama_print_timings:        eval time =  1994.60 ms /    63 runs   (   31.66 ms per token,    31.59 tokens per second)
llama_print_timings:       total time =  3938.16 ms

Also, if you wondered how smart a q2k 7b is, here it is:

 If Sarah has five brothers, and each has two sisters, how many sisters does Sarah have?

Answer: Sarah has 5 sisters.

Explanation: Each brother has two sisters, so the total number of sisters is 2 x 5 = 10. Since Sarah has 5 brothers, she has 5 x 2 = 10 sisters.
<!-- gh-comment-id:1752494129 --> @65a commented on GitHub (Oct 9, 2023): GGML also works at caae3cbab8136350da09c6d6e02c240c3a4db659 for me, though as you can see much slower than ROCm (though gfx1100 is a beast compared to gfx1030) ``` llama_print_timings: load time = 1889.26 ms llama_print_timings: sample time = 35.83 ms / 64 runs ( 0.56 ms per token, 1786.01 tokens per second) llama_print_timings: prompt eval time = 1888.78 ms / 22 tokens ( 85.85 ms per token, 11.65 tokens per second) llama_print_timings: eval time = 1994.60 ms / 63 runs ( 31.66 ms per token, 31.59 tokens per second) llama_print_timings: total time = 3938.16 ms ``` Also, if you wondered how smart a q2k 7b is, here it is: ``` If Sarah has five brothers, and each has two sisters, how many sisters does Sarah have? Answer: Sarah has 5 sisters. Explanation: Each brother has two sisters, so the total number of sisters is 2 x 5 = 10. Since Sarah has 5 brothers, she has 5 x 2 = 10 sisters. ```
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

Thanks again @deadmeu for testing and debugging with me, but I think we got it to a good point! I might put a top level build script somewhere, there are a ton of env vars to manage for ROCm/CLBlast.

<!-- gh-comment-id:1752496211 --> @65a commented on GitHub (Oct 9, 2023): Thanks again @deadmeu for testing and debugging with me, but I think we got it to a good point! I might put a top level build script somewhere, there are a ton of env vars to manage for ROCm/CLBlast.
Author
Owner

@deadmeu commented on GitHub (Oct 9, 2023):

No problem - thanks for sticking with me and seeing it through. Glad to have this working, and thanks a ton for your contributions to get it to where it is now. I'm looking forward to seeing how ollama and this space in general grows and am really keen to see ROCm mature more and get the love it deserves from AMD!

<!-- gh-comment-id:1752498110 --> @deadmeu commented on GitHub (Oct 9, 2023): No problem - thanks for sticking with me and seeing it through. Glad to have this working, and thanks a ton for your contributions to get it to where it is now. I'm looking forward to seeing how ollama and this space in general grows and am really keen to see ROCm mature more and get the love it deserves from AMD!
Author
Owner

@65a commented on GitHub (Oct 9, 2023):

Re: GGML/OpenCL performance, I think it's less optimized, and it's an older copy of llama.cpp codebase, so also less optimized. However, there are a variety of OpenCL drivers including Mesa and ROCm's OpenCL driver, as well as some other ones that might use CPU. You can poke around at clinfo and OpenCL docs, but if it's using some GPU it's probably working...you can imagine why I use GGUF mainly now, given the performance delta.

<!-- gh-comment-id:1752507117 --> @65a commented on GitHub (Oct 9, 2023): Re: GGML/OpenCL performance, I think it's less optimized, and it's an older copy of llama.cpp codebase, so also less optimized. However, there are a variety of OpenCL drivers including Mesa and ROCm's OpenCL driver, as well as some other ones that might use CPU. You can poke around at clinfo and OpenCL docs, but if it's using some GPU it's probably working...you can imagine why I use GGUF mainly now, given the performance delta.
Author
Owner

@deadmeu commented on GitHub (Oct 9, 2023):

It seems like all the models in https://ollama.ai/library are GGML which is how I ended up using them. Could we maybe have those swapped over to GGUF as a preferred default?

<!-- gh-comment-id:1752513780 --> @deadmeu commented on GitHub (Oct 9, 2023): It seems like all the models in https://ollama.ai/library are GGML which is how I ended up using them. Could we maybe have those swapped over to GGUF as a preferred default?
Author
Owner

@mchiang0610 commented on GitHub (Oct 11, 2023):

@deadmeu all the recent models have been uploaded using GGUF. The ones uploaded before the switch to GGUF are in GGML (for example, llama2 ). We will be uploading GGUF to the library for those models too soon.

<!-- gh-comment-id:1758207870 --> @mchiang0610 commented on GitHub (Oct 11, 2023): @deadmeu all the recent models have been uploaded using GGUF. The ones uploaded before the switch to GGUF are in GGML (for example, llama2 ). We will be uploading GGUF to the library for those models too soon.
Author
Owner

@scd31 commented on GitHub (Oct 24, 2023):

I'm running into issues getting this working on my RX580 on Arch Linux.

Here's my output:

2023/10/24 17:42:11 images.go:822: total blobs: 7
2023/10/24 17:42:11 images.go:829: total unused blobs removed: 0
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:	export GIN_MODE=release
 - using code:	gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST   /api/pull                 --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers)
[GIN-debug] POST   /api/generate             --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers)
[GIN-debug] POST   /api/embeddings           --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers)
[GIN-debug] POST   /api/create               --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers)
[GIN-debug] POST   /api/push                 --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers)
[GIN-debug] POST   /api/copy                 --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers)
[GIN-debug] DELETE /api/delete               --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers)
[GIN-debug] POST   /api/show                 --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers)
[GIN-debug] GET    /                         --> github.com/jmorganca/ollama/server.Serve.func2 (5 handlers)
[GIN-debug] GET    /api/tags                 --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
[GIN-debug] HEAD   /                         --> github.com/jmorganca/ollama/server.Serve.func2 (5 handlers)
[GIN-debug] HEAD   /api/tags                 --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
2023/10/24 17:42:11 routes.go:662: Listening on 127.0.0.1:11434 (version 0.0.0)
2023/10/24 17:42:11 routes.go:682: Warning: GPU support may not enabled, check you have installed install GPU drivers: nvidia-smi command failed
[GIN] 2023/10/24 - 17:48:31 | 200 |    1.081824ms |       127.0.0.1 | HEAD     "/"
[GIN] 2023/10/24 - 17:48:31 | 200 |    2.403875ms |       127.0.0.1 | GET      "/api/tags"
2023/10/24 17:48:31 llama.go:363: starting llama runner
2023/10/24 17:48:31 llama.go:421: waiting for llama runner to start responding
{"timestamp":1698180511,"level":"WARNING","function":"server_params_parse","line":845,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":0}
{"timestamp":1698180511,"level":"INFO","function":"main","line":1190,"message":"build info","build":1009,"commit":"9e232f0"}
{"timestamp":1698180511,"level":"INFO","function":"main","line":1192,"message":"system info","n_threads":8,"total_threads":16,"system_info":"AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | "}
llama.cpp: loading model from /home/stephen/.ollama/models/blobs/sha256:b5749cc827d33b7cb4c8869cede7b296a0a28d9e5d1982705c2ba4c603258159
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_head_kv  = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: n_gqa      = 1
llama_model_load_internal: rnorm_eps  = 5.0e-06
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.08 MB
llama_model_load_internal: mem required  = 3615.73 MB (+ 1024.00 MB per state)
llama_new_context_with_model: kv self size  = 1024.00 MB
llama_new_context_with_model: compute buffer total size =  153.35 MB

llama server listening at http://127.0.0.1:60613

{"timestamp":1698180517,"level":"INFO","function":"main","line":1443,"message":"HTTP server listening","hostname":"127.0.0.1","port":60613}
{"timestamp":1698180517,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":55198,"status":200,"method":"HEAD","path":"/","params":{}}
2023/10/24 17:48:37 llama.go:435: llama runner started in 5.601383 seconds
[GIN] 2023/10/24 - 17:48:37 | 200 |  5.611333824s |       127.0.0.1 | POST     "/api/generate"
{"timestamp":1698180539,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":52956,"status":200,"method":"HEAD","path":"/","params":{}}
{"timestamp":1698180545,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":52956,"status":200,"method":"POST","path":"/completion","params":{}}

llama_print_timings:        load time =  1738.84 ms
llama_print_timings:      sample time =    14.75 ms /    19 runs   (    0.78 ms per token,  1288.31 tokens per second)
llama_print_timings: prompt eval time =  1738.72 ms /    19 tokens (   91.51 ms per token,    10.93 tokens per second)
llama_print_timings:        eval time =  3784.08 ms /    18 runs   (  210.23 ms per token,     4.76 tokens per second)
llama_print_timings:       total time =  5543.13 ms
{"timestamp":1698180545,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":54840,"status":200,"method":"POST","path":"/tokenize","params":{}}
[GIN] 2023/10/24 - 17:49:05 | 200 |  5.548686201s |       127.0.0.1 | POST     "/api/generate"

And my modelfile:

FROM llama2-uncensored

PARAMETER num_gpu 50

As far as I can tell it all looks fine, but my CPU is getting hit hard while my GPU remains untouched.

<!-- gh-comment-id:1778017318 --> @scd31 commented on GitHub (Oct 24, 2023): I'm running into issues getting this working on my RX580 on Arch Linux. Here's my output: ``` 2023/10/24 17:42:11 images.go:822: total blobs: 7 2023/10/24 17:42:11 images.go:829: total unused blobs removed: 0 [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached. [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production. - using env: export GIN_MODE=release - using code: gin.SetMode(gin.ReleaseMode) [GIN-debug] POST /api/pull --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers) [GIN-debug] POST /api/generate --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers) [GIN-debug] POST /api/embeddings --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers) [GIN-debug] POST /api/create --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers) [GIN-debug] POST /api/push --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers) [GIN-debug] POST /api/copy --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers) [GIN-debug] DELETE /api/delete --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers) [GIN-debug] POST /api/show --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers) [GIN-debug] GET / --> github.com/jmorganca/ollama/server.Serve.func2 (5 handlers) [GIN-debug] GET /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers) [GIN-debug] HEAD / --> github.com/jmorganca/ollama/server.Serve.func2 (5 handlers) [GIN-debug] HEAD /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers) 2023/10/24 17:42:11 routes.go:662: Listening on 127.0.0.1:11434 (version 0.0.0) 2023/10/24 17:42:11 routes.go:682: Warning: GPU support may not enabled, check you have installed install GPU drivers: nvidia-smi command failed [GIN] 2023/10/24 - 17:48:31 | 200 | 1.081824ms | 127.0.0.1 | HEAD "/" [GIN] 2023/10/24 - 17:48:31 | 200 | 2.403875ms | 127.0.0.1 | GET "/api/tags" 2023/10/24 17:48:31 llama.go:363: starting llama runner 2023/10/24 17:48:31 llama.go:421: waiting for llama runner to start responding {"timestamp":1698180511,"level":"WARNING","function":"server_params_parse","line":845,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":0} {"timestamp":1698180511,"level":"INFO","function":"main","line":1190,"message":"build info","build":1009,"commit":"9e232f0"} {"timestamp":1698180511,"level":"INFO","function":"main","line":1192,"message":"system info","n_threads":8,"total_threads":16,"system_info":"AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | "} llama.cpp: loading model from /home/stephen/.ollama/models/blobs/sha256:b5749cc827d33b7cb4c8869cede7b296a0a28d9e5d1982705c2ba4c603258159 llama_model_load_internal: format = ggjt v3 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_head_kv = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: n_gqa = 1 llama_model_load_internal: rnorm_eps = 5.0e-06 llama_model_load_internal: n_ff = 11008 llama_model_load_internal: freq_base = 10000.0 llama_model_load_internal: freq_scale = 1 llama_model_load_internal: ftype = 2 (mostly Q4_0) llama_model_load_internal: model size = 7B llama_model_load_internal: ggml ctx size = 0.08 MB llama_model_load_internal: mem required = 3615.73 MB (+ 1024.00 MB per state) llama_new_context_with_model: kv self size = 1024.00 MB llama_new_context_with_model: compute buffer total size = 153.35 MB llama server listening at http://127.0.0.1:60613 {"timestamp":1698180517,"level":"INFO","function":"main","line":1443,"message":"HTTP server listening","hostname":"127.0.0.1","port":60613} {"timestamp":1698180517,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":55198,"status":200,"method":"HEAD","path":"/","params":{}} 2023/10/24 17:48:37 llama.go:435: llama runner started in 5.601383 seconds [GIN] 2023/10/24 - 17:48:37 | 200 | 5.611333824s | 127.0.0.1 | POST "/api/generate" {"timestamp":1698180539,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":52956,"status":200,"method":"HEAD","path":"/","params":{}} {"timestamp":1698180545,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":52956,"status":200,"method":"POST","path":"/completion","params":{}} llama_print_timings: load time = 1738.84 ms llama_print_timings: sample time = 14.75 ms / 19 runs ( 0.78 ms per token, 1288.31 tokens per second) llama_print_timings: prompt eval time = 1738.72 ms / 19 tokens ( 91.51 ms per token, 10.93 tokens per second) llama_print_timings: eval time = 3784.08 ms / 18 runs ( 210.23 ms per token, 4.76 tokens per second) llama_print_timings: total time = 5543.13 ms {"timestamp":1698180545,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":54840,"status":200,"method":"POST","path":"/tokenize","params":{}} [GIN] 2023/10/24 - 17:49:05 | 200 | 5.548686201s | 127.0.0.1 | POST "/api/generate" ``` And my modelfile: ``` FROM llama2-uncensored PARAMETER num_gpu 50 ``` As far as I can tell it all looks fine, but my CPU is getting hit hard while my GPU remains untouched.
Author
Owner

@65a commented on GitHub (Oct 25, 2023):

@scd31 I am not sure if ROCm support Polaris cards still, but it's worth a try. The logs you posted are from a binary compiled without ROCm support, or from fallback after it failed. You should see "Using ROCm" near the failure.

<!-- gh-comment-id:1778273787 --> @65a commented on GitHub (Oct 25, 2023): @scd31 I am not sure if ROCm support Polaris cards still, but it's worth a try. The logs you posted are from a binary compiled without ROCm support, or from fallback after it failed. You should see "Using ROCm" near the failure.
Author
Owner

@scd31 commented on GitHub (Oct 25, 2023):

@65a Thanks for your response. A silly question, but where should I see that?

ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./... has lots of output but no "using rocm" as far as I can tell (checked with grep)

go build -tags rocm doesn't return any output

<!-- gh-comment-id:1778327831 --> @scd31 commented on GitHub (Oct 25, 2023): @65a Thanks for your response. A silly question, but where should I see that? `ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./...` has lots of output but no "using rocm" as far as I can tell (checked with grep) `go build -tags rocm` doesn't return any output
Author
Owner

@65a commented on GitHub (Oct 25, 2023):

@scd31 try running ./llm/llama.cpp/gguf/build/rocm/bin/ollama-runner directly, it's the same as llama.cpp server (if you're familiar with it, if not try with --help, it needs -ngl 50 -model /path/to/model at least I think, and has a basic webui).

<!-- gh-comment-id:1778334487 --> @65a commented on GitHub (Oct 25, 2023): @scd31 try running `./llm/llama.cpp/gguf/build/rocm/bin/ollama-runner` directly, it's the same as llama.cpp server (if you're familiar with it, if not try with --help, it needs -ngl 50 -model /path/to/model at least I think, and has a basic webui).
Author
Owner

@65a commented on GitHub (Oct 25, 2023):

I have a theory that ROCm decided not to build on an older card, but you could try adjusting the AMDGPU_TARGETS and GPU_TARGETS to include your card and see what happens. Please do try the above command and see if it errors or is accelerated.

<!-- gh-comment-id:1778335959 --> @65a commented on GitHub (Oct 25, 2023): I have a theory that ROCm decided not to build on an older card, but you could try adjusting the AMDGPU_TARGETS and GPU_TARGETS to include your card and see what happens. Please do try the above command and see if it errors or is accelerated.
Author
Owner

@scd31 commented on GitHub (Oct 26, 2023):

The command errors - I have no rocm folder in llm/llama.cpp/gguf/build, just cpu and cuda. I tried building with AMDGPU_TARGETS=gfx803 GPU_TARGETS=gfx803 ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./... and go build -tags rocm but unfortunately that folder still doesn't exist. Nothing in the output logs when I grep for rocm. Anything else I can try?

<!-- gh-comment-id:1780299655 --> @scd31 commented on GitHub (Oct 26, 2023): The command errors - I have no `rocm` folder in `llm/llama.cpp/gguf/build`, just `cpu` and `cuda`. I tried building with `AMDGPU_TARGETS=gfx803 GPU_TARGETS=gfx803 ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./...` and `go build -tags rocm` but unfortunately that folder still doesn't exist. Nothing in the output logs when I grep for `rocm`. Anything else I can try?
Author
Owner

@65a commented on GitHub (Oct 28, 2023):

@scd31 I'm assuming you have ROCM and CLblast installed, can you pastebin the output of go generate or something?

<!-- gh-comment-id:1783625195 --> @65a commented on GitHub (Oct 28, 2023): @scd31 I'm assuming you have ROCM and CLblast installed, can you pastebin the output of go generate or something?
Author
Owner

@65a commented on GitHub (Oct 28, 2023):

If you don't see: -- HIP and hipBLAS found in the generate output, I suspect you don't have it installed, or it's installed at a different path

<!-- gh-comment-id:1783626471 --> @65a commented on GitHub (Oct 28, 2023): If you don't see: `-- HIP and hipBLAS found` in the generate output, I suspect you don't have it installed, or it's installed at a different path
Author
Owner

@scd31 commented on GitHub (Nov 4, 2023):

Sorry about the delay! I got busy.

Here's the pastebin output: https://pastebin.com/fEL3Rksi

I see no mention of HIP or hipBLAS so I suspect that's where the issue is.

Here's my /opt/rocm contents:

amdgcn  hipblas  hiprand    hsa      libexec  oam     rocalution  rocm_smi  rocsolver  roctracer
bin     hipcub   hipsolver  include  llvm     opencl  rocblas     rocprim   rocsparse  share
hip     hipfft   hipsparse  lib      miopen   rccl    rocfft      rocrand   rocthrust  test

And here's /usr/lib/cmake/CLBlast:

CLBlastConfig.cmake  CLBlastConfig-noconfig.cmake

Anything there look awry? Apologies for the silly questions, I'm super inexperienced with GPGPU in general.

<!-- gh-comment-id:1793543538 --> @scd31 commented on GitHub (Nov 4, 2023): Sorry about the delay! I got busy. Here's the pastebin output: https://pastebin.com/fEL3Rksi I see no mention of HIP or hipBLAS so I suspect that's where the issue is. Here's my `/opt/rocm` contents: ``` amdgcn hipblas hiprand hsa libexec oam rocalution rocm_smi rocsolver roctracer bin hipcub hipsolver include llvm opencl rocblas rocprim rocsparse share hip hipfft hipsparse lib miopen rccl rocfft rocrand rocthrust test ``` And here's `/usr/lib/cmake/CLBlast`: ``` CLBlastConfig.cmake CLBlastConfig-noconfig.cmake ``` Anything there look awry? Apologies for the silly questions, I'm super inexperienced with GPGPU in general.
Author
Owner

@65a commented on GitHub (Nov 9, 2023):

It looks fine to me. Ensure you are running ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./... && go build -tags rocm ./... on a fresh checkout. Can you try building ggerganov/llama.cpp with ROCm support directly? Does that work? This is really just adding a build of that to ollama, fundamentally.

<!-- gh-comment-id:1803058185 --> @65a commented on GitHub (Nov 9, 2023): It looks fine to me. Ensure you are running `ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./... && go build -tags rocm ./...` on a fresh checkout. Can you try building ggerganov/llama.cpp with ROCm support directly? Does that work? This is really just adding a build of that to ollama, fundamentally.
Author
Owner

@65a commented on GitHub (Nov 9, 2023):

@scd31 -- Build files have been written to: /home/stephen/src/ollama/llm/llama.cpp/ggml/build/cuda is highly suspect, you are running a cuda build. Try a fresh checkout of the pull request code (not ollama HEAD), and make sure you run only the commands listed above. I suspect you are operating on the upstream codebase, given that line.

<!-- gh-comment-id:1803060683 --> @65a commented on GitHub (Nov 9, 2023): @scd31 `-- Build files have been written to: /home/stephen/src/ollama/llm/llama.cpp/ggml/build/cuda` is highly suspect, you are running a cuda build. Try a fresh checkout of the pull request code (not ollama HEAD), and make sure you run only the commands listed above. I suspect you are operating on the upstream codebase, given that line.
Author
Owner

@scd31 commented on GitHub (Nov 9, 2023):

@65a That fixed it - thank you so much! No idea how I pulled down the wrong branch in the first place but it's working great now and I can see it maxing out my GPU. Thanks again for all the help!

<!-- gh-comment-id:1803677659 --> @scd31 commented on GitHub (Nov 9, 2023): @65a That fixed it - thank you so much! No idea how I pulled down the wrong branch in the first place but it's working great now and I can see it maxing out my GPU. Thanks again for all the help!
Author
Owner

@luantak commented on GitHub (Nov 10, 2023):

@65a Here are my logs for a vega 56 running ROCm/rocBLAS#814, https://gist.github.com/lu4p/fbad0b502c070af8295f2b4b0761a888

Indeed seems like its not offloading to the GPU. If you need any additional context feel free to ask.

<!-- gh-comment-id:1805052499 --> @luantak commented on GitHub (Nov 10, 2023): @65a Here are my logs for a vega 56 running ROCm/rocBLAS#814, https://gist.github.com/lu4p/fbad0b502c070af8295f2b4b0761a888 Indeed seems like its not offloading to the GPU. If you need any additional context feel free to ask.
Author
Owner

@65a commented on GitHub (Nov 10, 2023):

Yeah, it's not seeing the card for some reason: {"timestamp":1699588706,"level":"WARNING","function":"server_params_parse","line":871,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":-1}

Can you run /opt/rocm/bin/rocm-smi --showmeminfo VRAM --csv? It looks like it's getting some lines but not parsing them or something.

<!-- gh-comment-id:1805065190 --> @65a commented on GitHub (Nov 10, 2023): Yeah, it's not seeing the card for some reason: `{"timestamp":1699588706,"level":"WARNING","function":"server_params_parse","line":871,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":-1} ` Can you run `/opt/rocm/bin/rocm-smi --showmeminfo VRAM --csv`? It looks like it's getting some lines but not parsing them or something.
Author
Owner

@luantak commented on GitHub (Nov 10, 2023):

/opt/rocm/bin/rocm-smi --showmeminfo VRAM --csv
Unable to load the rocm_smi library.
Set LD_LIBRARY_PATH to the folder containing librocm_smi64.
Please refer to https://github.com/RadeonOpenCompute/rocm_smi_lib for the installation guide.

This is located in:

/opt/rocm/lib/librocm_smi64.so
/opt/rocm/lib/librocm_smi64.so.5
/opt/rocm/lib/librocm_smi64.so.5.0

Running the following produces the same output

LD_LIBRARY_PATH="/opt/rocm/rocm_smi/lib/" /opt/rocm/bin/rocm-smi --showmeminfo VRAM --csv 

I installed https://archlinux.org/packages/extra/x86_64/rocm-smi-lib/

<!-- gh-comment-id:1805094919 --> @luantak commented on GitHub (Nov 10, 2023): ``` /opt/rocm/bin/rocm-smi --showmeminfo VRAM --csv Unable to load the rocm_smi library. Set LD_LIBRARY_PATH to the folder containing librocm_smi64. Please refer to https://github.com/RadeonOpenCompute/rocm_smi_lib for the installation guide. ``` This is located in: ``` /opt/rocm/lib/librocm_smi64.so /opt/rocm/lib/librocm_smi64.so.5 /opt/rocm/lib/librocm_smi64.so.5.0 ``` Running the following produces the same output ``` LD_LIBRARY_PATH="/opt/rocm/rocm_smi/lib/" /opt/rocm/bin/rocm-smi --showmeminfo VRAM --csv ``` I installed https://archlinux.org/packages/extra/x86_64/rocm-smi-lib/
Author
Owner

@65a commented on GitHub (Nov 10, 2023):

That looks like a broken ROCm install to me. Uninstall that package, install rocm-hip-sdk and its dependencies (if not already). The fact it can't find libraries means this is a ROCm installation issue, but that package should do it.

<!-- gh-comment-id:1805100585 --> @65a commented on GitHub (Nov 10, 2023): That looks like a broken ROCm install to me. Uninstall that package, install `rocm-hip-sdk` and its dependencies (if not already). The fact it can't find libraries means this is a ROCm installation issue, but that package should do it.
Author
Owner

@65a commented on GitHub (Nov 10, 2023):

Can you also share pacman -Q rocm-hip-sdk if it's already installed? Possibly a packaging problem with upstream if you are using arch's testing repo, I haven't tried 5.7 from there yet.

<!-- gh-comment-id:1805102302 --> @65a commented on GitHub (Nov 10, 2023): Can you also share `pacman -Q rocm-hip-sdk` if it's already installed? Possibly a packaging problem with upstream if you are using arch's `testing` repo, I haven't tried 5.7 from there yet.
Author
Owner

@luantak commented on GitHub (Nov 10, 2023):

error: package 'rocm-hip-sdk' was not found

should I install it?

<!-- gh-comment-id:1805103040 --> @luantak commented on GitHub (Nov 10, 2023): error: package 'rocm-hip-sdk' was not found should I install it?
Author
Owner

@65a commented on GitHub (Nov 10, 2023):

Yes, that should pull all the things necessary, please see https://wiki.archlinux.org/title/GPGPU#ROCm

<!-- gh-comment-id:1805103366 --> @65a commented on GitHub (Nov 10, 2023): Yes, that should pull all the things necessary, please see https://wiki.archlinux.org/title/GPGPU#ROCm
Author
Owner

@luantak commented on GitHub (Nov 10, 2023):

Okay done.
New problem checked out the latest pr and am met with

ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./...         (base) 06:19:44
fatal: remote error: upload-pack: not our ref 9e232f0234073358e7031c1b8d7aa45020469a3b
fatal: Fetched in submodule path 'ggml', but it did not contain 9e232f0234073358e7031c1b8d7aa45020469a3b. Direct fetching of that commit failed.
llm/llama.cpp/generate_linux.go:5: running "git": exit status 128
<!-- gh-comment-id:1805104621 --> @luantak commented on GitHub (Nov 10, 2023): Okay done. New problem checked out the latest pr and am met with ``` ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./... (base) 06:19:44 fatal: remote error: upload-pack: not our ref 9e232f0234073358e7031c1b8d7aa45020469a3b fatal: Fetched in submodule path 'ggml', but it did not contain 9e232f0234073358e7031c1b8d7aa45020469a3b. Direct fetching of that commit failed. llm/llama.cpp/generate_linux.go:5: running "git": exit status 128 ```
Author
Owner

@65a commented on GitHub (Nov 10, 2023):

You might need to refetch, I pushed over the original commit with better diagnostics when you had the error. I'm not sure if you also need hip-runtime-amd, you may want to check that package and miopen-hip are installed, I first set up python ML deps on this machine...I've been meaning to figure out the minimum installed packages for arch or debian to write a dockerfile...

<!-- gh-comment-id:1805108170 --> @65a commented on GitHub (Nov 10, 2023): You might need to refetch, I pushed over the original commit with better diagnostics when you had the error. I'm not sure if you also need `hip-runtime-amd`, you may want to check that package and `miopen-hip` are installed, I first set up python ML deps on this machine...I've been meaning to figure out the minimum installed packages for arch or debian to write a dockerfile...
Author
Owner

@65a commented on GitHub (Nov 10, 2023):

I might have a bad merge of the generate_linux*, checking with a clean build...

<!-- gh-comment-id:1805124982 --> @65a commented on GitHub (Nov 10, 2023): I might have a bad merge of the generate_linux*, checking with a clean build...
Author
Owner

@65a commented on GitHub (Nov 10, 2023):

There was a copy-paste error of the cmake directory (fix pushed), but I think your problem was just needing a clean(er) checkout, you can do something like:

git clone --recursive https://github.com/65a/ollama ollama-rocm
cd ollama-rocm
ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./...
go build -tags rocm

This produces a binary at least here.

<!-- gh-comment-id:1805131775 --> @65a commented on GitHub (Nov 10, 2023): There was a copy-paste error of the cmake directory (fix pushed), but I think your problem was just needing a clean(er) checkout, you can do something like: ``` git clone --recursive https://github.com/65a/ollama ollama-rocm ``` ``` cd ollama-rocm ``` ``` ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./... ``` ``` go build -tags rocm ``` This produces a binary at least here.
Author
Owner

@luantak commented on GitHub (Nov 10, 2023):

Seems to be building now

<!-- gh-comment-id:1805139944 --> @luantak commented on GitHub (Nov 10, 2023): Seems to be building now
Author
Owner

@luantak commented on GitHub (Nov 10, 2023):

still the same

2023/11/10 06:57:37 accelerator_rocm.go:32: warning: ROCM_PATH is not set. Trying a likely fallback path, but it is recommended to set this variable in the environment.
2023/11/10 06:57:37 accelerator_rocm.go:73: found ROCm GPU but failed to parse free VRAM!
2023/11/10 06:57:37 routes.go:716: Warning: GPU support may not be enabled, check you have installed GPU drivers: rocm-smi command failed

Wouldn't it be easier/ more stable to link against the library directly instead of calling the rocm-smi command?

<!-- gh-comment-id:1805147298 --> @luantak commented on GitHub (Nov 10, 2023): still the same ``` 2023/11/10 06:57:37 accelerator_rocm.go:32: warning: ROCM_PATH is not set. Trying a likely fallback path, but it is recommended to set this variable in the environment. 2023/11/10 06:57:37 accelerator_rocm.go:73: found ROCm GPU but failed to parse free VRAM! 2023/11/10 06:57:37 routes.go:716: Warning: GPU support may not be enabled, check you have installed GPU drivers: rocm-smi command failed ``` Wouldn't it be easier/ more stable to link against the library directly instead of calling the rocm-smi command?
Author
Owner

@65a commented on GitHub (Nov 10, 2023):

If rocm-smi isn't working there's something wrong with your ROCm installation.

<!-- gh-comment-id:1805151492 --> @65a commented on GitHub (Nov 10, 2023): If `rocm-smi` isn't working there's something wrong with your ROCm installation.
Author
Owner

@65a commented on GitHub (Nov 10, 2023):

Regarding linking, Ollama doesn't really directly link to llama.cpp or any GPU stuff, that's more similar to the LocalAI/ https://github.com/go-skynet/go-llama.cpp approach (I also sent them a ROCm PR which is merged). I'd generally expect it to not work either, if the library loader is having issues running rocm-smi...something weird is happening, because rocm-smi is directly linked to the AMD libraries. Did you install the AMD "drivers" directly from AMD (or anywhere else other than Arch repositories) by any chance? This would overwrite package files on arch and cause issues...if so you may need to reinstall all of the arch packages, and I'm not sure what else. Also check env for any overrides of LD_* variables, that would cause weird issues too, depending.

<!-- gh-comment-id:1805162569 --> @65a commented on GitHub (Nov 10, 2023): Regarding linking, Ollama doesn't really directly link to llama.cpp or any GPU stuff, that's more similar to the LocalAI/ https://github.com/go-skynet/go-llama.cpp approach (I also sent them a ROCm PR which is merged). I'd generally expect it to not work either, if the library loader is having issues running rocm-smi...something weird is happening, because rocm-smi *is* directly linked to the AMD libraries. Did you install the AMD "drivers" directly from AMD (or anywhere else other than Arch repositories) by any chance? This would overwrite package files on arch and cause issues...if so you may need to reinstall all of the arch packages, and I'm not sure what else. Also check `env` for any overrides of LD_* variables, that would cause weird issues too, depending.
Author
Owner

@luantak commented on GitHub (Nov 10, 2023):

no normal mesa drivers from arch repos, i uninstalled everything rocm/ hip related (because I probably installed them in a weird order) and then reinstalled by just installing rocm-hip-sdk, still no luck

No overrides in env.

Maybe a broken package?

<!-- gh-comment-id:1805175872 --> @luantak commented on GitHub (Nov 10, 2023): no normal mesa drivers from arch repos, i uninstalled everything rocm/ hip related (because I probably installed them in a weird order) and then reinstalled by just installing `rocm-hip-sdk`, still no luck No overrides in env. Maybe a broken package?
Author
Owner

@luantak commented on GitHub (Nov 11, 2023):

Got it working on ubuntu now, ollama is actually fun to use now that its fast.

llama_print_timings:        load time =   11563.39 ms
llama_print_timings:      sample time =      24.53 ms /    89 runs   (    0.28 ms per token,  3628.21 tokens per second)
llama_print_timings: prompt eval time =     427.48 ms /    21 tokens (   20.36 ms per token,    49.13 tokens per second)
llama_print_timings:        eval time =    2915.77 ms /    88 runs   (   33.13 ms per token,    30.18 tokens per second)
llama_print_timings:       total time =    3372.64 ms
<!-- gh-comment-id:1806600288 --> @luantak commented on GitHub (Nov 11, 2023): Got it working on ubuntu now, ollama is actually fun to use now that its fast. ``` llama_print_timings: load time = 11563.39 ms llama_print_timings: sample time = 24.53 ms / 89 runs ( 0.28 ms per token, 3628.21 tokens per second) llama_print_timings: prompt eval time = 427.48 ms / 21 tokens ( 20.36 ms per token, 49.13 tokens per second) llama_print_timings: eval time = 2915.77 ms / 88 runs ( 33.13 ms per token, 30.18 tokens per second) llama_print_timings: total time = 3372.64 ms ```
Author
Owner

@65a commented on GitHub (Nov 11, 2023):

@lu4p if you ever figure out what was wrong in the arch env, mention it here, I'm sure it could help someone else too (or file bug reports upstream). I was going to guess maybe ROCM_PATH or ROCM_HOME conflicted or were wrong or something, but really glad to see you got it working somewhere.

<!-- gh-comment-id:1806614958 --> @65a commented on GitHub (Nov 11, 2023): @lu4p if you ever figure out what was wrong in the arch env, mention it here, I'm sure it could help someone else too (or file bug reports upstream). I was going to guess maybe ROCM_PATH or ROCM_HOME conflicted or were wrong or something, but really glad to see you got it working somewhere.
Author
Owner

@luantak commented on GitHub (Nov 11, 2023):

Tried again, I'm convinced that most of rocm is just broken on arch.

Fedora is taking months to try to package rocm, because they probably test if things are actually working.
https://fedoraproject.org/wiki/SIGs/HC

I'm now a boring Ubuntu LTS user, as Ubuntu 23.10 crashes randomly every 30 minutes for me.

<!-- gh-comment-id:1806702342 --> @luantak commented on GitHub (Nov 11, 2023): Tried again, I'm convinced that most of rocm is just broken on arch. Fedora is taking months to try to package rocm, because they probably test if things are actually working. https://fedoraproject.org/wiki/SIGs/HC I'm now a boring Ubuntu LTS user, as Ubuntu 23.10 crashes randomly every 30 minutes for me.
Author
Owner

@UltraRabbit commented on GitHub (Nov 11, 2023):

@65a I'm using a rocm/pytorch docker image as the base container trying to build the ollama-rocm. As this docker image is based on Ubuntu 20.04, I managed to install libclblast-dev from a ppa repository. However, it seems like there's no /usr/lib/cmake/CLBlast existed as required to be specified in CLBlast_DIR. Could I just ommit this env settings and make "go generate" to detect it automatically?

<!-- gh-comment-id:1806750747 --> @UltraRabbit commented on GitHub (Nov 11, 2023): @65a I'm using a rocm/pytorch docker image as the base container trying to build the ollama-rocm. As this docker image is based on Ubuntu 20.04, I managed to install libclblast-dev from a ppa repository. However, it seems like there's no /usr/lib/cmake/CLBlast existed as required to be specified in CLBlast_DIR. Could I just ommit this env settings and make "go generate" to detect it automatically?
Author
Owner

@paulie-g commented on GitHub (Nov 11, 2023):

Tried again, I'm convinced that most of rocm is just broken on arch.

This is not accurate. I am using nearly all of rocm successfully on Arch, including ollama. The only problem on Arch that I'm aware of is that the maintainers of the extras/python-pytorch multi-package broke rocm variants 2 months ago and have no intention of working on fixing it, hoping that an update to rocm 5.7 fixes it at some point in the future.

<!-- gh-comment-id:1806850029 --> @paulie-g commented on GitHub (Nov 11, 2023): > Tried again, I'm convinced that most of rocm is just broken on arch. This is not accurate. I am using nearly all of rocm successfully on Arch, including ollama. The only problem on Arch that I'm aware of is that the maintainers of the extras/python-pytorch multi-package broke rocm variants 2 months ago and have no intention of working on fixing it, hoping that an update to rocm 5.7 fixes it at some point in the future.
Author
Owner

@65a commented on GitHub (Nov 11, 2023):

@lu4p It's actually useful that you tested Ubuntu though, because I hadn't tested that (or Debian Sid) yet, but plan to so I can throw this stuff into k8s with OCI containers. If you want to keep debugging your arch, it might be interesting to see env (redacted, where appropriate), and make sure your user is in the group that owns the drm devices and /dev/kfd....probably (video and render).

@paulie-g I developed the patch on Arch and tested across a few different installs so it definitely works on Arch, but who knows if like a different value for LC_COLLATE than I use triggers a bug rocm-smi or something. I also actually filed that bug for pytorch :)

<!-- gh-comment-id:1806854783 --> @65a commented on GitHub (Nov 11, 2023): @lu4p It's actually useful that you tested Ubuntu though, because I hadn't tested that (or Debian Sid) yet, but plan to so I can throw this stuff into k8s with OCI containers. If you want to keep debugging your arch, it might be interesting to see `env` (redacted, where appropriate), and make sure your user is in the group that owns the drm devices and /dev/kfd....probably (`video` and `render`). @paulie-g I developed the patch on Arch and tested across a few different installs so it definitely works on Arch, but who knows if like a different value for LC_COLLATE than I use triggers a bug rocm-smi or something. I also actually filed that bug for pytorch :)
Author
Owner

@paulie-g commented on GitHub (Nov 11, 2023):

@lu4p If you want to keep debugging your arch, it might be interesting to see env (redacted, where appropriate), and make sure your user is in the group that owns the drm devices and /dev/kfd....probably (video and render).

Yes, on Arch egid in video and render is a must. Also, env has to have a PATH that includes /opt/rocm/bin and ROCM_PATH=/opt/rocm. None of this is done for you on Arch (you are supposed to have sufficient clue in line with Arch principles ;).

@paulie-g I developed the patch on Arch and tested across a few different installs so it definitely works on Arch

Embarrassingly, I wasted a huge amount of time trying to get it to work before I realised I missed the fact your patch wasn't committed to the main repo and I needed to pull from 65a/ollama.

but who knows if like a different value for LC_COLLATE than I use triggers a bug rocm-smi or something.

Unlikely, but the environment not being right is. We might want to add an env.sh that people can source.

I also actually filed that bug for pytorch :)

Yeah, I noticed ;) Didn't get them to do anything though. It's especially infuriating because I chose pytorch as a way to test my rocm install originally and took the segfault to mean that my whole set-up is broken. Turns out the maintainers are broken ;) I've honestly never seen a significant package being broken in the main Arch repos that a) passes testing, and b) just remains unfixed in the hope that something in the future automagically unclobbers it. I don't want to press the issue any further because alientating the maintainers won't help. The bug has very few votes because very few users have the wherewithal to debug the coredump, which is the only way to find the bug listing, so they just let it ride.

<!-- gh-comment-id:1806865791 --> @paulie-g commented on GitHub (Nov 11, 2023): > @lu4p If you want to keep debugging your arch, it might be interesting to see `env` (redacted, where appropriate), and make sure your user is in the group that owns the drm devices and /dev/kfd....probably (`video` and `render`). Yes, on Arch egid in `video` and `render` is a must. Also, env has to have a `PATH` that includes `/opt/rocm/bin` and `ROCM_PATH=/opt/rocm`. None of this is done for you on Arch (you are supposed to have sufficient clue in line with Arch principles ;). > @paulie-g I developed the patch on Arch and tested across a few different installs so it definitely works on Arch Embarrassingly, I wasted a huge amount of time trying to get it to work before I realised I missed the fact your patch wasn't committed to the main repo and I needed to pull from 65a/ollama. > but who knows if like a different value for LC_COLLATE than I use triggers a bug rocm-smi or something. Unlikely, but the environment not being right is. We might want to add an env.sh that people can source. > I also actually filed that bug for pytorch :) Yeah, I noticed ;) Didn't get them to do anything though. It's especially infuriating because I chose pytorch as a way to test my rocm install originally and took the segfault to mean that my whole set-up is broken. Turns out the maintainers are broken ;) I've honestly never seen a significant package being broken in the main Arch repos that a) passes testing, and b) just remains unfixed in the hope that something in the future automagically unclobbers it. I don't want to press the issue any further because alientating the maintainers won't help. The bug has very few votes because very few users have the wherewithal to debug the coredump, which is the only way to find the bug listing, so they just let it ride.
Author
Owner

@luantak commented on GitHub (Nov 11, 2023):

@paulie-g I know but usually when that's the case there is a nice arch wiki page explaining how to configure a software.

Closest I found is
https://github.com/rocm-arch/rocm-arch
which isn't working. I also knew about video, render groups from the AMD docs. I think while troubleshooting I also added the room/bin directory to my path. And ROCM_PATH had been set.

Is there anywhere else where I should've looked for this info?

Can you provide me a list of commands to get rocm working on arch from scratch?

<!-- gh-comment-id:1806869351 --> @luantak commented on GitHub (Nov 11, 2023): @paulie-g I know but usually when that's the case there is a nice arch wiki page explaining how to configure a software. Closest I found is https://github.com/rocm-arch/rocm-arch which isn't working. I also knew about video, render groups from the AMD docs. I think while troubleshooting I also added the room/bin directory to my path. And ROCM_PATH had been set. Is there anywhere else where I should've looked for this info? Can you provide me a list of commands to get rocm working on arch from scratch?
Author
Owner

@paulie-g commented on GitHub (Nov 11, 2023):

Can you provide me a list of commands to get rocm working on arch from scratch?

No, I don't recall doing anything other than installing all the packages and ensuring all the env vars are set and the groups. That repo, iirc, is from before Arch had the packages in the official repos. If you installed anything from there, it's probably a good idea to remove and install from official repos. It's mostly useful for edge cases, trying to get Polaris support working for old cards, some non-standard software for Mi cards etc.

My point is that I, @65a, and others are successfully using rocm on Arch. It is therefore unlikely that 'most of rocm is just broken on arch'. Does your rocm-bandwidth-test work, for example? If it does, and shows your card talking to the rest of the system in a sane way, then rocm works for a baseline definition of 'works'.

Checking some of your problems earlier in the thread, one problem was that you didn't have /opt/rocm/bin' in your PATH (and probably no ROCM_PATH` either, but I'm not sure that's necessary other than for builds). This isn't Windows, you don't just 'add' it to your env once, this works at best in the one shell process you do it in and its children (not even in the next tab you open in your term emulator). It has to be set for your log-in session, which is dependent on how you log in. I think at least one of the rocm packages might do it for you if you're not doing anything exotic, but then you have to log out and log back in before it takes effect. I am doing something exotic so I can't tell you if it works in a normal DE.

<!-- gh-comment-id:1806877223 --> @paulie-g commented on GitHub (Nov 11, 2023): > Can you provide me a list of commands to get rocm working on arch from scratch? No, I don't recall doing anything other than installing all the packages and ensuring all the env vars are set and the groups. That repo, iirc, is from before Arch had the packages in the official repos. If you installed anything from there, it's probably a good idea to remove and install from official repos. It's mostly useful for edge cases, trying to get Polaris support working for old cards, some non-standard software for Mi cards etc. My point is that I, @65a, and others are successfully using rocm on Arch. It is therefore unlikely that 'most of rocm is just broken on arch'. Does your `rocm-bandwidth-test` work, for example? If it does, and shows your card talking to the rest of the system in a sane way, then rocm works for a baseline definition of 'works'. Checking some of your problems earlier in the thread, one problem was that you didn't have `/opt/rocm/bin' in your PATH (and probably no `ROCM_PATH` either, but I'm not sure that's necessary other than for builds). This isn't Windows, you don't just 'add' it to your env once, this works at best in the one shell process you do it in and its children (not even in the next tab you open in your term emulator). It has to be set for your log-in session, which is dependent on how you log in. I think at least one of the rocm packages might do it for you if you're not doing anything exotic, but then you have to log out and log back in before it takes effect. I am doing something exotic so I can't tell you if it works in a normal DE.
Author
Owner

@paulie-g commented on GitHub (Nov 11, 2023):

Incidentally, a) try installing llama-cpp-rocm-git from the AUR and see if it works, and b) some packages, like various rocm things, tell you to do extra things as they install - this needs to be read and done (to your question, this is likely where people read about adding things to their env, groups etc).

<!-- gh-comment-id:1806881120 --> @paulie-g commented on GitHub (Nov 11, 2023): Incidentally, a) try installing `llama-cpp-rocm-git` from the AUR and see if it works, and b) some packages, like various rocm things, tell you to do extra things as they install - this needs to be read and done (to your question, this is likely where people read about adding things to their env, groups etc).
Author
Owner

@iDeNoh commented on GitHub (Nov 13, 2023):

I'm at a loss, I've tried everything that I can think of to get this to generate and build properly for ROCM but so far it just doesn't seem to want to work. I can only assume I'm doing something wrong in the process or maybe my system environment isn't set up correctly because it only generates for CPU. I have rocm 5.7.1 installed, rocm-smi and rocminfo are working fine. I've got CLBlast installed but its not in the location I would have expected, mine for some reason shows up under /usr/local/lib/cmake/CLBlast. I can do the install once more and drop logs if anyone is willing to give me some guidance.

<!-- gh-comment-id:1807433490 --> @iDeNoh commented on GitHub (Nov 13, 2023): I'm at a loss, I've tried everything that I can think of to get this to generate and build properly for ROCM but so far it just doesn't seem to want to work. I can only assume I'm doing something wrong in the process or maybe my system environment isn't set up correctly because it only generates for CPU. I have rocm 5.7.1 installed, rocm-smi and rocminfo are working fine. I've got CLBlast installed but its not in the location I would have expected, mine for some reason shows up under /usr/local/lib/cmake/CLBlast. I can do the install once more and drop logs if anyone is willing to give me some guidance.
Author
Owner

@paulie-g commented on GitHub (Nov 13, 2023):

Are you checking out from the correct repo?

<!-- gh-comment-id:1807658188 --> @paulie-g commented on GitHub (Nov 13, 2023): Are you checking out from the correct repo?
Author
Owner

@iDeNoh commented on GitHub (Nov 13, 2023):

I am, yes. My first run was using the command on the ollama website and I noticed that it didn't see that there was a GPU at all, the second run and every attempt after that it registered my GPU as Nvidia stating Nvidia smi failed

<!-- gh-comment-id:1808146682 --> @iDeNoh commented on GitHub (Nov 13, 2023): I am, yes. My first run was using the command on the ollama website and I noticed that it didn't see that there was a GPU at all, the second run and every attempt after that it registered my GPU as Nvidia stating Nvidia smi failed
Author
Owner

@65a commented on GitHub (Nov 13, 2023):

The PR isn't merged yet, you need to be compiling from https://github.com/65a/ollama using the instructions at the top of ROCm/rocBLAS#814

<!-- gh-comment-id:1808378510 --> @65a commented on GitHub (Nov 13, 2023): The PR isn't merged yet, you need to be compiling from https://github.com/65a/ollama using the instructions at the top of ROCm/rocBLAS#814
Author
Owner

@iDeNoh commented on GitHub (Nov 13, 2023):

Understood, and I did follow the steps that are included in this thread here

<!-- gh-comment-id:1808497966 --> @iDeNoh commented on GitHub (Nov 13, 2023): Understood, and I did follow the steps that are included in this thread [here](https://github.com/jmorganca/ollama/issues/738#issuecomment-1805131775)
Author
Owner

@iDeNoh commented on GitHub (Nov 13, 2023):

I should note that I got an error when attempting to run go generate, perhaps this is related? I followed its instructions but I'm not sure if they are correct or not:
`go: github.com/gin-gonic/gin@v1.9.1 requires

golang.org/x/net@v0.10.0: missing go.sum entry; to add it:
    go mod download golang.org/x/net

`

<!-- gh-comment-id:1808647689 --> @iDeNoh commented on GitHub (Nov 13, 2023): I should note that I got an error when attempting to run go generate, perhaps this is related? I followed its instructions but I'm not sure if they are correct or not: `go: github.com/gin-gonic/gin@v1.9.1 requires golang.org/x/net@v0.10.0: missing go.sum entry; to add it: go mod download golang.org/x/net `
Author
Owner

@luantak commented on GitHub (Nov 13, 2023):

Which distro and version?

Which go version are you using?
You should use a current go version (1.21), if your package manager ships an older version you can easily install it as a snap.
https://snapcraft.io/go

<!-- gh-comment-id:1808916465 --> @luantak commented on GitHub (Nov 13, 2023): Which distro and version? Which go version are you using? You should use a current go version (1.21), if your package manager ships an older version you can easily install it as a snap. https://snapcraft.io/go
Author
Owner

@65a commented on GitHub (Nov 14, 2023):

@iDeNoh what card, distro and please links logs of a build from a clean checkout

<!-- gh-comment-id:1809388979 --> @65a commented on GitHub (Nov 14, 2023): @iDeNoh what card, distro and please links logs of a build from a clean checkout
Author
Owner

@GZGavinZhao commented on GitHub (Nov 14, 2023):

I'd also like to note that please verify locally that the git commits of your git submodules (located in llm/llama.cpp/{ggml,gguf} align with the commits shown in the main tree here. Sometimes submodules stuff can be messed up and causing you to build an earlier or older version of ggml/gguf that may be causing trouble only for you.

<!-- gh-comment-id:1809425772 --> @GZGavinZhao commented on GitHub (Nov 14, 2023): I'd also like to note that please verify locally that the git commits of your git submodules (located in `llm/llama.cpp/{ggml,gguf}` align with the commits shown in the `main` tree [here](https://github.com/jmorganca/ollama/tree/main/llm/llama.cpp). Sometimes submodules stuff can be messed up and causing you to build an earlier or older version of `ggml`/`gguf` that may be causing trouble only for you.
Author
Owner

@65a commented on GitHub (Nov 14, 2023):

@GZGavinZhao there's also sometimes a problem I haven't quite pinned down where inference is really, really slow. This is usually cured if you use a completely fresh build on a fresh recursive git clone if possible...the cmake build process isn't that clean, which can probably be fixed in a different PR by ensuring things are fresh each time go generate is run...

<!-- gh-comment-id:1809428620 --> @65a commented on GitHub (Nov 14, 2023): @GZGavinZhao there's also sometimes a problem I haven't quite pinned down where inference is really, really slow. This is usually cured if you use a completely fresh build on a fresh recursive git clone if possible...the cmake build process isn't that clean, which can probably be fixed in a different PR by ensuring things are fresh each time `go generate` is run...
Author
Owner

@GZGavinZhao commented on GitHub (Nov 14, 2023):

@65a In this case, if they could run the ollama server with the environment variable AMD_LOG_LEVEL set to 71 (from here), maybe we can figure out something in the output.

Edit: I just tried it and I think the output from 7 is too much. AMD_LOG_LEVEL=1 ./ollama serve should give enough logs,

<!-- gh-comment-id:1809432398 --> @GZGavinZhao commented on GitHub (Nov 14, 2023): @65a In this case, if they could run the ollama server with the environment variable `AMD_LOG_LEVEL` set to ~7~1 (from [here](https://docs.amd.com/projects/HIP/en/latest/how_to_guides/debugging.html#summary-of-environment-variables-in-hip)), maybe we can figure out something in the output. Edit: I just tried it and I think the output from 7 is too much. `AMD_LOG_LEVEL=1 ./ollama serve` should give enough logs,
Author
Owner

@TeddyDD commented on GitHub (Nov 14, 2023):

@65a in regard to your question in ROCm/rocBLAS#814

completely clean build exhibits that very slow inference behavior

No, it seems that clean build fixed the issue.

I still can't run vicuna:13b-v1.5-16k-q5_K_M due to OOM error. Same thing happens when I use llama.cpp, but then I can set how many layers to offload. It's weird because I have 16gb of VRAM, this model should fit right in.

  • 41/43 layers: total VRAM used: 10036.21 MB (model: 8694.21 MB, context: 1342.00 MB)
  • 42 layers OOM

This might be an upstream issue, though. Logs just in case.

llm_load_tensors: using ROCm for GPU acceleration
llm_load_tensors: mem required  =  107.54 MB
llm_load_tensors: offloading 40 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 43/43 layers to GPU
llm_load_tensors: VRAM used: 8694.21 MB
...................................................................................................
llama_new_context_with_model: n_ctx      = 16384
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 0.25
:1:rocdevice.cpp            :3253: 85642407608 us: 24403: [tid:0x7fb048288500] hsa_amd_pointer_info() failed
llama_kv_cache_init: offloading v cache to GPU
:1:rocdevice.cpp            :2143: 85646306161 us: 24403: [tid:0x7fb048288500] Fail allocation local memory
:1:rocdevice.cpp            :1897: 85646306168 us: 24403: [tid:0x7fb048288500] Failed creating memory
:1:memory.cpp               :347 : 85646306171 us: 24403: [tid:0x7fb048288500] Video memory allocation failed!
:1:memory.cpp               :308 : 85646306173 us: 24403: [tid:0x7fb048288500] Can't allocate memory size - 0x90000000 bytes!
:1:rocdevice.cpp            :2334: 85646306175 us: 24403: [tid:0x7fb048288500] failed to create a svm hidden buffer!
:1:memory.cpp               :1501: 85646306178 us: 24403: [tid:0x7fb048288500] Unable to allocate aligned memory
:1:hip_memory.cpp           :303 : 85646306183 us: 24403: [tid:0x7fb048288500] Allocation failed : Device memory : required :6710886400 | free :515899392 | total :17163091968


CUDA error 2 at /home/teddy/src/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:7358: out of memory
current device: 0
2023/11/14 12:09:20 llama.go:387: 2 at /home/teddy/src/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:7358: out of memory
current device: 0
<!-- gh-comment-id:1810015311 --> @TeddyDD commented on GitHub (Nov 14, 2023): @65a in regard to your question in ROCm/rocBLAS#814 > completely clean build exhibits that very slow inference behavior No, it seems that clean build fixed the issue. I still can't run `vicuna:13b-v1.5-16k-q5_K_M` due to OOM error. Same thing happens when I use llama.cpp, but then I can set how many layers to offload. It's weird because I have 16gb of VRAM, this model should fit right in. - 41/43 layers: total VRAM used: 10036.21 MB (model: 8694.21 MB, context: 1342.00 MB) - 42 layers OOM This might be an upstream issue, though. Logs just in case. ``` llm_load_tensors: using ROCm for GPU acceleration llm_load_tensors: mem required = 107.54 MB llm_load_tensors: offloading 40 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 43/43 layers to GPU llm_load_tensors: VRAM used: 8694.21 MB ................................................................................................... llama_new_context_with_model: n_ctx = 16384 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 0.25 :1:rocdevice.cpp :3253: 85642407608 us: 24403: [tid:0x7fb048288500] hsa_amd_pointer_info() failed llama_kv_cache_init: offloading v cache to GPU :1:rocdevice.cpp :2143: 85646306161 us: 24403: [tid:0x7fb048288500] Fail allocation local memory :1:rocdevice.cpp :1897: 85646306168 us: 24403: [tid:0x7fb048288500] Failed creating memory :1:memory.cpp :347 : 85646306171 us: 24403: [tid:0x7fb048288500] Video memory allocation failed! :1:memory.cpp :308 : 85646306173 us: 24403: [tid:0x7fb048288500] Can't allocate memory size - 0x90000000 bytes! :1:rocdevice.cpp :2334: 85646306175 us: 24403: [tid:0x7fb048288500] failed to create a svm hidden buffer! :1:memory.cpp :1501: 85646306178 us: 24403: [tid:0x7fb048288500] Unable to allocate aligned memory :1:hip_memory.cpp :303 : 85646306183 us: 24403: [tid:0x7fb048288500] Allocation failed : Device memory : required :6710886400 | free :515899392 | total :17163091968 CUDA error 2 at /home/teddy/src/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:7358: out of memory current device: 0 2023/11/14 12:09:20 llama.go:387: 2 at /home/teddy/src/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:7358: out of memory current device: 0 ```
Author
Owner

@65a commented on GitHub (Nov 14, 2023):

@TeddyDD any chance you have an AMD iGPU as well? If so I've found I need to use HIP_VISIBLE_DEVICES=0 ollama serve. Setting AMD_LOG_LEVEL=1 and sharing the full log might be interesting as well.

<!-- gh-comment-id:1810477030 --> @65a commented on GitHub (Nov 14, 2023): @TeddyDD any chance you have an AMD iGPU as well? If so I've found I need to use `HIP_VISIBLE_DEVICES=0 ollama serve`. Setting `AMD_LOG_LEVEL=1` and sharing the full log might be interesting as well.
Author
Owner

@TeddyDD commented on GitHub (Nov 14, 2023):

@65a No, dedicated GPU only. Here is full log with AMD_LOG_LEVEL=1: log.txt

<!-- gh-comment-id:1810488572 --> @TeddyDD commented on GitHub (Nov 14, 2023): @65a No, dedicated GPU only. Here is full log with `AMD_LOG_LEVEL=1`: [log.txt](https://github.com/jmorganca/ollama/files/13350901/log.txt)
Author
Owner

@65a commented on GitHub (Nov 14, 2023):

@TeddyDD I do see Device memory : required :6710886400 | free :515899392 | total :17163091968, does it run if you reduce the layers some more? It seems like something is already using a lot of your VRAM I guess?

<!-- gh-comment-id:1810554701 --> @65a commented on GitHub (Nov 14, 2023): @TeddyDD I do see `Device memory : required :6710886400 | free :515899392 | total :17163091968`, does it run if you reduce the layers some more? It seems like something is already using a lot of your VRAM I guess?
Author
Owner

@GZGavinZhao commented on GitHub (Nov 14, 2023):

@TeddyDD You may want to install amdgpu_top and run amdgpu_top in a separate window along side ./ollama serve to monitor the programs that are taking up VRAM.

<!-- gh-comment-id:1810691838 --> @GZGavinZhao commented on GitHub (Nov 14, 2023): @TeddyDD You may want to install `amdgpu_top` and run `amdgpu_top` in a separate window along side `./ollama serve` to monitor the programs that are taking up VRAM.
Author
Owner

@TeddyDD commented on GitHub (Nov 14, 2023):

I run nvtop, my system uses ~500mb of vram on idle (browser takes ~200mb by itself). Rest of that is free to be used by ollama. Perhaps 16k models require more that 16GB?

does it run if you reduce the layers some more?

When running original llama.cpp it works with <= 41 layers, I can't control layers with Ollama AFAIK.

<!-- gh-comment-id:1811288055 --> @TeddyDD commented on GitHub (Nov 14, 2023): I run nvtop, my system uses ~500mb of vram on idle (browser takes ~200mb by itself). Rest of that is free to be used by ollama. Perhaps 16k models require more that 16GB? > does it run if you reduce the layers some more? When running original llama.cpp it works with <= 41 layers, I can't control layers with Ollama AFAIK.
Author
Owner

@65a commented on GitHub (Nov 15, 2023):

@TeddyDD look at the model file docs for num_gpu, it lets you override the number of layers offloaded.

<!-- gh-comment-id:1812834806 --> @65a commented on GitHub (Nov 15, 2023): @TeddyDD look at the model file docs for num_gpu, it lets you override the number of layers offloaded.
Author
Owner

@jacobkuzmits commented on GitHub (Nov 15, 2023):

I am trying to get ROCm working and I have an error running go generate on Ubuntu 22.04 with a 6950XT. The command I ran was ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/local/lib/cmake/CLBlast go generate -tags rocm ./.... I changed CLBlast_DIR to /usr/local/lib/cmake/CLBlast because /usr/lib/cmake/CLBlast didn't exist, but I'm not sure if that's even the right dir. The only things in that directory are CLBlastConfig.cmake and CLBlastConfig-noconfig.cmake. Regardless, I had the same error trying it both ways.

error log
-- The CXX compiler identification is Clang 17.0.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - failed
-- Check for working CXX compiler: /opt/rocm/llvm/bin/clang++
-- Check for working CXX compiler: /opt/rocm/llvm/bin/clang++ - broken
CMake Error at /usr/share/cmake-3.22/Modules/CMakeTestCXXCompiler.cmake:62 (message):
  The C++ compiler

    "/opt/rocm/llvm/bin/clang++"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: /home/jakeku/Downloads/ollama/llm/llama.cpp/gguf/build/rocm/CMakeFiles/CMakeTmp
    
    Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_a80fd/fast && /usr/bin/gmake  -f CMakeFiles/cmTC_a80fd.dir/build.make CMakeFiles/cmTC_a80fd.dir/build
    gmake[1]: Entering directory '/home/jakeku/Downloads/ollama/llm/llama.cpp/gguf/build/rocm/CMakeFiles/CMakeTmp'
    Building CXX object CMakeFiles/cmTC_a80fd.dir/testCXXCompiler.cxx.o
    /opt/rocm/llvm/bin/clang++    -MD -MT CMakeFiles/cmTC_a80fd.dir/testCXXCompiler.cxx.o -MF CMakeFiles/cmTC_a80fd.dir/testCXXCompiler.cxx.o.d -o CMakeFiles/cmTC_a80fd.dir/testCXXCompiler.cxx.o -c /home/jakeku/Downloads/ollama/llm/llama.cpp/gguf/build/rocm/CMakeFiles/CMakeTmp/testCXXCompiler.cxx
    Linking CXX executable cmTC_a80fd
    /usr/bin/cmake -E cmake_link_script CMakeFiles/cmTC_a80fd.dir/link.txt --verbose=1
    /opt/rocm/llvm/bin/clang++ CMakeFiles/cmTC_a80fd.dir/testCXXCompiler.cxx.o -o cmTC_a80fd 
    ld.lld: error: unable to find library -lstdc++
    clang++: error: linker command failed with exit code 1 (use -v to see invocation)
    gmake[1]: *** [CMakeFiles/cmTC_a80fd.dir/build.make:100: cmTC_a80fd] Error 1
    gmake[1]: Leaving directory '/home/jakeku/Downloads/ollama/llm/llama.cpp/gguf/build/rocm/CMakeFiles/CMakeTmp'
    gmake: *** [Makefile:127: cmTC_a80fd/fast] Error 2
    
    

  

  CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
  CMakeLists.txt:2 (project)

I am not sure what to do to fix this. apt list --installed shows that I have:

libstdc++-11-dev/jammy-updates,jammy-security,now 11.4.0-1ubuntu1~22.04 amd64 [installed]
libstdc++6/jammy-updates,jammy-security,now 12.3.0-1ubuntu1~22.04 amd64 [installed,automatic]
libstdc++6/jammy-updates,jammy-security,now 12.3.0-1ubuntu1~22.04 i386 [installed,automatic]

I don't know what I'm doing so I just ignored the error and ran go build anyway. That built without error and ollama was able to run and detect my VRAM, but it was slow so I assume it had to be falling back to CPU. I'll include those logs even though the issue is probably the problem above.

serve/run logs
> ollama serve

2023/11/15 10:33:48 images.go:824: total blobs: 4
2023/11/15 10:33:48 images.go:831: total unused blobs removed: 0
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:	export GIN_MODE=release
 - using code:	gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST   /api/pull                 --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers)
[GIN-debug] POST   /api/generate             --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers)
[GIN-debug] POST   /api/embeddings           --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers)
[GIN-debug] POST   /api/create               --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers)
[GIN-debug] POST   /api/push                 --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers)
[GIN-debug] POST   /api/copy                 --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers)
[GIN-debug] DELETE /api/delete               --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers)
[GIN-debug] POST   /api/show                 --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers)
[GIN-debug] GET    /                         --> github.com/jmorganca/ollama/server.Serve.func2 (5 handlers)
[GIN-debug] GET    /api/tags                 --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
[GIN-debug] HEAD   /                         --> github.com/jmorganca/ollama/server.Serve.func2 (5 handlers)
[GIN-debug] HEAD   /api/tags                 --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
2023/11/15 10:33:48 routes.go:696: Listening on 127.0.0.1:11434 (version 0.0.0)
2023/11/15 10:33:48 accelerator_rocm.go:39: warning: ROCM_PATH is not set. Trying a likely fallback path, but it is recommended to set this variable in the environment.
2023/11/15 10:33:48 accelerator_rocm.go:73: ROCm found 15217 MiB of available VRAM on device "card0"
2023/11/15 10:33:48 accelerator_rocm.go:83: ROCm selecting device "card0"
> ollama run mistral

2023/11/15 10:41:16 accelerator_rocm.go:39: warning: ROCM_PATH is not set. Trying a likely fallback path, but it is recommended to set this variable in the environment.
2023/11/15 10:41:16 accelerator_rocm.go:73: ROCm found 15277 MiB of available VRAM on device "card0"
2023/11/15 10:41:16 accelerator_rocm.go:83: ROCm selecting device "card0"
2023/11/15 10:41:16 llama.go:247: 15277 MB VRAM available, loading up to 93 GPU layers
2023/11/15 10:41:16 llama.go:346: llama runner not found: stat /tmp/ollama3875790972/llama.cpp/gguf/build/rocm/bin/ollama-runner: no such file or directory
2023/11/15 10:41:16 llama.go:372: starting llama runner
2023/11/15 10:41:16 llama.go:430: waiting for llama runner to start responding
{"timestamp":1700062876,"level":"WARNING","function":"server_params_parse","line":871,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":-1}
{"timestamp":1700062876,"level":"INFO","function":"main","line":1323,"message":"build info","build":1,"commit":"9e70cc0"}
{"timestamp":1700062876,"level":"INFO","function":"main","line":1325,"message":"system info","n_threads":8,"n_threads_batch":-1,"total_threads":16,"system_info":"AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | "}
llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from /home/jakeku/.ollama/models/blobs/sha256:6ae28029995007a3ee8d0b8556d50f3b59b831074cf19c84de87acf51fb54054 (version GGUF V2 (latest))
llama_model_loader: spam x1000 removed since i don't think it matters
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format           = GGUF V2 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 32768
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = mostly Q4_0
llm_load_print_meta: model params     = 7.24 B
llm_load_print_meta: model size       = 3.83 GiB (4.54 BPW) 
llm_load_print_meta: general.name   = mistralai
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token  = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.10 MB
llm_load_tensors: mem required  = 3917.96 MB
..................................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size  =  256.00 MB
llama_new_context_with_model: compute buffer total size = 162.13 MB

llama server listening at http://127.0.0.1:49618

{"timestamp":1700062876,"level":"INFO","function":"main","line":1746,"message":"HTTP server listening","hostname":"127.0.0.1","port":49618}
{"timestamp":1700062876,"level":"INFO","function":"log_server_request","line":1233,"message":"request","remote_addr":"127.0.0.1","remote_port":57018,"status":200,"method":"HEAD","path":"/","params":{}}
2023/11/15 10:41:16 llama.go:444: llama runner started in 0.401371 seconds

Sending a prompt results in

llama_print_timings:        load time =     340.94 ms
llama_print_timings:      sample time =       2.60 ms /    16 runs   (    0.16 ms per token,  6163.33 tokens per second)
llama_print_timings: prompt eval time =    1187.05 ms /    13 tokens (   91.31 ms per token,    10.95 tokens per second)
llama_print_timings:        eval time =    1945.57 ms /    15 runs   (  129.70 ms per token,     7.71 tokens per second)
<!-- gh-comment-id:1812864368 --> @jacobkuzmits commented on GitHub (Nov 15, 2023): I am trying to get ROCm working and I have an error running go generate on Ubuntu 22.04 with a 6950XT. The command I ran was `ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/local/lib/cmake/CLBlast go generate -tags rocm ./...`. I changed `CLBlast_DIR` to `/usr/local/lib/cmake/CLBlast` because `/usr/lib/cmake/CLBlast` didn't exist, but I'm not sure if that's even the right dir. The only things in that directory are CLBlastConfig.cmake and CLBlastConfig-noconfig.cmake. Regardless, I had the same error trying it both ways. <details> <summary>error log</summary> ``` -- The CXX compiler identification is Clang 17.0.0 -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - failed -- Check for working CXX compiler: /opt/rocm/llvm/bin/clang++ -- Check for working CXX compiler: /opt/rocm/llvm/bin/clang++ - broken CMake Error at /usr/share/cmake-3.22/Modules/CMakeTestCXXCompiler.cmake:62 (message): The C++ compiler "/opt/rocm/llvm/bin/clang++" is not able to compile a simple test program. It fails with the following output: Change Dir: /home/jakeku/Downloads/ollama/llm/llama.cpp/gguf/build/rocm/CMakeFiles/CMakeTmp Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_a80fd/fast && /usr/bin/gmake -f CMakeFiles/cmTC_a80fd.dir/build.make CMakeFiles/cmTC_a80fd.dir/build gmake[1]: Entering directory '/home/jakeku/Downloads/ollama/llm/llama.cpp/gguf/build/rocm/CMakeFiles/CMakeTmp' Building CXX object CMakeFiles/cmTC_a80fd.dir/testCXXCompiler.cxx.o /opt/rocm/llvm/bin/clang++ -MD -MT CMakeFiles/cmTC_a80fd.dir/testCXXCompiler.cxx.o -MF CMakeFiles/cmTC_a80fd.dir/testCXXCompiler.cxx.o.d -o CMakeFiles/cmTC_a80fd.dir/testCXXCompiler.cxx.o -c /home/jakeku/Downloads/ollama/llm/llama.cpp/gguf/build/rocm/CMakeFiles/CMakeTmp/testCXXCompiler.cxx Linking CXX executable cmTC_a80fd /usr/bin/cmake -E cmake_link_script CMakeFiles/cmTC_a80fd.dir/link.txt --verbose=1 /opt/rocm/llvm/bin/clang++ CMakeFiles/cmTC_a80fd.dir/testCXXCompiler.cxx.o -o cmTC_a80fd ld.lld: error: unable to find library -lstdc++ clang++: error: linker command failed with exit code 1 (use -v to see invocation) gmake[1]: *** [CMakeFiles/cmTC_a80fd.dir/build.make:100: cmTC_a80fd] Error 1 gmake[1]: Leaving directory '/home/jakeku/Downloads/ollama/llm/llama.cpp/gguf/build/rocm/CMakeFiles/CMakeTmp' gmake: *** [Makefile:127: cmTC_a80fd/fast] Error 2 CMake will not be able to correctly generate this project. Call Stack (most recent call first): CMakeLists.txt:2 (project) ``` </details> I am not sure what to do to fix this. apt list --installed shows that I have: ``` libstdc++-11-dev/jammy-updates,jammy-security,now 11.4.0-1ubuntu1~22.04 amd64 [installed] libstdc++6/jammy-updates,jammy-security,now 12.3.0-1ubuntu1~22.04 amd64 [installed,automatic] libstdc++6/jammy-updates,jammy-security,now 12.3.0-1ubuntu1~22.04 i386 [installed,automatic] ``` I don't know what I'm doing so I just ignored the error and ran go build anyway. That built without error and ollama was able to run and detect my VRAM, but it was slow so I assume it had to be falling back to CPU. I'll include those logs even though the issue is probably the problem above. <details> <summary>serve/run logs</summary> ``` > ollama serve 2023/11/15 10:33:48 images.go:824: total blobs: 4 2023/11/15 10:33:48 images.go:831: total unused blobs removed: 0 [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached. [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production. - using env: export GIN_MODE=release - using code: gin.SetMode(gin.ReleaseMode) [GIN-debug] POST /api/pull --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers) [GIN-debug] POST /api/generate --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers) [GIN-debug] POST /api/embeddings --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers) [GIN-debug] POST /api/create --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers) [GIN-debug] POST /api/push --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers) [GIN-debug] POST /api/copy --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers) [GIN-debug] DELETE /api/delete --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers) [GIN-debug] POST /api/show --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers) [GIN-debug] GET / --> github.com/jmorganca/ollama/server.Serve.func2 (5 handlers) [GIN-debug] GET /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers) [GIN-debug] HEAD / --> github.com/jmorganca/ollama/server.Serve.func2 (5 handlers) [GIN-debug] HEAD /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers) 2023/11/15 10:33:48 routes.go:696: Listening on 127.0.0.1:11434 (version 0.0.0) 2023/11/15 10:33:48 accelerator_rocm.go:39: warning: ROCM_PATH is not set. Trying a likely fallback path, but it is recommended to set this variable in the environment. 2023/11/15 10:33:48 accelerator_rocm.go:73: ROCm found 15217 MiB of available VRAM on device "card0" 2023/11/15 10:33:48 accelerator_rocm.go:83: ROCm selecting device "card0" ``` ``` > ollama run mistral 2023/11/15 10:41:16 accelerator_rocm.go:39: warning: ROCM_PATH is not set. Trying a likely fallback path, but it is recommended to set this variable in the environment. 2023/11/15 10:41:16 accelerator_rocm.go:73: ROCm found 15277 MiB of available VRAM on device "card0" 2023/11/15 10:41:16 accelerator_rocm.go:83: ROCm selecting device "card0" 2023/11/15 10:41:16 llama.go:247: 15277 MB VRAM available, loading up to 93 GPU layers 2023/11/15 10:41:16 llama.go:346: llama runner not found: stat /tmp/ollama3875790972/llama.cpp/gguf/build/rocm/bin/ollama-runner: no such file or directory 2023/11/15 10:41:16 llama.go:372: starting llama runner 2023/11/15 10:41:16 llama.go:430: waiting for llama runner to start responding {"timestamp":1700062876,"level":"WARNING","function":"server_params_parse","line":871,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":-1} {"timestamp":1700062876,"level":"INFO","function":"main","line":1323,"message":"build info","build":1,"commit":"9e70cc0"} {"timestamp":1700062876,"level":"INFO","function":"main","line":1325,"message":"system info","n_threads":8,"n_threads_batch":-1,"total_threads":16,"system_info":"AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | "} llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from /home/jakeku/.ollama/models/blobs/sha256:6ae28029995007a3ee8d0b8556d50f3b59b831074cf19c84de87acf51fb54054 (version GGUF V2 (latest)) llama_model_loader: spam x1000 removed since i don't think it matters llm_load_vocab: special tokens definition check successful ( 259/32000 ). llm_load_print_meta: format = GGUF V2 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train = 32768 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_gqa = 4 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: n_ff = 14336 llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: model type = 7B llm_load_print_meta: model ftype = mostly Q4_0 llm_load_print_meta: model params = 7.24 B llm_load_print_meta: model size = 3.83 GiB (4.54 BPW) llm_load_print_meta: general.name = mistralai llm_load_print_meta: BOS token = 1 '<s>' llm_load_print_meta: EOS token = 2 '</s>' llm_load_print_meta: UNK token = 0 '<unk>' llm_load_print_meta: LF token = 13 '<0x0A>' llm_load_tensors: ggml ctx size = 0.10 MB llm_load_tensors: mem required = 3917.96 MB .................................................................................................. llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_new_context_with_model: kv self size = 256.00 MB llama_new_context_with_model: compute buffer total size = 162.13 MB llama server listening at http://127.0.0.1:49618 {"timestamp":1700062876,"level":"INFO","function":"main","line":1746,"message":"HTTP server listening","hostname":"127.0.0.1","port":49618} {"timestamp":1700062876,"level":"INFO","function":"log_server_request","line":1233,"message":"request","remote_addr":"127.0.0.1","remote_port":57018,"status":200,"method":"HEAD","path":"/","params":{}} 2023/11/15 10:41:16 llama.go:444: llama runner started in 0.401371 seconds ``` Sending a prompt results in ``` llama_print_timings: load time = 340.94 ms llama_print_timings: sample time = 2.60 ms / 16 runs ( 0.16 ms per token, 6163.33 tokens per second) llama_print_timings: prompt eval time = 1187.05 ms / 13 tokens ( 91.31 ms per token, 10.95 tokens per second) llama_print_timings: eval time = 1945.57 ms / 15 runs ( 129.70 ms per token, 7.71 tokens per second) ``` </details>
Author
Owner

@65a commented on GitHub (Nov 16, 2023):

@jacobkuzmits I don't use ubuntu, but the error you have is classic "missing -dev package problems" for Ubuntu, by any chance do you have rocm-dev installed (or whatever equivalent -dev for hip clang++?)

<!-- gh-comment-id:1813735744 --> @65a commented on GitHub (Nov 16, 2023): @jacobkuzmits I don't use ubuntu, but the error you have is classic "missing -dev package problems" for Ubuntu, by any chance do you have rocm-dev installed (or whatever equivalent -dev for hip clang++?)
Author
Owner

@65a commented on GitHub (Nov 16, 2023):

@lu4p was using Ubuntu as well, and may know how to do this better than I do on Ubuntu

<!-- gh-comment-id:1813737620 --> @65a commented on GitHub (Nov 16, 2023): @lu4p was using Ubuntu as well, and may know how to do this better than I do on Ubuntu
Author
Owner

@luantak commented on GitHub (Nov 16, 2023):

@jacobkuzmits

Ubuntu 22.04. Instructions

  1. https://rocm.docs.amd.com/en/latest/deploy/linux/prerequisites.html
  2. https://rocm.docs.amd.com/en/latest/deploy/linux/os-native/install.html
sudo snap install go
sudo apt install libclblast-dev
ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/x86_64-linux-gnu/cmake/CLBlast go generate -tags rocm ./...
go build -tags rocm
<!-- gh-comment-id:1813743259 --> @luantak commented on GitHub (Nov 16, 2023): @jacobkuzmits Ubuntu 22.04. Instructions 1. https://rocm.docs.amd.com/en/latest/deploy/linux/prerequisites.html 2. https://rocm.docs.amd.com/en/latest/deploy/linux/os-native/install.html 3. ```sh sudo snap install go sudo apt install libclblast-dev ``` 4. ``` ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/x86_64-linux-gnu/cmake/CLBlast go generate -tags rocm ./... go build -tags rocm ```
Author
Owner

@makenwaves commented on GitHub (Nov 17, 2023):

Hey @jacobkuzmits, so after a bit of digging on the same problem you were having, I found that:
sudo apt-get install libstdc++-12-dev
seemed to do the trick. The go compilation still throws a ton of warnings, but it compiled, and it is flying!!. I'm using rocm 5.7 and running a 7900xtx. I'm kinda new to posting so I hope this is welcome:)

<!-- gh-comment-id:1815522397 --> @makenwaves commented on GitHub (Nov 17, 2023): Hey @jacobkuzmits, so after a bit of digging on the same problem you were having, I found that: `sudo apt-get install libstdc++-12-dev` seemed to do the trick. The go compilation still throws a ton of warnings, but it compiled, and it is flying!!. I'm using rocm 5.7 and running a 7900xtx. I'm kinda new to posting so I hope this is welcome:)
Author
Owner

@iDeNoh commented on GitHub (Nov 17, 2023):

@iDeNoh what card, distro and please links logs of a build from a
clean checkout

MSI 6700xt, Ubuntu 22.04 rocm 5.7, and here's install logs.txt

<!-- gh-comment-id:1816644858 --> @iDeNoh commented on GitHub (Nov 17, 2023): > @iDeNoh what card, distro and please links logs of a build from a clean checkout MSI 6700xt, Ubuntu 22.04 rocm 5.7, and here's [install logs.txt](https://github.com/jmorganca/ollama/files/13393703/install.logs.txt)
Author
Owner

@65a commented on GitHub (Nov 17, 2023):

@iDeNoh a couple things, you have some go errors I don't expect, this may be related to needing to update go on your machine, I think that may be related to the snap thing @lu4p mentioned, but I don't get go mod errors locally. Your go build command doesn't look like it succeeds for anything but cpu, and the go:build errors make me think your go version doesn't understand them properly (for ollama or tonic, it looks like?) so it only builds the cpu runner, and at the very end, you are running ollama serve instead of ./ollama serve (the latter would be the result of the build, the former a system-wide install.

<!-- gh-comment-id:1816702016 --> @65a commented on GitHub (Nov 17, 2023): @iDeNoh a couple things, you have some go errors I don't expect, this may be related to needing to update go on your machine, I think that may be related to the snap thing @lu4p mentioned, but I don't get go mod errors locally. Your `go build` command doesn't look like it succeeds for anything but cpu, and the `go:build` errors make me think your go version doesn't understand them properly (for ollama or tonic, it looks like?) so it only builds the cpu runner, and at the very end, you are running `ollama serve` instead of `./ollama serve` (the latter would be the result of the build, the former a system-wide install.
Author
Owner

@iDeNoh commented on GitHub (Nov 17, 2023):

@iDeNoh a couple things, you have some go errors I don't expect, this may be related to needing to update go on your machine, I think that may be related to the snap thing @lu4p mentioned, but I don't get go mod errors locally. Your go build command doesn't look like it succeeds for anything but cpu, and the go:build errors make me think your go version doesn't understand them properly (for ollama or tonic, it looks like?) so it only builds the cpu runner, and at the very end, you are running ollama serve instead of ./ollama serve (the latter would be the result of the build, the former a system-wide install.

well im not sure, according to snap i have go v1.21.4 installed. im wondering if ive got something wrong with my ubuntu install

<!-- gh-comment-id:1816881086 --> @iDeNoh commented on GitHub (Nov 17, 2023): > @iDeNoh a couple things, you have some go errors I don't expect, this may be related to needing to update go on your machine, I think that may be related to the snap thing @lu4p mentioned, but I don't get go mod errors locally. Your `go build` command doesn't look like it succeeds for anything but cpu, and the `go:build` errors make me think your go version doesn't understand them properly (for ollama or tonic, it looks like?) so it only builds the cpu runner, and at the very end, you are running `ollama serve` instead of `./ollama serve` (the latter would be the result of the build, the former a system-wide install. well im not sure, according to snap i have go v1.21.4 installed. im wondering if ive got something wrong with my ubuntu install
Author
Owner

@iDeNoh commented on GitHub (Nov 17, 2023):

Ah, found the culprit. I had an old version of go installed (1.16.5) that was superseding the snap version of go, deleted that and reinstalled go via snap and go generate ran without complaining, and go build was able to run successfully, @lu4p you were right, it was absolutely my version of go. fyi for anyone else in my shoes, don't rely on snap. go version will show if you have an older version installed (in my case it was a preinstalled version that came with ubuntu that never got updated/replaced)

<!-- gh-comment-id:1816933239 --> @iDeNoh commented on GitHub (Nov 17, 2023): Ah, found the culprit. I had an old version of go installed (1.16.5) that was superseding the snap version of go, deleted that and reinstalled go via snap and go generate ran without complaining, and go build was able to run successfully, @lu4p you were right, it was absolutely my version of go. **fyi** for anyone else in my shoes, don't rely on snap. `go version` will show if you have an older version installed (in my case it was a preinstalled version that came with ubuntu that never got updated/replaced)
Author
Owner

@GZGavinZhao commented on GitHub (Nov 18, 2023):

@shamb0 I don't know where you installed rocBLAS from, but I checked that as of ROCm 5.7.2 the official rocblas package from AMD has gfx1010 support. If you install ROCm using AMD's instructions, the rocBLAS errors should go away.

<!-- gh-comment-id:1817594420 --> @GZGavinZhao commented on GitHub (Nov 18, 2023): @shamb0 I don't know where you installed rocBLAS from, but I checked that as of ROCm 5.7.2 the official `rocblas` package from AMD has `gfx1010` support. If you install ROCm using [AMD's instructions](https://rocm.docs.amd.com/en/latest/deploy/linux/quick_start.html), the rocBLAS errors should go away.
Author
Owner

@shamb0 commented on GitHub (Nov 19, 2023):

@shamb0 I don't know where you installed rocBLAS from, but I checked that as of ROCm 5.7.2 the official rocblas package from AMD has gfx1010 support. If you install ROCm using AMD's instructions, the rocBLAS errors should go away.

Thanks @GZGavinZhao

  • Can I have pointer to ROCm 5.7.2 release ?

  • I installed rocBLAS using below command:

https://rocm.docs.amd.com/projects/rocBLAS/en/latest/Linux_Install_Guide.html#building-and-installing-rocblas

sudo amdgpu-install --usecase=rocmdev

rocm-dev is already the newest version (5.7.0.50700-63~22.04).
amdgpu-dkms is already the newest version (1:6.2.4.50700-1666569.22.04).
linux-headers-6.2.0-36-generic is already the newest version (6.2.0-36.37).
  • I learned from below threads RoCM support for Navi 10 gfx1010, is not supported or enabled.

https://github.com/ROCmSoftwarePlatform/Tensile/issues/1165

https://github.com/RadeonOpenCompute/ROCm/issues/1714

  • Can I have the pointer from where you got this good news ?

<!-- gh-comment-id:1817734741 --> @shamb0 commented on GitHub (Nov 19, 2023): > @shamb0 I don't know where you installed rocBLAS from, but I checked that as of ROCm 5.7.2 the official `rocblas` package from AMD has `gfx1010` support. If you install ROCm using [AMD's instructions](https://rocm.docs.amd.com/en/latest/deploy/linux/quick_start.html), the rocBLAS errors should go away. Thanks @GZGavinZhao - Can I have pointer to ROCm 5.7.2 release ? - I installed rocBLAS using below command: https://rocm.docs.amd.com/projects/rocBLAS/en/latest/Linux_Install_Guide.html#building-and-installing-rocblas ![](https://i.imgur.com/nVIjKvd.png) ```shell sudo amdgpu-install --usecase=rocmdev rocm-dev is already the newest version (5.7.0.50700-63~22.04). amdgpu-dkms is already the newest version (1:6.2.4.50700-1666569.22.04). linux-headers-6.2.0-36-generic is already the newest version (6.2.0-36.37). ``` - I learned from below threads RoCM support for Navi 10 gfx1010, is not supported or enabled. https://github.com/ROCmSoftwarePlatform/Tensile/issues/1165 https://github.com/RadeonOpenCompute/ROCm/issues/1714 - Can I have the pointer from where you got this good news ? ![](https://i.imgur.com/t6yy4R5.png)
Author
Owner

@GZGavinZhao commented on GitHub (Nov 19, 2023):

@shamb0 If you add the repository, you should be able to just run sudo apt-get install rocblas rocm-hip-libraries and install rocblas and all of its dependencies. You shouldn't need to build from source. Sorry, I confused myself. gfx1010 is already enabled by default, but lazy library loading seems to be unsupported for gfx1010 (which is the cause of the error you linked here). To fix this, you either have to build from source with -DTensile_LAZY_LIBRARY_LOADING=OFF, or if you're on Debian 13/Ubuntu 23.10 install librocblas-dev with OS-installed package. Your distribution may be providing pre-built packages that have the correct configuration, so if you're on Debian 13/Ubuntu 23.10 you shouldn't need to download anything from AMD. (source)

I know that rocBLAS has gfx1010 support because of two reasons:

  1. This comment from the same thread you linked to me.
  2. I manually inspected the rocblas 5.7.0 packages and saw that files like /opt/rocm-5.7.0/lib/rocblas/library/Kernels.so-000-gfx1010.hsaco exist.
<!-- gh-comment-id:1817869758 --> @GZGavinZhao commented on GitHub (Nov 19, 2023): @shamb0 ~~If you [add the repository](https://rocm.docs.amd.com/en/latest/deploy/linux/quick_start.html#add-repositories), you should be able to just run `sudo apt-get install rocblas rocm-hip-libraries` and install `rocblas` and all of its dependencies. You shouldn't need to build from source.~~ Sorry, I confused myself. `gfx1010` is already enabled by default, but lazy library loading seems to be unsupported for `gfx1010` (which is the cause of the error you linked [here](https://github.com/jmorganca/ollama/pull/814#issuecomment-1817577177)). To fix this, you either have to build from source with `-DTensile_LAZY_LIBRARY_LOADING=OFF`, or if you're on Debian 13/Ubuntu 23.10 install `librocblas-dev` with OS-installed package. Your distribution may be providing pre-built packages that have the correct configuration, so if you're on Debian 13/Ubuntu 23.10 you shouldn't need to download anything from AMD. ([source](https://github.com/ROCmSoftwarePlatform/rocBLAS/issues/1339#issuecomment-1682846493)) I know that rocBLAS has `gfx1010` support because of two reasons: 1. [This comment](https://github.com/ROCmSoftwarePlatform/Tensile/issues/1165#issuecomment-1094556880) from the same thread you linked to me. 2. I manually inspected the `rocblas` 5.7.0 packages and saw that files like `/opt/rocm-5.7.0/lib/rocblas/library/Kernels.so-000-gfx1010.hsaco` exist.
Author
Owner

@shamb0 commented on GitHub (Nov 19, 2023):

@shamb0 If you add the repository, you should be able to just run sudo apt-get install rocblas rocm-hip-libraries and install rocblas and all of its dependencies. You shouldn't need to build from source. Sorry, I confused myself. gfx1010 is already enabled by default, but lazy library loading seems to be unsupported for gfx1010 (which is the cause of the error you linked here). To fix this, you either have to build from source with -DTensile_LAZY_LIBRARY_LOADING=OFF, or if you're on Debian 13/Ubuntu 23.10 install librocblas-dev with OS-installed package. Your distribution may be providing pre-built packages that have the correct configuration, so if you're on Debian 13/Ubuntu 23.10 you shouldn't need to download anything from AMD. (source)

I know that rocBLAS has gfx1010 support because of two reasons:

  1. This comment from the same thread you linked to me.
  2. I manually inspected the rocblas 5.7.0 packages and saw that files like /opt/rocm-5.7.0/lib/rocblas/library/Kernels.so-000-gfx1010.hsaco exist.

Thanks a lot for very detailed analysis @GZGavinZhao, I try the suggestions and get back ASAP.

<!-- gh-comment-id:1817884399 --> @shamb0 commented on GitHub (Nov 19, 2023): > @shamb0 ~If you [add the repository](https://rocm.docs.amd.com/en/latest/deploy/linux/quick_start.html#add-repositories), you should be able to just run `sudo apt-get install rocblas rocm-hip-libraries` and install `rocblas` and all of its dependencies. You shouldn't need to build from source.~ Sorry, I confused myself. `gfx1010` is already enabled by default, but lazy library loading seems to be unsupported for `gfx1010` (which is the cause of the error you linked [here](https://github.com/jmorganca/ollama/pull/814#issuecomment-1817577177)). To fix this, you either have to build from source with `-DTensile_LAZY_LIBRARY_LOADING=OFF`, or if you're on Debian 13/Ubuntu 23.10 install `librocblas-dev` with OS-installed package. Your distribution may be providing pre-built packages that have the correct configuration, so if you're on Debian 13/Ubuntu 23.10 you shouldn't need to download anything from AMD. ([source](https://github.com/ROCmSoftwarePlatform/rocBLAS/issues/1339#issuecomment-1682846493)) > > I know that rocBLAS has `gfx1010` support because of two reasons: > > 1. [This comment](https://github.com/ROCmSoftwarePlatform/Tensile/issues/1165#issuecomment-1094556880) from the same thread you linked to me. > 2. I manually inspected the `rocblas` 5.7.0 packages and saw that files like `/opt/rocm-5.7.0/lib/rocblas/library/Kernels.so-000-gfx1010.hsaco` exist. Thanks a lot for very detailed analysis @GZGavinZhao, I try the suggestions and get back ASAP.
Author
Owner

@james-luther commented on GitHub (Nov 24, 2023):

@shamb0 If you add the repository, you should be able to just run sudo apt-get install rocblas rocm-hip-libraries and install rocblas and all of its dependencies. You shouldn't need to build from source. Sorry, I confused myself. gfx1010 is already enabled by default, but lazy library loading seems to be unsupported for gfx1010 (which is the cause of the error you linked here). To fix this, you either have to build from source with -DTensile_LAZY_LIBRARY_LOADING=OFF, or if you're on Debian 13/Ubuntu 23.10 install librocblas-dev with OS-installed package. Your distribution may be providing pre-built packages that have the correct configuration, so if you're on Debian 13/Ubuntu 23.10 you shouldn't need to download anything from AMD. (source)
I know that rocBLAS has gfx1010 support because of two reasons:

  1. This comment from the same thread you linked to me.
  2. I manually inspected the rocblas 5.7.0 packages and saw that files like /opt/rocm-5.7.0/lib/rocblas/library/Kernels.so-000-gfx1010.hsaco exist.

Thanks a lot for very detailed analysis @GZGavinZhao, I try the suggestions and get back ASAP.

When you are running ollama make sure the user you are running as is a member of both the video and render group. If not, you will have rocblas errors. My regular user is a member of these groups and when running manually things worked great but I failed to add the ollama user to these groups and the service was having the error you mentioned.

<!-- gh-comment-id:1825869045 --> @james-luther commented on GitHub (Nov 24, 2023): > > @shamb0 ~If you [add the repository](https://rocm.docs.amd.com/en/latest/deploy/linux/quick_start.html#add-repositories), you should be able to just run `sudo apt-get install rocblas rocm-hip-libraries` and install `rocblas` and all of its dependencies. You shouldn't need to build from source.~ Sorry, I confused myself. `gfx1010` is already enabled by default, but lazy library loading seems to be unsupported for `gfx1010` (which is the cause of the error you linked [here](https://github.com/jmorganca/ollama/pull/814#issuecomment-1817577177)). To fix this, you either have to build from source with `-DTensile_LAZY_LIBRARY_LOADING=OFF`, or if you're on Debian 13/Ubuntu 23.10 install `librocblas-dev` with OS-installed package. Your distribution may be providing pre-built packages that have the correct configuration, so if you're on Debian 13/Ubuntu 23.10 you shouldn't need to download anything from AMD. ([source](https://github.com/ROCmSoftwarePlatform/rocBLAS/issues/1339#issuecomment-1682846493)) > > I know that rocBLAS has `gfx1010` support because of two reasons: > > > > 1. [This comment](https://github.com/ROCmSoftwarePlatform/Tensile/issues/1165#issuecomment-1094556880) from the same thread you linked to me. > > 2. I manually inspected the `rocblas` 5.7.0 packages and saw that files like `/opt/rocm-5.7.0/lib/rocblas/library/Kernels.so-000-gfx1010.hsaco` exist. > > Thanks a lot for very detailed analysis @GZGavinZhao, I try the suggestions and get back ASAP. When you are running ollama make sure the user you are running as is a member of both the `video` and `render` group. If not, you will have `rocblas` errors. My regular user is a member of these groups and when running manually things worked great but I failed to add the ollama user to these groups and the service was having the error you mentioned.
Author
Owner

@wijjj commented on GitHub (Nov 27, 2023):

sorry just checking in: did anybody manage to run ollama using rocm? :)

<!-- gh-comment-id:1828209447 --> @wijjj commented on GitHub (Nov 27, 2023): sorry just checking in: did anybody manage to run ollama using rocm? :)
Author
Owner

@james-luther commented on GitHub (Nov 27, 2023):

sorry just checking in: did anybody manage to run ollama using rocm? :)

I am able to and am currently running it on Ubuntu 23.10 Server with 7900XTX.

Follow the instructions to install rocm for your distro and install. With Ubuntu you have some additional groups you need to ensure are setup but that's it.

<!-- gh-comment-id:1828233919 --> @james-luther commented on GitHub (Nov 27, 2023): > sorry just checking in: did anybody manage to run ollama using rocm? :) I am able to and am currently running it on Ubuntu 23.10 Server with 7900XTX. Follow the instructions to install rocm for your distro and install. With Ubuntu you have some additional groups you need to ensure are setup but that's it.
Author
Owner

@bergutman commented on GitHub (Nov 28, 2023):

Howdy! For anyone not running one of AMD's supported distributions (most of us) I have created a docker image containing a copy of CLBlast and @65a's fork of ollama with ROCm support. The image exposes the api on port 11434 but you can also bash into the container if you'd like to access ollama from a terminal. The README explains how to get everything up and running. Cheers! 🎉

https://hub.docker.com/r/bergutman/ollama-rocm

<!-- gh-comment-id:1828974337 --> @bergutman commented on GitHub (Nov 28, 2023): Howdy! For anyone not running one of AMD's supported distributions (most of us) I have created a docker image containing a copy of CLBlast and @65a's fork of ollama with ROCm support. The image exposes the api on port 11434 but you can also bash into the container if you'd like to access ollama from a terminal. The README explains how to get everything up and running. Cheers! 🎉 [https://hub.docker.com/r/bergutman/ollama-rocm](https://hub.docker.com/r/bergutman/ollama-rocm)
Author
Owner

@ignacio82 commented on GitHub (Nov 28, 2023):

Thanks @bergutman . I think I'm doing something wrong. When I transcode a movie with jellyfin I can see clear activity on my gpu using radeontop:

image

On the other hand when I make a query to llama2 I don't see any activity:

image

This is what I have in my docker-compose file:

version: "3.9"
services:
    ollama:
        image: bergutman/ollama-rocm
        container_name: ollama
        restart: unless-stopped
        ports:
          - "11434:11434"
        devices:  # this is the same that I have for jellyfin
          - "/dev/dri/renderD128:/dev/dri/renderD128"
          - "/dev/kfd:/dev/kfd"
          - "/dev/dri/card0:/dev/dri/card0"        
        group_add:
          - video
        stdin_open: true
        tty: true       
        volumes:
          - nfs-ollama:/usr/share/ollama

Am I missing something?

<!-- gh-comment-id:1829149833 --> @ignacio82 commented on GitHub (Nov 28, 2023): Thanks @bergutman . I think I'm doing something wrong. When I transcode a movie with jellyfin I can see clear activity on my gpu using `radeontop`: ![image](https://github.com/jmorganca/ollama/assets/1833309/5c2c684f-3956-49f0-8ee7-b0e8488742d4) On the other hand when I make a query to llama2 I don't see any activity: ![image](https://github.com/jmorganca/ollama/assets/1833309/b5a7b00c-5dd5-4fdc-a83f-0ec100a73dfc) This is what I have in my docker-compose file: ``` version: "3.9" services: ollama: image: bergutman/ollama-rocm container_name: ollama restart: unless-stopped ports: - "11434:11434" devices: # this is the same that I have for jellyfin - "/dev/dri/renderD128:/dev/dri/renderD128" - "/dev/kfd:/dev/kfd" - "/dev/dri/card0:/dev/dri/card0" group_add: - video stdin_open: true tty: true volumes: - nfs-ollama:/usr/share/ollama ``` Am I missing something?
Author
Owner

@james-luther commented on GitHub (Nov 28, 2023):

Thanks @bergutman . I think I'm doing something wrong. When I transcode a movie with jellyfin I can see clear activity on my gpu using radeontop:

image

On the other hand when I make a query to llama2 I don't see any activity:

image

This is what I have in my docker-compose file:

version: "3.9"
services:
    ollama:
        image: bergutman/ollama-rocm
        container_name: ollama
        restart: unless-stopped
        ports:
          - "11434:11434"
        devices:  # this is the same that I have for jellyfin
          - "/dev/dri/renderD128:/dev/dri/renderD128"
          - "/dev/kfd:/dev/kfd"
          - "/dev/dri/card0:/dev/dri/card0"        
        group_add:
          - video
        stdin_open: true
        tty: true       
        volumes:
          - nfs-ollama:/usr/share/ollama

Am I missing something?

Make sure when you run Radeontop you select the correct PCI bus. When I tested it initially I didn't select a bus and it attached to the GPU integrated into my processor (7950X).

When I run lspci -nn | grep -E 'VGA|Display'

03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 [Radeon RX 7900 XT/7900 XTX] [1002:744c] (rev c8)
59:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Raphael [1002:164e] (rev c1)

From here I run radeontop with radeontop -b 03:00.0 and I see ollama using the card.

<!-- gh-comment-id:1830159821 --> @james-luther commented on GitHub (Nov 28, 2023): > Thanks @bergutman . I think I'm doing something wrong. When I transcode a movie with jellyfin I can see clear activity on my gpu using `radeontop`: > > ![image](https://private-user-images.githubusercontent.com/1833309/286144747-5c2c684f-3956-49f0-8ee7-b0e8488742d4.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDExODc0MTIsIm5iZiI6MTcwMTE4NzExMiwicGF0aCI6Ii8xODMzMzA5LzI4NjE0NDc0Ny01YzJjNjg0Zi0zOTU2LTQ5ZjAtOGVlNy1iMGU4NDg4NzQyZDQucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQUlXTkpZQVg0Q1NWRUg1M0ElMkYyMDIzMTEyOCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyMzExMjhUMTU1ODMyWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZWFmNzdmMWY3M2JmNjVmMjg5NThmYjc4NGU5NDQ4YTIyNDAzYTc5YmQ5OWRjNGU3OWE4ZGNmY2VkNGZmNWI2YSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.sWxuMC2BlpVQu7j9whClCBMm8MktAJLC3-LsmN2OcLk) > > On the other hand when I make a query to llama2 I don't see any activity: > > ![image](https://private-user-images.githubusercontent.com/1833309/286145123-b5a7b00c-5dd5-4fdc-a83f-0ec100a73dfc.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDExODc0MTIsIm5iZiI6MTcwMTE4NzExMiwicGF0aCI6Ii8xODMzMzA5LzI4NjE0NTEyMy1iNWE3YjAwYy01ZGQ1LTRmZGMtYTgzZi0wZWMxMDBhNzNkZmMucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQUlXTkpZQVg0Q1NWRUg1M0ElMkYyMDIzMTEyOCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyMzExMjhUMTU1ODMyWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZTdhZWVhZmY2OTMzNWQzNDEwOWVkYzIzZWMxMjExMzA3MTc4NWJjODk1YzlhYzFhNDFlZWUxYzdiYjNmMTY3MiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.S7fpPK5td9yn3xgHTzJSrM8taZYjB0pVxXk1IPMcT08) > > This is what I have in my docker-compose file: > > ``` > version: "3.9" > services: > ollama: > image: bergutman/ollama-rocm > container_name: ollama > restart: unless-stopped > ports: > - "11434:11434" > devices: # this is the same that I have for jellyfin > - "/dev/dri/renderD128:/dev/dri/renderD128" > - "/dev/kfd:/dev/kfd" > - "/dev/dri/card0:/dev/dri/card0" > group_add: > - video > stdin_open: true > tty: true > volumes: > - nfs-ollama:/usr/share/ollama > ``` > > Am I missing something? Make sure when you run Radeontop you select the correct PCI bus. When I tested it initially I didn't select a bus and it attached to the GPU integrated into my processor (7950X). When I run `lspci -nn | grep -E 'VGA|Display'` ```bash 03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 [Radeon RX 7900 XT/7900 XTX] [1002:744c] (rev c8) 59:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Raphael [1002:164e] (rev c1) ``` From here I run radeontop with `radeontop -b 03:00.0` and I see ollama using the card.
Author
Owner

@ignacio82 commented on GitHub (Nov 29, 2023):

Thanks @james-luther
$ lspci -nn | grep -E 'VGA|Display'
35:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt [1002:1681] (rev c7)
Is my problem that my card is not compatible?

<!-- gh-comment-id:1831152498 --> @ignacio82 commented on GitHub (Nov 29, 2023): Thanks @james-luther $ lspci -nn | grep -E 'VGA|Display' 35:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt [1002:1681] (rev c7) Is my problem that my card is not compatible?
Author
Owner

@james-luther commented on GitHub (Nov 29, 2023):

@ignacio82 no, that card is compatible with RoCM. Maybe it is a groups/permissions thing with Docker. Have you tried /dev/dri:/dev/dri in your docker-compose instead of attempting to isolate the specific card?

<!-- gh-comment-id:1831898761 --> @james-luther commented on GitHub (Nov 29, 2023): @ignacio82 no, that card is compatible with RoCM. Maybe it is a groups/permissions thing with Docker. Have you tried /dev/dri:/dev/dri in your docker-compose instead of attempting to isolate the specific card?
Author
Owner

@prawilny commented on GitHub (Dec 2, 2023):

@bergutman, thanks for the help in getting it working, but your image seems to be missing rocblas-dev and hipblas-dev - after I added those, queries started being GPU-accelerated, whereas before the change the build step of ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./... silently failed after not finding BLAS libraries.

You can view my configuration at https://github.com/prawilny/ollama-rocm-docker (I tested it with podman and the whole redhatware stack).

<!-- gh-comment-id:1837118335 --> @prawilny commented on GitHub (Dec 2, 2023): @bergutman, thanks for the help in getting it working, but your image seems to be missing `rocblas-dev` and `hipblas-dev` - after I added those, queries started being GPU-accelerated, whereas before the change the build step of `ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./...` silently failed after not finding BLAS libraries. You can view my configuration at https://github.com/prawilny/ollama-rocm-docker (I tested it with podman and the whole redhatware stack).
Author
Owner

@ignacio82 commented on GitHub (Dec 3, 2023):

Have you tried /dev/dri:/dev/dri in your docker-compose instead of attempting to isolate the specific card?

Yes, that made no difference.

@prawilny is your container on dockerhub?

<!-- gh-comment-id:1837317632 --> @ignacio82 commented on GitHub (Dec 3, 2023): > Have you tried /dev/dri:/dev/dri in your docker-compose instead of attempting to isolate the specific card? Yes, that made no difference. @prawilny is your container on dockerhub?
Author
Owner

@prawilny commented on GitHub (Dec 3, 2023):

@ignacio82, no, I didn't push it to any registry, but you can build it from this Dockerfile (note that you may need to remove line ENV HSA_OVERRIDE_GFX_VERSION=10.3.0 which is a workaround specific to my GPU).

<!-- gh-comment-id:1837422794 --> @prawilny commented on GitHub (Dec 3, 2023): @ignacio82, no, I didn't push it to any registry, but you can build it from this [Dockerfile](https://github.com/prawilny/ollama-rocm-docker/blob/master/Dockerfile) (note that you may need to remove line `ENV HSA_OVERRIDE_GFX_VERSION=10.3.0` which is a workaround specific to my GPU).
Author
Owner

@ml2s commented on GitHub (Dec 5, 2023):

A quick question:
People say ROCm functions as a translation layer, does that mean I need to install CUDA toolkit on my Linux system in order to use ROCm? If yes, before or after I install ROCm?

<!-- gh-comment-id:1839804313 --> @ml2s commented on GitHub (Dec 5, 2023): A quick question: People say ROCm functions as a translation layer, does that mean I need to install CUDA toolkit on my Linux system in order to use ROCm? If yes, before or after I install ROCm?
Author
Owner

@luantak commented on GitHub (Dec 5, 2023):

No.

<!-- gh-comment-id:1839922769 --> @luantak commented on GitHub (Dec 5, 2023): No.
Author
Owner

@markg85 commented on GitHub (Dec 5, 2023):

Hi,

Sorry for "partially hijacking" this thread. It's still very much on-subject but perhaps in the wrong place.
I'm actually trying to reach @65a but he's so smart to hide his mail everywhere. This ping probably reaches him though :)

So i'm trying to build ollama with rocm for archlinux by modifying the Arch PKGBUILD file.

The file i have now looks like:

pkgname=ollama
pkgdesc='Create, run and share large language models (LLMs)'
pkgver=0.1.12
pkgrel=1
arch=(x86_64)
url='https://github.com/65a/ollama.git'
license=(MIT)
makedepends=(cmake git go setconf clblast)
# The git submodule commit hashes are here:
# https://github.com/jmorganca/ollama/tree/v0.1.12/llm/llama.cpp
_ggmlcommit=9e232f0234073358e7031c1b8d7aa45020469a3b
_ggufcommit=9656026b53236ed7328458269c4c798dd50ac8d1
source=(git+$url
        ggml::git+https://github.com/ggerganov/llama.cpp#commit=$_ggmlcommit
        gguf::git+https://github.com/ggerganov/llama.cpp#commit=$_ggufcommit
        sysusers.conf
        tmpfiles.d
        ollama.service)
b2sums=('SKIP'
        'SKIP'
        'SKIP'
        '3aabf135c4f18e1ad745ae8800db782b25b15305dfeaaa031b4501408ab7e7d01f66e8ebb5be59fc813cfbff6788d08d2e48dcf24ecc480a40ec9db8dbce9fec'
        'c890a741958d31375ebbd60eeeb29eff965a6e1e69f15eb17ea7d15b575a4abee176b7d407b3e1764aa7436862a764a05ad04bb9901a739ffd81968c09046bb6'
        'a773bbf16cf5ccc2ee505ad77c3f9275346ddf412be283cfeaee7c2e4c41b8637a31aaff8766ed769524ebddc0c03cf924724452639b62208e578d98b9176124')

prepare() {
  cd $pkgname

  rm -frv llm/llama.cpp/gg{ml,uf}

  # Copy git submodule files instead of symlinking because the build process is sensitive to symlinks.
  cp -r "$srcdir/ggml" llm/llama.cpp/ggml
  cp -r "$srcdir/gguf" llm/llama.cpp/gguf

  # Do not git clone when "go generate" is being run.
  sed -i 's,git submodule,true,g' llm/llama.cpp/generate_linux.go

  # Set the version number
  setconf version/version.go 'var Version string' "\"$pkgver\""
}

build() {
  cd $pkgname
  export CGO_CFLAGS="$CFLAGS" CGO_CPPFLAGS="$CPPFLAGS" CGO_CXXFLAGS="$CXXFLAGS" CGO_LDFLAGS="$LDFLAGS"

  ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./...
  go build -tags rocm -buildmode=pie -trimpath -mod=readonly -modcacherw -ldflags=-linkmode=external -ldflags=-buildid=''
}

build() {
  cd $pkgname
  export CGO_CFLAGS="$CFLAGS" CGO_CPPFLAGS="$CPPFLAGS" CGO_CXXFLAGS="$CXXFLAGS" CGO_LDFLAGS="$LDFLAGS"

  ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./...
  go build -tags rocm -buildmode=pie -trimpath -mod=readonly -modcacherw -ldflags=-linkmode=external -ldflags=-buildid=''
}

check() {
  cd $pkgname
  go test ./...
}

package() {
  install -Dm755 $pkgname/$pkgname "$pkgdir/usr/bin/$pkgname"
  install -dm700 "$pkgdir/var/lib/ollama"
  install -Dm644 ollama.service "$pkgdir/usr/lib/systemd/system/ollama.service"
  install -Dm644 sysusers.conf "$pkgdir/usr/lib/sysusers.d/ollama.conf"
  install -Dm644 tmpfiles.d "$pkgdir/usr/lib/tmpfiles.d/ollama.conf"
  install -Dm644 $pkgname/LICENSE "$pkgdir/usr/share/licenses/$pkgname/LICENSE"
}

While compilation seems to get far, it eventually does fail with an error like this:

-- Build files have been written to: /home/mark/pkgbuilds/ollama/src/ollama/llm/llama.cpp/gguf/build/rocm
[  6%] Building CXX object CMakeFiles/ggml-rocm.dir/ggml-cuda.cu.o
[ 13%] Built target BUILD_INFO
error: option 'cf-protection=return' cannot be specified on this target
error: option 'cf-protection=branch' cannot be specified on this target
2 errors generated when compiling for gfx1010.
make[3]: *** [CMakeFiles/ggml-rocm.dir/build.make:76: CMakeFiles/ggml-rocm.dir/ggml-cuda.cu.o] Error 1
make[2]: *** [CMakeFiles/Makefile2:650: CMakeFiles/ggml-rocm.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:2488: examples/server/CMakeFiles/server.dir/rule] Error 2
make: *** [Makefile:1050: server] Error 2
llm/llama.cpp/generate_linux_rocm.go:24: running "cmake": exit status 2
==> ERROR: A failure occurred in build().
    Aborting...

Besides the obvious compile error, the gfx gpu is wrongly detected too. I have a gfx1100, not a gfx1010.
Do you perhaps have a clue on how to solve these issues?

<!-- gh-comment-id:1841255129 --> @markg85 commented on GitHub (Dec 5, 2023): Hi, Sorry for "partially hijacking" this thread. It's still very much on-subject but perhaps in the wrong place. I'm actually trying to reach @65a but he's so smart to hide his mail everywhere. This ping probably reaches him though :) So i'm trying to build ollama with rocm for archlinux by modifying the Arch PKGBUILD file. The file i have now looks like: ```bash pkgname=ollama pkgdesc='Create, run and share large language models (LLMs)' pkgver=0.1.12 pkgrel=1 arch=(x86_64) url='https://github.com/65a/ollama.git' license=(MIT) makedepends=(cmake git go setconf clblast) # The git submodule commit hashes are here: # https://github.com/jmorganca/ollama/tree/v0.1.12/llm/llama.cpp _ggmlcommit=9e232f0234073358e7031c1b8d7aa45020469a3b _ggufcommit=9656026b53236ed7328458269c4c798dd50ac8d1 source=(git+$url ggml::git+https://github.com/ggerganov/llama.cpp#commit=$_ggmlcommit gguf::git+https://github.com/ggerganov/llama.cpp#commit=$_ggufcommit sysusers.conf tmpfiles.d ollama.service) b2sums=('SKIP' 'SKIP' 'SKIP' '3aabf135c4f18e1ad745ae8800db782b25b15305dfeaaa031b4501408ab7e7d01f66e8ebb5be59fc813cfbff6788d08d2e48dcf24ecc480a40ec9db8dbce9fec' 'c890a741958d31375ebbd60eeeb29eff965a6e1e69f15eb17ea7d15b575a4abee176b7d407b3e1764aa7436862a764a05ad04bb9901a739ffd81968c09046bb6' 'a773bbf16cf5ccc2ee505ad77c3f9275346ddf412be283cfeaee7c2e4c41b8637a31aaff8766ed769524ebddc0c03cf924724452639b62208e578d98b9176124') prepare() { cd $pkgname rm -frv llm/llama.cpp/gg{ml,uf} # Copy git submodule files instead of symlinking because the build process is sensitive to symlinks. cp -r "$srcdir/ggml" llm/llama.cpp/ggml cp -r "$srcdir/gguf" llm/llama.cpp/gguf # Do not git clone when "go generate" is being run. sed -i 's,git submodule,true,g' llm/llama.cpp/generate_linux.go # Set the version number setconf version/version.go 'var Version string' "\"$pkgver\"" } build() { cd $pkgname export CGO_CFLAGS="$CFLAGS" CGO_CPPFLAGS="$CPPFLAGS" CGO_CXXFLAGS="$CXXFLAGS" CGO_LDFLAGS="$LDFLAGS" ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./... go build -tags rocm -buildmode=pie -trimpath -mod=readonly -modcacherw -ldflags=-linkmode=external -ldflags=-buildid='' } build() { cd $pkgname export CGO_CFLAGS="$CFLAGS" CGO_CPPFLAGS="$CPPFLAGS" CGO_CXXFLAGS="$CXXFLAGS" CGO_LDFLAGS="$LDFLAGS" ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./... go build -tags rocm -buildmode=pie -trimpath -mod=readonly -modcacherw -ldflags=-linkmode=external -ldflags=-buildid='' } check() { cd $pkgname go test ./... } package() { install -Dm755 $pkgname/$pkgname "$pkgdir/usr/bin/$pkgname" install -dm700 "$pkgdir/var/lib/ollama" install -Dm644 ollama.service "$pkgdir/usr/lib/systemd/system/ollama.service" install -Dm644 sysusers.conf "$pkgdir/usr/lib/sysusers.d/ollama.conf" install -Dm644 tmpfiles.d "$pkgdir/usr/lib/tmpfiles.d/ollama.conf" install -Dm644 $pkgname/LICENSE "$pkgdir/usr/share/licenses/$pkgname/LICENSE" } ``` While compilation seems to get far, it eventually does fail with an error like this: ```bash -- Build files have been written to: /home/mark/pkgbuilds/ollama/src/ollama/llm/llama.cpp/gguf/build/rocm [ 6%] Building CXX object CMakeFiles/ggml-rocm.dir/ggml-cuda.cu.o [ 13%] Built target BUILD_INFO error: option 'cf-protection=return' cannot be specified on this target error: option 'cf-protection=branch' cannot be specified on this target 2 errors generated when compiling for gfx1010. make[3]: *** [CMakeFiles/ggml-rocm.dir/build.make:76: CMakeFiles/ggml-rocm.dir/ggml-cuda.cu.o] Error 1 make[2]: *** [CMakeFiles/Makefile2:650: CMakeFiles/ggml-rocm.dir/all] Error 2 make[1]: *** [CMakeFiles/Makefile2:2488: examples/server/CMakeFiles/server.dir/rule] Error 2 make: *** [Makefile:1050: server] Error 2 llm/llama.cpp/generate_linux_rocm.go:24: running "cmake": exit status 2 ==> ERROR: A failure occurred in build(). Aborting... ``` Besides the obvious compile error, the gfx gpu is wrongly detected too. I have a `gfx1100`, not a `gfx1010`. Do you perhaps have a clue on how to solve these issues?
Author
Owner

@paulie-g commented on GitHub (Dec 5, 2023):

@markg85 You're using a compile flag that's incompatible with one of {llama,rocm,hip,cuda}. It's erroring earlier, but it'll error later too. There are a number of stack, call flow graph etc protections and similar gimmicks that break. It's likely coming from your makepkg.conf.

The gpu is not detected at all for the purposes of building, otherwise it would be a terrible package that can't be transferred to a different system with a different GPU. Since you didn't take specific steps to make sure it only builds for yours, it's building for all of them and that one just happens to be the first.

<!-- gh-comment-id:1841319465 --> @paulie-g commented on GitHub (Dec 5, 2023): @markg85 You're using a compile flag that's incompatible with one of {llama,rocm,hip,cuda}. It's erroring earlier, but it'll error later too. There are a number of stack, call flow graph etc protections and similar gimmicks that break. It's likely coming from your makepkg.conf. The gpu is not detected at all for the purposes of building, otherwise it would be a terrible package that can't be transferred to a different system with a different GPU. Since you didn't take specific steps to make sure it only builds for yours, it's building for all of them and that one just happens to be the first.
Author
Owner

@markg85 commented on GitHub (Dec 6, 2023):

Ah, you're totally right @paulie-g!
makepkg.conf indeed had fcf-protection which causes this error. Removing it made the compile proceed much further!

It still errors though. This time on an actual undefined variable, or so it seems:

[ 40%] Built target ggml
[ 46%] Building CXX object CMakeFiles/llama.dir/llama.cpp.o
[ 53%] Linking CXX static library libllama.a
[ 53%] Built target llama
[ 60%] Building CXX object common/CMakeFiles/common.dir/console.cpp.o
[ 80%] Building CXX object common/CMakeFiles/common.dir/train.cpp.o
[ 80%] Building CXX object common/CMakeFiles/common.dir/sampling.cpp.o
[ 80%] Building CXX object common/CMakeFiles/common.dir/common.cpp.o
[ 86%] Building CXX object common/CMakeFiles/common.dir/grammar-parser.cpp.o
[ 86%] Built target common
[ 93%] Building CXX object examples/server/CMakeFiles/server.dir/server.cpp.o
[100%] Linking CXX executable ../../bin/server
[100%] Built target server
==> Starting check()...
# github.com/jmorganca/ollama/llm
llm/accelerator_none.go:20:12: undefined: errNoGPU
?       github.com/jmorganca/ollama/examples/golang-simplegenerate      [no test files]
?       github.com/jmorganca/ollama/llm/llama.cpp       [no test files]
?       github.com/jmorganca/ollama/parser      [no test files]
?       github.com/jmorganca/ollama/progress    [no test files]
?       github.com/jmorganca/ollama/readline    [no test files]
ok      github.com/jmorganca/ollama/api (cached)
?       github.com/jmorganca/ollama/version     [no test files]
ok      github.com/jmorganca/ollama/format      (cached)
FAIL    github.com/jmorganca/ollama/server [build failed]
FAIL
==> ERROR: A failure occurred in check().
    Aborting...

For context, i changed the build part of the script to:

build() {
  cd $pkgname
#  export CGO_CFLAGS="$CFLAGS" CGO_CPPFLAGS="$CPPFLAGS" CGO_CXXFLAGS="$CXXFLAGS" CGO_LDFLAGS="$LDFLAGS"

  ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./...
  go build -tags rocm
}

But yeah, llm/accelerator_none.go:20:12: undefined: errNoGPU kinda kills the compile thoroughly. Is there something i need to do to get that variable to exist?

<!-- gh-comment-id:1843251605 --> @markg85 commented on GitHub (Dec 6, 2023): Ah, you're totally right @paulie-g! `makepkg.conf` indeed had `fcf-protection` which causes this error. Removing it made the compile proceed much further! It still errors though. This time on an actual undefined variable, or so it seems: ``` [ 40%] Built target ggml [ 46%] Building CXX object CMakeFiles/llama.dir/llama.cpp.o [ 53%] Linking CXX static library libllama.a [ 53%] Built target llama [ 60%] Building CXX object common/CMakeFiles/common.dir/console.cpp.o [ 80%] Building CXX object common/CMakeFiles/common.dir/train.cpp.o [ 80%] Building CXX object common/CMakeFiles/common.dir/sampling.cpp.o [ 80%] Building CXX object common/CMakeFiles/common.dir/common.cpp.o [ 86%] Building CXX object common/CMakeFiles/common.dir/grammar-parser.cpp.o [ 86%] Built target common [ 93%] Building CXX object examples/server/CMakeFiles/server.dir/server.cpp.o [100%] Linking CXX executable ../../bin/server [100%] Built target server ==> Starting check()... # github.com/jmorganca/ollama/llm llm/accelerator_none.go:20:12: undefined: errNoGPU ? github.com/jmorganca/ollama/examples/golang-simplegenerate [no test files] ? github.com/jmorganca/ollama/llm/llama.cpp [no test files] ? github.com/jmorganca/ollama/parser [no test files] ? github.com/jmorganca/ollama/progress [no test files] ? github.com/jmorganca/ollama/readline [no test files] ok github.com/jmorganca/ollama/api (cached) ? github.com/jmorganca/ollama/version [no test files] ok github.com/jmorganca/ollama/format (cached) FAIL github.com/jmorganca/ollama/server [build failed] FAIL ==> ERROR: A failure occurred in check(). Aborting... ``` For context, i changed the build part of the script to: ```bash build() { cd $pkgname # export CGO_CFLAGS="$CFLAGS" CGO_CPPFLAGS="$CPPFLAGS" CGO_CXXFLAGS="$CXXFLAGS" CGO_LDFLAGS="$LDFLAGS" ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./... go build -tags rocm } ``` But yeah, `llm/accelerator_none.go:20:12: undefined: errNoGPU` kinda kills the compile thoroughly. Is there something i need to do to get that variable to exist?
Author
Owner

@paulie-g commented on GitHub (Dec 7, 2023):

@markg85 You might try editing the PKGBUILD to pull from the repo that actually has ROCm support ;) You need to pull from @65a 's repo, his ROCm support pull isn't merged into mainline yet. Not sure if that's the cause of that specific problem (and not sure why you've edited out the CGO flags), but it would be a good start.

<!-- gh-comment-id:1845402007 --> @paulie-g commented on GitHub (Dec 7, 2023): @markg85 You might try editing the PKGBUILD to pull from the repo that actually has ROCm support ;) You need to pull from @65a 's repo, his ROCm support pull isn't merged into mainline yet. Not sure if that's the cause of that specific problem (and not sure why you've edited out the CGO flags), but it would be a good start.
Author
Owner

@markg85 commented on GitHub (Dec 7, 2023):

@paulie-g lol 😂
Have another look at the PKGBUILD I posted a couple posts back. I am pulling that repo you mention. In fact, the error I'm getting is on a file that the @65a repo has, this one doesn't.

I changed the flags to be the same as the first post in this thread. There's no special other reason besides just mimicking what others in this thread did.

<!-- gh-comment-id:1846097471 --> @markg85 commented on GitHub (Dec 7, 2023): @paulie-g lol 😂 Have another look at the PKGBUILD I posted a couple posts back. I am pulling that repo you mention. In fact, the error I'm getting is on a file that the @65a repo has, this one doesn't. I changed the flags to be the same as the first post in this thread. There's no special other reason besides just mimicking what others in this thread did.
Author
Owner

@paulie-g commented on GitHub (Dec 8, 2023):

@markg85 Missed that, it's not me thread ;)

<!-- gh-comment-id:1846560665 --> @paulie-g commented on GitHub (Dec 8, 2023): @markg85 Missed that, it's not me thread ;)
Author
Owner

@deftdawg commented on GitHub (Dec 9, 2023):

@prawilny running your image from straight podman gives an error about /run/podman-init being missing (it's not present inside my container image; maybe it's something podman compose adds, idk)

In any case, I can get it to run by overriding endpoint and then running the CMD command... It works on my GPU (6900XT).

Here are the steps, I'm running on a NixOS host:

# Where you want it to live
export OLLAMA_HOME=${HOME}/source
[ ! -d "${OLLAMA_HOME}" ] && mkdir -p "${OLLAMA_HOME}" 
cd "${OLLAMA_HOME}"

# Clone and build -- this takes a long time :(
git clone https://github.com/prawilny/ollama-rocm-docker
cd ollama-rocm-docker/

# Pass devices to build - FIXME: use rocm-info to detect build target while building
podman build --device=/dev/kfd --device=/dev/dri . -t ollama-rocm 

# Run it passing in GPU devices and mapping port
export AI_MODEL_DIR="${OLLAMA_HOME}/ollama-models-cache/"
[ ! -d "${AI_MODEL_DIR}" ] && echo "AI Models directory (${AI_MODEL_DIR}) doesn't exist, creating.." && mkdir -p "${AI_MODEL_DIR}" && chmod 777 "${AI_MODEL_DIR}"

# FIXME: override entrypoint because /run/podman-init doesn't exist
podman run --name ollama --rm -it \
  --entrypoint /bin/bash \
  --device=/dev/kfd --device=/dev/dri \
  -p 11434:11434 \
  --mount type=bind,source="${AI_MODEL_DIR}",target=/home/rocm-user/.ollama \
  localhost/ollama-rocm:latest
# Workaround: run `OLLAMA_HOST=0.0.0.0:11434 ollama serve` at the container bash prompt

# on the host machine run `ollama run mistral`

EDIT: fixed ~/.ollama not being writable with chmod 777

<!-- gh-comment-id:1848252114 --> @deftdawg commented on GitHub (Dec 9, 2023): @prawilny running your image from straight podman gives an error about `/run/podman-init` being missing (it's not present inside my container image; maybe it's something `podman compose` adds, idk) In any case, I can get it to run by overriding endpoint and then running the CMD command... It works on my GPU (6900XT). Here are the steps, I'm running on a NixOS host: ```sh # Where you want it to live export OLLAMA_HOME=${HOME}/source [ ! -d "${OLLAMA_HOME}" ] && mkdir -p "${OLLAMA_HOME}" cd "${OLLAMA_HOME}" # Clone and build -- this takes a long time :( git clone https://github.com/prawilny/ollama-rocm-docker cd ollama-rocm-docker/ # Pass devices to build - FIXME: use rocm-info to detect build target while building podman build --device=/dev/kfd --device=/dev/dri . -t ollama-rocm # Run it passing in GPU devices and mapping port export AI_MODEL_DIR="${OLLAMA_HOME}/ollama-models-cache/" [ ! -d "${AI_MODEL_DIR}" ] && echo "AI Models directory (${AI_MODEL_DIR}) doesn't exist, creating.." && mkdir -p "${AI_MODEL_DIR}" && chmod 777 "${AI_MODEL_DIR}" # FIXME: override entrypoint because /run/podman-init doesn't exist podman run --name ollama --rm -it \ --entrypoint /bin/bash \ --device=/dev/kfd --device=/dev/dri \ -p 11434:11434 \ --mount type=bind,source="${AI_MODEL_DIR}",target=/home/rocm-user/.ollama \ localhost/ollama-rocm:latest # Workaround: run `OLLAMA_HOST=0.0.0.0:11434 ollama serve` at the container bash prompt # on the host machine run `ollama run mistral` ``` EDIT: fixed ~/.ollama not being writable with chmod 777
Author
Owner

@prawilny commented on GitHub (Dec 9, 2023):

@deftdawg, /run/podman-init is provided by podman run's --init flag.

<!-- gh-comment-id:1848617539 --> @prawilny commented on GitHub (Dec 9, 2023): @deftdawg, `/run/podman-init` is provided by `podman run`'s `--init` flag.
Author
Owner

@zevv commented on GitHub (Dec 11, 2023):

fwiw, I was able to get AMD acceleration working on the first try on my 6600:

  • Latest ROCm installed in the usual locations
  • Checked out the master branch
  • Applied @65a's patch
  • Generate and build with -tags rocm
  • Start ollama with HSA_OVERRIDE_GFX_VERSION=10.3.0 ollama serve
<!-- gh-comment-id:1849384060 --> @zevv commented on GitHub (Dec 11, 2023): fwiw, I was able to get AMD acceleration working on the first try on my 6600: - Latest ROCm installed in the usual locations - Checked out the master branch - Applied @65a's patch - Generate and build with `-tags rocm` - Start ollama with `HSA_OVERRIDE_GFX_VERSION=10.3.0 ollama serve`
Author
Owner

@markg85 commented on GitHub (Dec 11, 2023):

fwiw, I was able to get AMD acceleration working on the first try on my 6600:
...

  • Start ollama with HSA_OVERRIDE_GFX_VERSION=10.3.0 ollama serve

That trick - forcing the HSA version - is a trick that works for RDNA2 GPUs. So your 6600 gpu is perfectly happy with that.
It's not an option for RDNA3 GPUs (anything starting in the 7xxx-series). I know cause i tried.

<!-- gh-comment-id:1850144284 --> @markg85 commented on GitHub (Dec 11, 2023): > fwiw, I was able to get AMD acceleration working on the first try on my 6600: ... > * Start ollama with `HSA_OVERRIDE_GFX_VERSION=10.3.0 ollama serve` That trick - forcing the HSA version - is a trick that works for RDNA2 GPUs. So your 6600 gpu is perfectly happy with that. It's not an option for RDNA3 GPUs (anything starting in the 7xxx-series). I know cause i tried.
Author
Owner

@65a commented on GitHub (Dec 14, 2023):

Note, I will close and delete my branch (the patch in ROCm/rocBLAS#814) when ROCm/rocBLAS#1146 merges, as it effectively contains my patch. I am not intending to maintain a fork, and I actually am using a different approach than an ollama server now locally. Anyone building directly from my patch should fork or move to 1146 when it is merged.

<!-- gh-comment-id:1855056291 --> @65a commented on GitHub (Dec 14, 2023): Note, I will close and delete my branch (the patch in ROCm/rocBLAS#814) when ROCm/rocBLAS#1146 merges, as it effectively contains my patch. I am not intending to maintain a fork, and I actually am using a different approach than an ollama server now locally. Anyone building directly from my patch should fork or move to 1146 when it is merged.
Author
Owner

@paulie-g commented on GitHub (Dec 15, 2023):

I am not intending to maintain a fork

Thank you for all the work on this.

and I actually am using a different approach than an ollama server now locally. Anyone building directly from my patch should fork or move to 1146 when it is merged.

Would you mind elaborating? Curious to see what you are using instead (can be as short as you like).

<!-- gh-comment-id:1857430037 --> @paulie-g commented on GitHub (Dec 15, 2023): > I am not intending to maintain a fork Thank you for all the work on this. > and I actually am using a different approach than an ollama server now locally. Anyone building directly from my patch should fork or move to 1146 when it is merged. Would you mind elaborating? Curious to see what you are using instead (can be as short as you like).
Author
Owner

@65a commented on GitHub (Dec 18, 2023):

@paulie-g Nothing special, just a local shim linking to libllama instead, similar to where Ollama is going and what go-skynet did (both of which are probably better), just can get more opinionated (aka hardcode) things like tokenizer stack and settings, mainly this is all just for learning for me, so getting to poke around libllama and understand it is interesting. The CGO linked Ollama is looking good, so I'd definitely run that for a real use case (especially when CGO is involved, more eyes is better) and having an API abstraction can be a good thing for resiliency.

<!-- gh-comment-id:1859343231 --> @65a commented on GitHub (Dec 18, 2023): @paulie-g Nothing special, just a local shim linking to libllama instead, similar to where Ollama is going and what go-skynet did (both of which are probably better), just can get more opinionated (aka hardcode) things like tokenizer stack and settings, mainly this is all just for learning for me, so getting to poke around libllama and understand it is interesting. The CGO linked Ollama is looking good, so I'd definitely run that for a real use case (especially when CGO is involved, more eyes is better) and having an API abstraction can be a good thing for resiliency.
Author
Owner

@Wintoplay commented on GitHub (Dec 19, 2023):

Thank to @deadmeu Ollama run on my Amd GPU. However, Shader clock stuck at 125% as long as the terminal run the serving model. Is that ok? or how can I fix it?

Is it because I use rocm 5.7.0?

<!-- gh-comment-id:1861997469 --> @Wintoplay commented on GitHub (Dec 19, 2023): Thank to @deadmeu Ollama run on my Amd GPU. However, Shader clock stuck at 125% as long as the terminal run the serving model. Is that ok? or how can I fix it? Is it because I use rocm 5.7.0?
Author
Owner

@ReOT20 commented on GitHub (Dec 19, 2023):

Tried building on Ubuntu 22.04 LTS with ROCm 5.4.2 and RX 5700 XT masked as gfx1030. I am getting this after lots of warnings while trying to build dependencies: https://pastebin.com/VWEaBMqG

Exactly the same thing with ROCm/rocBLAS#1146

<!-- gh-comment-id:1863351358 --> @ReOT20 commented on GitHub (Dec 19, 2023): Tried building on Ubuntu 22.04 LTS with ROCm 5.4.2 and RX 5700 XT masked as gfx1030. I am getting this after lots of warnings while trying to build dependencies: https://pastebin.com/VWEaBMqG Exactly the same thing with ROCm/rocBLAS#1146
Author
Owner

@ignacio82 commented on GitHub (Dec 24, 2023):

I finally got around creating the image using this Dockerfile but I don't think it is using my GPU, or at least I don' t see any activity on radeontop. Any idea for why or how to debug this?

Another thing I noticed is that the webui does not work anymore.

image

How can I fix that?

<!-- gh-comment-id:1868392714 --> @ignacio82 commented on GitHub (Dec 24, 2023): I finally got around creating the image using [this Dockerfile](https://github.com/ignacio82/ollama-rocm-docker/blob/master/Dockerfile) but I don't think it is using my GPU, or at least I don' t see any activity on radeontop. Any idea for why or how to debug this? Another thing I noticed is that the webui does not work anymore. ![image](https://github.com/jmorganca/ollama/assets/1833309/bbe6afcd-7d86-4b80-88bf-dc6030ff553b) How can I fix that?
Author
Owner

@deftdawg commented on GitHub (Dec 24, 2023):

What card are you using? What command line to start?

Webui doesn't work because ollama picks random port numbers inside your container that you didn't publish at container start time... not yet sure how to fix that (I assume there's a variable or option somewhere).

<!-- gh-comment-id:1868401734 --> @deftdawg commented on GitHub (Dec 24, 2023): What card are you using? What command line to start? Webui doesn't work because ollama picks random port numbers inside your container that you didn't publish at container start time... not yet sure how to fix that (I assume there's a variable or option somewhere).
Author
Owner

@ignacio82 commented on GitHub (Dec 24, 2023):

AMD Ryzen 9 6900HX(Up to 4.9GHz), Radeon 680M Graphics,8C/16T Micro Computer, 32GB DDR5 512GB PCIe4.0 SSD

Thanks for the help!

<!-- gh-comment-id:1868403945 --> @ignacio82 commented on GitHub (Dec 24, 2023): AMD Ryzen 9 6900HX(Up to 4.9GHz), Radeon 680M Graphics,8C/16T Micro Computer, 32GB DDR5 512GB PCIe4.0 SSD Thanks for the help!
Author
Owner

@andy-shi88 commented on GitHub (Dec 26, 2023):

I'm on

gpu: rx6700 xt
os: ubuntu 22.04
rocm

I build the binary from main branch
I'm getting
Error: could not connect to ollama server, run 'ollama serve' to start it when I run ollama run mistral
This is the log in journalctl -ru ollama looks like it crashed and restarted the service at some point when Lazy loading ...

Dec 26 11:10:06 andy-ubuntu sh[29121]: 2023/12/26 11:10:06 images.go:835: total unused blobs removed: 0
Dec 26 11:10:06 andy-ubuntu sh[29121]: 2023/12/26 11:10:06 images.go:828: total blobs: 8
Dec 26 11:10:06 andy-ubuntu systemd[1]: Started Ollama Service.
Dec 26 11:10:06 andy-ubuntu systemd[1]: Stopped Ollama Service.
Dec 26 11:10:06 andy-ubuntu systemd[1]: ollama.service: Scheduled restart job, restart counter is at 1.
Dec 26 11:10:03 andy-ubuntu systemd[1]: ollama.service: Failed with result 'core-dump'.
Dec 26 11:10:03 andy-ubuntu systemd[1]: ollama.service: Main process exited, code=dumped, status=6/ABRT
Dec 26 11:10:03 andy-ubuntu sh[28950]: Lazy loading /tmp/ollama3679635750/librocm_server.so library
Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx942.dat"
Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat"
Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat"
Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat"
Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat"
Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx940.dat"
Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat"
Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat"
Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat"
Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat"
Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx941.dat"
Dec 26 11:10:03 andy-ubuntu sh[28950]:  List of available TensileLibrary Files :
Dec 26 11:10:03 andy-ubuntu sh[28950]: rocBLAS error: Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1031
Dec 26 11:10:03 andy-ubuntu sh[28950]: [1703560203] disabling verbose llm logging
Dec 26 11:10:03 andy-ubuntu sh[28950]: Failed to open logfile 'llama.log' with error 'Permission denied'
Dec 26 11:10:03 andy-ubuntu sh[28950]: 2023/12/26 11:10:03 ext_server.go:189: Initializing internal llama server
Dec 26 11:10:03 andy-ubuntu sh[28950]: 2023/12/26 11:10:03 gpu.go:131: 11531 MB VRAM available, loading up to 70 ROCM GPU layers out of 32
Dec 26 11:10:03 andy-ubuntu sh[28950]: 2023/12/26 11:10:03 shim_ext_server.go:94: Loading Dynamic Shim llm server: /tmp/ollama3679635750/librocm_server.so
Dec 26 11:10:03 andy-ubuntu sh[28950]: [GIN] 2023/12/26 - 11:10:03 | 200 |       429.4µs |       127.0.0.1 | POST     "/api/show"
Dec 26 11:10:03 andy-ubuntu sh[28950]: [GIN] 2023/12/26 - 11:10:03 | 200 |     574.229µs |       127.0.0.1 | POST     "/api/show"
Dec 26 11:10:03 andy-ubuntu sh[28950]: [GIN] 2023/12/26 - 11:10:03 | 200 |      48.448µs |       127.0.0.1 | HEAD     "/"
Dec 26 11:09:54 andy-ubuntu sh[28950]: 2023/12/26 11:09:54 gpu.go:47: Radeon GPU detected
Dec 26 11:09:54 andy-ubuntu sh[28950]: 2023/12/26 11:09:54 gpu.go:38: CUDA not detected: Unable to load libnvidia-ml.so library to query for Nvidia GPUs: /usr/lib/wsl/lib/libnvidia-ml.so.1: cannot open shared object file: No such file or >
Dec 26 11:09:54 andy-ubuntu sh[28950]: 2023/12/26 11:09:54 gpu.go:33: Detecting GPU type

is there some flag I need to set for this to work?

edit:
I'm able to run it if I run ollama serve then ollama run mistral
but fail if ollama serve is ran from systemd service

<!-- gh-comment-id:1869229885 --> @andy-shi88 commented on GitHub (Dec 26, 2023): I'm on ``` gpu: rx6700 xt os: ubuntu 22.04 rocm ``` I build the binary from `main` branch I'm getting `Error: could not connect to ollama server, run 'ollama serve' to start it` when I run `ollama run mistral` This is the log in `journalctl -ru ollama` looks like it crashed and restarted the service at some point when `Lazy loading ...` ``` Dec 26 11:10:06 andy-ubuntu sh[29121]: 2023/12/26 11:10:06 images.go:835: total unused blobs removed: 0 Dec 26 11:10:06 andy-ubuntu sh[29121]: 2023/12/26 11:10:06 images.go:828: total blobs: 8 Dec 26 11:10:06 andy-ubuntu systemd[1]: Started Ollama Service. Dec 26 11:10:06 andy-ubuntu systemd[1]: Stopped Ollama Service. Dec 26 11:10:06 andy-ubuntu systemd[1]: ollama.service: Scheduled restart job, restart counter is at 1. Dec 26 11:10:03 andy-ubuntu systemd[1]: ollama.service: Failed with result 'core-dump'. Dec 26 11:10:03 andy-ubuntu systemd[1]: ollama.service: Main process exited, code=dumped, status=6/ABRT Dec 26 11:10:03 andy-ubuntu sh[28950]: Lazy loading /tmp/ollama3679635750/librocm_server.so library Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx942.dat" Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat" Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat" Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat" Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat" Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx940.dat" Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat" Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat" Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat" Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat" Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx941.dat" Dec 26 11:10:03 andy-ubuntu sh[28950]: List of available TensileLibrary Files : Dec 26 11:10:03 andy-ubuntu sh[28950]: rocBLAS error: Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1031 Dec 26 11:10:03 andy-ubuntu sh[28950]: [1703560203] disabling verbose llm logging Dec 26 11:10:03 andy-ubuntu sh[28950]: Failed to open logfile 'llama.log' with error 'Permission denied' Dec 26 11:10:03 andy-ubuntu sh[28950]: 2023/12/26 11:10:03 ext_server.go:189: Initializing internal llama server Dec 26 11:10:03 andy-ubuntu sh[28950]: 2023/12/26 11:10:03 gpu.go:131: 11531 MB VRAM available, loading up to 70 ROCM GPU layers out of 32 Dec 26 11:10:03 andy-ubuntu sh[28950]: 2023/12/26 11:10:03 shim_ext_server.go:94: Loading Dynamic Shim llm server: /tmp/ollama3679635750/librocm_server.so Dec 26 11:10:03 andy-ubuntu sh[28950]: [GIN] 2023/12/26 - 11:10:03 | 200 | 429.4µs | 127.0.0.1 | POST "/api/show" Dec 26 11:10:03 andy-ubuntu sh[28950]: [GIN] 2023/12/26 - 11:10:03 | 200 | 574.229µs | 127.0.0.1 | POST "/api/show" Dec 26 11:10:03 andy-ubuntu sh[28950]: [GIN] 2023/12/26 - 11:10:03 | 200 | 48.448µs | 127.0.0.1 | HEAD "/" Dec 26 11:09:54 andy-ubuntu sh[28950]: 2023/12/26 11:09:54 gpu.go:47: Radeon GPU detected Dec 26 11:09:54 andy-ubuntu sh[28950]: 2023/12/26 11:09:54 gpu.go:38: CUDA not detected: Unable to load libnvidia-ml.so library to query for Nvidia GPUs: /usr/lib/wsl/lib/libnvidia-ml.so.1: cannot open shared object file: No such file or > Dec 26 11:09:54 andy-ubuntu sh[28950]: 2023/12/26 11:09:54 gpu.go:33: Detecting GPU type ``` is there some flag I need to set for this to work? edit: I'm able to run it if I run `ollama serve` then `ollama run mistral` but fail if `ollama serve` is ran from systemd service
Author
Owner

@andy-shi88 commented on GitHub (Dec 26, 2023):

I'm on

gpu: rx6700 xt
os: ubuntu 22.04
rocm

I build the binary from main branch I'm getting Error: could not connect to ollama server, run 'ollama serve' to start it when I run ollama run mistral This is the log in journalctl -ru ollama looks like it crashed and restarted the service at some point when Lazy loading ...

Dec 26 11:10:06 andy-ubuntu sh[29121]: 2023/12/26 11:10:06 images.go:835: total unused blobs removed: 0
Dec 26 11:10:06 andy-ubuntu sh[29121]: 2023/12/26 11:10:06 images.go:828: total blobs: 8
Dec 26 11:10:06 andy-ubuntu systemd[1]: Started Ollama Service.
Dec 26 11:10:06 andy-ubuntu systemd[1]: Stopped Ollama Service.
Dec 26 11:10:06 andy-ubuntu systemd[1]: ollama.service: Scheduled restart job, restart counter is at 1.
Dec 26 11:10:03 andy-ubuntu systemd[1]: ollama.service: Failed with result 'core-dump'.
Dec 26 11:10:03 andy-ubuntu systemd[1]: ollama.service: Main process exited, code=dumped, status=6/ABRT
Dec 26 11:10:03 andy-ubuntu sh[28950]: Lazy loading /tmp/ollama3679635750/librocm_server.so library
Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx942.dat"
Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat"
Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat"
Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat"
Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat"
Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx940.dat"
Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat"
Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat"
Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat"
Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat"
Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx941.dat"
Dec 26 11:10:03 andy-ubuntu sh[28950]:  List of available TensileLibrary Files :
Dec 26 11:10:03 andy-ubuntu sh[28950]: rocBLAS error: Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1031
Dec 26 11:10:03 andy-ubuntu sh[28950]: [1703560203] disabling verbose llm logging
Dec 26 11:10:03 andy-ubuntu sh[28950]: Failed to open logfile 'llama.log' with error 'Permission denied'
Dec 26 11:10:03 andy-ubuntu sh[28950]: 2023/12/26 11:10:03 ext_server.go:189: Initializing internal llama server
Dec 26 11:10:03 andy-ubuntu sh[28950]: 2023/12/26 11:10:03 gpu.go:131: 11531 MB VRAM available, loading up to 70 ROCM GPU layers out of 32
Dec 26 11:10:03 andy-ubuntu sh[28950]: 2023/12/26 11:10:03 shim_ext_server.go:94: Loading Dynamic Shim llm server: /tmp/ollama3679635750/librocm_server.so
Dec 26 11:10:03 andy-ubuntu sh[28950]: [GIN] 2023/12/26 - 11:10:03 | 200 |       429.4µs |       127.0.0.1 | POST     "/api/show"
Dec 26 11:10:03 andy-ubuntu sh[28950]: [GIN] 2023/12/26 - 11:10:03 | 200 |     574.229µs |       127.0.0.1 | POST     "/api/show"
Dec 26 11:10:03 andy-ubuntu sh[28950]: [GIN] 2023/12/26 - 11:10:03 | 200 |      48.448µs |       127.0.0.1 | HEAD     "/"
Dec 26 11:09:54 andy-ubuntu sh[28950]: 2023/12/26 11:09:54 gpu.go:47: Radeon GPU detected
Dec 26 11:09:54 andy-ubuntu sh[28950]: 2023/12/26 11:09:54 gpu.go:38: CUDA not detected: Unable to load libnvidia-ml.so library to query for Nvidia GPUs: /usr/lib/wsl/lib/libnvidia-ml.so.1: cannot open shared object file: No such file or >
Dec 26 11:09:54 andy-ubuntu sh[28950]: 2023/12/26 11:09:54 gpu.go:33: Detecting GPU type

is there some flag I need to set for this to work?

edit: I'm able to run it if I run ollama serve then ollama run mistral but fail if ollama serve is ran from systemd service

oh just realized I set HSA_OVERRIDE_GFX_VERSION=10.3.0 in my ~/.zshrc and I just need to set Environment=" HSA_OVERRIDE_GFX_VERSION=10.3.0" in the service file.
It works perfectly now on my rx 6700 xt, Thank you!

<!-- gh-comment-id:1869265294 --> @andy-shi88 commented on GitHub (Dec 26, 2023): > I'm on > > ``` > gpu: rx6700 xt > os: ubuntu 22.04 > rocm > ``` > > I build the binary from `main` branch I'm getting `Error: could not connect to ollama server, run 'ollama serve' to start it` when I run `ollama run mistral` This is the log in `journalctl -ru ollama` looks like it crashed and restarted the service at some point when `Lazy loading ...` > > ``` > Dec 26 11:10:06 andy-ubuntu sh[29121]: 2023/12/26 11:10:06 images.go:835: total unused blobs removed: 0 > Dec 26 11:10:06 andy-ubuntu sh[29121]: 2023/12/26 11:10:06 images.go:828: total blobs: 8 > Dec 26 11:10:06 andy-ubuntu systemd[1]: Started Ollama Service. > Dec 26 11:10:06 andy-ubuntu systemd[1]: Stopped Ollama Service. > Dec 26 11:10:06 andy-ubuntu systemd[1]: ollama.service: Scheduled restart job, restart counter is at 1. > Dec 26 11:10:03 andy-ubuntu systemd[1]: ollama.service: Failed with result 'core-dump'. > Dec 26 11:10:03 andy-ubuntu systemd[1]: ollama.service: Main process exited, code=dumped, status=6/ABRT > Dec 26 11:10:03 andy-ubuntu sh[28950]: Lazy loading /tmp/ollama3679635750/librocm_server.so library > Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx942.dat" > Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat" > Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat" > Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat" > Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat" > Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx940.dat" > Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat" > Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat" > Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat" > Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat" > Dec 26 11:10:03 andy-ubuntu sh[28950]: "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx941.dat" > Dec 26 11:10:03 andy-ubuntu sh[28950]: List of available TensileLibrary Files : > Dec 26 11:10:03 andy-ubuntu sh[28950]: rocBLAS error: Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1031 > Dec 26 11:10:03 andy-ubuntu sh[28950]: [1703560203] disabling verbose llm logging > Dec 26 11:10:03 andy-ubuntu sh[28950]: Failed to open logfile 'llama.log' with error 'Permission denied' > Dec 26 11:10:03 andy-ubuntu sh[28950]: 2023/12/26 11:10:03 ext_server.go:189: Initializing internal llama server > Dec 26 11:10:03 andy-ubuntu sh[28950]: 2023/12/26 11:10:03 gpu.go:131: 11531 MB VRAM available, loading up to 70 ROCM GPU layers out of 32 > Dec 26 11:10:03 andy-ubuntu sh[28950]: 2023/12/26 11:10:03 shim_ext_server.go:94: Loading Dynamic Shim llm server: /tmp/ollama3679635750/librocm_server.so > Dec 26 11:10:03 andy-ubuntu sh[28950]: [GIN] 2023/12/26 - 11:10:03 | 200 | 429.4µs | 127.0.0.1 | POST "/api/show" > Dec 26 11:10:03 andy-ubuntu sh[28950]: [GIN] 2023/12/26 - 11:10:03 | 200 | 574.229µs | 127.0.0.1 | POST "/api/show" > Dec 26 11:10:03 andy-ubuntu sh[28950]: [GIN] 2023/12/26 - 11:10:03 | 200 | 48.448µs | 127.0.0.1 | HEAD "/" > Dec 26 11:09:54 andy-ubuntu sh[28950]: 2023/12/26 11:09:54 gpu.go:47: Radeon GPU detected > Dec 26 11:09:54 andy-ubuntu sh[28950]: 2023/12/26 11:09:54 gpu.go:38: CUDA not detected: Unable to load libnvidia-ml.so library to query for Nvidia GPUs: /usr/lib/wsl/lib/libnvidia-ml.so.1: cannot open shared object file: No such file or > > Dec 26 11:09:54 andy-ubuntu sh[28950]: 2023/12/26 11:09:54 gpu.go:33: Detecting GPU type > ``` > > is there some flag I need to set for this to work? > > edit: I'm able to run it if I run `ollama serve` then `ollama run mistral` but fail if `ollama serve` is ran from systemd service oh just realized I set ` HSA_OVERRIDE_GFX_VERSION=10.3.0` in my `~/.zshrc` and I just need to set `Environment=" HSA_OVERRIDE_GFX_VERSION=10.3.0"` in the service file. It works perfectly now on my rx 6700 xt, Thank you!
Author
Owner

@oatmealm commented on GitHub (Dec 30, 2023):

I'm having similiar issues on fedora 38:

[GIN] 2023/12/30 - 08:44:14 | 200 |      55.523µs |       127.0.0.1 | HEAD     "/"
[GIN] 2023/12/30 - 08:44:14 | 200 |     561.942µs |       127.0.0.1 | POST     "/api/show"
[GIN] 2023/12/30 - 08:44:14 | 200 |     298.571µs |       127.0.0.1 | POST     "/api/show"
Lazy loading /tmp/ollama756781052/librocm_server.so library
2023/12/30 08:44:14 llm.go:98: Failed to load dynamic library - falling back to CPU mode Unable to load dynamic library: Unable to load dynamic server library: /opt/rocm/lib/libamdhip64.so.6: version `hip_6.0' not found (required by /tmp/ollama7567
2023/12/30 08:44:14 gpu.go:131: 83 MB VRAM available, loading up to 0 ROCM GPU layers out of 32
2023/12/30 08:44:14 ext_server.go:189: Initializing internal llama server
disabling verbose llm logging

Note the /opt/rocm/lib/libamdhip64.so.6: version hip_6.0' not found` I've tried symlinking, this dynamic lib and others it was complaining about, but no luck.

ls -al libamdhip64.*
lrwxrwxrwx. 1 root root       16 May 15  2023 libamdhip64.so -> libamdhip64.so.5
lrwxrwxrwx. 1 root root       24 May 15  2023 libamdhip64.so.5 -> libamdhip64.so.5.5.50501
-rwxr-xr-x. 1 root root 27276184 May 15  2023 libamdhip64.so.5.5.50501
lrwxrwxrwx. 1 root root       24 Dec 30 08:44 libamdhip64.so.6 -> libamdhip64.so.5.5.50501

rocminfo, rocm-smi etc. are all available on the path and working:

rocminfo 
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 9 5900HX with Radeon Graphics
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 9 5900HX with Radeon Graphics
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   4680                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            16                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    32260988(0x1ec437c) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    32260988(0x1ec437c) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    32260988(0x1ec437c) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx90c                             
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon Graphics                
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      1024(0x400) KB                     
  Chip ID:                 5688(0x1638)                       
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2100                               
  BDFID:                   1024                               
  Internal Node ID:        1                                  
  Compute Unit:            8                                  
  SIMDs per CU:            4                                  
  Shader Engines:          1                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    524288(0x80000) KB                 
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx90c:xnack-   
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***  
rocm-smi 


======================= ROCm System Management Interface =======================
================================= Concise Info =================================
GPU[0]		: Not supported on the given system
ERROR: GPU[0]	: sclk clock is unsupported
================================================================================
GPU[0]		: Not supported on the given system
GPU  Temp (DieEdge)  AvgPwr  SCLK  MCLK     Fan  Perf  PwrCap       VRAM%  GPU%  
0    43.0c           N/A     None  1600Mhz  0%   auto  Unsupported   84%   2%    
================================================================================
============================= End of ROCm SMI Log ==============================

image

<!-- gh-comment-id:1872476550 --> @oatmealm commented on GitHub (Dec 30, 2023): I'm having similiar issues on fedora 38: ``` [GIN] 2023/12/30 - 08:44:14 | 200 | 55.523µs | 127.0.0.1 | HEAD "/" [GIN] 2023/12/30 - 08:44:14 | 200 | 561.942µs | 127.0.0.1 | POST "/api/show" [GIN] 2023/12/30 - 08:44:14 | 200 | 298.571µs | 127.0.0.1 | POST "/api/show" Lazy loading /tmp/ollama756781052/librocm_server.so library 2023/12/30 08:44:14 llm.go:98: Failed to load dynamic library - falling back to CPU mode Unable to load dynamic library: Unable to load dynamic server library: /opt/rocm/lib/libamdhip64.so.6: version `hip_6.0' not found (required by /tmp/ollama7567 2023/12/30 08:44:14 gpu.go:131: 83 MB VRAM available, loading up to 0 ROCM GPU layers out of 32 2023/12/30 08:44:14 ext_server.go:189: Initializing internal llama server disabling verbose llm logging ``` Note the `/opt/rocm/lib/libamdhip64.so.6: version `hip_6.0' not found` I've tried symlinking, this dynamic lib and others it was complaining about, but no luck. ``` ls -al libamdhip64.* lrwxrwxrwx. 1 root root 16 May 15 2023 libamdhip64.so -> libamdhip64.so.5 lrwxrwxrwx. 1 root root 24 May 15 2023 libamdhip64.so.5 -> libamdhip64.so.5.5.50501 -rwxr-xr-x. 1 root root 27276184 May 15 2023 libamdhip64.so.5.5.50501 lrwxrwxrwx. 1 root root 24 Dec 30 08:44 libamdhip64.so.6 -> libamdhip64.so.5.5.50501 ``` rocminfo, rocm-smi etc. are all available on the path and working: ``` rocminfo ROCk module is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.1 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE ========== HSA Agents ========== ******* Agent 1 ******* Name: AMD Ryzen 9 5900HX with Radeon Graphics Uuid: CPU-XX Marketing Name: AMD Ryzen 9 5900HX with Radeon Graphics Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 4680 BDFID: 0 Internal Node ID: 0 Compute Unit: 16 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 32260988(0x1ec437c) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 32260988(0x1ec437c) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 32260988(0x1ec437c) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: ******* Agent 2 ******* Name: gfx90c Uuid: GPU-XX Marketing Name: AMD Radeon Graphics Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 1024(0x400) KB Chip ID: 5688(0x1638) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 2100 BDFID: 1024 Internal Node ID: 1 Compute Unit: 8 SIMDs per CU: 4 Shader Engines: 1 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 40(0x28) Max Work-item Per CU: 2560(0xa00) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 524288(0x80000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx90c:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 *** Done *** ``` ``` rocm-smi ======================= ROCm System Management Interface ======================= ================================= Concise Info ================================= GPU[0] : Not supported on the given system ERROR: GPU[0] : sclk clock is unsupported ================================================================================ GPU[0] : Not supported on the given system GPU Temp (DieEdge) AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 0 43.0c N/A None 1600Mhz 0% auto Unsupported 84% 2% ================================================================================ ============================= End of ROCm SMI Log ============================== ``` ![image](https://github.com/jmorganca/ollama/assets/68159077/71fba926-cb0d-4d9e-b75e-b5260ea1e727)
Author
Owner

@light-on-shadow commented on GitHub (Dec 30, 2023):

@oatmealm I'm on Fedora 39, the only way I could make it all work is by purging anything ROCM from prior trials, adding version 5.6 from the RHEL 9.3 official repo, then installing the ROCM SDK from there.

repo.radeon.com_rocm_rhel9_5.6_main_.repo

Fedora 39
6950XT

<!-- gh-comment-id:1872487649 --> @light-on-shadow commented on GitHub (Dec 30, 2023): @oatmealm I'm on Fedora 39, the only way I could make it all work is by purging anything ROCM from prior trials, adding version 5.6 from the RHEL 9.3 official repo, then installing the ROCM SDK from there. >repo.radeon.com_rocm_rhel9_5.6_main_.repo Fedora 39 6950XT
Author
Owner

@oatmealm commented on GitHub (Dec 30, 2023):

@light-on-shadow oh good. I was avoiding upgrading since I thought 39 was not supported... didn't know rhel 9.3 is compatible with f39

<!-- gh-comment-id:1872502712 --> @oatmealm commented on GitHub (Dec 30, 2023): @light-on-shadow oh good. I was avoiding upgrading since I thought 39 was not supported... didn't know rhel 9.3 is compatible with f39
Author
Owner

@light-on-shadow commented on GitHub (Dec 30, 2023):

@oatmealm RHEL 9 is based on CentOS Stream 9, which is based on Fedora 34.
ROCM is officially supported on Ubuntu, RHEL and OpenSUSE, so anything you do outside of the requirements is with a caveat anyway.

<!-- gh-comment-id:1872503847 --> @light-on-shadow commented on GitHub (Dec 30, 2023): @oatmealm RHEL 9 is based on CentOS Stream 9, which is based on Fedora 34. ROCM is officially supported on Ubuntu, RHEL and OpenSUSE, so anything you do outside of the requirements is with a caveat anyway.
Author
Owner

@jayk commented on GitHub (Jan 1, 2024):

Question! I have ollama running on my Manjaro linux machine with a 7900xtx and it works great. (git version as of 2023-12-25) However, I notice that once I run any model, the GPU stays pegged, consuming ~135 watts, even if I am no longer running any model.

I would have expected that once it's not running a model anymore, it would drop, but it doesn't. It doesn't drop until I kill ollama serve altogether.

Is this expected behavior with ollama or is it something specific to rocm support?

<!-- gh-comment-id:1873520610 --> @jayk commented on GitHub (Jan 1, 2024): Question! I have ollama running on my Manjaro linux machine with a 7900xtx and it works great. (git version as of 2023-12-25) However, I notice that once I run any model, the GPU stays pegged, consuming ~135 watts, even if I am no longer running any model. I would have expected that once it's not running a model anymore, it would drop, but it doesn't. It doesn't drop until I kill `ollama serve` altogether. Is this expected behavior with ollama or is it something specific to rocm support?
Author
Owner

@65a commented on GitHub (Jan 2, 2024):

Run Ollama with GPU_MAX_HW_QUEUES=1 /path/to/ollama, or otherwise set it in Ollama's environment. This bug is upstream of Ollama, and has to do with how HIP works vs CUDA. It would be fine for Ollama to add code like os.Setenv("GPU_MAX_HW_QUEUES","1") before calling into C code, as this solves the issue as well without the user having to do anything. The best solution is probably at the HIP layer, or less ideally, some ifdefs in llama.cpp or something.

This took my W7900 from a 99W idle to 18W. Graphics clocks will drop when not inferencing as expected (it's not clocked as hard by default as a 7900XTX). I couldn't determine any performance impact, still seeing 60tok/s on short prompts with 7b mistral.

<!-- gh-comment-id:1873662929 --> @65a commented on GitHub (Jan 2, 2024): Run Ollama with `GPU_MAX_HW_QUEUES=1 /path/to/ollama`, or otherwise set it in Ollama's environment. This bug is upstream of Ollama, and has to do with how HIP works vs CUDA. It would be fine for Ollama to add code like `os.Setenv("GPU_MAX_HW_QUEUES","1")` before calling into C code, as this solves the issue as well without the user having to do anything. The best solution is probably at the HIP layer, or less ideally, some ifdefs in llama.cpp or something. This took my W7900 from a 99W idle to 18W. Graphics clocks will drop when not inferencing as expected (it's not clocked as hard by default as a 7900XTX). I couldn't determine any performance impact, still seeing 60tok/s on short prompts with 7b mistral.
Author
Owner

@oatmealm commented on GitHub (Jan 2, 2024):

@oatmealm I'm on Fedora 39, the only way I could make it all work is by purging anything ROCM from prior trials, adding version 5.6 from the RHEL 9.3 official repo, then installing the ROCM SDK from there.

repo.radeon.com_rocm_rhel9_5.6_main_.repo

Fedora 39 6950XT

Trying to install from the RHEL 9.3 repo but I'm not sure how to set it up. Are you installing from the script or package manager? From script I'm getting [in F39]

sudo amdgpu-install rocm
AMDGPU 6.0 repository                                              887  B/s | 548  B     00:00    
Errors during downloading metadata for repository 'amdgpu':
  - Status code: 404 for https://repo.radeon.com/amdgpu/6.0/rhel//main/x86_64/repodata/repomd.xml (IP: 13.82.220.49)
Error: Failed to download metadata for repo 'amdgpu': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried
Ignoring repositories: amdgpu
Last metadata expiration check: 0:03:37 ago on Tue 02 Jan 2024 08:19:51 AM CET.
No match for argument: amdgpu-lib
No match for argument: amdgpu-dkms
Error: Unable to find a match: amdgpu-lib amdgpu-dkms
<!-- gh-comment-id:1873692098 --> @oatmealm commented on GitHub (Jan 2, 2024): > @oatmealm I'm on Fedora 39, the only way I could make it all work is by purging anything ROCM from prior trials, adding version 5.6 from the RHEL 9.3 official repo, then installing the ROCM SDK from there. > > > repo.radeon.com_rocm_rhel9_5.6_main_.repo > > Fedora 39 6950XT Trying to install from the RHEL 9.3 repo but I'm not sure how to set it up. Are you installing from the script or package manager? From script I'm getting [in F39] ``` sudo amdgpu-install rocm AMDGPU 6.0 repository 887 B/s | 548 B 00:00 Errors during downloading metadata for repository 'amdgpu': - Status code: 404 for https://repo.radeon.com/amdgpu/6.0/rhel//main/x86_64/repodata/repomd.xml (IP: 13.82.220.49) Error: Failed to download metadata for repo 'amdgpu': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried Ignoring repositories: amdgpu Last metadata expiration check: 0:03:37 ago on Tue 02 Jan 2024 08:19:51 AM CET. No match for argument: amdgpu-lib No match for argument: amdgpu-dkms Error: Unable to find a match: amdgpu-lib amdgpu-dkms ```
Author
Owner

@light-on-shadow commented on GitHub (Jan 2, 2024):

Trying to install from the RHEL 9.3 repo but I'm not sure how to set it up. Are you installing from the script or package manager? From script I'm getting [in F39]

Yes from the package manager : sudo dnf install rocm-hip-sdk[...]

<!-- gh-comment-id:1873876938 --> @light-on-shadow commented on GitHub (Jan 2, 2024): > Trying to install from the RHEL 9.3 repo but I'm not sure how to set it up. Are you installing from the script or package manager? From script I'm getting [in F39] Yes from the package manager : sudo dnf install rocm-hip-sdk[...]
Author
Owner

@oatmealm commented on GitHub (Jan 2, 2024):

And which version of the amdgpu-dkms I should install? Only 6.0 seems available for rhel 9.3 and the installation fails for me with some error in the install script.

<!-- gh-comment-id:1873945194 --> @oatmealm commented on GitHub (Jan 2, 2024): And which version of the amdgpu-dkms I should install? Only 6.0 seems available for rhel 9.3 and the installation fails for me with some error in the install script.
Author
Owner

@jayk commented on GitHub (Jan 2, 2024):

Run Ollama with GPU_MAX_HW_QUEUES=1 /path/to/ollama, or otherwise set it in Ollama's environment. This bug is upstream of Ollama, and has to do with how HIP works vs CUDA. It would be fine for Ollama to add code like os.Setenv("GPU_MAX_HW_QUEUES","1") before calling into C code, as this solves the issue as well without the user having to do anything. The best solution is probably at the HIP layer, or less ideally, some ifdefs in llama.cpp or something.

This worked great. Thank you. I set the variable and now I'm down to 15W which is much happier than the 125W I was pegged at before. And my room is cooler. 😉 Thanks!!!

<!-- gh-comment-id:1874458083 --> @jayk commented on GitHub (Jan 2, 2024): > Run Ollama with `GPU_MAX_HW_QUEUES=1 /path/to/ollama`, or otherwise set it in Ollama's environment. This bug is upstream of Ollama, and has to do with how HIP works vs CUDA. It would be fine for Ollama to add code like `os.Setenv("GPU_MAX_HW_QUEUES","1")` before calling into C code, as this solves the issue as well without the user having to do anything. The best solution is probably at the HIP layer, or less ideally, some ifdefs in llama.cpp or something. This worked great. Thank you. I set the variable and now I'm down to 15W which is much happier than the 125W I was pegged at before. And my room is cooler. :wink: Thanks!!!
Author
Owner

@oatmealm commented on GitHub (Jan 4, 2024):

Made some progress with my setup on Fedora 39 and rocm 5.6 it started compiling but ends like this:

[ 62%] Built target llama
[ 87%] Built target llava
[100%] Built target llava_static
+ gcc -fPIC -g -shared -o gguf/build/lib/librocm_server.so -Wl,--whole-archive gguf/build/rocm/examples/server/libext_server.a gguf/build/rocm/common/libcommon.a gguf/build/rocm/libllama.a -Wl,--no-whole-archive -lrt -lpthread -ldl -lstdc++ -lm -L/opt/rocm/lib -L/opt/amdgpu/lib/x86_64-linux-gnu/ -Wl,-rpath,/opt/rocm/lib,-rpath,/opt/amdgpu/lib/x86_64-linux-gnu/ -lhipblas -lrocblas -lamdhip64 -lrocsolver -lamd_comgr -lhsa-runtime64 -lrocsparse -ldrm -ldrm_amdgpu
/usr/bin/ld: cannot find -lamdhip64: No such file or directory
/usr/bin/ld: cannot find -lrocsolver: No such file or directory
/usr/bin/ld: cannot find -lrocsparse: No such file or directory

Checking the libraries I see, etc. Any idea what could be the problem?

ldconfig  -p | grep rocsparse
	librocsparse.so.0 (libc6,x86-64) => /opt/rocm-5.6.0/lib/librocsparse.so.0
	librocsparse.so (libc6,x86-64) => /opt/rocm-5.6.0/lib/librocsparse.so

<!-- gh-comment-id:1877691418 --> @oatmealm commented on GitHub (Jan 4, 2024): Made some progress with my setup on Fedora 39 and rocm 5.6 it started compiling but ends like this: ``` [ 62%] Built target llama [ 87%] Built target llava [100%] Built target llava_static + gcc -fPIC -g -shared -o gguf/build/lib/librocm_server.so -Wl,--whole-archive gguf/build/rocm/examples/server/libext_server.a gguf/build/rocm/common/libcommon.a gguf/build/rocm/libllama.a -Wl,--no-whole-archive -lrt -lpthread -ldl -lstdc++ -lm -L/opt/rocm/lib -L/opt/amdgpu/lib/x86_64-linux-gnu/ -Wl,-rpath,/opt/rocm/lib,-rpath,/opt/amdgpu/lib/x86_64-linux-gnu/ -lhipblas -lrocblas -lamdhip64 -lrocsolver -lamd_comgr -lhsa-runtime64 -lrocsparse -ldrm -ldrm_amdgpu /usr/bin/ld: cannot find -lamdhip64: No such file or directory /usr/bin/ld: cannot find -lrocsolver: No such file or directory /usr/bin/ld: cannot find -lrocsparse: No such file or directory ``` Checking the libraries I see, etc. Any idea what could be the problem? ``` ldconfig -p | grep rocsparse librocsparse.so.0 (libc6,x86-64) => /opt/rocm-5.6.0/lib/librocsparse.so.0 librocsparse.so (libc6,x86-64) => /opt/rocm-5.6.0/lib/librocsparse.so ```
Author
Owner

@light-on-shadow commented on GitHub (Jan 4, 2024):

I installed this I believe rocm-hip-sdk5.6.0-5.6.0.50600-67.el9.x86_64

When I run a locate on that lib on my setup I get this :

/opt/rocm-5.6.0/lib/librocsparse.so
/opt/rocm-5.6.0/lib/librocsparse.so.0
/opt/rocm-5.6.0/lib/librocsparse.so.0.1.50600
/opt/rocm-5.6.0/rocsparse/lib/librocsparse.so

Seems like it's in the rocsparse/rocsparse-devel package which you should get with the SDK.

<!-- gh-comment-id:1877704974 --> @light-on-shadow commented on GitHub (Jan 4, 2024): I installed this I believe rocm-hip-sdk5.6.0-5.6.0.50600-67.el9.x86_64 When I run a locate on that lib on my setup I get this : > /opt/rocm-5.6.0/lib/librocsparse.so > /opt/rocm-5.6.0/lib/librocsparse.so.0 > /opt/rocm-5.6.0/lib/librocsparse.so.0.1.50600 > /opt/rocm-5.6.0/rocsparse/lib/librocsparse.so Seems like it's in the rocsparse/rocsparse-devel package which you should get with the SDK.
Author
Owner

@oatmealm commented on GitHub (Jan 4, 2024):

Yes. It's installed and ld can see it but still the last step when gcc is run fails ...

<!-- gh-comment-id:1877742405 --> @oatmealm commented on GitHub (Jan 4, 2024): Yes. It's installed and ld can see it but still the last step when gcc is run fails ...
Author
Owner

@oatmealm commented on GitHub (Jan 4, 2024):

Ok, problem solved by symlinking rocm-5.6.0 to rocm instead of using ROCM_PATH ...

Now that it's running, all models seem to fail... with the following:

[GIN] 2024/01/04 - 23:08:57 | 200 |      66.757µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/01/04 - 23:08:57 | 200 |     745.843µs |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/01/04 - 23:08:57 | 200 |     351.729µs |       127.0.0.1 | POST     "/api/show"
Lazy loading /tmp/ollama3338686230/librocm_server.so library
2024/01/04 23:08:57 shim_ext_server.go:94: Loading Dynamic Shim llm server: /tmp/ollama3338686230/librocm_server.so
2024/01/04 23:08:57 gpu.go:131: 135 MB VRAM available, loading up to 0 ROCM GPU layers out of 32
2024/01/04 23:08:57 ext_server.go:189: Initializing internal llama server
disabling verbose llm logging

rocBLAS error: Cannot read /opt/rocm-5.6.0/lib/rocblas/library/TensileLibrary.dat: No such file or directory
Aborted (core dumped)

rocminfo 
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
 Name:                    AMD Ryzen 9 5900HX with Radeon Graphics
 Uuid:                    CPU-XX                             
 Marketing Name:          AMD Ryzen 9 5900HX with Radeon Graphics
 Vendor Name:             CPU                                
 Feature:                 None specified                     
 Profile:                 FULL_PROFILE                       
 Float Round Mode:        NEAR                               
 Max Queue Number:        0(0x0)                             
 Queue Min Size:          0(0x0)                             
 Queue Max Size:          0(0x0)                             
 Queue Type:              MULTI                              
 Node:                    0                                  
 Device Type:             CPU                                
 Cache Info:              
   L1:                      32768(0x8000) KB                   
 Chip ID:                 0(0x0)                             
 ASIC Revision:           0(0x0)                             
 Cacheline Size:          64(0x40)                           
 Max Clock Freq. (MHz):   4680                               
 BDFID:                   0                                  
 Internal Node ID:        0                                  
 Compute Unit:            16                                 
 SIMDs per CU:            0                                  
 Shader Engines:          0                                  
 Shader Arrs. per Eng.:   0                                  
 WatchPts on Addr. Ranges:1                                  
 Features:                None
 Pool Info:               
   Pool 1                   
     Segment:                 GLOBAL; FLAGS: FINE GRAINED        
     Size:                    32260988(0x1ec437c) KB             
     Allocatable:             TRUE                               
     Alloc Granule:           4KB                                
     Alloc Alignment:         4KB                                
     Accessible by all:       TRUE                               
   Pool 2                   
     Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
     Size:                    32260988(0x1ec437c) KB             
     Allocatable:             TRUE                               
     Alloc Granule:           4KB                                
     Alloc Alignment:         4KB                                
     Accessible by all:       TRUE                               
   Pool 3                   
     Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
     Size:                    32260988(0x1ec437c) KB             
     Allocatable:             TRUE                               
     Alloc Granule:           4KB                                
     Alloc Alignment:         4KB                                
     Accessible by all:       TRUE                               
 ISA Info:                
*******                  
Agent 2                  
*******                  
 Name:                    gfx90c                             
 Uuid:                    GPU-XX                             
 Marketing Name:          AMD Radeon Graphics                
 Vendor Name:             AMD                                
 Feature:                 KERNEL_DISPATCH                    
 Profile:                 BASE_PROFILE                       
 Float Round Mode:        NEAR                               
 Max Queue Number:        128(0x80)                          
 Queue Min Size:          64(0x40)                           
 Queue Max Size:          131072(0x20000)                    
 Queue Type:              MULTI                              
 Node:                    1                                  
 Device Type:             GPU                                
 Cache Info:              
   L1:                      16(0x10) KB                        
   L2:                      1024(0x400) KB                     
 Chip ID:                 5688(0x1638)                       
 ASIC Revision:           0(0x0)                             
 Cacheline Size:          64(0x40)                           
 Max Clock Freq. (MHz):   2100                               
 BDFID:                   1024                               
 Internal Node ID:        1                                  
 Compute Unit:            8                                  
 SIMDs per CU:            4                                  
 Shader Engines:          1                                  
 Shader Arrs. per Eng.:   1                                  
 WatchPts on Addr. Ranges:4                                  
 Features:                KERNEL_DISPATCH 
 Fast F16 Operation:      TRUE                               
 Wavefront Size:          64(0x40)                           
 Workgroup Max Size:      1024(0x400)                        
 Workgroup Max Size per Dimension:
   x                        1024(0x400)                        
   y                        1024(0x400)                        
   z                        1024(0x400)                        
 Max Waves Per CU:        40(0x28)                           
 Max Work-item Per CU:    2560(0xa00)                        
 Grid Max Size:           4294967295(0xffffffff)             
 Grid Max Size per Dimension:
   x                        4294967295(0xffffffff)             
   y                        4294967295(0xffffffff)             
   z                        4294967295(0xffffffff)             
 Max fbarriers/Workgrp:   32                                 
 Pool Info:               
   Pool 1                   
     Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
     Size:                    524288(0x80000) KB                 
     Allocatable:             TRUE                               
     Alloc Granule:           4KB                                
     Alloc Alignment:         4KB                                
     Accessible by all:       FALSE                              
   Pool 2                   
     Segment:                 GROUP                              
     Size:                    64(0x40) KB                        
     Allocatable:             FALSE                              
     Alloc Granule:           0KB                                
     Alloc Alignment:         0KB                                
     Accessible by all:       FALSE                              
 ISA Info:                
   ISA 1                    
     Name:                    amdgcn-amd-amdhsa--gfx90c:xnack-   
     Machine Models:          HSA_MACHINE_MODEL_LARGE            
     Profiles:                HSA_PROFILE_BASE                   
     Default Rounding Mode:   NEAR                               
     Default Rounding Mode:   NEAR                               
     Fast f16:                TRUE                               
     Workgroup Max Size:      1024(0x400)                        
     Workgroup Max Size per Dimension:
       x                        1024(0x400)                        
       y                        1024(0x400)                        
       z                        1024(0x400)                        
     Grid Max Size:           4294967295(0xffffffff)             
     Grid Max Size per Dimension:
       x                        4294967295(0xffffffff)             
       y                        4294967295(0xffffffff)             
       z                        4294967295(0xffffffff)             
     FBarrier Max Size:       32                                 
*** Done ***   ```
<!-- gh-comment-id:1877827826 --> @oatmealm commented on GitHub (Jan 4, 2024): Ok, problem solved by symlinking rocm-5.6.0 to rocm instead of using ROCM_PATH ... Now that it's running, all models seem to fail... with the following: ``` [GIN] 2024/01/04 - 23:08:57 | 200 | 66.757µs | 127.0.0.1 | HEAD "/" [GIN] 2024/01/04 - 23:08:57 | 200 | 745.843µs | 127.0.0.1 | POST "/api/show" [GIN] 2024/01/04 - 23:08:57 | 200 | 351.729µs | 127.0.0.1 | POST "/api/show" Lazy loading /tmp/ollama3338686230/librocm_server.so library 2024/01/04 23:08:57 shim_ext_server.go:94: Loading Dynamic Shim llm server: /tmp/ollama3338686230/librocm_server.so 2024/01/04 23:08:57 gpu.go:131: 135 MB VRAM available, loading up to 0 ROCM GPU layers out of 32 2024/01/04 23:08:57 ext_server.go:189: Initializing internal llama server disabling verbose llm logging rocBLAS error: Cannot read /opt/rocm-5.6.0/lib/rocblas/library/TensileLibrary.dat: No such file or directory Aborted (core dumped) ``` ``` rocminfo ROCk module is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.1 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE ========== HSA Agents ========== ******* Agent 1 ******* Name: AMD Ryzen 9 5900HX with Radeon Graphics Uuid: CPU-XX Marketing Name: AMD Ryzen 9 5900HX with Radeon Graphics Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 4680 BDFID: 0 Internal Node ID: 0 Compute Unit: 16 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 32260988(0x1ec437c) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 32260988(0x1ec437c) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 32260988(0x1ec437c) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: ******* Agent 2 ******* Name: gfx90c Uuid: GPU-XX Marketing Name: AMD Radeon Graphics Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 1024(0x400) KB Chip ID: 5688(0x1638) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 2100 BDFID: 1024 Internal Node ID: 1 Compute Unit: 8 SIMDs per CU: 4 Shader Engines: 1 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 40(0x28) Max Work-item Per CU: 2560(0xa00) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 524288(0x80000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx90c:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 *** Done *** ```
Author
Owner

@light-on-shadow commented on GitHub (Jan 4, 2024):

AMD Ryzen 9 5900HX with Radeon Graphics

I've had a lot of issues in the past because of my CPU's integrated GPU, try to disable it.
But the errors seems to point to something else, again it cannot fetch a lib.

<!-- gh-comment-id:1877869607 --> @light-on-shadow commented on GitHub (Jan 4, 2024): > AMD Ryzen 9 5900HX with Radeon Graphics I've had a lot of issues in the past because of my CPU's integrated GPU, try to disable it. But the errors seems to point to something else, again it cannot fetch a lib.
Author
Owner

@oatmealm commented on GitHub (Jan 5, 2024):

Hardware wise though it should be ok though? I was trying to check if it's currently supported or not...

<!-- gh-comment-id:1878197086 --> @oatmealm commented on GitHub (Jan 5, 2024): Hardware wise though it should be ok though? I was trying to check if it's currently supported or not...
Author
Owner

@ms178 commented on GitHub (Jan 6, 2024):

I tried to build ollama 0.1.18 release on Arch with the provided PKGBUILD, but even though I've tried various GCC and Clang compiler flags and ROCm versions (5.7.1 and 6.0), I still end up with:

/usr/bin/ld: gguf/build/linux/rocm/lib/libext_server.a: member gguf/build/linux/rocm/lib/libext_server.a(ext_server.cpp.o) in archive is not an object in the installation phase.

Any ideas how to fix this?

<!-- gh-comment-id:1879492883 --> @ms178 commented on GitHub (Jan 6, 2024): I tried to build ollama 0.1.18 release on Arch with the provided [PKGBUILD](https://gitlab.archlinux.org/archlinux/packaging/packages/ollama/-/blob/main/PKGBUILD?ref_type=heads), but even though I've tried various GCC and Clang compiler flags and ROCm versions (5.7.1 and 6.0), I still end up with: `/usr/bin/ld: gguf/build/linux/rocm/lib/libext_server.a: member gguf/build/linux/rocm/lib/libext_server.a(ext_server.cpp.o) in archive is not an object` in the installation phase. Any ideas how to fix this?
Author
Owner

@deadmeu commented on GitHub (Jan 7, 2024):

Now that the changes in ROCm/rocBLAS#1146 have flowed through to a release and is being shipped in the Arch repo I tried it out to see how it would go but unfortunately am having some issues:

  • Initially, with only the ollama package installed, my GPU is not detected gpu.go:45: ROCm not detected: Unable to load librocm_smi64.so library to query for Radeon GPUs: /opt/rocm/lib/librocm_smi64.so: cannot open shared object file: No such file or directory.
  • After installing the rocm-smi-lib package my GPU was being picked up (Radeon GPU detected) but running a model would not use my GPU.
<!-- gh-comment-id:1879993706 --> @deadmeu commented on GitHub (Jan 7, 2024): Now that the changes in ROCm/rocBLAS#1146 have flowed through to a release and is being shipped in the [Arch repo](https://archlinux.org/packages/extra/x86_64/ollama/) I tried it out to see how it would go but unfortunately am having some issues: - Initially, with only the `ollama` package installed, my GPU is not detected `gpu.go:45: ROCm not detected: Unable to load librocm_smi64.so library to query for Radeon GPUs: /opt/rocm/lib/librocm_smi64.so: cannot open shared object file: No such file or directory`. - After installing the `rocm-smi-lib` package my GPU was being picked up (`Radeon GPU detected`) but running a model would not use my GPU.
Author
Owner

@jameshulse commented on GitHub (Jan 9, 2024):

@oatmealm

Ok, problem solved by symlinking rocm-5.6.0 to rocm instead of using ROCM_PATH ...

Now that it's running, all models seem to fail... with the following:

[GIN] 2024/01/04 - 23:08:57 | 200 |      66.757µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/01/04 - 23:08:57 | 200 |     745.843µs |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/01/04 - 23:08:57 | 200 |     351.729µs |       127.0.0.1 | POST     "/api/show"
Lazy loading /tmp/ollama3338686230/librocm_server.so library
2024/01/04 23:08:57 shim_ext_server.go:94: Loading Dynamic Shim llm server: /tmp/ollama3338686230/librocm_server.so
2024/01/04 23:08:57 gpu.go:131: 135 MB VRAM available, loading up to 0 ROCM GPU layers out of 32
2024/01/04 23:08:57 ext_server.go:189: Initializing internal llama server
disabling verbose llm logging

rocBLAS error: Cannot read /opt/rocm-5.6.0/lib/rocblas/library/TensileLibrary.dat: No such file or directory
Aborted (core dumped)

I think this is fixed by setting the environment variable to use the older GFX version: HSA_OVERRIDE_GFX_VERSION=10.3.0. You can set it in your systemd config, or if you are running ollama serve directly then prefix it like HSA_OVERRIDE_GFX_VERSION=10.3.0 ollama serve. There is also another env variable HCC_AMDGPU_TARGET=gfx1030 which could help.

<!-- gh-comment-id:1882305368 --> @jameshulse commented on GitHub (Jan 9, 2024): @oatmealm > Ok, problem solved by symlinking rocm-5.6.0 to rocm instead of using ROCM_PATH ... > > Now that it's running, all models seem to fail... with the following: > > ``` > [GIN] 2024/01/04 - 23:08:57 | 200 | 66.757µs | 127.0.0.1 | HEAD "/" > [GIN] 2024/01/04 - 23:08:57 | 200 | 745.843µs | 127.0.0.1 | POST "/api/show" > [GIN] 2024/01/04 - 23:08:57 | 200 | 351.729µs | 127.0.0.1 | POST "/api/show" > Lazy loading /tmp/ollama3338686230/librocm_server.so library > 2024/01/04 23:08:57 shim_ext_server.go:94: Loading Dynamic Shim llm server: /tmp/ollama3338686230/librocm_server.so > 2024/01/04 23:08:57 gpu.go:131: 135 MB VRAM available, loading up to 0 ROCM GPU layers out of 32 > 2024/01/04 23:08:57 ext_server.go:189: Initializing internal llama server > disabling verbose llm logging > > rocBLAS error: Cannot read /opt/rocm-5.6.0/lib/rocblas/library/TensileLibrary.dat: No such file or directory > Aborted (core dumped) I think this is fixed by setting the environment variable to use the older GFX version: `HSA_OVERRIDE_GFX_VERSION=10.3.0`. You can set it in your systemd config, or if you are running `ollama serve` directly then prefix it like `HSA_OVERRIDE_GFX_VERSION=10.3.0 ollama serve`. There is also another env variable `HCC_AMDGPU_TARGET=gfx1030` which could help.
Author
Owner

@oatmealm commented on GitHub (Jan 9, 2024):

I think this is fixed by setting the environment variable to use the older GFX version: HSA_OVERRIDE_GFX_VERSION=10.3.0. You can set it in your systemd config, or if you are running ollama serve directly then prefix it like HSA_OVERRIDE_GFX_VERSION=10.3.0 ollama serve. There is also another env variable HCC_AMDGPU_TARGET=gfx1030 which could help.

I've tried the first one before, but wasn't sure which version I'm supposed to use. Know that it's running, I'm not seeing the output indicating it fond rocm and when running, I see this: "Not compiled with GPU offload support," ...?!!

2024/01/09 15:32:13 llama.go:403: skipping accelerated runner because num_gpu=0
2024/01/09 15:32:13 llama.go:436: starting llama runner
2024/01/09 15:32:13 llama.go:494: waiting for llama runner to start responding
{"timestamp":1704810733,"level":"WARNING","function":"server_params_parse","line":2160,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":-1}
{"timestamp":1704810733,"level":"INFO","function":"main","line":2667,"message":"build info","build":468,"commit":"a7aee47"}
{"timestamp":1704810733,"level":"INFO","function":"main","line":2670,"message":"system info","n_threads":8,"n_threads_batch":-1,"total_threads":16,"system_info":"AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | "}
<!-- gh-comment-id:1883172761 --> @oatmealm commented on GitHub (Jan 9, 2024): > I think this is fixed by setting the environment variable to use the older GFX version: `HSA_OVERRIDE_GFX_VERSION=10.3.0`. You can set it in your systemd config, or if you are running `ollama serve` directly then prefix it like `HSA_OVERRIDE_GFX_VERSION=10.3.0 ollama serve`. There is also another env variable `HCC_AMDGPU_TARGET=gfx1030` which could help. I've tried the first one before, but wasn't sure which version I'm supposed to use. Know that it's running, I'm not seeing the output indicating it fond rocm and when running, I see this: "Not compiled with GPU offload support," ...?!! ``` 2024/01/09 15:32:13 llama.go:403: skipping accelerated runner because num_gpu=0 2024/01/09 15:32:13 llama.go:436: starting llama runner 2024/01/09 15:32:13 llama.go:494: waiting for llama runner to start responding {"timestamp":1704810733,"level":"WARNING","function":"server_params_parse","line":2160,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":-1} {"timestamp":1704810733,"level":"INFO","function":"main","line":2667,"message":"build info","build":468,"commit":"a7aee47"} {"timestamp":1704810733,"level":"INFO","function":"main","line":2670,"message":"system info","n_threads":8,"n_threads_batch":-1,"total_threads":16,"system_info":"AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | "} ```
Author
Owner

@oatmealm commented on GitHub (Jan 9, 2024):

Ok, sorry. It's fine with HSA_OVERRIDE_GFX_VERSION=10.3.0 HCC_AMDGPU_TARGET=gfx90c ./ollama serv ... the gpu I actually have. It seems to work and there's some activitiy shown in radeontop, but cpu goes to 50%+ all the time...

<!-- gh-comment-id:1883185866 --> @oatmealm commented on GitHub (Jan 9, 2024): Ok, sorry. It's fine with `HSA_OVERRIDE_GFX_VERSION=10.3.0 HCC_AMDGPU_TARGET=gfx90c ./ollama serv` ... the gpu I actually have. It seems to work and there's some activitiy shown in radeontop, but cpu goes to 50%+ all the time...
Author
Owner

@tecosaur commented on GitHub (Jan 10, 2024):

I'm running openSUSE Tumbleweed, and once I got libffi7 was able to install the SLE libraries. Following the docs, I've ended up with the following installed:

❯ zypper se -i amd rocm hip
Loading repository data...
Reading installed packages...

S  | Name                   | Summary                                                  | Type
---+------------------------+----------------------------------------------------------+--------
i  | amdgpu-core            | Core meta package                                        | package
i+ | amdgpu-dkms            | The amdgpu Linux kernel driver                           | package
i  | amdgpu-dkms-firmware   | Firmware for amdgpu-dkms                                 | package
i  | hip-devel              | HIP:Heterogenous-computing Interface for Portability     | package
i  | hip-runtime-amd        | HIP:Heterogenous-computing Interface for Portability     | package
i  | hipblas                | Radeon Open Compute BLAS marshalling library             | package
i  | hipblaslt              | Radeon Open Compute BLAS marshalling library             | package
i  | hipcc                  | hipcc built using CMake                                  | package
i  | hipfft                 | ROCm FFT marshalling library                             | package
i  | hiprand                | Radeon Open Compute RAND library                         | package
i  | hipsolver              | Radeon Open Compute LAPACK marshalling library           | package
i  | hipsparse              | ROCm SPARSE library                                      | package
i  | hiptensor              | Adaptation library of tensor contraction with composab-> | package
i  | kernel-firmware-amdgpu | Kernel firmware files for AMDGPU graphics driver         | package
i  | libamd2                | Symmetric Approximate Minimum Degree                     | package
i  | libcamd2               | Symmetric Approximate Minimum Degree                     | package
i  | libccolamd2            | Constrained Column Approximate Minimum Degree            | package
i  | libcolamd2             | Column Approximate Minimum Degree                        | package
i  | libdrm-amdgpu          | Direct Rendering Manager runtime library                 | package
i  | libdrm-amdgpu-common   | List of AMD/ATI cards' ID info                           | package
i  | libdrm_amdgpu1         | Userspace interface for Kernel DRM services for AMD Ra-> | package
i  | libdrm_amdgpu1-32bit   | Userspace interface for Kernel DRM services for AMD Ra-> | package
i  | procmail               | A program for local e-mail delivery                      | package
i  | rocm-core              | Radeon Open Compute (ROCm) Runtime software stack        | package
i+ | rocm-hip-libraries     | Radeon Open Compute (ROCm) Runtime software stack        | package
i  | rocm-hip-runtime       | Radeon Open Compute (ROCm) Runtime software stack        | package
i  | rocm-language-runtime  | Radeon Open Compute (ROCm) Runtime software stack        | package
i  | rocm-llvm              | ROCm compiler                                            | package
i  | rocm-smi-lib           | AMD System Management libraries                          | package
i  | rocminfo               | Radeon Open Compute (ROCm) Runtime rocminfo tool         | package

I've then run ollama serve (ollama 0.1.19) like so:

HSA_OVERRIDE_GFX_VERSION=11.0.0 HCC_AMDGPU_TARGET=gfx1101 ollama serve

Attempting to run a model though, I see that while the GPU is detected, inference falls back to the CPU.

2024/01/10 17:00:09 images.go:808: total blobs: 6
2024/01/10 17:00:09 images.go:815: total unused blobs removed: 0
2024/01/10 17:00:09 routes.go:930: Listening on 127.0.0.1:11434 (version 0.1.19)
2024/01/10 17:00:09 shim_ext_server.go:142: Dynamic LLM variants [rocm cuda]
2024/01/10 17:00:09 gpu.go:35: Detecting GPU type
2024/01/10 17:00:09 gpu.go:40: CUDA not detected: Unable to load libnvidia-ml.so library to query for Nvidia GPUs: /usr/lib/wsl/lib/libnvidia-ml.so.1: cannot open shared object file: No such file or directory
2024/01/10 17:00:09 gpu.go:49: Radeon GPU detected
[GIN] 2024/01/10 - 17:00:17 | 200 |      14.593µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/01/10 - 17:00:17 | 200 |     385.094µs |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/01/10 - 17:00:17 | 200 |     991.271µs |       127.0.0.1 | POST     "/api/show"
filetype Q4_0
architecture llama
type 11B
name gguf
embd 4096
head 32
head_kv 8
gqa 4
2024/01/10 17:00:17 llm.go:70: system memory bytes: 10366363239
2024/01/10 17:00:17 llm.go:71: required model bytes: 6072394944
2024/01/10 17:00:17 llm.go:72: required kv bytes: 402653184
2024/01/10 17:00:17 llm.go:73: required alloc bytes: 268435456
2024/01/10 17:00:17 llm.go:74: required total bytes: 6743483584
2024/01/10 17:00:17 shim_ext_server_linux.go:24: Updating PATH to /home/tec/.local/share/cargo/bin:/home/tec/.julia/juliaup/bin:/home/tec/.local/share/go/bin:/home/tec/.local/share/npm/bin:/home/tec/.local/bin:/usr/local/sbin:/home/tec/.nix-profile/bin:/nix/var/nix/profiles/default/bin:/usr/local/bin:/usr/bin:/bin:/home/tec/.cache/zsh4humans/v5/fzf/bin:/tmp/ollama2630925167/rocm
Lazy loading /tmp/ollama2630925167/rocm/libext_server.so library
2024/01/10 17:00:17 llm.go:142: Failed to load dynamic library rocm - falling back to CPU mode Unable to load dynamic library: Unable to load dynamic server library: libhipblas.so.1: cannot open shared object file: No such file or directory

Looking in /opt/rocm/lib, I see libhipblas.so.2 not libhipblas.so.1. This is having installed ROCm 6.0.

I don't suppose anybody might have any ideas about this?

<!-- gh-comment-id:1884730808 --> @tecosaur commented on GitHub (Jan 10, 2024): I'm running openSUSE Tumbleweed, and once I got `libffi7` was able to install the SLE libraries. Following the docs, I've ended up with the following installed: ``` ❯ zypper se -i amd rocm hip Loading repository data... Reading installed packages... S | Name | Summary | Type ---+------------------------+----------------------------------------------------------+-------- i | amdgpu-core | Core meta package | package i+ | amdgpu-dkms | The amdgpu Linux kernel driver | package i | amdgpu-dkms-firmware | Firmware for amdgpu-dkms | package i | hip-devel | HIP:Heterogenous-computing Interface for Portability | package i | hip-runtime-amd | HIP:Heterogenous-computing Interface for Portability | package i | hipblas | Radeon Open Compute BLAS marshalling library | package i | hipblaslt | Radeon Open Compute BLAS marshalling library | package i | hipcc | hipcc built using CMake | package i | hipfft | ROCm FFT marshalling library | package i | hiprand | Radeon Open Compute RAND library | package i | hipsolver | Radeon Open Compute LAPACK marshalling library | package i | hipsparse | ROCm SPARSE library | package i | hiptensor | Adaptation library of tensor contraction with composab-> | package i | kernel-firmware-amdgpu | Kernel firmware files for AMDGPU graphics driver | package i | libamd2 | Symmetric Approximate Minimum Degree | package i | libcamd2 | Symmetric Approximate Minimum Degree | package i | libccolamd2 | Constrained Column Approximate Minimum Degree | package i | libcolamd2 | Column Approximate Minimum Degree | package i | libdrm-amdgpu | Direct Rendering Manager runtime library | package i | libdrm-amdgpu-common | List of AMD/ATI cards' ID info | package i | libdrm_amdgpu1 | Userspace interface for Kernel DRM services for AMD Ra-> | package i | libdrm_amdgpu1-32bit | Userspace interface for Kernel DRM services for AMD Ra-> | package i | procmail | A program for local e-mail delivery | package i | rocm-core | Radeon Open Compute (ROCm) Runtime software stack | package i+ | rocm-hip-libraries | Radeon Open Compute (ROCm) Runtime software stack | package i | rocm-hip-runtime | Radeon Open Compute (ROCm) Runtime software stack | package i | rocm-language-runtime | Radeon Open Compute (ROCm) Runtime software stack | package i | rocm-llvm | ROCm compiler | package i | rocm-smi-lib | AMD System Management libraries | package i | rocminfo | Radeon Open Compute (ROCm) Runtime rocminfo tool | package ``` I've then run `ollama serve` (ollama 0.1.19) like so: ``` HSA_OVERRIDE_GFX_VERSION=11.0.0 HCC_AMDGPU_TARGET=gfx1101 ollama serve ``` Attempting to run a model though, I see that while the GPU is detected, inference falls back to the CPU. ``` 2024/01/10 17:00:09 images.go:808: total blobs: 6 2024/01/10 17:00:09 images.go:815: total unused blobs removed: 0 2024/01/10 17:00:09 routes.go:930: Listening on 127.0.0.1:11434 (version 0.1.19) 2024/01/10 17:00:09 shim_ext_server.go:142: Dynamic LLM variants [rocm cuda] 2024/01/10 17:00:09 gpu.go:35: Detecting GPU type 2024/01/10 17:00:09 gpu.go:40: CUDA not detected: Unable to load libnvidia-ml.so library to query for Nvidia GPUs: /usr/lib/wsl/lib/libnvidia-ml.so.1: cannot open shared object file: No such file or directory 2024/01/10 17:00:09 gpu.go:49: Radeon GPU detected [GIN] 2024/01/10 - 17:00:17 | 200 | 14.593µs | 127.0.0.1 | HEAD "/" [GIN] 2024/01/10 - 17:00:17 | 200 | 385.094µs | 127.0.0.1 | POST "/api/show" [GIN] 2024/01/10 - 17:00:17 | 200 | 991.271µs | 127.0.0.1 | POST "/api/show" filetype Q4_0 architecture llama type 11B name gguf embd 4096 head 32 head_kv 8 gqa 4 2024/01/10 17:00:17 llm.go:70: system memory bytes: 10366363239 2024/01/10 17:00:17 llm.go:71: required model bytes: 6072394944 2024/01/10 17:00:17 llm.go:72: required kv bytes: 402653184 2024/01/10 17:00:17 llm.go:73: required alloc bytes: 268435456 2024/01/10 17:00:17 llm.go:74: required total bytes: 6743483584 2024/01/10 17:00:17 shim_ext_server_linux.go:24: Updating PATH to /home/tec/.local/share/cargo/bin:/home/tec/.julia/juliaup/bin:/home/tec/.local/share/go/bin:/home/tec/.local/share/npm/bin:/home/tec/.local/bin:/usr/local/sbin:/home/tec/.nix-profile/bin:/nix/var/nix/profiles/default/bin:/usr/local/bin:/usr/bin:/bin:/home/tec/.cache/zsh4humans/v5/fzf/bin:/tmp/ollama2630925167/rocm Lazy loading /tmp/ollama2630925167/rocm/libext_server.so library 2024/01/10 17:00:17 llm.go:142: Failed to load dynamic library rocm - falling back to CPU mode Unable to load dynamic library: Unable to load dynamic server library: libhipblas.so.1: cannot open shared object file: No such file or directory ``` Looking in `/opt/rocm/lib`, I see `libhipblas.so.2` not `libhipblas.so.1`. This is having installed ROCm 6.0. I don't suppose anybody might have any ideas about this?
Author
Owner

@jameshulse commented on GitHub (Jan 10, 2024):

@tecosaur

I don't suppose anybody might have any ideas about this?

I'm not an expert by any means, but my guess is you have a more recent version of libhipblas than is expected. I wonder if you can downgrade?

EDIT: I've just seen you installed ROCm 6.0. I got it working with an older version : 5.6.0 I believe.

<!-- gh-comment-id:1884751228 --> @jameshulse commented on GitHub (Jan 10, 2024): @tecosaur > I don't suppose anybody might have any ideas about this? I'm not an expert by any means, but my guess is you have a more recent version of libhipblas than is expected. I wonder if you can downgrade? EDIT: I've just seen you installed ROCm 6.0. I got it working with an older version : **5.6.0** I believe.
Author
Owner

@ms178 commented on GitHub (Jan 10, 2024):

@tecosaur See: https://github.com/jmorganca/ollama/pull/1819

ROCm 5 and 6 are incompatible to one another. Ollama expects ROCm 5 at the moment. The linked Pull request will make it compatible with ROCm 6.

<!-- gh-comment-id:1884762241 --> @ms178 commented on GitHub (Jan 10, 2024): @tecosaur See: https://github.com/jmorganca/ollama/pull/1819 ROCm 5 and 6 are incompatible to one another. Ollama expects ROCm 5 at the moment. The linked Pull request will make it compatible with ROCm 6.
Author
Owner

@tecosaur commented on GitHub (Jan 10, 2024):

Ah interesting, I wasn't aware that the initial support was for ROCm 5.6 only. For what it's worth, the sym links "hacky fix" seems to be working my end.

<!-- gh-comment-id:1884802377 --> @tecosaur commented on GitHub (Jan 10, 2024): Ah interesting, I wasn't aware that the initial support was for ROCm 5.6 only. For what it's worth, the sym links "hacky fix" seems to be working my end.
Author
Owner

@tecosaur commented on GitHub (Jan 11, 2024):

I've noticed something funny when I'm not actively running inference with any model.

  1. Without ollama running, or with no model loaded in memory, my GPU (7800xt) idles at ~4% and ~30W.
  2. When running inference that jumps to 100% and ~200W.
  3. After running inference, the GPU still shows 100% utilisation, and power usage only drops to 70W

Restarting the ollama service immediately takes me from (3) to (1), but I'm find it strange that (3) happens in the first place.

<!-- gh-comment-id:1886698079 --> @tecosaur commented on GitHub (Jan 11, 2024): I've noticed something funny when I'm _not_ actively running inference with any model. 1. Without ollama running, or with no model loaded in memory, my GPU (7800xt) idles at ~4% and ~30W. 2. When running inference that jumps to 100% and ~200W. 3. _After_ running inference, the GPU still shows 100% utilisation, and power usage only drops to 70W Restarting the ollama service immediately takes me from (3) to (1), but I'm find it strange that (3) happens in the first place.
Author
Owner

@tecosaur commented on GitHub (Jan 12, 2024):

Oh, and my computer also refuses to go to sleep while the GPU is in state (2).

<!-- gh-comment-id:1888408320 --> @tecosaur commented on GitHub (Jan 12, 2024): Oh, and my computer also refuses to go to sleep while the GPU is in state (2).
Author
Owner

@65a commented on GitHub (Jan 12, 2024):

@tecosaur try setting GPU_MAX_HW_QUEUES to 1 in your environment.

If Ollama wanted to, they could do something like:

if err := os.Setenv("GPU_MAX_HW_QUEUES", "1"); err != nil {

on the ROCm path, but I should probably just send a PR to llama.cpp at this point. It's a problem with the the way HIP slams the ROCm scheduler with queues, which seems to be broken.

<!-- gh-comment-id:1888467732 --> @65a commented on GitHub (Jan 12, 2024): @tecosaur try setting `GPU_MAX_HW_QUEUES` to `1` in your environment. If Ollama wanted to, they could do something like: ``` if err := os.Setenv("GPU_MAX_HW_QUEUES", "1"); err != nil { ``` on the ROCm path, but I should probably just send a PR to llama.cpp at this point. It's a problem with the the way HIP slams the ROCm scheduler with queues, which seems to be broken.
Author
Owner

@dhiltgen commented on GitHub (Jan 20, 2024):

The pre-release for 0.1.21 is up now, and we've made various improvements to support ROCm cards, covering both v5 and v6 of the ROCm libraries. You'll have to install ROCm, but then the Ollama binary should work.

Please let us know if you run into any problems.

<!-- gh-comment-id:1902386834 --> @dhiltgen commented on GitHub (Jan 20, 2024): The pre-release for [0.1.21](https://github.com/jmorganca/ollama/releases/tag/v0.1.21) is up now, and we've made various improvements to support ROCm cards, covering both v5 and v6 of the ROCm libraries. You'll have to [install ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html#rocm-install-quick), but then the Ollama binary should work. Please let us know if you run into any problems.
Author
Owner

@dhiltgen commented on GitHub (Jan 20, 2024):

I've noticed something funny when I'm not actively running inference with any model.

@tecosaur this is a known issue tracked via #1848

<!-- gh-comment-id:1902391552 --> @dhiltgen commented on GitHub (Jan 20, 2024): > I've noticed something funny when I'm not actively running inference with any model. @tecosaur this is a known issue tracked via #1848
Author
Owner

@babariviere commented on GitHub (Jan 20, 2024):

@dhiltgen will the docker image support rocm? Or do we need to make our own dockerfile for this?

<!-- gh-comment-id:1902427690 --> @babariviere commented on GitHub (Jan 20, 2024): @dhiltgen will the docker image support rocm? Or do we need to make our own dockerfile for this?
Author
Owner

@dhiltgen commented on GitHub (Jan 20, 2024):

will the docker image support rocm? Or do we need to make our own dockerfile for this?

We haven't added it to our image Dockerfile yet.

<!-- gh-comment-id:1902440338 --> @dhiltgen commented on GitHub (Jan 20, 2024): > will the docker image support rocm? Or do we need to make our own dockerfile for this? We haven't added it to our image [Dockerfile](https://github.com/jmorganca/ollama/blob/main/Dockerfile) yet.
Author
Owner

@babariviere commented on GitHub (Jan 21, 2024):

will the docker image support rocm? Or do we need to make our own dockerfile for this?

We haven't added it to our image Dockerfile yet.

Good to know thanks. 😄 Hope it will come one day!

<!-- gh-comment-id:1902457751 --> @babariviere commented on GitHub (Jan 21, 2024): > > will the docker image support rocm? Or do we need to make our own dockerfile for this? > > > > We haven't added it to our image [Dockerfile](https://github.com/jmorganca/ollama/blob/main/Dockerfile) yet. Good to know thanks. 😄 Hope it will come one day!
Author
Owner

@dhiltgen commented on GitHub (Jan 21, 2024):

@babariviere #2127 once merged will add rocm support to our official container image.

<!-- gh-comment-id:1902739593 --> @dhiltgen commented on GitHub (Jan 21, 2024): @babariviere #2127 once merged will add rocm support to our official container image.
Author
Owner

@zaskokus commented on GitHub (Jan 22, 2024):

@dhiltgen 1.21 keeps using CPU only for me. I've latest packages from archlinux, running on rdna3. am I missing something? (the model I've used is tinyllama if that makes any difference)

$ /opt/rocm/bin/rocm-smi 


========================= ROCm System Management Interface =========================
=================================== Concise Info ===================================
Exception caught: map::at
ERROR: GPU[1]	: sclk clock is unsupported
====================================================================================
GPU[1]		: get_power_cap, Not supported on the given system
GPU  Temp (DieEdge)  AvgPwr  SCLK    MCLK    Fan    Perf  PwrCap       VRAM%  GPU%  
0    45.0c           13.0W   315Mhz  772Mhz  29.8%  auto  100.0W        13%   9%    
1    33.0c           6.18W   None    400Mhz  0%     low   Unsupported    2%   0%    
====================================================================================
=============================== End of ROCm SMI Log ================================
$ pacman -Q | grep 'hip\|rocm'
hip-runtime-amd 5.7.1-1
hipblas 5.7.1-1
hipfft 5.7.1-1
hipsolver 5.7.1-1
hipsparse 5.7.1-1
rocm-cmake 5.7.1-1
rocm-core 5.7.1-1
rocm-device-libs 5.7.1-1
rocm-hip-libraries 5.7.1-2
rocm-hip-runtime 5.7.1-2
rocm-language-runtime 5.7.1-2
rocm-llvm 5.7.1-1
rocm-smi-lib 5.7.1-1
rocminfo 5.7.1-1
$ HSA_OVERRIDE_GFX_VERSION=11.0.0 HCC_AMDGPU_TARGET=gfx1102 ollama serve
2024/01/22 19:36:02 images.go:810: INFO total blobs: 4
2024/01/22 19:36:02 images.go:817: INFO total unused blobs removed: 0
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:	export GIN_MODE=release
 - using code:	gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST   /api/pull                 --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers)
[GIN-debug] POST   /api/generate             --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers)
[GIN-debug] POST   /api/chat                 --> github.com/jmorganca/ollama/server.ChatHandler (5 handlers)
[GIN-debug] POST   /api/embeddings           --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers)
[GIN-debug] POST   /api/create               --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers)
[GIN-debug] POST   /api/push                 --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers)
[GIN-debug] POST   /api/copy                 --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers)
[GIN-debug] DELETE /api/delete               --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers)
[GIN-debug] POST   /api/show                 --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers)
[GIN-debug] POST   /api/blobs/:digest        --> github.com/jmorganca/ollama/server.CreateBlobHandler (5 handlers)
[GIN-debug] HEAD   /api/blobs/:digest        --> github.com/jmorganca/ollama/server.HeadBlobHandler (5 handlers)
[GIN-debug] GET    /                         --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] GET    /api/tags                 --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
[GIN-debug] GET    /api/version              --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers)
[GIN-debug] HEAD   /                         --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD   /api/tags                 --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
[GIN-debug] HEAD   /api/version              --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers)
2024/01/22 19:36:02 routes.go:943: INFO Listening on 127.0.0.1:11434 (version 0.1.21)
2024/01/22 19:36:02 payload_common.go:106: INFO Extracting dynamic libraries...
2024/01/22 19:36:02 payload_common.go:145: INFO Dynamic LLM libraries [cpu_avx cpu cpu_avx2]
2024/01/22 19:36:02 gpu.go:91: INFO Detecting GPU type
2024/01/22 19:36:02 gpu.go:210: INFO Searching for GPU management library libnvidia-ml.so
2024/01/22 19:36:02 gpu.go:256: INFO Discovered GPU libraries: []
2024/01/22 19:36:02 gpu.go:210: INFO Searching for GPU management library librocm_smi64.so
2024/01/22 19:36:02 gpu.go:256: INFO Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0]
2024/01/22 19:36:02 gpu.go:106: INFO Radeon GPU detected
[GIN] 2024/01/22 - 19:37:34 | 200 |      45.348µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/01/22 - 19:37:34 | 404 |     230.329µs |       127.0.0.1 | POST     "/api/show"
2024/01/22 19:37:41 download.go:123: INFO downloading 2af3b81862c6 in 7 100 MB part(s)
2024/01/22 19:38:58 download.go:123: INFO downloading af0ddbdaaa26 in 1 70 B part(s)
2024/01/22 19:39:01 download.go:123: INFO downloading c8472cd9daed in 1 31 B part(s)
2024/01/22 19:39:04 download.go:123: INFO downloading fa956ab37b8c in 1 98 B part(s)
2024/01/22 19:39:09 download.go:123: INFO downloading 6331358be52a in 1 483 B part(s)
[GIN] 2024/01/22 - 19:39:11 | 200 |         1m37s |       127.0.0.1 | POST     "/api/pull"
[GIN] 2024/01/22 - 19:39:11 | 200 |     348.907µs |       127.0.0.1 | POST     "/api/show"
2024/01/22 19:39:12 cpu_common.go:11: INFO CPU has AVX2
loading library /tmp/ollama1525774821/cpu_avx2/libext_server.so
2024/01/22 19:39:12 dyn_ext_server.go:90: INFO Loading Dynamic llm server: /tmp/ollama1525774821/cpu_avx2/libext_server.so
2024/01/22 19:39:12 dyn_ext_server.go:139: INFO Initializing llama server
system info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | llama_model_loader: loaded meta data with 23 key-value pairs and 201 tensors from /home/rhqq/.ollama/models/blobs/sha256:2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = TinyLlama
llama_model_loader: - kv   2:                       llama.context_length u32              = 2048
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 2048
llama_model_loader: - kv   4:                          llama.block_count u32              = 22
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 5632
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 64
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 4
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  11:                          general.file_type u32              = 2
llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  16:                      tokenizer.ggml.merges arr[str,61249]   = ["▁ t", "e r", "i n", "▁ a", "e n...
llama_model_loader: - kv  17:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  18:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  20:            tokenizer.ggml.padding_token_id u32              = 2
llama_model_loader: - kv  21:                    tokenizer.chat_template str              = {% for message in messages %}\n{% if m...
llama_model_loader: - kv  22:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   45 tensors
llama_model_loader: - type q4_0:  155 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 2048
llm_load_print_meta: n_embd           = 2048
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 4
llm_load_print_meta: n_layer          = 22
llm_load_print_meta: n_rot            = 64
llm_load_print_meta: n_embd_head_k    = 64
llm_load_print_meta: n_embd_head_v    = 64
llm_load_print_meta: n_gqa            = 8
llm_load_print_meta: n_embd_k_gqa     = 256
llm_load_print_meta: n_embd_v_gqa     = 256
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 5632
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 2048
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 1B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 1.10 B
llm_load_print_meta: model size       = 606.53 MiB (4.63 BPW) 
llm_load_print_meta: general.name     = TinyLlama
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: PAD token        = 2 '</s>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size       =    0.08 MiB
llm_load_tensors: system memory used  =  606.60 MiB
.......................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: KV self size  =   44.00 MiB, K (f16):   22.00 MiB, V (f16):   22.00 MiB
llama_build_graph: non-view tensors processed: 466/466
llama_new_context_with_model: compute buffer total size = 147.19 MiB
[1705948752] warming up the model with an empty run
[1705948752] Available slots:
[1705948752]  -> Slot 0 - max context: 2048
2024/01/22 19:39:12 dyn_ext_server.go:147: INFO Starting llama main loop
[1705948752] llama server main loop starting
[1705948752] all slots are idle and system prompt is empty, clear the KV cache
[GIN] 2024/01/22 - 19:39:12 | 200 |  330.479596ms |       127.0.0.1 | POST     "/api/chat

edit: I used the binary from release section:

$ ./ollama-linux-amd64 serve
2024/01/23 19:26:49 images.go:815: INFO total blobs: 9
2024/01/23 19:26:49 images.go:822: INFO total unused blobs removed: 0
2024/01/23 19:26:49 routes.go:943: INFO Listening on 127.0.0.1:11434 (version 0.1.21)
2024/01/23 19:26:49 payload_common.go:106: INFO Extracting dynamic libraries...
2024/01/23 19:26:51 payload_common.go:145: INFO Dynamic LLM libraries [cpu cuda_v11 rocm_v5 cpu_avx cpu_avx2 rocm_v6]
2024/01/23 19:26:51 gpu.go:91: INFO Detecting GPU type
2024/01/23 19:26:51 gpu.go:210: INFO Searching for GPU management library libnvidia-ml.so
2024/01/23 19:26:51 gpu.go:256: INFO Discovered GPU libraries: []
2024/01/23 19:26:51 gpu.go:210: INFO Searching for GPU management library librocm_smi64.so
2024/01/23 19:26:51 gpu.go:256: INFO Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0]
2024/01/23 19:26:51 gpu.go:106: INFO Radeon GPU detected
[GIN] 2024/01/23 - 19:26:51 | 200 |        42.4µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/01/23 - 19:26:51 | 200 |     635.748µs |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/01/23 - 19:26:51 | 200 |     246.865µs |       127.0.0.1 | POST     "/api/show"
2024/01/23 19:26:51 cpu_common.go:11: INFO CPU has AVX2
loading library /tmp/ollama800487147/rocm_v5/libext_server.so
2024/01/23 19:26:51 dyn_ext_server.go:90: INFO Loading Dynamic llm server: /tmp/ollama800487147/rocm_v5/libext_server.so
2024/01/23 19:26:51 dyn_ext_server.go:145: INFO Initializing llama server
free(): invalid pointer
Aborted (core dumped)
<!-- gh-comment-id:1904593691 --> @zaskokus commented on GitHub (Jan 22, 2024): @dhiltgen 1.21 keeps using CPU only for me. I've latest packages from archlinux, running on rdna3. am I missing something? (the model I've used is `tinyllama` if that makes any difference) ``` $ /opt/rocm/bin/rocm-smi ========================= ROCm System Management Interface ========================= =================================== Concise Info =================================== Exception caught: map::at ERROR: GPU[1] : sclk clock is unsupported ==================================================================================== GPU[1] : get_power_cap, Not supported on the given system GPU Temp (DieEdge) AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 0 45.0c 13.0W 315Mhz 772Mhz 29.8% auto 100.0W 13% 9% 1 33.0c 6.18W None 400Mhz 0% low Unsupported 2% 0% ==================================================================================== =============================== End of ROCm SMI Log ================================ ``` ``` $ pacman -Q | grep 'hip\|rocm' hip-runtime-amd 5.7.1-1 hipblas 5.7.1-1 hipfft 5.7.1-1 hipsolver 5.7.1-1 hipsparse 5.7.1-1 rocm-cmake 5.7.1-1 rocm-core 5.7.1-1 rocm-device-libs 5.7.1-1 rocm-hip-libraries 5.7.1-2 rocm-hip-runtime 5.7.1-2 rocm-language-runtime 5.7.1-2 rocm-llvm 5.7.1-1 rocm-smi-lib 5.7.1-1 rocminfo 5.7.1-1 ``` ``` $ HSA_OVERRIDE_GFX_VERSION=11.0.0 HCC_AMDGPU_TARGET=gfx1102 ollama serve 2024/01/22 19:36:02 images.go:810: INFO total blobs: 4 2024/01/22 19:36:02 images.go:817: INFO total unused blobs removed: 0 [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached. [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production. - using env: export GIN_MODE=release - using code: gin.SetMode(gin.ReleaseMode) [GIN-debug] POST /api/pull --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers) [GIN-debug] POST /api/generate --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers) [GIN-debug] POST /api/chat --> github.com/jmorganca/ollama/server.ChatHandler (5 handlers) [GIN-debug] POST /api/embeddings --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers) [GIN-debug] POST /api/create --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers) [GIN-debug] POST /api/push --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers) [GIN-debug] POST /api/copy --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers) [GIN-debug] DELETE /api/delete --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers) [GIN-debug] POST /api/show --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers) [GIN-debug] POST /api/blobs/:digest --> github.com/jmorganca/ollama/server.CreateBlobHandler (5 handlers) [GIN-debug] HEAD /api/blobs/:digest --> github.com/jmorganca/ollama/server.HeadBlobHandler (5 handlers) [GIN-debug] GET / --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) [GIN-debug] GET /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers) [GIN-debug] GET /api/version --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers) [GIN-debug] HEAD / --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) [GIN-debug] HEAD /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers) [GIN-debug] HEAD /api/version --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers) 2024/01/22 19:36:02 routes.go:943: INFO Listening on 127.0.0.1:11434 (version 0.1.21) 2024/01/22 19:36:02 payload_common.go:106: INFO Extracting dynamic libraries... 2024/01/22 19:36:02 payload_common.go:145: INFO Dynamic LLM libraries [cpu_avx cpu cpu_avx2] 2024/01/22 19:36:02 gpu.go:91: INFO Detecting GPU type 2024/01/22 19:36:02 gpu.go:210: INFO Searching for GPU management library libnvidia-ml.so 2024/01/22 19:36:02 gpu.go:256: INFO Discovered GPU libraries: [] 2024/01/22 19:36:02 gpu.go:210: INFO Searching for GPU management library librocm_smi64.so 2024/01/22 19:36:02 gpu.go:256: INFO Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0] 2024/01/22 19:36:02 gpu.go:106: INFO Radeon GPU detected [GIN] 2024/01/22 - 19:37:34 | 200 | 45.348µs | 127.0.0.1 | HEAD "/" [GIN] 2024/01/22 - 19:37:34 | 404 | 230.329µs | 127.0.0.1 | POST "/api/show" 2024/01/22 19:37:41 download.go:123: INFO downloading 2af3b81862c6 in 7 100 MB part(s) 2024/01/22 19:38:58 download.go:123: INFO downloading af0ddbdaaa26 in 1 70 B part(s) 2024/01/22 19:39:01 download.go:123: INFO downloading c8472cd9daed in 1 31 B part(s) 2024/01/22 19:39:04 download.go:123: INFO downloading fa956ab37b8c in 1 98 B part(s) 2024/01/22 19:39:09 download.go:123: INFO downloading 6331358be52a in 1 483 B part(s) [GIN] 2024/01/22 - 19:39:11 | 200 | 1m37s | 127.0.0.1 | POST "/api/pull" [GIN] 2024/01/22 - 19:39:11 | 200 | 348.907µs | 127.0.0.1 | POST "/api/show" 2024/01/22 19:39:12 cpu_common.go:11: INFO CPU has AVX2 loading library /tmp/ollama1525774821/cpu_avx2/libext_server.so 2024/01/22 19:39:12 dyn_ext_server.go:90: INFO Loading Dynamic llm server: /tmp/ollama1525774821/cpu_avx2/libext_server.so 2024/01/22 19:39:12 dyn_ext_server.go:139: INFO Initializing llama server system info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | llama_model_loader: loaded meta data with 23 key-value pairs and 201 tensors from /home/rhqq/.ollama/models/blobs/sha256:2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = TinyLlama llama_model_loader: - kv 2: llama.context_length u32 = 2048 llama_model_loader: - kv 3: llama.embedding_length u32 = 2048 llama_model_loader: - kv 4: llama.block_count u32 = 22 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 5632 llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 64 llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 4 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000 llama_model_loader: - kv 11: general.file_type u32 = 2 llama_model_loader: - kv 12: tokenizer.ggml.model str = llama llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<... llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... llama_model_loader: - kv 16: tokenizer.ggml.merges arr[str,61249] = ["▁ t", "e r", "i n", "▁ a", "e n... llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 0 llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 2 llama_model_loader: - kv 21: tokenizer.chat_template str = {% for message in messages %}\n{% if m... llama_model_loader: - kv 22: general.quantization_version u32 = 2 llama_model_loader: - type f32: 45 tensors llama_model_loader: - type q4_0: 155 tensors llama_model_loader: - type q6_K: 1 tensors llm_load_vocab: special tokens definition check successful ( 259/32000 ). llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train = 2048 llm_load_print_meta: n_embd = 2048 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 4 llm_load_print_meta: n_layer = 22 llm_load_print_meta: n_rot = 64 llm_load_print_meta: n_embd_head_k = 64 llm_load_print_meta: n_embd_head_v = 64 llm_load_print_meta: n_gqa = 8 llm_load_print_meta: n_embd_k_gqa = 256 llm_load_print_meta: n_embd_v_gqa = 256 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: n_ff = 5632 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 2048 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: model type = 1B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 1.10 B llm_load_print_meta: model size = 606.53 MiB (4.63 BPW) llm_load_print_meta: general.name = TinyLlama llm_load_print_meta: BOS token = 1 '<s>' llm_load_print_meta: EOS token = 2 '</s>' llm_load_print_meta: UNK token = 0 '<unk>' llm_load_print_meta: PAD token = 2 '</s>' llm_load_print_meta: LF token = 13 '<0x0A>' llm_load_tensors: ggml ctx size = 0.08 MiB llm_load_tensors: system memory used = 606.60 MiB ....................................................................................... llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_new_context_with_model: KV self size = 44.00 MiB, K (f16): 22.00 MiB, V (f16): 22.00 MiB llama_build_graph: non-view tensors processed: 466/466 llama_new_context_with_model: compute buffer total size = 147.19 MiB [1705948752] warming up the model with an empty run [1705948752] Available slots: [1705948752] -> Slot 0 - max context: 2048 2024/01/22 19:39:12 dyn_ext_server.go:147: INFO Starting llama main loop [1705948752] llama server main loop starting [1705948752] all slots are idle and system prompt is empty, clear the KV cache [GIN] 2024/01/22 - 19:39:12 | 200 | 330.479596ms | 127.0.0.1 | POST "/api/chat ``` edit: I used the binary from release section: ``` $ ./ollama-linux-amd64 serve 2024/01/23 19:26:49 images.go:815: INFO total blobs: 9 2024/01/23 19:26:49 images.go:822: INFO total unused blobs removed: 0 2024/01/23 19:26:49 routes.go:943: INFO Listening on 127.0.0.1:11434 (version 0.1.21) 2024/01/23 19:26:49 payload_common.go:106: INFO Extracting dynamic libraries... 2024/01/23 19:26:51 payload_common.go:145: INFO Dynamic LLM libraries [cpu cuda_v11 rocm_v5 cpu_avx cpu_avx2 rocm_v6] 2024/01/23 19:26:51 gpu.go:91: INFO Detecting GPU type 2024/01/23 19:26:51 gpu.go:210: INFO Searching for GPU management library libnvidia-ml.so 2024/01/23 19:26:51 gpu.go:256: INFO Discovered GPU libraries: [] 2024/01/23 19:26:51 gpu.go:210: INFO Searching for GPU management library librocm_smi64.so 2024/01/23 19:26:51 gpu.go:256: INFO Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0] 2024/01/23 19:26:51 gpu.go:106: INFO Radeon GPU detected [GIN] 2024/01/23 - 19:26:51 | 200 | 42.4µs | 127.0.0.1 | HEAD "/" [GIN] 2024/01/23 - 19:26:51 | 200 | 635.748µs | 127.0.0.1 | POST "/api/show" [GIN] 2024/01/23 - 19:26:51 | 200 | 246.865µs | 127.0.0.1 | POST "/api/show" 2024/01/23 19:26:51 cpu_common.go:11: INFO CPU has AVX2 loading library /tmp/ollama800487147/rocm_v5/libext_server.so 2024/01/23 19:26:51 dyn_ext_server.go:90: INFO Loading Dynamic llm server: /tmp/ollama800487147/rocm_v5/libext_server.so 2024/01/23 19:26:51 dyn_ext_server.go:145: INFO Initializing llama server free(): invalid pointer Aborted (core dumped) ```
Author
Owner

@0xdeafbeef commented on GitHub (Jan 22, 2024):

exporting LD_LIBRARY_PATH=/usr/lib64/
will fix lib search

2024/01/22 20:26:39 gpu.go:210: INFO Searching for GPU management library librocm_smi64.so
2024/01/22 20:26:39 gpu.go:256: INFO Discovered GPU libraries: [/usr/lib64/librocm_smi64.so.5.0]
2024/01/22 20:26:39 gpu.go:106: INFO Radeon GPU detected	

append to /etc/systemd/system/ollama.service
Environment='LD_LIBRARY_PATH=/usr/lib64/'
into [Service] section.
Than

sudo systemctl daemon-reload && sudo systemctl restart ollama

UPD:
but it anyway doesn't use gpu :)

<!-- gh-comment-id:1904669630 --> @0xdeafbeef commented on GitHub (Jan 22, 2024): exporting `LD_LIBRARY_PATH=/usr/lib64/` will fix lib search ``` 2024/01/22 20:26:39 gpu.go:210: INFO Searching for GPU management library librocm_smi64.so 2024/01/22 20:26:39 gpu.go:256: INFO Discovered GPU libraries: [/usr/lib64/librocm_smi64.so.5.0] 2024/01/22 20:26:39 gpu.go:106: INFO Radeon GPU detected ``` append to `/etc/systemd/system/ollama.service` `Environment='LD_LIBRARY_PATH=/usr/lib64/'` into `[Service]` section. Than ```sh sudo systemctl daemon-reload && sudo systemctl restart ollama ``` UPD: but it anyway doesn't use gpu :)
Author
Owner

@mnn commented on GitHub (Jan 22, 2024):

I am getting not enough vram available, falling back to CPU only (in both - compiled main 6225fde046 and the binary from release section here). I tried with dolphin-mistral-q4_0 and tinyllama. I am quite sure 1B model should fit in VRAM of 7900XTX. Maybe I am doing something wrong, I tried a bunch of env. vars and I don't think they affect anything (it looks like it doesn't even try to use GPU; I remember with Stable Diffusion similar sounding wrong env. vars. lead to segfaults or pc freezes).

logs
❯ PATH=$PATH:/opt/rocm/bin HSA_OVERRIDE_GFX_VERSION=11.0.0 HCC_AMDGPU_TARGET=gfx1100 ./ollama-linux-amd64_0.1.21_pre serve
2024/01/22 20:58:53 images.go:810: INFO total blobs: 25
2024/01/22 20:58:53 images.go:817: INFO total unused blobs removed: 0
2024/01/22 20:58:53 routes.go:943: INFO Listening on 127.0.0.1:11434 (version 0.1.21)
2024/01/22 20:58:53 payload_common.go:106: INFO Extracting dynamic libraries...
2024/01/22 20:58:55 payload_common.go:145: INFO Dynamic LLM libraries [rocm_v6 cpu cuda_v11 rocm_v5 cpu_avx cpu_avx2]
2024/01/22 20:58:55 gpu.go:91: INFO Detecting GPU type
2024/01/22 20:58:55 gpu.go:210: INFO Searching for GPU management library libnvidia-ml.so
2024/01/22 20:58:55 gpu.go:256: INFO Discovered GPU libraries: [/opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so /usr/lib/libnvidia-ml.so.545.29.06 /usr/lib64/libnvidia-ml.so.545.29.06]

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
WARNING:

You should always run with libnvidia-ml.so that is installed with your
NVIDIA Display Driver. By default it's installed in /usr/lib and /usr/lib64.
libnvidia-ml.so in GDK package is a stub library that is attached only for
build purposes (e.g. machine that you build your application doesn't have
to have Display Driver installed).
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Linked to libnvidia-ml library at wrong path : /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so

2024/01/22 20:58:55 gpu.go:267: INFO Unable to load CUDA management library /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so: nvml vram init failure: 9
2024/01/22 20:58:55 gpu.go:267: INFO Unable to load CUDA management library /usr/lib/libnvidia-ml.so.545.29.06: nvml vram init failure: 9
2024/01/22 20:58:55 gpu.go:267: INFO Unable to load CUDA management library /usr/lib64/libnvidia-ml.so.545.29.06: nvml vram init failure: 9
2024/01/22 20:58:55 gpu.go:210: INFO Searching for GPU management library librocm_smi64.so
2024/01/22 20:58:55 gpu.go:256: INFO Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.1.0]
2024/01/22 20:58:55 gpu.go:106: INFO Radeon GPU detected
[GIN] 2024/01/22 - 20:59:03 | 200 | 33.434µs | 127.0.0.1 | HEAD "/"
[GIN] 2024/01/22 - 20:59:03 | 200 | 336.059µs | 127.0.0.1 | POST "/api/show"
2024/01/22 20:59:03 llm.go:110: INFO not enough vram available, falling back to CPU only
2024/01/22 20:59:03 cpu_common.go:11: INFO CPU has AVX2
loading library /tmp/ollama2431702200/cpu_avx2/libext_server.so
2024/01/22 20:59:03 dyn_ext_server.go:90: INFO Loading Dynamic llm server: /tmp/ollama2431702200/cpu_avx2/libext_server.so
2024/01/22 20:59:03 dyn_ext_server.go:139: INFO Initializing llama server
llama_model_loader: loaded meta data with 23 key-value pairs and 201 tensors from /home/xxx/.ollama/models/blobs/sha256:2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = TinyLlama
llama_model_loader: - kv 2: llama.context_length u32 = 2048
llama_model_loader: - kv 3: llama.embedding_length u32 = 2048
llama_model_loader: - kv 4: llama.block_count u32 = 22
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 5632
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 64
llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 4
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000
llama_model_loader: - kv 11: general.file_type u32 = 2
llama_model_loader: - kv 12: tokenizer.ggml.model str = llama
llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["", "", "", "<0x00>", "<...
llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 16: tokenizer.ggml.merges arr[str,61249] = ["▁ t", "e r", "i n", "▁ a", "e n...
llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 2
llama_model_loader: - kv 21: tokenizer.chat_template str = {% for message in messages %}\n{% if m...
llama_model_loader: - kv 22: general.quantization_version u32 = 2
llama_model_loader: - type f32: 45 tensors
llama_model_loader: - type q4_0: 155 tensors
llama_model_loader: - type q6_K: 1 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 2048
llm_load_print_meta: n_embd = 2048
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 4
llm_load_print_meta: n_layer = 22
llm_load_print_meta: n_rot = 64
llm_load_print_meta: n_embd_head_k = 64
llm_load_print_meta: n_embd_head_v = 64
llm_load_print_meta: n_gqa = 8
llm_load_print_meta: n_embd_k_gqa = 256
llm_load_print_meta: n_embd_v_gqa = 256
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 5632
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 2048
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: model type = 1B
llm_load_print_meta: model ftype = Q4_0
llm_load_print_meta: model params = 1.10 B
llm_load_print_meta: model size = 606.53 MiB (4.63 BPW)
llm_load_print_meta: general.name = TinyLlama
llm_load_print_meta: BOS token = 1 ''
llm_load_print_meta: EOS token = 2 '
'
llm_load_print_meta: UNK token = 0 ''
llm_load_print_meta: PAD token = 2 ''
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.08 MiB
llm_load_tensors: system memory used = 606.60 MiB
.......................................................................................
llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: KV self size = 44.00 MiB, K (f16): 22.00 MiB, V (f16): 22.00 MiB
llama_build_graph: non-view tensors processed: 466/466
llama_new_context_with_model: compute buffer total size = 147.19 MiB
2024/01/22 20:59:03 dyn_ext_server.go:148: INFO Starting llama main loop
2024/01/22 20:59:03 dyn_ext_server.go:162: INFO loaded 0 images
[GIN] 2024/01/22 - 20:59:08 | 200 | 5.837962258s | 127.0.0.1 | POST "/api/generate"

<!-- gh-comment-id:1904734922 --> @mnn commented on GitHub (Jan 22, 2024): I am getting `not enough vram available, falling back to CPU only` (in both - compiled main 6225fde046f86ef767b9878f7c3bd1e58628710c and the binary from release section here). I tried with dolphin-mistral-q4_0 and tinyllama. I am quite sure 1B model should fit in VRAM of 7900XTX. Maybe I am doing something wrong, I tried a bunch of env. vars and I don't think they affect anything (it looks like it doesn't even try to use GPU; I remember with Stable Diffusion similar sounding wrong env. vars. lead to segfaults or pc freezes). <details><summary>logs</summary> <pre> ❯ PATH=$PATH:/opt/rocm/bin HSA_OVERRIDE_GFX_VERSION=11.0.0 HCC_AMDGPU_TARGET=gfx1100 ./ollama-linux-amd64_0.1.21_pre serve 2024/01/22 20:58:53 images.go:810: INFO total blobs: 25 2024/01/22 20:58:53 images.go:817: INFO total unused blobs removed: 0 2024/01/22 20:58:53 routes.go:943: INFO Listening on 127.0.0.1:11434 (version 0.1.21) 2024/01/22 20:58:53 payload_common.go:106: INFO Extracting dynamic libraries... 2024/01/22 20:58:55 payload_common.go:145: INFO Dynamic LLM libraries [rocm_v6 cpu cuda_v11 rocm_v5 cpu_avx cpu_avx2] 2024/01/22 20:58:55 gpu.go:91: INFO Detecting GPU type 2024/01/22 20:58:55 gpu.go:210: INFO Searching for GPU management library libnvidia-ml.so 2024/01/22 20:58:55 gpu.go:256: INFO Discovered GPU libraries: [/opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so /usr/lib/libnvidia-ml.so.545.29.06 /usr/lib64/libnvidia-ml.so.545.29.06] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! WARNING: You should always run with libnvidia-ml.so that is installed with your NVIDIA Display Driver. By default it's installed in /usr/lib and /usr/lib64. libnvidia-ml.so in GDK package is a stub library that is attached only for build purposes (e.g. machine that you build your application doesn't have to have Display Driver installed). !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Linked to libnvidia-ml library at wrong path : /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so 2024/01/22 20:58:55 gpu.go:267: INFO Unable to load CUDA management library /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so: nvml vram init failure: 9 2024/01/22 20:58:55 gpu.go:267: INFO Unable to load CUDA management library /usr/lib/libnvidia-ml.so.545.29.06: nvml vram init failure: 9 2024/01/22 20:58:55 gpu.go:267: INFO Unable to load CUDA management library /usr/lib64/libnvidia-ml.so.545.29.06: nvml vram init failure: 9 2024/01/22 20:58:55 gpu.go:210: INFO Searching for GPU management library librocm_smi64.so 2024/01/22 20:58:55 gpu.go:256: INFO Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.1.0] 2024/01/22 20:58:55 gpu.go:106: INFO Radeon GPU detected [GIN] 2024/01/22 - 20:59:03 | 200 | 33.434µs | 127.0.0.1 | HEAD "/" [GIN] 2024/01/22 - 20:59:03 | 200 | 336.059µs | 127.0.0.1 | POST "/api/show" 2024/01/22 20:59:03 llm.go:110: INFO not enough vram available, falling back to CPU only 2024/01/22 20:59:03 cpu_common.go:11: INFO CPU has AVX2 loading library /tmp/ollama2431702200/cpu_avx2/libext_server.so 2024/01/22 20:59:03 dyn_ext_server.go:90: INFO Loading Dynamic llm server: /tmp/ollama2431702200/cpu_avx2/libext_server.so 2024/01/22 20:59:03 dyn_ext_server.go:139: INFO Initializing llama server llama_model_loader: loaded meta data with 23 key-value pairs and 201 tensors from /home/xxx/.ollama/models/blobs/sha256:2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = TinyLlama llama_model_loader: - kv 2: llama.context_length u32 = 2048 llama_model_loader: - kv 3: llama.embedding_length u32 = 2048 llama_model_loader: - kv 4: llama.block_count u32 = 22 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 5632 llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 64 llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 4 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000 llama_model_loader: - kv 11: general.file_type u32 = 2 llama_model_loader: - kv 12: tokenizer.ggml.model str = llama llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<... llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... llama_model_loader: - kv 16: tokenizer.ggml.merges arr[str,61249] = ["▁ t", "e r", "i n", "▁ a", "e n... llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 0 llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 2 llama_model_loader: - kv 21: tokenizer.chat_template str = {% for message in messages %}\n{% if m... llama_model_loader: - kv 22: general.quantization_version u32 = 2 llama_model_loader: - type f32: 45 tensors llama_model_loader: - type q4_0: 155 tensors llama_model_loader: - type q6_K: 1 tensors llm_load_vocab: special tokens definition check successful ( 259/32000 ). llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train = 2048 llm_load_print_meta: n_embd = 2048 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 4 llm_load_print_meta: n_layer = 22 llm_load_print_meta: n_rot = 64 llm_load_print_meta: n_embd_head_k = 64 llm_load_print_meta: n_embd_head_v = 64 llm_load_print_meta: n_gqa = 8 llm_load_print_meta: n_embd_k_gqa = 256 llm_load_print_meta: n_embd_v_gqa = 256 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: n_ff = 5632 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 2048 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: model type = 1B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 1.10 B llm_load_print_meta: model size = 606.53 MiB (4.63 BPW) llm_load_print_meta: general.name = TinyLlama llm_load_print_meta: BOS token = 1 '<s>' llm_load_print_meta: EOS token = 2 '</s>' llm_load_print_meta: UNK token = 0 '<unk>' llm_load_print_meta: PAD token = 2 '</s>' llm_load_print_meta: LF token = 13 '<0x0A>' llm_load_tensors: ggml ctx size = 0.08 MiB llm_load_tensors: system memory used = 606.60 MiB ....................................................................................... llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_new_context_with_model: KV self size = 44.00 MiB, K (f16): 22.00 MiB, V (f16): 22.00 MiB llama_build_graph: non-view tensors processed: 466/466 llama_new_context_with_model: compute buffer total size = 147.19 MiB 2024/01/22 20:59:03 dyn_ext_server.go:148: INFO Starting llama main loop 2024/01/22 20:59:03 dyn_ext_server.go:162: INFO loaded 0 images [GIN] 2024/01/22 - 20:59:08 | 200 | 5.837962258s | 127.0.0.1 | POST "/api/generate" </pre> </details>
Author
Owner

@adham-omran commented on GitHub (Jan 23, 2024):

The pre-release for 0.1.21 is up now, and we've made various improvements to support ROCm cards, covering both v5 and v6 of the ROCm libraries. You'll have to install ROCm, but then the Ollama binary should work.

Please let us know if you run into any problems.

Thank you for your work and efforts

I'm running into issue on Arch Linux

Issues

When I run HSA_OVERRIDE_GFX_VERSION=11.0.0 HCC_AMDGPU_TARGET=gfx1032 ./ollama-linux-amd64 serve It returns

Log
2024/01/23 08:09:57 images.go:815: INFO total blobs: 22
2024/01/23 08:09:57 images.go:822: INFO total unused blobs removed: 0
2024/01/23 08:09:57 routes.go:943: INFO Listening on 127.0.0.1:11434 (version 0.1.21)
2024/01/23 08:09:57 payload_common.go:106: INFO Extracting dynamic libraries...
2024/01/23 08:09:58 payload_common.go:145: INFO Dynamic LLM libraries [rocm_v6 cpu cpu_avx cuda_v11 rocm_v5 cpu_avx2]
2024/01/23 08:09:58 gpu.go:91: INFO Detecting GPU type
2024/01/23 08:09:58 gpu.go:210: INFO Searching for GPU management library libnvidia-ml.so
2024/01/23 08:09:58 gpu.go:256: INFO Discovered GPU libraries: []
2024/01/23 08:09:58 gpu.go:210: INFO Searching for GPU management library librocm_smi64.so
2024/01/23 08:09:58 gpu.go:256: INFO Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0]
2024/01/23 08:09:58 gpu.go:106: INFO Radeon GPU detected

However when I run ./ollama-linux-amd64 run llama2 I get

Log
[GIN] 2024/01/23 - 08:10:33 | 200 |      29.889µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/01/23 - 08:10:33 | 200 |     348.726µs |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/01/23 - 08:10:33 | 200 |     154.188µs |       127.0.0.1 | POST     "/api/show"
2024/01/23 08:10:33 cpu_common.go:11: INFO CPU has AVX2
loading library /tmp/ollama3456034654/rocm_v5/libext_server.so
2024/01/23 08:10:33 dyn_ext_server.go:90: INFO Loading Dynamic llm server: /tmp/ollama3456034654/rocm_v5/libext_server.so
2024/01/23 08:10:33 dyn_ext_server.go:145: INFO Initializing llama server
free(): invalid pointer
fish: Job 1, 'HSA_OVERRIDE_GFX_VERSION=11.0.0…' terminated by signal SIGABRT (Abort)

With a bash shell

Bash Shell Log
[GIN] 2024/01/23 - 08:11:22 | 200 |       23.13µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/01/23 - 08:11:22 | 200 |     298.266µs |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/01/23 - 08:11:22 | 200 |     202.127µs |       127.0.0.1 | POST     "/api/show"
2024/01/23 08:11:22 cpu_common.go:11: INFO CPU has AVX2
loading library /tmp/ollama2056054560/rocm_v5/libext_server.so
2024/01/23 08:11:22 dyn_ext_server.go:90: INFO Loading Dynamic llm server: /tmp/ollama2056054560/rocm_v5/libext_server.so
2024/01/23 08:11:22 dyn_ext_server.go:145: INFO Initializing llama server
free(): invalid pointer
Aborted (core dumped)

System Details

Part Name
GPU XFX Speedstar 6600
CPU Ryzen 7950X
RAM G.Skill Trident Z5 Neo RGB 64 GB (2 x 32 GB) DDR5-6000 CL30 Memory
/opt/rocm/bin/rocm-smi


========================= ROCm System Management Interface =========================
=================================== Concise Info ===================================
GPU[1]          : get_power_avg, Not supported on the given system
Exception caught: map::at
ERROR: GPU[1]   : sclk clock is unsupported
====================================================================================
GPU[1]          : get_power_cap, Not supported on the given system
GPU  Temp (DieEdge)  AvgPwr  SCLK    MCLK     Fan  Perf  PwrCap       VRAM%  GPU%  
0    57.0c           19.0W   800Mhz  875Mhz   0%   auto  100.0W         1%   5%    
1    43.0c           N/A     None    2400Mhz  0%   auto  Unsupported   85%   44%   
====================================================================================
=============================== End of ROCm SMI Log ================================

pacman -Q | grep 'hip\|rocm'
hip-runtime-amd 5.7.1-1
hipblas 5.7.1-1
hipcub 5.7.1-1
hipfft 5.7.1-1
hipsolver 5.7.1-1
hipsparse 5.7.1-1
miopen-hip 5.7.1-1
rocm-clang-ocl 5.7.1-1
rocm-cmake 5.7.1-1
rocm-core 5.7.1-1
rocm-device-libs 5.7.1-1
rocm-hip-libraries 5.7.1-2
rocm-hip-runtime 5.7.1-2
rocm-hip-sdk 5.7.1-2
rocm-language-runtime 5.7.1-2
rocm-llvm 5.7.1-1
rocm-opencl-runtime 5.7.1-1
rocm-opencl-sdk 5.7.1-2
rocm-smi-lib 5.7.1-1
rocminfo 5.7.1-1
<!-- gh-comment-id:1905314503 --> @adham-omran commented on GitHub (Jan 23, 2024): > The pre-release for [0.1.21](https://github.com/jmorganca/ollama/releases/tag/v0.1.21) is up now, and we've made various improvements to support ROCm cards, covering both v5 and v6 of the ROCm libraries. You'll have to [install ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html#rocm-install-quick), but then the Ollama binary should work. > > Please let us know if you run into any problems. Thank you for your work and efforts I'm running into issue on Arch Linux - I've installed ROCm with https://github.com/rocm-arch/rocm-arch # Issues When I run `HSA_OVERRIDE_GFX_VERSION=11.0.0 HCC_AMDGPU_TARGET=gfx1032 ./ollama-linux-amd64 serve` It returns <details> <summary>Log</summary> ``` 2024/01/23 08:09:57 images.go:815: INFO total blobs: 22 2024/01/23 08:09:57 images.go:822: INFO total unused blobs removed: 0 2024/01/23 08:09:57 routes.go:943: INFO Listening on 127.0.0.1:11434 (version 0.1.21) 2024/01/23 08:09:57 payload_common.go:106: INFO Extracting dynamic libraries... 2024/01/23 08:09:58 payload_common.go:145: INFO Dynamic LLM libraries [rocm_v6 cpu cpu_avx cuda_v11 rocm_v5 cpu_avx2] 2024/01/23 08:09:58 gpu.go:91: INFO Detecting GPU type 2024/01/23 08:09:58 gpu.go:210: INFO Searching for GPU management library libnvidia-ml.so 2024/01/23 08:09:58 gpu.go:256: INFO Discovered GPU libraries: [] 2024/01/23 08:09:58 gpu.go:210: INFO Searching for GPU management library librocm_smi64.so 2024/01/23 08:09:58 gpu.go:256: INFO Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0] 2024/01/23 08:09:58 gpu.go:106: INFO Radeon GPU detected ``` </details> However when I run `./ollama-linux-amd64 run llama2` I get <details> <summary>Log</summary> ``` [GIN] 2024/01/23 - 08:10:33 | 200 | 29.889µs | 127.0.0.1 | HEAD "/" [GIN] 2024/01/23 - 08:10:33 | 200 | 348.726µs | 127.0.0.1 | POST "/api/show" [GIN] 2024/01/23 - 08:10:33 | 200 | 154.188µs | 127.0.0.1 | POST "/api/show" 2024/01/23 08:10:33 cpu_common.go:11: INFO CPU has AVX2 loading library /tmp/ollama3456034654/rocm_v5/libext_server.so 2024/01/23 08:10:33 dyn_ext_server.go:90: INFO Loading Dynamic llm server: /tmp/ollama3456034654/rocm_v5/libext_server.so 2024/01/23 08:10:33 dyn_ext_server.go:145: INFO Initializing llama server free(): invalid pointer fish: Job 1, 'HSA_OVERRIDE_GFX_VERSION=11.0.0…' terminated by signal SIGABRT (Abort) ``` </details> With a bash shell <details> <summary>Bash Shell Log</summary> ``` [GIN] 2024/01/23 - 08:11:22 | 200 | 23.13µs | 127.0.0.1 | HEAD "/" [GIN] 2024/01/23 - 08:11:22 | 200 | 298.266µs | 127.0.0.1 | POST "/api/show" [GIN] 2024/01/23 - 08:11:22 | 200 | 202.127µs | 127.0.0.1 | POST "/api/show" 2024/01/23 08:11:22 cpu_common.go:11: INFO CPU has AVX2 loading library /tmp/ollama2056054560/rocm_v5/libext_server.so 2024/01/23 08:11:22 dyn_ext_server.go:90: INFO Loading Dynamic llm server: /tmp/ollama2056054560/rocm_v5/libext_server.so 2024/01/23 08:11:22 dyn_ext_server.go:145: INFO Initializing llama server free(): invalid pointer Aborted (core dumped) ``` </details> # System Details |Part | Name | |-------|---------| | GPU | XFX Speedstar 6600 | | CPU | Ryzen 7950X | | RAM | G.Skill Trident Z5 Neo RGB 64 GB (2 x 32 GB) DDR5-6000 CL30 Memory | <details> <summary><code>/opt/rocm/bin/rocm-smi</code></summary> ``` ========================= ROCm System Management Interface ========================= =================================== Concise Info =================================== GPU[1] : get_power_avg, Not supported on the given system Exception caught: map::at ERROR: GPU[1] : sclk clock is unsupported ==================================================================================== GPU[1] : get_power_cap, Not supported on the given system GPU Temp (DieEdge) AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 0 57.0c 19.0W 800Mhz 875Mhz 0% auto 100.0W 1% 5% 1 43.0c N/A None 2400Mhz 0% auto Unsupported 85% 44% ==================================================================================== =============================== End of ROCm SMI Log ================================ ``` </details> <details> <summary><code>pacman -Q | grep 'hip\|rocm'</code></summary> ``` hip-runtime-amd 5.7.1-1 hipblas 5.7.1-1 hipcub 5.7.1-1 hipfft 5.7.1-1 hipsolver 5.7.1-1 hipsparse 5.7.1-1 miopen-hip 5.7.1-1 rocm-clang-ocl 5.7.1-1 rocm-cmake 5.7.1-1 rocm-core 5.7.1-1 rocm-device-libs 5.7.1-1 rocm-hip-libraries 5.7.1-2 rocm-hip-runtime 5.7.1-2 rocm-hip-sdk 5.7.1-2 rocm-language-runtime 5.7.1-2 rocm-llvm 5.7.1-1 rocm-opencl-runtime 5.7.1-1 rocm-opencl-sdk 5.7.1-2 rocm-smi-lib 5.7.1-1 rocminfo 5.7.1-1 ``` </details>
Author
Owner

@hiepxanh commented on GitHub (Jan 23, 2024):

It only support rx 6800 and 7600 as ROCm home page mention, try to using llamafile can help you because I also have rx 6600

<!-- gh-comment-id:1905438291 --> @hiepxanh commented on GitHub (Jan 23, 2024): It only support rx 6800 and 7600 as ROCm home page mention, try to using llamafile can help you because I also have rx 6600
Author
Owner

@dhiltgen commented on GitHub (Jan 23, 2024):

It looks like multiple people are hitting this free(): invalid pointer problem - I've opened up a new ticket to track the resolution of that #2165

<!-- gh-comment-id:1907096266 --> @dhiltgen commented on GitHub (Jan 23, 2024): It looks like multiple people are hitting this `free(): invalid pointer` problem - I've opened up a new ticket to track the resolution of that #2165
Author
Owner

@dhiltgen commented on GitHub (Jan 23, 2024):

@adham-omran can you clarify the scenario when you override with gfx1032? Does the server "work" and are you able to run models on the GPU?

<!-- gh-comment-id:1907103659 --> @dhiltgen commented on GitHub (Jan 23, 2024): @adham-omran can you clarify the scenario when you override with `gfx1032`? Does the server "work" and are you able to run models on the GPU?
Author
Owner

@dhiltgen commented on GitHub (Jan 24, 2024):

@mnn keep an eye on PR #2162 which may fix or at the very least help us troubleshoot why VRAM discovery isn't working on your system

<!-- gh-comment-id:1907115592 --> @dhiltgen commented on GitHub (Jan 24, 2024): @mnn keep an eye on PR #2162 which may fix or at the very least help us troubleshoot why VRAM discovery isn't working on your system
Author
Owner

@dhiltgen commented on GitHub (Jan 24, 2024):

@0xdeafbeef could you share more of the server log showing startup and loading the llm library so we can see why it's not working correctly on your ROCm setup?

<!-- gh-comment-id:1907118005 --> @dhiltgen commented on GitHub (Jan 24, 2024): @0xdeafbeef could you share more of the server log showing startup and loading the llm library so we can see why it's not working correctly on your ROCm setup?
Author
Owner

@mnn commented on GitHub (Jan 24, 2024):

@dhiltgen I have tried the new logging (used commit f63dc2db5c), but I think it crashed on NVidia-related code before it got to AMD (just my guess, I know close to nothing about Go or GPU stuff). It behaves same without env. vars.

❯ OLLAMA_DEBUG=1 PATH=$PATH:/opt/rocm/bin HSA_OVERRIDE_GFX_VERSION=11.0.0 HCC_AMDGPU_TARGET=gfx1100 ./ollama serve
time=2024-01-24T08:16:50.396+01:00 level=DEBUG source=/mnt/dev/ai/ollama/server/routes.go:919 msg="Debug logging enabled"
time=2024-01-24T08:16:50.396+01:00 level=INFO source=/mnt/dev/ai/ollama/server/images.go:815 msg="total blobs: 25"
time=2024-01-24T08:16:50.397+01:00 level=INFO source=/mnt/dev/ai/ollama/server/images.go:822 msg="total unused blobs removed: 0"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.

  • using env: export GIN_MODE=release
  • using code: gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST /api/pull --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers)
[GIN-debug] POST /api/generate --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers)
[GIN-debug] POST /api/chat --> github.com/jmorganca/ollama/server.ChatHandler (5 handlers)
[GIN-debug] POST /api/embeddings --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers)
[GIN-debug] POST /api/create --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers)
[GIN-debug] POST /api/push --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers)
[GIN-debug] POST /api/copy --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers)
[GIN-debug] DELETE /api/delete --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers)
[GIN-debug] POST /api/show --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers)
[GIN-debug] POST /api/blobs/:digest --> github.com/jmorganca/ollama/server.CreateBlobHandler (5 handlers)
[GIN-debug] HEAD /api/blobs/:digest --> github.com/jmorganca/ollama/server.HeadBlobHandler (5 handlers)
[GIN-debug] GET / --> github.com/jmorganca/ollama/server.(Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] GET /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
[GIN-debug] GET /api/version --> github.com/jmorganca/ollama/server.(Server).GenerateRoutes.func3 (5 handlers)
[GIN-debug] HEAD / --> github.com/jmorganca/ollama/server.(Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
[GIN-debug] HEAD /api/version --> github.com/jmorganca/ollama/server.(Server).GenerateRoutes.func3 (5 handlers)
time=2024-01-24T08:16:50.397+01:00 level=INFO source=/mnt/dev/ai/ollama/server/routes.go:943 msg="Listening on 127.0.0.1:11434 (version 0.0.0)"
time=2024-01-24T08:16:50.397+01:00 level=INFO source=/mnt/dev/ai/ollama/llm/payload_common.go:106 msg="Extracting dynamic libraries..."
time=2024-01-24T08:16:53.664+01:00 level=INFO source=/mnt/dev/ai/ollama/llm/payload_common.go:145 msg="Dynamic LLM libraries [cpu_avx2 rocm_v1 cpu cpu_avx cuda_v12]"
time=2024-01-24T08:16:53.664+01:00 level=DEBUG source=/mnt/dev/ai/ollama/llm/payload_common.go:146 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-01-24T08:16:53.664+01:00 level=INFO source=/mnt/dev/ai/ollama/gpu/gpu.go:91 msg="Detecting GPU type"
time=2024-01-24T08:16:53.664+01:00 level=INFO source=/mnt/dev/ai/ollama/gpu/gpu.go:210 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-01-24T08:16:53.665+01:00 level=DEBUG source=/mnt/dev/ai/ollama/gpu/gpu.go:228 msg="gpu management search paths: [/usr/local/cuda/lib64/libnvidia-ml.so
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so
/usr/lib/x86_64-linux-gnu/libnvidia-ml.so
/usr/lib/wsl/lib/libnvidia-ml.so
/usr/lib/wsl/drivers//libnvidia-ml.so /opt/cuda/lib64/libnvidia-ml.so* /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so* /usr/lib*/libnvidia-ml.so* /usr/local/lib*/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/libnvidia-ml.so* /mnt/dev/ai/ollama/libnvidia-ml.so*]"
time=2024-01-24T08:16:53.673+01:00 level=INFO source=/mnt/dev/ai/ollama/gpu/gpu.go:256 msg="Discovered GPU libraries: [/opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so /usr/lib/libnvidia-ml.so.545.29.06 /usr/lib64/libnvidia-ml.so.545.29.06]"

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
WARNING:

You should always run with libnvidia-ml.so that is installed with your
NVIDIA Display Driver. By default it's installed in /usr/lib and /usr/lib64.
libnvidia-ml.so in GDK package is a stub library that is attached only for
build purposes (e.g. machine that you build your application doesn't have
to have Display Driver installed).
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Linked to libnvidia-ml library at wrong path : /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so

SIGSEGV: segmentation violation
PC=0x7fd014406300 m=14 sigcode=1
signal arrived during cgo execution

goroutine 1 [syscall]:
runtime.cgocall(0x9b4490, 0xc0004598a8)
/usr/lib/go/src/runtime/cgocall.go:157 +0x4b fp=0xc000459880 sp=0xc000459848 pc=0x409b0b
github.com/jmorganca/ollama/gpu._Cfunc_cuda_init(0x7fcfd8000cb0, 0xc00003c300)
_cgo_gotypes.go:254 +0x3f fp=0xc0004598a8 sp=0xc000459880 pc=0x7b945f
github.com/jmorganca/ollama/gpu.LoadCUDAMgmt.func2(0xc00003a0d0?, 0x38?)
/mnt/dev/ai/ollama/gpu/gpu.go:266 +0x4a fp=0xc0004598e8 sp=0xc0004598a8 pc=0x7bb22a
github.com/jmorganca/ollama/gpu.LoadCUDAMgmt({0xc00052c000, 0x3, 0xc0000f2420?})
/mnt/dev/ai/ollama/gpu/gpu.go:266 +0x1b8 fp=0xc000459988 sp=0xc0004598e8 pc=0x7bb0f8
github.com/jmorganca/ollama/gpu.initGPUHandles()
/mnt/dev/ai/ollama/gpu/gpu.go:94 +0xd1 fp=0xc0004599f0 sp=0xc000459988 pc=0x7b98b1
github.com/jmorganca/ollama/gpu.GetGPUInfo()
/mnt/dev/ai/ollama/gpu/gpu.go:119 +0xb5 fp=0xc000459b00 sp=0xc0004599f0 pc=0x7b9a75
github.com/jmorganca/ollama/gpu.CheckVRAM()
/mnt/dev/ai/ollama/gpu/gpu.go:192 +0x1f fp=0xc000459ba8 sp=0xc000459b00 pc=0x7ba75f
github.com/jmorganca/ollama/server.Serve({0x19e64470, 0xc000024040})
/mnt/dev/ai/ollama/server/routes.go:965 +0x45f fp=0xc000459c98 sp=0xc000459ba8 pc=0x999b3f
github.com/jmorganca/ollama/cmd.RunServer(0xc000438300?, {0x1a2a87a0?, 0x4?, 0xaccea1?})
/mnt/dev/ai/ollama/cmd/cmd.go:690 +0x199 fp=0xc000459d30 sp=0xc000459c98 pc=0x9abf59
github.com/spf13/cobra.(*Command).execute(0xc0003f1500, {0x1a2a87a0, 0x0, 0x0})
/home/xxx/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0x87c fp=0xc000459e68 sp=0xc000459d30 pc=0x763c9c
github.com/spf13/cobra.(*Command).ExecuteC(0xc0003f0900)
/home/xxx/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000459f20 sp=0xc000459e68 pc=0x7644c5
github.com/spf13/cobra.(*Command).Execute(...)
/home/xxx/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
/home/xxx/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
/mnt/dev/ai/ollama/main.go:11 +0x4d fp=0xc000459f40 sp=0xc000459f20 pc=0x9b2fcd
runtime.main()
/usr/lib/go/src/runtime/proc.go:267 +0x2bb fp=0xc000459fe0 sp=0xc000459f40 pc=0x43e1bb
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000459fe8 sp=0xc000459fe0 pc=0x46e081

goroutine 2 [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00006cfa8 sp=0xc00006cf88 pc=0x43e60e
runtime.goparkunlock(...)
/usr/lib/go/src/runtime/proc.go:404
runtime.forcegchelper()
/usr/lib/go/src/runtime/proc.go:322 +0xb3 fp=0xc00006cfe0 sp=0xc00006cfa8 pc=0x43e493
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006cfe8 sp=0xc00006cfe0 pc=0x46e081
created by runtime.init.6 in goroutine 1
/usr/lib/go/src/runtime/proc.go:310 +0x1a

goroutine 18 [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000068778 sp=0xc000068758 pc=0x43e60e
runtime.goparkunlock(...)
/usr/lib/go/src/runtime/proc.go:404
runtime.bgsweep(0x0?)
/usr/lib/go/src/runtime/mgcsweep.go:321 +0xdf fp=0xc0000687c8 sp=0xc000068778 pc=0x42a57f
runtime.gcenable.func1()
/usr/lib/go/src/runtime/mgc.go:200 +0x25 fp=0xc0000687e0 sp=0xc0000687c8 pc=0x41f6c5
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000687e8 sp=0xc0000687e0 pc=0x46e081
created by runtime.gcenable in goroutine 1
/usr/lib/go/src/runtime/mgc.go:200 +0x66

goroutine 19 [GC scavenge wait]:
runtime.gopark(0x8b2ee4?, 0x7ed898?, 0x0?, 0x0?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000068f70 sp=0xc000068f50 pc=0x43e60e
runtime.goparkunlock(...)
/usr/lib/go/src/runtime/proc.go:404
runtime.(*scavengerState).park(0x1a278b20)
/usr/lib/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc000068fa0 sp=0xc000068f70 pc=0x427de9
runtime.bgscavenge(0x0?)
/usr/lib/go/src/runtime/mgcscavenge.go:658 +0x59 fp=0xc000068fc8 sp=0xc000068fa0 pc=0x428399
runtime.gcenable.func2()
/usr/lib/go/src/runtime/mgc.go:201 +0x25 fp=0xc000068fe0 sp=0xc000068fc8 pc=0x41f665
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000068fe8 sp=0xc000068fe0 pc=0x46e081
created by runtime.gcenable in goroutine 1
/usr/lib/go/src/runtime/mgc.go:201 +0xa5

goroutine 20 [finalizer wait]:
runtime.gopark(0x198?, 0xac5e60?, 0x1?, 0xf7?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00006c620 sp=0xc00006c600 pc=0x43e60e
runtime.runfinq()
/usr/lib/go/src/runtime/mfinal.go:193 +0x107 fp=0xc00006c7e0 sp=0xc00006c620 pc=0x41e6e7
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006c7e8 sp=0xc00006c7e0 pc=0x46e081
created by runtime.createfing in goroutine 1
/usr/lib/go/src/runtime/mfinal.go:163 +0x3d

goroutine 21 [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000069750 sp=0xc000069730 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc0000697e0 sp=0xc000069750 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000697e8 sp=0xc0000697e0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c

goroutine 22 [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000069f50 sp=0xc000069f30 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc000069fe0 sp=0xc000069f50 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000069fe8 sp=0xc000069fe0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c

goroutine 3 [GC worker (idle)]:
runtime.gopark(0x59ab3bdc52a?, 0x1?, 0xa0?, 0x31?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00006d750 sp=0xc00006d730 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc00006d7e0 sp=0xc00006d750 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006d7e8 sp=0xc00006d7e0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c

goroutine 34 [GC worker (idle)]:
runtime.gopark(0x59ab3bf2720?, 0x3?, 0xc6?, 0x18?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000488750 sp=0xc000488730 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc0004887e0 sp=0xc000488750 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004887e8 sp=0xc0004887e0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c

goroutine 35 [GC worker (idle)]:
runtime.gopark(0x59ab3bdc399?, 0x1?, 0x5b?, 0x9c?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000488f50 sp=0xc000488f30 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc000488fe0 sp=0xc000488f50 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000488fe8 sp=0xc000488fe0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c

goroutine 4 [GC worker (idle)]:
runtime.gopark(0x59ab3bdc584?, 0x1?, 0xae?, 0x65?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00006df50 sp=0xc00006df30 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc00006dfe0 sp=0xc00006df50 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006dfe8 sp=0xc00006dfe0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c

goroutine 36 [GC worker (idle)]:
runtime.gopark(0x59ab3bf2e04?, 0x3?, 0x3a?, 0x23?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000489750 sp=0xc000489730 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc0004897e0 sp=0xc000489750 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004897e8 sp=0xc0004897e0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c

goroutine 37 [GC worker (idle)]:
runtime.gopark(0x59ab3bdc52a?, 0x1?, 0xba?, 0x4d?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000489f50 sp=0xc000489f30 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc000489fe0 sp=0xc000489f50 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000489fe8 sp=0xc000489fe0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c

goroutine 23 [GC worker (idle)]:
runtime.gopark(0x59ab3bdd079?, 0x3?, 0xbc?, 0x18?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00006a750 sp=0xc00006a730 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc00006a7e0 sp=0xc00006a750 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006a7e8 sp=0xc00006a7e0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c

goroutine 38 [GC worker (idle)]:
runtime.gopark(0x59ab3bf29ab?, 0x1?, 0x8c?, 0x6d?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00048a750 sp=0xc00048a730 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc00048a7e0 sp=0xc00048a750 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00048a7e8 sp=0xc00048a7e0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c

goroutine 5 [GC worker (idle)]:
runtime.gopark(0x59ab008fefa?, 0x1?, 0x58?, 0x52?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00006e750 sp=0xc00006e730 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc00006e7e0 sp=0xc00006e750 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006e7e8 sp=0xc00006e7e0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c

goroutine 6 [GC worker (idle)]:
runtime.gopark(0x59ab3bdbc66?, 0x1?, 0x3a?, 0x21?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00006ef50 sp=0xc00006ef30 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc00006efe0 sp=0xc00006ef50 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006efe8 sp=0xc00006efe0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c

goroutine 7 [GC worker (idle)]:
runtime.gopark(0x59ab3bf269e?, 0x1?, 0xf5?, 0x46?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00006f750 sp=0xc00006f730 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc00006f7e0 sp=0xc00006f750 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006f7e8 sp=0xc00006f7e0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c

goroutine 8 [GC worker (idle)]:
runtime.gopark(0x59ab3bf273e?, 0x3?, 0xc1?, 0xa1?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00006ff50 sp=0xc00006ff30 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc00006ffe0 sp=0xc00006ff50 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006ffe8 sp=0xc00006ffe0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c

goroutine 9 [GC worker (idle)]:
runtime.gopark(0x1a2aa4e0?, 0x1?, 0xba?, 0x61?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000484750 sp=0xc000484730 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc0004847e0 sp=0xc000484750 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004847e8 sp=0xc0004847e0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c

goroutine 10 [GC worker (idle)]:
runtime.gopark(0x59ab3bf30d5?, 0x3?, 0xa5?, 0x32?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000484f50 sp=0xc000484f30 pc=0x43e60e
runtime.gcBgMarkWorker()
/usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc000484fe0 sp=0xc000484f50 pc=0x421245
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000484fe8 sp=0xc000484fe0 pc=0x46e081
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/lib/go/src/runtime/mgc.go:1219 +0x1c

goroutine 39 [select, locked to thread]:
runtime.gopark(0xc00048b7a8?, 0x2?, 0xa9?, 0xe8?, 0xc00048b7a4?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00048b638 sp=0xc00048b618 pc=0x43e60e
runtime.selectgo(0xc00048b7a8, 0xc00048b7a0, 0x0?, 0x0, 0x0?, 0x1)
/usr/lib/go/src/runtime/select.go:327 +0x725 fp=0xc00048b758 sp=0xc00048b638 pc=0x44e165
runtime.ensureSigM.func1()
/usr/lib/go/src/runtime/signal_unix.go:1014 +0x19f fp=0xc00048b7e0 sp=0xc00048b758 pc=0x46519f
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00048b7e8 sp=0xc00048b7e0 pc=0x46e081
created by runtime.ensureSigM in goroutine 1
/usr/lib/go/src/runtime/signal_unix.go:997 +0xc8

goroutine 24 [syscall]:
runtime.notetsleepg(0x0?, 0x0?)
/usr/lib/go/src/runtime/lock_futex.go:236 +0x29 fp=0xc0004527a0 sp=0xc000452768 pc=0x411209
os/signal.signal_recv()
/usr/lib/go/src/runtime/sigqueue.go:152 +0x29 fp=0xc0004527c0 sp=0xc0004527a0 pc=0x46aa49
os/signal.loop()
/usr/lib/go/src/os/signal/signal_unix.go:23 +0x13 fp=0xc0004527e0 sp=0xc0004527c0 pc=0x6f3913
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004527e8 sp=0xc0004527e0 pc=0x46e081
created by os/signal.Notify.func1.1 in goroutine 1
/usr/lib/go/src/os/signal/signal.go:151 +0x1f

goroutine 25 [chan receive]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000452f18 sp=0xc000452ef8 pc=0x43e60e
runtime.chanrecv(0xc0001538c0, 0x0, 0x1)
/usr/lib/go/src/runtime/chan.go:583 +0x3cd fp=0xc000452f90 sp=0xc000452f18 pc=0x40beed
runtime.chanrecv1(0x0?, 0x0?)
/usr/lib/go/src/runtime/chan.go:442 +0x12 fp=0xc000452fb8 sp=0xc000452f90 pc=0x40baf2
github.com/jmorganca/ollama/server.Serve.func1()
/mnt/dev/ai/ollama/server/routes.go:952 +0x25 fp=0xc000452fe0 sp=0xc000452fb8 pc=0x999c05
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000452fe8 sp=0xc000452fe0 pc=0x46e081
created by github.com/jmorganca/ollama/server.Serve in goroutine 1
/mnt/dev/ai/ollama/server/routes.go:951 +0x407

rax 0x7fcfd8001d40
rbx 0x9
rcx 0x1a
rdx 0x1a
rdi 0x7fcfe7ffebe0
rsi 0x100
rbp 0x7fcfe7ffee00
rsp 0x7fcfe7ffebd8
r8 0x64
r9 0x0
r10 0x7fd065a30e58
r11 0x7fd065ac13c0
r12 0xc00003c300
r13 0x7fcfe7ffedc0
r14 0x7fcfe7ffebe0
r15 0xc00003c370
rip 0x7fd014406300
rflags 0x10206
cs 0x33
fs 0x0
gs 0x0

I'll add more info, because I am not entirely sure if only some specific ROCm version is supported (I have an older one, because it works with other software well), nor I am sure if my env. vars are correct (that GFX_VERSION should be, I am using it with SD WebUI, ComfyUI and Text Generation WebUI).

❯ yay -Q | grep -Pi rocm                                                                                                                                                                    ─╯
rocm-clang-ocl 5.6.1-1
rocm-cmake 5.6.1-1
rocm-core 5.6.1-1
rocm-device-libs 5.6.1-1
rocm-hip-libraries 5.6.1-1
rocm-hip-runtime 5.6.1-1
rocm-hip-sdk 5.6.1-1
rocm-language-runtime 5.6.1-1
rocm-llvm 5.6.1-1
rocm-ml-libraries 5.6.1-1
rocm-ml-sdk 5.6.1-1
rocm-opencl-runtime 5.6.1-1
rocm-smi-lib 5.6.1-1
rocminfo 5.6.1-1
❯ /opt/rocm/bin/rocminfo                                                                                                                                                                    ─╯
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE
System Endianness:       LITTLE
==========
HSA Agents
==========
*******
Agent 1
*******
  Name:                    AMD Ryzen 7 7800X3D 8-Core Processor
  Uuid:                    CPU-XX
  Marketing Name:          AMD Ryzen 7 7800X3D 8-Core Processor
  Vendor Name:             CPU
  Feature:                 None specified
  Profile:                 FULL_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        0(0x0)
  Queue Min Size:          0(0x0)
  Queue Max Size:          0(0x0)
  Queue Type:              MULTI
  Node:                    0
  Device Type:             CPU
  Cache Info:
    L1:                      32768(0x8000) KB
  Chip ID:                 0(0x0)
  ASIC Revision:           0(0x0)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   5050
  BDFID:                   0
  Internal Node ID:        0
  Compute Unit:            16
  SIMDs per CU:            0
  Shader Engines:          0
  Shader Arrs. per Eng.:   0
  WatchPts on Addr. Ranges:1
  Features:                None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: FINE GRAINED
      Size:                    64937188(0x3dedce4) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 2
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    64937188(0x3dedce4) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 3
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    64937188(0x3dedce4) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
  ISA Info:
*******
Agent 2
*******
  Name:                    gfx1100
  Uuid:                    GPU-df09d9133148a62b
  Marketing Name:          AMD Radeon RX 7900 XTX
  Vendor Name:             AMD
  Feature:                 KERNEL_DISPATCH
  Profile:                 BASE_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        128(0x80)
  Queue Min Size:          64(0x40)
  Queue Max Size:          131072(0x20000)
  Queue Type:              MULTI
  Node:                    1
  Device Type:             GPU
  Cache Info:
    L1:                      32(0x20) KB
    L2:                      6144(0x1800) KB
    L3:                      98304(0x18000) KB
  Chip ID:                 29772(0x744c)
  ASIC Revision:           0(0x0)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   2526
  BDFID:                   768
  Internal Node ID:        1
  Compute Unit:            96
  SIMDs per CU:            2
  Shader Engines:          6
  Shader Arrs. per Eng.:   2
  WatchPts on Addr. Ranges:4
  Features:                KERNEL_DISPATCH
  Fast F16 Operation:      TRUE
  Wavefront Size:          32(0x20)
  Workgroup Max Size:      1024(0x400)
  Workgroup Max Size per Dimension:
    x                        1024(0x400)
    y                        1024(0x400)
    z                        1024(0x400)
  Max Waves Per CU:        32(0x20)
  Max Work-item Per CU:    1024(0x400)
  Grid Max Size:           4294967295(0xffffffff)
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)
    y                        4294967295(0xffffffff)
    z                        4294967295(0xffffffff)
  Max fbarriers/Workgrp:   32
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    25149440(0x17fc000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 2
      Segment:                 GLOBAL; FLAGS:
      Size:                    25149440(0x17fc000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 3
      Segment:                 GROUP
      Size:                    64(0x40) KB
      Allocatable:             FALSE
      Alloc Granule:           0KB
      Alloc Alignment:         0KB
      Accessible by all:       FALSE
  ISA Info:
    ISA 1
      Name:                    amdgcn-amd-amdhsa--gfx1100
      Machine Models:          HSA_MACHINE_MODEL_LARGE
      Profiles:                HSA_PROFILE_BASE
      Default Rounding Mode:   NEAR
      Default Rounding Mode:   NEAR
      Fast f16:                TRUE
      Workgroup Max Size:      1024(0x400)
      Workgroup Max Size per Dimension:
        x                        1024(0x400)
        y                        1024(0x400)
        z                        1024(0x400)
      Grid Max Size:           4294967295(0xffffffff)
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)
        y                        4294967295(0xffffffff)
        z                        4294967295(0xffffffff)
      FBarrier Max Size:       32
*******
Agent 3
*******
  Name:                    gfx1036
  Uuid:                    GPU-XX
  Marketing Name:          AMD Radeon Graphics
  Vendor Name:             AMD
  Feature:                 KERNEL_DISPATCH
  Profile:                 BASE_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        128(0x80)
  Queue Min Size:          64(0x40)
  Queue Max Size:          131072(0x20000)
  Queue Type:              MULTI
  Node:                    2
  Device Type:             GPU
  Cache Info:
    L1:                      16(0x10) KB
    L2:                      256(0x100) KB
  Chip ID:                 5710(0x164e)
  ASIC Revision:           1(0x1)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   2200
  BDFID:                   6656
  Internal Node ID:        2
  Compute Unit:            2
  SIMDs per CU:            2
  Shader Engines:          1
  Shader Arrs. per Eng.:   1
  WatchPts on Addr. Ranges:4
  Features:                KERNEL_DISPATCH
  Fast F16 Operation:      TRUE
  Wavefront Size:          32(0x20)
  Workgroup Max Size:      1024(0x400)
  Workgroup Max Size per Dimension:
    x                        1024(0x400)
    y                        1024(0x400)
    z                        1024(0x400)
  Max Waves Per CU:        32(0x20)
  Max Work-item Per CU:    1024(0x400)
  Grid Max Size:           4294967295(0xffffffff)
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)
    y                        4294967295(0xffffffff)
    z                        4294967295(0xffffffff)
  Max fbarriers/Workgrp:   32
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    524288(0x80000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 2
      Segment:                 GLOBAL; FLAGS:
      Size:                    524288(0x80000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 3
      Segment:                 GROUP
      Size:                    64(0x40) KB
      Allocatable:             FALSE
      Alloc Granule:           0KB
      Alloc Alignment:         0KB
      Accessible by all:       FALSE
  ISA Info:
    ISA 1
      Name:                    amdgcn-amd-amdhsa--gfx1036
      Machine Models:          HSA_MACHINE_MODEL_LARGE
      Profiles:                HSA_PROFILE_BASE
      Default Rounding Mode:   NEAR
      Default Rounding Mode:   NEAR
      Fast f16:                TRUE
      Workgroup Max Size:      1024(0x400)
      Workgroup Max Size per Dimension:
        x                        1024(0x400)
        y                        1024(0x400)
        z                        1024(0x400)
      Grid Max Size:           4294967295(0xffffffff)
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)
        y                        4294967295(0xffffffff)
        z                        4294967295(0xffffffff)
      FBarrier Max Size:       32
*** Done ***
<!-- gh-comment-id:1907563724 --> @mnn commented on GitHub (Jan 24, 2024): @dhiltgen I have tried the new logging (used commit f63dc2db5c00de0f6b0b5ea9b53bb20e83513cea), but I think it crashed on NVidia-related code before it got to AMD (just my guess, I know close to nothing about Go or GPU stuff). It behaves same without env. vars. <details> <pre> ❯ OLLAMA_DEBUG=1 PATH=$PATH:/opt/rocm/bin HSA_OVERRIDE_GFX_VERSION=11.0.0 HCC_AMDGPU_TARGET=gfx1100 ./ollama serve time=2024-01-24T08:16:50.396+01:00 level=DEBUG source=/mnt/dev/ai/ollama/server/routes.go:919 msg="Debug logging enabled" time=2024-01-24T08:16:50.396+01:00 level=INFO source=/mnt/dev/ai/ollama/server/images.go:815 msg="total blobs: 25" time=2024-01-24T08:16:50.397+01:00 level=INFO source=/mnt/dev/ai/ollama/server/images.go:822 msg="total unused blobs removed: 0" [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached. [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production. - using env: export GIN_MODE=release - using code: gin.SetMode(gin.ReleaseMode) [GIN-debug] POST /api/pull --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers) [GIN-debug] POST /api/generate --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers) [GIN-debug] POST /api/chat --> github.com/jmorganca/ollama/server.ChatHandler (5 handlers) [GIN-debug] POST /api/embeddings --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers) [GIN-debug] POST /api/create --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers) [GIN-debug] POST /api/push --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers) [GIN-debug] POST /api/copy --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers) [GIN-debug] DELETE /api/delete --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers) [GIN-debug] POST /api/show --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers) [GIN-debug] POST /api/blobs/:digest --> github.com/jmorganca/ollama/server.CreateBlobHandler (5 handlers) [GIN-debug] HEAD /api/blobs/:digest --> github.com/jmorganca/ollama/server.HeadBlobHandler (5 handlers) [GIN-debug] GET / --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) [GIN-debug] GET /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers) [GIN-debug] GET /api/version --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers) [GIN-debug] HEAD / --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) [GIN-debug] HEAD /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers) [GIN-debug] HEAD /api/version --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers) time=2024-01-24T08:16:50.397+01:00 level=INFO source=/mnt/dev/ai/ollama/server/routes.go:943 msg="Listening on 127.0.0.1:11434 (version 0.0.0)" time=2024-01-24T08:16:50.397+01:00 level=INFO source=/mnt/dev/ai/ollama/llm/payload_common.go:106 msg="Extracting dynamic libraries..." time=2024-01-24T08:16:53.664+01:00 level=INFO source=/mnt/dev/ai/ollama/llm/payload_common.go:145 msg="Dynamic LLM libraries [cpu_avx2 rocm_v1 cpu cpu_avx cuda_v12]" time=2024-01-24T08:16:53.664+01:00 level=DEBUG source=/mnt/dev/ai/ollama/llm/payload_common.go:146 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-01-24T08:16:53.664+01:00 level=INFO source=/mnt/dev/ai/ollama/gpu/gpu.go:91 msg="Detecting GPU type" time=2024-01-24T08:16:53.664+01:00 level=INFO source=/mnt/dev/ai/ollama/gpu/gpu.go:210 msg="Searching for GPU management library libnvidia-ml.so" time=2024-01-24T08:16:53.665+01:00 level=DEBUG source=/mnt/dev/ai/ollama/gpu/gpu.go:228 msg="gpu management search paths: [/usr/local/cuda/lib64/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/libnvidia-ml.so* /usr/lib/wsl/lib/libnvidia-ml.so* /usr/lib/wsl/drivers/*/libnvidia-ml.so* /opt/cuda/lib64/libnvidia-ml.so* /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so* /usr/lib*/libnvidia-ml.so* /usr/local/lib*/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/libnvidia-ml.so* /mnt/dev/ai/ollama/libnvidia-ml.so*]" time=2024-01-24T08:16:53.673+01:00 level=INFO source=/mnt/dev/ai/ollama/gpu/gpu.go:256 msg="Discovered GPU libraries: [/opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so /usr/lib/libnvidia-ml.so.545.29.06 /usr/lib64/libnvidia-ml.so.545.29.06]" !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! WARNING: You should always run with libnvidia-ml.so that is installed with your NVIDIA Display Driver. By default it's installed in /usr/lib and /usr/lib64. libnvidia-ml.so in GDK package is a stub library that is attached only for build purposes (e.g. machine that you build your application doesn't have to have Display Driver installed). !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Linked to libnvidia-ml library at wrong path : /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so SIGSEGV: segmentation violation PC=0x7fd014406300 m=14 sigcode=1 signal arrived during cgo execution goroutine 1 [syscall]: runtime.cgocall(0x9b4490, 0xc0004598a8) /usr/lib/go/src/runtime/cgocall.go:157 +0x4b fp=0xc000459880 sp=0xc000459848 pc=0x409b0b github.com/jmorganca/ollama/gpu._Cfunc_cuda_init(0x7fcfd8000cb0, 0xc00003c300) _cgo_gotypes.go:254 +0x3f fp=0xc0004598a8 sp=0xc000459880 pc=0x7b945f github.com/jmorganca/ollama/gpu.LoadCUDAMgmt.func2(0xc00003a0d0?, 0x38?) /mnt/dev/ai/ollama/gpu/gpu.go:266 +0x4a fp=0xc0004598e8 sp=0xc0004598a8 pc=0x7bb22a github.com/jmorganca/ollama/gpu.LoadCUDAMgmt({0xc00052c000, 0x3, 0xc0000f2420?}) /mnt/dev/ai/ollama/gpu/gpu.go:266 +0x1b8 fp=0xc000459988 sp=0xc0004598e8 pc=0x7bb0f8 github.com/jmorganca/ollama/gpu.initGPUHandles() /mnt/dev/ai/ollama/gpu/gpu.go:94 +0xd1 fp=0xc0004599f0 sp=0xc000459988 pc=0x7b98b1 github.com/jmorganca/ollama/gpu.GetGPUInfo() /mnt/dev/ai/ollama/gpu/gpu.go:119 +0xb5 fp=0xc000459b00 sp=0xc0004599f0 pc=0x7b9a75 github.com/jmorganca/ollama/gpu.CheckVRAM() /mnt/dev/ai/ollama/gpu/gpu.go:192 +0x1f fp=0xc000459ba8 sp=0xc000459b00 pc=0x7ba75f github.com/jmorganca/ollama/server.Serve({0x19e64470, 0xc000024040}) /mnt/dev/ai/ollama/server/routes.go:965 +0x45f fp=0xc000459c98 sp=0xc000459ba8 pc=0x999b3f github.com/jmorganca/ollama/cmd.RunServer(0xc000438300?, {0x1a2a87a0?, 0x4?, 0xaccea1?}) /mnt/dev/ai/ollama/cmd/cmd.go:690 +0x199 fp=0xc000459d30 sp=0xc000459c98 pc=0x9abf59 github.com/spf13/cobra.(*Command).execute(0xc0003f1500, {0x1a2a87a0, 0x0, 0x0}) /home/xxx/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0x87c fp=0xc000459e68 sp=0xc000459d30 pc=0x763c9c github.com/spf13/cobra.(*Command).ExecuteC(0xc0003f0900) /home/xxx/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000459f20 sp=0xc000459e68 pc=0x7644c5 github.com/spf13/cobra.(*Command).Execute(...) /home/xxx/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) /home/xxx/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:985 main.main() /mnt/dev/ai/ollama/main.go:11 +0x4d fp=0xc000459f40 sp=0xc000459f20 pc=0x9b2fcd runtime.main() /usr/lib/go/src/runtime/proc.go:267 +0x2bb fp=0xc000459fe0 sp=0xc000459f40 pc=0x43e1bb runtime.goexit() /usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000459fe8 sp=0xc000459fe0 pc=0x46e081 goroutine 2 [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00006cfa8 sp=0xc00006cf88 pc=0x43e60e runtime.goparkunlock(...) /usr/lib/go/src/runtime/proc.go:404 runtime.forcegchelper() /usr/lib/go/src/runtime/proc.go:322 +0xb3 fp=0xc00006cfe0 sp=0xc00006cfa8 pc=0x43e493 runtime.goexit() /usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006cfe8 sp=0xc00006cfe0 pc=0x46e081 created by runtime.init.6 in goroutine 1 /usr/lib/go/src/runtime/proc.go:310 +0x1a goroutine 18 [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000068778 sp=0xc000068758 pc=0x43e60e runtime.goparkunlock(...) /usr/lib/go/src/runtime/proc.go:404 runtime.bgsweep(0x0?) /usr/lib/go/src/runtime/mgcsweep.go:321 +0xdf fp=0xc0000687c8 sp=0xc000068778 pc=0x42a57f runtime.gcenable.func1() /usr/lib/go/src/runtime/mgc.go:200 +0x25 fp=0xc0000687e0 sp=0xc0000687c8 pc=0x41f6c5 runtime.goexit() /usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000687e8 sp=0xc0000687e0 pc=0x46e081 created by runtime.gcenable in goroutine 1 /usr/lib/go/src/runtime/mgc.go:200 +0x66 goroutine 19 [GC scavenge wait]: runtime.gopark(0x8b2ee4?, 0x7ed898?, 0x0?, 0x0?, 0x0?) /usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000068f70 sp=0xc000068f50 pc=0x43e60e runtime.goparkunlock(...) /usr/lib/go/src/runtime/proc.go:404 runtime.(*scavengerState).park(0x1a278b20) /usr/lib/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc000068fa0 sp=0xc000068f70 pc=0x427de9 runtime.bgscavenge(0x0?) /usr/lib/go/src/runtime/mgcscavenge.go:658 +0x59 fp=0xc000068fc8 sp=0xc000068fa0 pc=0x428399 runtime.gcenable.func2() /usr/lib/go/src/runtime/mgc.go:201 +0x25 fp=0xc000068fe0 sp=0xc000068fc8 pc=0x41f665 runtime.goexit() /usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000068fe8 sp=0xc000068fe0 pc=0x46e081 created by runtime.gcenable in goroutine 1 /usr/lib/go/src/runtime/mgc.go:201 +0xa5 goroutine 20 [finalizer wait]: runtime.gopark(0x198?, 0xac5e60?, 0x1?, 0xf7?, 0x0?) /usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00006c620 sp=0xc00006c600 pc=0x43e60e runtime.runfinq() /usr/lib/go/src/runtime/mfinal.go:193 +0x107 fp=0xc00006c7e0 sp=0xc00006c620 pc=0x41e6e7 runtime.goexit() /usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006c7e8 sp=0xc00006c7e0 pc=0x46e081 created by runtime.createfing in goroutine 1 /usr/lib/go/src/runtime/mfinal.go:163 +0x3d goroutine 21 [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000069750 sp=0xc000069730 pc=0x43e60e runtime.gcBgMarkWorker() /usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc0000697e0 sp=0xc000069750 pc=0x421245 runtime.goexit() /usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000697e8 sp=0xc0000697e0 pc=0x46e081 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1219 +0x1c goroutine 22 [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000069f50 sp=0xc000069f30 pc=0x43e60e runtime.gcBgMarkWorker() /usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc000069fe0 sp=0xc000069f50 pc=0x421245 runtime.goexit() /usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000069fe8 sp=0xc000069fe0 pc=0x46e081 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1219 +0x1c goroutine 3 [GC worker (idle)]: runtime.gopark(0x59ab3bdc52a?, 0x1?, 0xa0?, 0x31?, 0x0?) /usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00006d750 sp=0xc00006d730 pc=0x43e60e runtime.gcBgMarkWorker() /usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc00006d7e0 sp=0xc00006d750 pc=0x421245 runtime.goexit() /usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006d7e8 sp=0xc00006d7e0 pc=0x46e081 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1219 +0x1c goroutine 34 [GC worker (idle)]: runtime.gopark(0x59ab3bf2720?, 0x3?, 0xc6?, 0x18?, 0x0?) /usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000488750 sp=0xc000488730 pc=0x43e60e runtime.gcBgMarkWorker() /usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc0004887e0 sp=0xc000488750 pc=0x421245 runtime.goexit() /usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004887e8 sp=0xc0004887e0 pc=0x46e081 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1219 +0x1c goroutine 35 [GC worker (idle)]: runtime.gopark(0x59ab3bdc399?, 0x1?, 0x5b?, 0x9c?, 0x0?) /usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000488f50 sp=0xc000488f30 pc=0x43e60e runtime.gcBgMarkWorker() /usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc000488fe0 sp=0xc000488f50 pc=0x421245 runtime.goexit() /usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000488fe8 sp=0xc000488fe0 pc=0x46e081 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1219 +0x1c goroutine 4 [GC worker (idle)]: runtime.gopark(0x59ab3bdc584?, 0x1?, 0xae?, 0x65?, 0x0?) /usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00006df50 sp=0xc00006df30 pc=0x43e60e runtime.gcBgMarkWorker() /usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc00006dfe0 sp=0xc00006df50 pc=0x421245 runtime.goexit() /usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006dfe8 sp=0xc00006dfe0 pc=0x46e081 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1219 +0x1c goroutine 36 [GC worker (idle)]: runtime.gopark(0x59ab3bf2e04?, 0x3?, 0x3a?, 0x23?, 0x0?) /usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000489750 sp=0xc000489730 pc=0x43e60e runtime.gcBgMarkWorker() /usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc0004897e0 sp=0xc000489750 pc=0x421245 runtime.goexit() /usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004897e8 sp=0xc0004897e0 pc=0x46e081 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1219 +0x1c goroutine 37 [GC worker (idle)]: runtime.gopark(0x59ab3bdc52a?, 0x1?, 0xba?, 0x4d?, 0x0?) /usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000489f50 sp=0xc000489f30 pc=0x43e60e runtime.gcBgMarkWorker() /usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc000489fe0 sp=0xc000489f50 pc=0x421245 runtime.goexit() /usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000489fe8 sp=0xc000489fe0 pc=0x46e081 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1219 +0x1c goroutine 23 [GC worker (idle)]: runtime.gopark(0x59ab3bdd079?, 0x3?, 0xbc?, 0x18?, 0x0?) /usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00006a750 sp=0xc00006a730 pc=0x43e60e runtime.gcBgMarkWorker() /usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc00006a7e0 sp=0xc00006a750 pc=0x421245 runtime.goexit() /usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006a7e8 sp=0xc00006a7e0 pc=0x46e081 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1219 +0x1c goroutine 38 [GC worker (idle)]: runtime.gopark(0x59ab3bf29ab?, 0x1?, 0x8c?, 0x6d?, 0x0?) /usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00048a750 sp=0xc00048a730 pc=0x43e60e runtime.gcBgMarkWorker() /usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc00048a7e0 sp=0xc00048a750 pc=0x421245 runtime.goexit() /usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00048a7e8 sp=0xc00048a7e0 pc=0x46e081 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1219 +0x1c goroutine 5 [GC worker (idle)]: runtime.gopark(0x59ab008fefa?, 0x1?, 0x58?, 0x52?, 0x0?) /usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00006e750 sp=0xc00006e730 pc=0x43e60e runtime.gcBgMarkWorker() /usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc00006e7e0 sp=0xc00006e750 pc=0x421245 runtime.goexit() /usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006e7e8 sp=0xc00006e7e0 pc=0x46e081 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1219 +0x1c goroutine 6 [GC worker (idle)]: runtime.gopark(0x59ab3bdbc66?, 0x1?, 0x3a?, 0x21?, 0x0?) /usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00006ef50 sp=0xc00006ef30 pc=0x43e60e runtime.gcBgMarkWorker() /usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc00006efe0 sp=0xc00006ef50 pc=0x421245 runtime.goexit() /usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006efe8 sp=0xc00006efe0 pc=0x46e081 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1219 +0x1c goroutine 7 [GC worker (idle)]: runtime.gopark(0x59ab3bf269e?, 0x1?, 0xf5?, 0x46?, 0x0?) /usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00006f750 sp=0xc00006f730 pc=0x43e60e runtime.gcBgMarkWorker() /usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc00006f7e0 sp=0xc00006f750 pc=0x421245 runtime.goexit() /usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006f7e8 sp=0xc00006f7e0 pc=0x46e081 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1219 +0x1c goroutine 8 [GC worker (idle)]: runtime.gopark(0x59ab3bf273e?, 0x3?, 0xc1?, 0xa1?, 0x0?) /usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00006ff50 sp=0xc00006ff30 pc=0x43e60e runtime.gcBgMarkWorker() /usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc00006ffe0 sp=0xc00006ff50 pc=0x421245 runtime.goexit() /usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006ffe8 sp=0xc00006ffe0 pc=0x46e081 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1219 +0x1c goroutine 9 [GC worker (idle)]: runtime.gopark(0x1a2aa4e0?, 0x1?, 0xba?, 0x61?, 0x0?) /usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000484750 sp=0xc000484730 pc=0x43e60e runtime.gcBgMarkWorker() /usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc0004847e0 sp=0xc000484750 pc=0x421245 runtime.goexit() /usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004847e8 sp=0xc0004847e0 pc=0x46e081 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1219 +0x1c goroutine 10 [GC worker (idle)]: runtime.gopark(0x59ab3bf30d5?, 0x3?, 0xa5?, 0x32?, 0x0?) /usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000484f50 sp=0xc000484f30 pc=0x43e60e runtime.gcBgMarkWorker() /usr/lib/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc000484fe0 sp=0xc000484f50 pc=0x421245 runtime.goexit() /usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000484fe8 sp=0xc000484fe0 pc=0x46e081 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/lib/go/src/runtime/mgc.go:1219 +0x1c goroutine 39 [select, locked to thread]: runtime.gopark(0xc00048b7a8?, 0x2?, 0xa9?, 0xe8?, 0xc00048b7a4?) /usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc00048b638 sp=0xc00048b618 pc=0x43e60e runtime.selectgo(0xc00048b7a8, 0xc00048b7a0, 0x0?, 0x0, 0x0?, 0x1) /usr/lib/go/src/runtime/select.go:327 +0x725 fp=0xc00048b758 sp=0xc00048b638 pc=0x44e165 runtime.ensureSigM.func1() /usr/lib/go/src/runtime/signal_unix.go:1014 +0x19f fp=0xc00048b7e0 sp=0xc00048b758 pc=0x46519f runtime.goexit() /usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00048b7e8 sp=0xc00048b7e0 pc=0x46e081 created by runtime.ensureSigM in goroutine 1 /usr/lib/go/src/runtime/signal_unix.go:997 +0xc8 goroutine 24 [syscall]: runtime.notetsleepg(0x0?, 0x0?) /usr/lib/go/src/runtime/lock_futex.go:236 +0x29 fp=0xc0004527a0 sp=0xc000452768 pc=0x411209 os/signal.signal_recv() /usr/lib/go/src/runtime/sigqueue.go:152 +0x29 fp=0xc0004527c0 sp=0xc0004527a0 pc=0x46aa49 os/signal.loop() /usr/lib/go/src/os/signal/signal_unix.go:23 +0x13 fp=0xc0004527e0 sp=0xc0004527c0 pc=0x6f3913 runtime.goexit() /usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004527e8 sp=0xc0004527e0 pc=0x46e081 created by os/signal.Notify.func1.1 in goroutine 1 /usr/lib/go/src/os/signal/signal.go:151 +0x1f goroutine 25 [chan receive]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/lib/go/src/runtime/proc.go:398 +0xce fp=0xc000452f18 sp=0xc000452ef8 pc=0x43e60e runtime.chanrecv(0xc0001538c0, 0x0, 0x1) /usr/lib/go/src/runtime/chan.go:583 +0x3cd fp=0xc000452f90 sp=0xc000452f18 pc=0x40beed runtime.chanrecv1(0x0?, 0x0?) /usr/lib/go/src/runtime/chan.go:442 +0x12 fp=0xc000452fb8 sp=0xc000452f90 pc=0x40baf2 github.com/jmorganca/ollama/server.Serve.func1() /mnt/dev/ai/ollama/server/routes.go:952 +0x25 fp=0xc000452fe0 sp=0xc000452fb8 pc=0x999c05 runtime.goexit() /usr/lib/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000452fe8 sp=0xc000452fe0 pc=0x46e081 created by github.com/jmorganca/ollama/server.Serve in goroutine 1 /mnt/dev/ai/ollama/server/routes.go:951 +0x407 rax 0x7fcfd8001d40 rbx 0x9 rcx 0x1a rdx 0x1a rdi 0x7fcfe7ffebe0 rsi 0x100 rbp 0x7fcfe7ffee00 rsp 0x7fcfe7ffebd8 r8 0x64 r9 0x0 r10 0x7fd065a30e58 r11 0x7fd065ac13c0 r12 0xc00003c300 r13 0x7fcfe7ffedc0 r14 0x7fcfe7ffebe0 r15 0xc00003c370 rip 0x7fd014406300 rflags 0x10206 cs 0x33 fs 0x0 gs 0x0 </pre> </details> I'll add more info, because I am not entirely sure if only some specific ROCm version is supported (I have an older one, because it works with other software well), nor I am sure if my env. vars are correct (that GFX_VERSION should be, I am using it with SD WebUI, ComfyUI and Text Generation WebUI). <details> <pre> ❯ yay -Q | grep -Pi rocm ─╯ rocm-clang-ocl 5.6.1-1 rocm-cmake 5.6.1-1 rocm-core 5.6.1-1 rocm-device-libs 5.6.1-1 rocm-hip-libraries 5.6.1-1 rocm-hip-runtime 5.6.1-1 rocm-hip-sdk 5.6.1-1 rocm-language-runtime 5.6.1-1 rocm-llvm 5.6.1-1 rocm-ml-libraries 5.6.1-1 rocm-ml-sdk 5.6.1-1 rocm-opencl-runtime 5.6.1-1 rocm-smi-lib 5.6.1-1 rocminfo 5.6.1-1 ❯ /opt/rocm/bin/rocminfo ─╯ ROCk module is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.1 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE ========== HSA Agents ========== ******* Agent 1 ******* Name: AMD Ryzen 7 7800X3D 8-Core Processor Uuid: CPU-XX Marketing Name: AMD Ryzen 7 7800X3D 8-Core Processor Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 5050 BDFID: 0 Internal Node ID: 0 Compute Unit: 16 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 64937188(0x3dedce4) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 64937188(0x3dedce4) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 64937188(0x3dedce4) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: ******* Agent 2 ******* Name: gfx1100 Uuid: GPU-df09d9133148a62b Marketing Name: AMD Radeon RX 7900 XTX Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 32(0x20) KB L2: 6144(0x1800) KB L3: 98304(0x18000) KB Chip ID: 29772(0x744c) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 2526 BDFID: 768 Internal Node ID: 1 Compute Unit: 96 SIMDs per CU: 2 Shader Engines: 6 Shader Arrs. per Eng.: 2 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 32(0x20) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 25149440(0x17fc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: Size: 25149440(0x17fc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx1100 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 ******* Agent 3 ******* Name: gfx1036 Uuid: GPU-XX Marketing Name: AMD Radeon Graphics Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 2 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 256(0x100) KB Chip ID: 5710(0x164e) ASIC Revision: 1(0x1) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 2200 BDFID: 6656 Internal Node ID: 2 Compute Unit: 2 SIMDs per CU: 2 Shader Engines: 1 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 32(0x20) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 524288(0x80000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: Size: 524288(0x80000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx1036 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 *** Done *** </pre> </details>
Author
Owner

@adham-omran commented on GitHub (Jan 24, 2024):

@adham-omran can you clarify the scenario when you override with gfx1032? Does the server "work" and are you able to run models on the GPU?

No, I am unable to run models with my 6600 even with override.

<!-- gh-comment-id:1907795218 --> @adham-omran commented on GitHub (Jan 24, 2024): > @adham-omran can you clarify the scenario when you override with `gfx1032`? Does the server "work" and are you able to run models on the GPU? No, I am unable to run models with my 6600 even with override.
Author
Owner

@0xdeafbeef commented on GitHub (Jan 24, 2024):

@0xdeafbeef could you share more of the server log showing startup and loading the llm library so we can see why it's not working correctly on your ROCm setup?

It fails because fedora doesn't yet ship hipblas. I'll later check using docker

<!-- gh-comment-id:1907893357 --> @0xdeafbeef commented on GitHub (Jan 24, 2024): > @0xdeafbeef could you share more of the server log showing startup and loading the llm library so we can see why it's not working correctly on your ROCm setup? It fails because fedora doesn't yet ship hipblas. I'll later check using docker
Author
Owner

@0xdeafbeef commented on GitHub (Jan 24, 2024):

@0xdeafbeef could you share more of the server log showing startup and loading the llm library so we can see why it's not working correctly on your ROCm setup?

HSA_OVERRIDE_GFX_VERSION=11.0.0 HCC_AMDGPU_TARGET=gfx1102 LD_LIBRARY_PATH=/usr/lib64/ ollama serve
2024/01/24 17:07:25 dyn_ext_server.go:145: INFO Initializing llama server
Memory access fault by GPU node-1 (Agent handle: 0x72c6cc903ef0) on address 0x72c7c6208000. Reason: Page not present or supervisor privilege.
fish: Job 1, 'HSA_OVERRIDE_GFX_VERSION=11.0.0…' terminated by signal SIGABRT (Abort)

after symlinking ln -s /usr/lib64/libhipblas.so.2 /usr/lib64/libhipblas.so.1 stucks here

ime=2024-01-24T17:23:40.434+01:00 level=INFO source=/go/src/github.com/jmorganca/ollama/gpu/cpu_common.go:11 msg="CPU has AVX2"
loading library /tmp/ollama3746580572/rocm_v5/libext_server.so
time=2024-01-24T17:23:40.434+01:00 level=WARN source=/go/src/github.com/jmorganca/ollama/llm/llm.go:152 msg="Failed to load dynamic library /tmp/ollama3746580572/rocm_v5/libext_server.so  Unable to load dynamic library: Unable to load dynamic server library: libamdhip64.so.5: cannot open shared object file: No such file or directory"
loading library /tmp/ollama3746580572/rocm_v6/libext_server.so
time=2024-01-24T17:23:40.463+01:00 level=INFO source=/go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama3746580572/rocm_v6/libext_server.so"
time=2024-01-24T17:23:40.463+01:00 level=INFO source=/go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:145 msg="Initializing llama server"
[1706113420] system info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
[1706113420] Performing pre-initialization of GPU
<!-- gh-comment-id:1908481901 --> @0xdeafbeef commented on GitHub (Jan 24, 2024): > @0xdeafbeef could you share more of the server log showing startup and loading the llm library so we can see why it's not working correctly on your ROCm setup? ``` HSA_OVERRIDE_GFX_VERSION=11.0.0 HCC_AMDGPU_TARGET=gfx1102 LD_LIBRARY_PATH=/usr/lib64/ ollama serve ``` ``` 2024/01/24 17:07:25 dyn_ext_server.go:145: INFO Initializing llama server Memory access fault by GPU node-1 (Agent handle: 0x72c6cc903ef0) on address 0x72c7c6208000. Reason: Page not present or supervisor privilege. fish: Job 1, 'HSA_OVERRIDE_GFX_VERSION=11.0.0…' terminated by signal SIGABRT (Abort) ``` after symlinking `ln -s /usr/lib64/libhipblas.so.2 /usr/lib64/libhipblas.so.1` stucks here ``` ime=2024-01-24T17:23:40.434+01:00 level=INFO source=/go/src/github.com/jmorganca/ollama/gpu/cpu_common.go:11 msg="CPU has AVX2" loading library /tmp/ollama3746580572/rocm_v5/libext_server.so time=2024-01-24T17:23:40.434+01:00 level=WARN source=/go/src/github.com/jmorganca/ollama/llm/llm.go:152 msg="Failed to load dynamic library /tmp/ollama3746580572/rocm_v5/libext_server.so Unable to load dynamic library: Unable to load dynamic server library: libamdhip64.so.5: cannot open shared object file: No such file or directory" loading library /tmp/ollama3746580572/rocm_v6/libext_server.so time=2024-01-24T17:23:40.463+01:00 level=INFO source=/go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama3746580572/rocm_v6/libext_server.so" time=2024-01-24T17:23:40.463+01:00 level=INFO source=/go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:145 msg="Initializing llama server" [1706113420] system info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | [1706113420] Performing pre-initialization of GPU ```
Author
Owner

@dhiltgen commented on GitHub (Jan 24, 2024):

@mnn Do you have any NVIDIA cards in your system? If not, as a workaround, you could uninstall the CUDA libraries so we don't try to probe for CUDA cards, but ollama shouldn't crash like that. I'll add some more verbose logging in that cuda_init routine so we can try to understand why it's crashing and fix the bug so it continues on gracefully.

<!-- gh-comment-id:1908543954 --> @dhiltgen commented on GitHub (Jan 24, 2024): @mnn Do you have any NVIDIA cards in your system? If not, as a workaround, you could uninstall the CUDA libraries so we don't try to probe for CUDA cards, but ollama shouldn't crash like that. I'll add some more verbose logging in that cuda_init routine so we can try to understand why it's crashing and fix the bug so it continues on gracefully.
Author
Owner

@dhiltgen commented on GitHub (Jan 24, 2024):

It fails because fedora doesn't yet ship hipblas. I'll later check using docker

@0xdeafbeef we haven't pushed an updated official image yet, but I've pushed an image to dhiltgen/ollama:0.1.21-rc3 which you could try with something like:

docker run --privileged --rm -it --device /dev/kfd -e OLLAMA_DEBUG=1 dhiltgen/ollama:0.1.21-rc3

Also note that recent builds support both ROCm v6 and v5, if that helps get the necessary dependencies on your system, so you shouldn't have to symlink different versions.

<!-- gh-comment-id:1908550482 --> @dhiltgen commented on GitHub (Jan 24, 2024): > It fails because fedora doesn't yet ship hipblas. I'll later check using docker @0xdeafbeef we haven't pushed an updated official image yet, but I've pushed an image to `dhiltgen/ollama:0.1.21-rc3` which you could try with something like: ``` docker run --privileged --rm -it --device /dev/kfd -e OLLAMA_DEBUG=1 dhiltgen/ollama:0.1.21-rc3 ``` Also note that recent builds support both ROCm v6 and v5, if that helps get the necessary dependencies on your system, so you shouldn't have to symlink different versions.
Author
Owner

@0xdeafbeef commented on GitHub (Jan 24, 2024):

It fails because fedora doesn't yet ship hipblas. I'll later check using docker

@0xdeafbeef we haven't pushed an updated official image yet, but I've pushed an image to dhiltgen/ollama:0.1.21-rc3 which you could try with something like:

docker run --privileged --rm -it --device /dev/kfd -e OLLAMA_DEBUG=1 dhiltgen/ollama:0.1.21-rc3

Also note that recent builds support both ROCm v6 and v5, if that helps get the necessary dependencies on your system, so you shouldn't have to symlink different versions.

[74658.571655] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32785, for process ollama pid 933444 thread ollama pid 933506)
[74658.571659] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x000072c7c5a08000 from client 0x1b (UTCL2)
[74658.571661] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00801431
[74658.571662] amdgpu 0000:03:00.0: amdgpu:      Faulty UTCL2 client ID: SQC (data) (0xa)
[74658.571663] amdgpu 0000:03:00.0: amdgpu:      MORE_FAULTS: 0x1
[74658.571664] amdgpu 0000:03:00.0: amdgpu:      WALKER_ERROR: 0x0
[74658.571665] amdgpu 0000:03:00.0: amdgpu:      PERMISSION_FAULTS: 0x3
[74658.571666] amdgpu 0000:03:00.0: amdgpu:      MAPPING_ERROR: 0x0
[74658.571666] amdgpu 0000:03:00.0: amdgpu:      RW: 0x0
[74658.571678] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32785, for process ollama pid 933444 thread ollama pid 933506)
[74658.571680] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x000072c7c5a08000 from client 0x1b (UTCL2)
[74658.571681] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00801231
[74658.571682] amdgpu 0000:03:00.0: amdgpu:      Faulty UTCL2 client ID: SQC (inst) (0x9)
[74658.571682] amdgpu 0000:03:00.0: amdgpu:      MORE_FAULTS: 0x1
[74658.571683] amdgpu 0000:03:00.0: amdgpu:      WALKER_ERROR: 0x0
[74658.571684] amdgpu 0000:03:00.0: amdgpu:      PERMISSION_FAULTS: 0x3
[74658.571684] amdgpu 0000:03:00.0: amdgpu:      MAPPING_ERROR: 0x0
[74658.571685] amdgpu 0000:03:00.0: amdgpu:      RW: 0x0
[74658.571690] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32785, for process ollama pid 933444 thread ollama pid 933506)
[74658.571691] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x000072c7c5608000 from client 0x1b (UTCL2)
[74658.571692] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[74658.571693] amdgpu 0000:03:00.0: amdgpu:      Faulty UTCL2 client ID: CB/DB (0x0)
[74658.571694] amdgpu 0000:03:00.0: amdgpu:      MORE_FAULTS: 0x0
[74658.571694] amdgpu 0000:03:00.0: amdgpu:      WALKER_ERROR: 0x0
[74658.571695] amdgpu 0000:03:00.0: amdgpu:      PERMISSION_FAULTS: 0x0
[74658.571696] amdgpu 0000:03:00.0: amdgpu:      MAPPING_ERROR: 0x0
[74658.571696] amdgpu 0000:03:00.0: amdgpu:      RW: 0x0
[74658.571701] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32785, for process ollama pid 933444 thread ollama pid 933506)
[74658.571703] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x000072c7c5205000 from client 0x1b (UTCL2)
[74658.571703] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[74658.571704] amdgpu 0000:03:00.0: amdgpu:      Faulty UTCL2 client ID: CB/DB (0x0)
[74658.571705] amdgpu 0000:03:00.0: amdgpu:      MORE_FAULTS: 0x0
[74658.571705] amdgpu 0000:03:00.0: amdgpu:      WALKER_ERROR: 0x0
[74658.571706] amdgpu 0000:03:00.0: amdgpu:      PERMISSION_FAULTS: 0x0
[74658.571707] amdgpu 0000:03:00.0: amdgpu:      MAPPING_ERROR: 0x0
[74658.571707] amdgpu 0000:03:00.0: amdgpu:      RW: 0x0
[74658.571712] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32785, for process ollama pid 933444 thread ollama pid 933506)
[74658.571713] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x000072c7c5205000 from client 0x1b (UTCL2)
[74658.571714] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[74658.571715] amdgpu 0000:03:00.0: amdgpu:      Faulty UTCL2 client ID: CB/DB (0x0)
[74658.571716] amdgpu 0000:03:00.0: amdgpu:      MORE_FAULTS: 0x0
[74658.571716] amdgpu 0000:03:00.0: amdgpu:      WALKER_ERROR: 0x0
[74658.571717] amdgpu 0000:03:00.0: amdgpu:      PERMISSION_FAULTS: 0x0
[74658.571718] amdgpu 0000:03:00.0: amdgpu:      MAPPING_ERROR: 0x0
[74658.571718] amdgpu 0000:03:00.0: amdgpu:      RW: 0x0
[74658.571723] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32785, for process ollama pid 933444 thread ollama pid 933506)
[74658.571725] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x000072c7c5205000 from client 0x1b (UTCL2)
[74658.571725] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[74658.571726] amdgpu 0000:03:00.0: amdgpu:      Faulty UTCL2 client ID: CB/DB (0x0)
[74658.571727] amdgpu 0000:03:00.0: amdgpu:      MORE_FAULTS: 0x0
[74658.571727] amdgpu 0000:03:00.0: amdgpu:      WALKER_ERROR: 0x0
[74658.571728] amdgpu 0000:03:00.0: amdgpu:      PERMISSION_FAULTS: 0x0
[74658.571729] amdgpu 0000:03:00.0: amdgpu:      MAPPING_ERROR: 0x0
[74658.571729] amdgpu 0000:03:00.0: amdgpu:      RW: 0x0
[74658.571734] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32785, for process ollama pid 933444 thread ollama pid 933506)
[74658.571735] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x000072c7c5205000 from client 0x1b (UTCL2)
[74658.571736] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[74658.571737] amdgpu 0000:03:00.0: amdgpu:      Faulty UTCL2 client ID: CB/DB (0x0)
[74658.571738] amdgpu 0000:03:00.0: amdgpu:      MORE_FAULTS: 0x0
[74658.571738] amdgpu 0000:03:00.0: amdgpu:      WALKER_ERROR: 0x0
[74658.571739] amdgpu 0000:03:00.0: amdgpu:      PERMISSION_FAULTS: 0x0
[74658.571739] amdgpu 0000:03:00.0: amdgpu:      MAPPING_ERROR: 0x0
[74658.571740] amdgpu 0000:03:00.0: amdgpu:      RW: 0x0
[74658.571745] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32785, for process ollama pid 933444 thread ollama pid 933506)
[74658.571746] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x000072c7c5205000 from client 0x1b (UTCL2)
[74658.571747] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[74658.571748] amdgpu 0000:03:00.0: amdgpu:      Faulty UTCL2 client ID: CB/DB (0x0)
[74658.571748] amdgpu 0000:03:00.0: amdgpu:      MORE_FAULTS: 0x0
[74658.571749] amdgpu 0000:03:00.0: amdgpu:      WALKER_ERROR: 0x0
[74658.571750] amdgpu 0000:03:00.0: amdgpu:      PERMISSION_FAULTS: 0x0
[74658.571750] amdgpu 0000:03:00.0: amdgpu:      MAPPING_ERROR: 0x0
[74658.571751] amdgpu 0000:03:00.0: amdgpu:      RW: 0x0
[74658.571756] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32785, for process ollama pid 933444 thread ollama pid 933506)
[74658.571757] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x000072c7c5205000 from client 0x1b (UTCL2)
[74658.571758] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[74658.571758] amdgpu 0000:03:00.0: amdgpu:      Faulty UTCL2 client ID: CB/DB (0x0)
[74658.571759] amdgpu 0000:03:00.0: amdgpu:      MORE_FAULTS: 0x0
[74658.571760] amdgpu 0000:03:00.0: amdgpu:      WALKER_ERROR: 0x0
[74658.571760] amdgpu 0000:03:00.0: amdgpu:      PERMISSION_FAULTS: 0x0
[74658.571761] amdgpu 0000:03:00.0: amdgpu:      MAPPING_ERROR: 0x0
[74658.571762] amdgpu 0000:03:00.0: amdgpu:      RW: 0x0
[74658.571767] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32785, for process ollama pid 933444 thread ollama pid 933506)
[74658.571768] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x000072c7c5205000 from client 0x1b (UTCL2)
[74658.571769] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[74658.571769] amdgpu 0000:03:00.0: amdgpu:      Faulty UTCL2 client ID: CB/DB (0x0)
[74658.571770] amdgpu 0000:03:00.0: amdgpu:      MORE_FAULTS: 0x0
[74658.571770] amdgpu 0000:03:00.0: amdgpu:      WALKER_ERROR: 0x0
[74658.571771] amdgpu 0000:03:00.0: amdgpu:      PERMISSION_FAULTS: 0x0
[74658.571772] amdgpu 0000:03:00.0: amdgpu:      MAPPING_ERROR: 0x0
[74658.571772] amdgpu 0000:03:00.0: amdgpu:      RW: 0x0
[74662.571664] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74662.571688] amdgpu: sq_intr: error, se 2, data 0x100000, sa 0, priv 1, wave_id 0, simd_id 0, wgp_id 4, err_type 2
[74662.571691] amdgpu: sq_intr: error, se 2, data 0x100000, sa 0, priv 1, wave_id 0, simd_id 3, wgp_id 4, err_type 2
[74662.571693] amdgpu: sq_intr: error, se 2, data 0x100000, sa 0, priv 1, wave_id 0, simd_id 2, wgp_id 4, err_type 2
[74662.571695] amdgpu: sq_intr: error, se 2, data 0x100000, sa 0, priv 1, wave_id 0, simd_id 1, wgp_id 4, err_type 2
[74662.571696] amdgpu: sq_intr: error, se 2, data 0x100000, sa 1, priv 1, wave_id 0, simd_id 2, wgp_id 0, err_type 2
[74662.571697] amdgpu: sq_intr: error, se 3, data 0x100000, sa 1, priv 1, wave_id 0, simd_id 3, wgp_id 4, err_type 2
[74662.571698] amdgpu: sq_intr: error, se 2, data 0x100000, sa 0, priv 1, wave_id 0, simd_id 2, wgp_id 3, err_type 2
[74662.571699] amdgpu: sq_intr: error, se 3, data 0x100000, sa 1, priv 1, wave_id 0, simd_id 2, wgp_id 4, err_type 2
[74662.571701] amdgpu: sq_intr: error, se 0, data 0x100000, sa 0, priv 1, wave_id 0, simd_id 2, wgp_id 1, err_type 2
[74662.571702] amdgpu: sq_intr: error, se 2, data 0x100000, sa 1, priv 1, wave_id 0, simd_id 2, wgp_id 4, err_type 2
[74667.249606] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74667.249610] amdgpu: Resetting wave fronts (cpsch) on dev 00000000e866c4fd
[74667.249617] amdgpu 0000:03:00.0: amdgpu: Didn't find vmid for pasid 0x8011
[74675.193621] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74675.193624] amdgpu: Resetting wave fronts (cpsch) on dev 00000000e866c4fd
[74675.193631] amdgpu 0000:03:00.0: amdgpu: Didn't find vmid for pasid 0x8014
[74850.701220] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74850.701225] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[74854.763063] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74854.763068] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[74858.774592] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74858.774597] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[74862.786367] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74862.786372] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[74866.798261] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74866.798266] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[74870.810197] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74870.810202] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[74874.821984] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74874.821988] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[74878.833767] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74878.833772] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[74882.845450] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74882.845453] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[74886.857228] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74886.857231] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[74890.869084] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74890.869088] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[74894.873513] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74894.873518] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[74898.877880] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74898.877885] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[74902.882165] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74902.882169] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[74906.895182] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74906.895186] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[74910.911586] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74910.911589] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[74914.924833] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74914.924839] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[74918.924815] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74918.924819] amdgpu: Failed to evict process queues
[74918.924821] amdgpu: Failed to quiesce KFD
[74922.930635] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74922.930638] amdgpu: Failed to restore process queues
[74922.930639] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD
[74926.936559] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74926.936564] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[74930.957696] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74930.957699] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[74935.005577] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74935.005581] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[74939.005593] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74939.005597] amdgpu: Failed to evict process queues
[74939.005598] amdgpu: Failed to quiesce KFD
[74943.028570] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74943.028573] amdgpu: Resetting wave fronts (cpsch) on dev 00000000e866c4fd
[74943.028581] amdgpu 0000:03:00.0: amdgpu: Didn't find vmid for pasid 0x8011
[74994.569660] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74994.569665] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[74998.626772] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[74998.626776] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75002.626785] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75002.626789] amdgpu: Failed to evict process queues
[75002.626790] amdgpu: Failed to quiesce KFD
[75006.651368] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75006.651372] amdgpu: Resetting wave fronts (cpsch) on dev 00000000e866c4fd
[75006.651379] amdgpu 0000:03:00.0: amdgpu: Didn't find vmid for pasid 0x8011
[75110.746508] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75110.746512] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75114.797729] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75114.797734] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75118.805437] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75118.805441] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75122.808987] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75122.808991] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75126.812552] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75126.812556] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75130.816133] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75130.816137] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75134.819757] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75134.819761] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75138.823404] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75138.823408] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75142.827023] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75142.827027] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75146.830598] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75146.830603] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75150.834235] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75150.834239] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75154.837977] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75154.837981] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75158.841823] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75158.841827] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75162.845634] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75162.845639] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75166.849353] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75166.849357] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75170.853176] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75170.853181] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75174.857324] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75174.857330] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75178.857316] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75178.857320] amdgpu: Failed to evict process queues
[75178.857321] amdgpu: Failed to quiesce KFD
[75182.863552] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75182.863555] amdgpu: Failed to restore process queues
[75182.863556] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD
[75186.863539] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75186.863543] amdgpu: Failed to evict process queues
[75186.863544] amdgpu: Failed to quiesce KFD
[75190.873415] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75190.873419] amdgpu: Failed to restore process queues
[75190.873420] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD
[75194.873399] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75194.873401] amdgpu: Failed to evict process queues
[75194.873402] amdgpu: Failed to quiesce KFD
[75198.879356] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75198.879359] amdgpu: Failed to restore process queues
[75198.879360] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD
[75202.879467] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75202.879471] amdgpu: Failed to evict process queues
[75202.879472] amdgpu: Failed to quiesce KFD
[75206.883460] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75206.883463] amdgpu: Failed to restore process queues
[75206.883464] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD
[75210.883443] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75210.883446] amdgpu: Failed to evict process queues
[75210.883447] amdgpu: Failed to quiesce KFD
[75214.889366] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75214.889369] amdgpu: Failed to restore process queues
[75214.889370] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD
[75218.889351] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75218.889354] amdgpu: Failed to evict process queues
[75218.889355] amdgpu: Failed to quiesce KFD
[75222.895340] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75222.895343] amdgpu: Failed to restore process queues
[75222.895344] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD
[75226.895323] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75226.895327] amdgpu: Failed to evict process queues
[75226.895328] amdgpu: Failed to quiesce KFD
[75230.898977] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75230.898981] amdgpu: Failed to restore process queues
[75230.898982] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD
[75234.898959] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75234.898963] amdgpu: Failed to evict process queues
[75234.898964] amdgpu: Failed to quiesce KFD
[75238.905095] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75238.905099] amdgpu: Failed to restore process queues
[75238.905100] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD
[75242.905079] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75242.905083] amdgpu: Failed to evict process queues
[75242.905084] amdgpu: Failed to quiesce KFD
[75246.910996] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75246.911000] amdgpu: Failed to restore process queues
[75246.911001] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD
[75279.316895] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75279.316899] amdgpu: Failed to evict process queues
[75279.316900] amdgpu: Failed to quiesce KFD
[75283.322382] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75283.322386] amdgpu: Failed to restore process queues
[75283.322387] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD
[75287.322365] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75287.322369] amdgpu: Failed to evict process queues
[75287.322370] amdgpu: Failed to quiesce KFD
[75291.328266] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75291.328270] amdgpu: Failed to restore process queues
[75291.328272] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD
[75295.328251] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75295.328254] amdgpu: Failed to evict process queues
[75295.328255] amdgpu: Failed to quiesce KFD
[75299.332304] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75299.332307] amdgpu: Failed to restore process queues
[75299.332308] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD
[75303.332286] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75303.332290] amdgpu: Failed to evict process queues
[75303.332291] amdgpu: Failed to quiesce KFD
[75307.338189] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75307.338193] amdgpu: Failed to restore process queues
[75307.338193] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD
[75372.514647] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75372.514651] amdgpu: Failed to evict process queues
[75372.514652] amdgpu: Failed to quiesce KFD
[75376.517218] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75376.517221] amdgpu: Failed to restore process queues
[75376.517222] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD
[75501.892242] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75501.892246] amdgpu: Failed to evict process queues
[75501.892247] amdgpu: Failed to quiesce KFD
[75505.913016] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75505.913020] amdgpu: Resetting wave fronts (cpsch) on dev 00000000e866c4fd
[75505.913027] amdgpu 0000:03:00.0: amdgpu: Didn't find vmid for pasid 0x8011
[75637.613841] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75637.613845] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75641.669463] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75641.669467] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75645.677277] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75645.677281] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75649.680780] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75649.680784] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75653.684414] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75653.684419] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75657.687975] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75657.687979] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75661.691726] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75661.691730] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75665.695111] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75665.695115] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75669.698495] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75669.698499] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75673.702208] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75673.702212] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75677.705666] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75677.705670] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75681.709179] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75681.709183] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75685.712878] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75685.712882] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75689.716587] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75689.716591] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75693.720507] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75693.720510] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75697.724554] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75697.724558] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75701.729028] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75701.729033] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75705.768041] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75705.768046] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75709.774374] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75709.774379] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75713.808910] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75713.808915] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75717.817695] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75717.817700] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75721.855447] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75721.855451] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75725.862973] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75725.862978] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75729.893517] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75729.893522] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[75733.893534] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75733.893537] amdgpu: Failed to evict process queues
[75733.893539] amdgpu: Failed to quiesce KFD
[75737.915988] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[75737.915991] amdgpu: Resetting wave fronts (cpsch) on dev 00000000e866c4fd
[75737.915999] amdgpu 0000:03:00.0: amdgpu: Didn't find vmid for pasid 0x8011
[81445.494669] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81445.494671] amdgpu: Resetting wave fronts (cpsch) on dev 00000000e866c4fd
[81445.494679] amdgpu 0000:03:00.0: amdgpu: Didn't find vmid for pasid 0x8011
[81689.785341] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81689.785346] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81693.785328] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81693.785331] amdgpu: Failed to evict process queues
[81693.785332] amdgpu: Failed to quiesce KFD
[81697.788918] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81697.788921] amdgpu: Failed to restore process queues
[81697.788922] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD
[81701.792399] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81701.792404] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81705.805053] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81705.805058] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81709.817849] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81709.817854] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81713.830654] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81713.830659] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81717.843467] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81717.843471] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81721.856534] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81721.856539] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81725.869835] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81725.869839] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81729.883381] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81729.883389] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81733.899758] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81733.899761] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81737.899738] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81737.899742] amdgpu: Failed to evict process queues
[81737.899743] amdgpu: Failed to quiesce KFD
[81741.902263] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81741.902267] amdgpu: Failed to restore process queues
[81741.902269] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD
[81745.904901] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81745.904904] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81749.909894] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81749.909897] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81753.924789] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81753.924793] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81757.938917] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81757.938921] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81761.952736] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81761.952740] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81765.967069] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81765.967074] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81769.981609] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81769.981613] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81773.995553] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81773.995558] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81778.008707] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81778.008711] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81782.022107] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81782.022112] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81786.035002] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81786.035006] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81790.047844] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81790.047849] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81794.060562] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81794.060567] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81798.060543] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81798.060546] amdgpu: Failed to evict process queues
[81798.060547] amdgpu: Failed to quiesce KFD
[81802.063546] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81802.063548] amdgpu: Failed to restore process queues
[81802.063549] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD
[81806.066400] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81806.066405] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81810.071324] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81810.071328] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81814.084215] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81814.084219] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81818.096942] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81818.096946] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81822.109685] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81822.109689] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81826.122614] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81826.122618] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81830.127164] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81830.127168] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81834.145235] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81834.145239] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81838.165209] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81838.165214] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81842.169472] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81842.169477] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81846.182660] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81846.182664] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81850.214998] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81850.215002] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81854.229372] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81854.229376] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81858.277391] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81858.277395] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81862.293386] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81862.293390] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81866.320512] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81866.320516] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81870.334670] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81870.334677] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81874.334649] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81874.334653] amdgpu: Failed to evict process queues
[81874.334654] amdgpu: Failed to quiesce KFD
[81878.340606] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81878.340610] amdgpu: Failed to restore process queues
[81878.340611] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD
[81882.345168] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81882.345173] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81886.361487] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81886.361491] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81890.401063] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81890.401068] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81894.409440] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81894.409444] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81898.480003] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81898.480007] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81902.499650] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81902.499653] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81906.523949] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81906.523954] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81910.537804] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81910.537809] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81914.537798] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81914.537800] amdgpu: Failed to evict process queues
[81914.537801] amdgpu: Failed to quiesce KFD
[81918.542125] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81918.542127] amdgpu: Failed to restore process queues
[81918.542128] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD
[81922.547685] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81922.547690] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81926.566288] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81926.566292] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81930.579644] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81930.579648] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81934.592717] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81934.592720] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81938.606035] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81938.606039] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81942.620547] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81942.620552] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81946.643279] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81946.643284] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81950.658038] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81950.658042] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81954.677560] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81954.677564] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81958.692680] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81958.692684] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81962.713273] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81962.713277] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81966.728372] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81966.728377] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81970.748701] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81970.748705] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81974.760859] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81974.760863] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81978.818306] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81978.818310] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81982.827514] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81982.827519] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81986.827492] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81986.827495] amdgpu: Failed to evict process queues
[81986.827496] amdgpu: Failed to quiesce KFD
[81990.832319] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81990.832322] amdgpu: Failed to restore process queues
[81990.832323] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD
[81994.838673] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81994.838678] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[81998.848195] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[81998.848200] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[82002.875407] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[82002.875411] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[82006.881846] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[82006.881850] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[82010.940815] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[82010.940819] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[82014.959420] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[82014.959424] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[82018.971921] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[82018.971926] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[82022.975850] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[82022.975855] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[82026.979590] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[82026.979595] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[82030.983306] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[82030.983310] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[82034.986995] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[82034.986998] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[82038.990685] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[82038.990690] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[82042.994371] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[82042.994375] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[82046.998121] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[82046.998125] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[82051.001804] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[82051.001809] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[82055.005522] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[82055.005526] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62
[82059.007540] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[82059.007542] amdgpu: Failed to evict process queues
[82059.007543] amdgpu: Failed to quiesce KFD
[82063.014424] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[82063.014428] amdgpu: Failed to restore process queues
[82063.014429] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD
[82067.014406] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[82067.014409] amdgpu: Failed to evict process queues
[82067.014411] amdgpu: Failed to quiesce KFD
[82071.042294] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[82071.042297] amdgpu: Resetting wave fronts (cpsch) on dev 00000000e866c4fd
[82071.042304] amdgpu 0000:03:00.0: amdgpu: Didn't find vmid for pasid 0x8011

Fails with null pointer :(

<!-- gh-comment-id:1908674324 --> @0xdeafbeef commented on GitHub (Jan 24, 2024): > > It fails because fedora doesn't yet ship hipblas. I'll later check using docker > > @0xdeafbeef we haven't pushed an updated official image yet, but I've pushed an image to `dhiltgen/ollama:0.1.21-rc3` which you could try with something like: > > ``` > docker run --privileged --rm -it --device /dev/kfd -e OLLAMA_DEBUG=1 dhiltgen/ollama:0.1.21-rc3 > ``` > > Also note that recent builds support both ROCm v6 and v5, if that helps get the necessary dependencies on your system, so you shouldn't have to symlink different versions. ``` [74658.571655] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32785, for process ollama pid 933444 thread ollama pid 933506) [74658.571659] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000072c7c5a08000 from client 0x1b (UTCL2) [74658.571661] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00801431 [74658.571662] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: SQC (data) (0xa) [74658.571663] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1 [74658.571664] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0 [74658.571665] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [74658.571666] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0 [74658.571666] amdgpu 0000:03:00.0: amdgpu: RW: 0x0 [74658.571678] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32785, for process ollama pid 933444 thread ollama pid 933506) [74658.571680] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000072c7c5a08000 from client 0x1b (UTCL2) [74658.571681] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00801231 [74658.571682] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: SQC (inst) (0x9) [74658.571682] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1 [74658.571683] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0 [74658.571684] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [74658.571684] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0 [74658.571685] amdgpu 0000:03:00.0: amdgpu: RW: 0x0 [74658.571690] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32785, for process ollama pid 933444 thread ollama pid 933506) [74658.571691] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000072c7c5608000 from client 0x1b (UTCL2) [74658.571692] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000 [74658.571693] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0) [74658.571694] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0 [74658.571694] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0 [74658.571695] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0 [74658.571696] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0 [74658.571696] amdgpu 0000:03:00.0: amdgpu: RW: 0x0 [74658.571701] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32785, for process ollama pid 933444 thread ollama pid 933506) [74658.571703] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000072c7c5205000 from client 0x1b (UTCL2) [74658.571703] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000 [74658.571704] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0) [74658.571705] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0 [74658.571705] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0 [74658.571706] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0 [74658.571707] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0 [74658.571707] amdgpu 0000:03:00.0: amdgpu: RW: 0x0 [74658.571712] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32785, for process ollama pid 933444 thread ollama pid 933506) [74658.571713] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000072c7c5205000 from client 0x1b (UTCL2) [74658.571714] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000 [74658.571715] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0) [74658.571716] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0 [74658.571716] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0 [74658.571717] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0 [74658.571718] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0 [74658.571718] amdgpu 0000:03:00.0: amdgpu: RW: 0x0 [74658.571723] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32785, for process ollama pid 933444 thread ollama pid 933506) [74658.571725] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000072c7c5205000 from client 0x1b (UTCL2) [74658.571725] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000 [74658.571726] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0) [74658.571727] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0 [74658.571727] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0 [74658.571728] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0 [74658.571729] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0 [74658.571729] amdgpu 0000:03:00.0: amdgpu: RW: 0x0 [74658.571734] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32785, for process ollama pid 933444 thread ollama pid 933506) [74658.571735] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000072c7c5205000 from client 0x1b (UTCL2) [74658.571736] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000 [74658.571737] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0) [74658.571738] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0 [74658.571738] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0 [74658.571739] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0 [74658.571739] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0 [74658.571740] amdgpu 0000:03:00.0: amdgpu: RW: 0x0 [74658.571745] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32785, for process ollama pid 933444 thread ollama pid 933506) [74658.571746] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000072c7c5205000 from client 0x1b (UTCL2) [74658.571747] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000 [74658.571748] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0) [74658.571748] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0 [74658.571749] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0 [74658.571750] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0 [74658.571750] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0 [74658.571751] amdgpu 0000:03:00.0: amdgpu: RW: 0x0 [74658.571756] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32785, for process ollama pid 933444 thread ollama pid 933506) [74658.571757] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000072c7c5205000 from client 0x1b (UTCL2) [74658.571758] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000 [74658.571758] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0) [74658.571759] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0 [74658.571760] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0 [74658.571760] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0 [74658.571761] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0 [74658.571762] amdgpu 0000:03:00.0: amdgpu: RW: 0x0 [74658.571767] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32785, for process ollama pid 933444 thread ollama pid 933506) [74658.571768] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000072c7c5205000 from client 0x1b (UTCL2) [74658.571769] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000 [74658.571769] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0) [74658.571770] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0 [74658.571770] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0 [74658.571771] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0 [74658.571772] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0 [74658.571772] amdgpu 0000:03:00.0: amdgpu: RW: 0x0 [74662.571664] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74662.571688] amdgpu: sq_intr: error, se 2, data 0x100000, sa 0, priv 1, wave_id 0, simd_id 0, wgp_id 4, err_type 2 [74662.571691] amdgpu: sq_intr: error, se 2, data 0x100000, sa 0, priv 1, wave_id 0, simd_id 3, wgp_id 4, err_type 2 [74662.571693] amdgpu: sq_intr: error, se 2, data 0x100000, sa 0, priv 1, wave_id 0, simd_id 2, wgp_id 4, err_type 2 [74662.571695] amdgpu: sq_intr: error, se 2, data 0x100000, sa 0, priv 1, wave_id 0, simd_id 1, wgp_id 4, err_type 2 [74662.571696] amdgpu: sq_intr: error, se 2, data 0x100000, sa 1, priv 1, wave_id 0, simd_id 2, wgp_id 0, err_type 2 [74662.571697] amdgpu: sq_intr: error, se 3, data 0x100000, sa 1, priv 1, wave_id 0, simd_id 3, wgp_id 4, err_type 2 [74662.571698] amdgpu: sq_intr: error, se 2, data 0x100000, sa 0, priv 1, wave_id 0, simd_id 2, wgp_id 3, err_type 2 [74662.571699] amdgpu: sq_intr: error, se 3, data 0x100000, sa 1, priv 1, wave_id 0, simd_id 2, wgp_id 4, err_type 2 [74662.571701] amdgpu: sq_intr: error, se 0, data 0x100000, sa 0, priv 1, wave_id 0, simd_id 2, wgp_id 1, err_type 2 [74662.571702] amdgpu: sq_intr: error, se 2, data 0x100000, sa 1, priv 1, wave_id 0, simd_id 2, wgp_id 4, err_type 2 [74667.249606] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74667.249610] amdgpu: Resetting wave fronts (cpsch) on dev 00000000e866c4fd [74667.249617] amdgpu 0000:03:00.0: amdgpu: Didn't find vmid for pasid 0x8011 [74675.193621] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74675.193624] amdgpu: Resetting wave fronts (cpsch) on dev 00000000e866c4fd [74675.193631] amdgpu 0000:03:00.0: amdgpu: Didn't find vmid for pasid 0x8014 [74850.701220] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74850.701225] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [74854.763063] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74854.763068] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [74858.774592] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74858.774597] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [74862.786367] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74862.786372] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [74866.798261] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74866.798266] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [74870.810197] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74870.810202] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [74874.821984] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74874.821988] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [74878.833767] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74878.833772] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [74882.845450] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74882.845453] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [74886.857228] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74886.857231] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [74890.869084] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74890.869088] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [74894.873513] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74894.873518] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [74898.877880] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74898.877885] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [74902.882165] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74902.882169] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [74906.895182] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74906.895186] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [74910.911586] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74910.911589] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [74914.924833] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74914.924839] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [74918.924815] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74918.924819] amdgpu: Failed to evict process queues [74918.924821] amdgpu: Failed to quiesce KFD [74922.930635] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74922.930638] amdgpu: Failed to restore process queues [74922.930639] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD [74926.936559] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74926.936564] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [74930.957696] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74930.957699] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [74935.005577] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74935.005581] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [74939.005593] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74939.005597] amdgpu: Failed to evict process queues [74939.005598] amdgpu: Failed to quiesce KFD [74943.028570] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74943.028573] amdgpu: Resetting wave fronts (cpsch) on dev 00000000e866c4fd [74943.028581] amdgpu 0000:03:00.0: amdgpu: Didn't find vmid for pasid 0x8011 [74994.569660] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74994.569665] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [74998.626772] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [74998.626776] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75002.626785] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75002.626789] amdgpu: Failed to evict process queues [75002.626790] amdgpu: Failed to quiesce KFD [75006.651368] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75006.651372] amdgpu: Resetting wave fronts (cpsch) on dev 00000000e866c4fd [75006.651379] amdgpu 0000:03:00.0: amdgpu: Didn't find vmid for pasid 0x8011 [75110.746508] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75110.746512] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75114.797729] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75114.797734] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75118.805437] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75118.805441] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75122.808987] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75122.808991] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75126.812552] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75126.812556] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75130.816133] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75130.816137] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75134.819757] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75134.819761] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75138.823404] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75138.823408] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75142.827023] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75142.827027] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75146.830598] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75146.830603] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75150.834235] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75150.834239] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75154.837977] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75154.837981] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75158.841823] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75158.841827] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75162.845634] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75162.845639] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75166.849353] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75166.849357] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75170.853176] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75170.853181] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75174.857324] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75174.857330] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75178.857316] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75178.857320] amdgpu: Failed to evict process queues [75178.857321] amdgpu: Failed to quiesce KFD [75182.863552] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75182.863555] amdgpu: Failed to restore process queues [75182.863556] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD [75186.863539] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75186.863543] amdgpu: Failed to evict process queues [75186.863544] amdgpu: Failed to quiesce KFD [75190.873415] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75190.873419] amdgpu: Failed to restore process queues [75190.873420] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD [75194.873399] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75194.873401] amdgpu: Failed to evict process queues [75194.873402] amdgpu: Failed to quiesce KFD [75198.879356] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75198.879359] amdgpu: Failed to restore process queues [75198.879360] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD [75202.879467] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75202.879471] amdgpu: Failed to evict process queues [75202.879472] amdgpu: Failed to quiesce KFD [75206.883460] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75206.883463] amdgpu: Failed to restore process queues [75206.883464] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD [75210.883443] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75210.883446] amdgpu: Failed to evict process queues [75210.883447] amdgpu: Failed to quiesce KFD [75214.889366] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75214.889369] amdgpu: Failed to restore process queues [75214.889370] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD [75218.889351] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75218.889354] amdgpu: Failed to evict process queues [75218.889355] amdgpu: Failed to quiesce KFD [75222.895340] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75222.895343] amdgpu: Failed to restore process queues [75222.895344] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD [75226.895323] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75226.895327] amdgpu: Failed to evict process queues [75226.895328] amdgpu: Failed to quiesce KFD [75230.898977] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75230.898981] amdgpu: Failed to restore process queues [75230.898982] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD [75234.898959] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75234.898963] amdgpu: Failed to evict process queues [75234.898964] amdgpu: Failed to quiesce KFD [75238.905095] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75238.905099] amdgpu: Failed to restore process queues [75238.905100] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD [75242.905079] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75242.905083] amdgpu: Failed to evict process queues [75242.905084] amdgpu: Failed to quiesce KFD [75246.910996] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75246.911000] amdgpu: Failed to restore process queues [75246.911001] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD [75279.316895] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75279.316899] amdgpu: Failed to evict process queues [75279.316900] amdgpu: Failed to quiesce KFD [75283.322382] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75283.322386] amdgpu: Failed to restore process queues [75283.322387] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD [75287.322365] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75287.322369] amdgpu: Failed to evict process queues [75287.322370] amdgpu: Failed to quiesce KFD [75291.328266] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75291.328270] amdgpu: Failed to restore process queues [75291.328272] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD [75295.328251] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75295.328254] amdgpu: Failed to evict process queues [75295.328255] amdgpu: Failed to quiesce KFD [75299.332304] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75299.332307] amdgpu: Failed to restore process queues [75299.332308] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD [75303.332286] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75303.332290] amdgpu: Failed to evict process queues [75303.332291] amdgpu: Failed to quiesce KFD [75307.338189] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75307.338193] amdgpu: Failed to restore process queues [75307.338193] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD [75372.514647] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75372.514651] amdgpu: Failed to evict process queues [75372.514652] amdgpu: Failed to quiesce KFD [75376.517218] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75376.517221] amdgpu: Failed to restore process queues [75376.517222] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD [75501.892242] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75501.892246] amdgpu: Failed to evict process queues [75501.892247] amdgpu: Failed to quiesce KFD [75505.913016] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75505.913020] amdgpu: Resetting wave fronts (cpsch) on dev 00000000e866c4fd [75505.913027] amdgpu 0000:03:00.0: amdgpu: Didn't find vmid for pasid 0x8011 [75637.613841] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75637.613845] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75641.669463] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75641.669467] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75645.677277] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75645.677281] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75649.680780] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75649.680784] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75653.684414] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75653.684419] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75657.687975] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75657.687979] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75661.691726] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75661.691730] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75665.695111] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75665.695115] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75669.698495] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75669.698499] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75673.702208] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75673.702212] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75677.705666] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75677.705670] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75681.709179] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75681.709183] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75685.712878] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75685.712882] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75689.716587] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75689.716591] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75693.720507] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75693.720510] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75697.724554] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75697.724558] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75701.729028] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75701.729033] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75705.768041] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75705.768046] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75709.774374] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75709.774379] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75713.808910] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75713.808915] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75717.817695] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75717.817700] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75721.855447] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75721.855451] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75725.862973] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75725.862978] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75729.893517] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75729.893522] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [75733.893534] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75733.893537] amdgpu: Failed to evict process queues [75733.893539] amdgpu: Failed to quiesce KFD [75737.915988] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [75737.915991] amdgpu: Resetting wave fronts (cpsch) on dev 00000000e866c4fd [75737.915999] amdgpu 0000:03:00.0: amdgpu: Didn't find vmid for pasid 0x8011 [81445.494669] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81445.494671] amdgpu: Resetting wave fronts (cpsch) on dev 00000000e866c4fd [81445.494679] amdgpu 0000:03:00.0: amdgpu: Didn't find vmid for pasid 0x8011 [81689.785341] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81689.785346] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81693.785328] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81693.785331] amdgpu: Failed to evict process queues [81693.785332] amdgpu: Failed to quiesce KFD [81697.788918] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81697.788921] amdgpu: Failed to restore process queues [81697.788922] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD [81701.792399] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81701.792404] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81705.805053] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81705.805058] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81709.817849] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81709.817854] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81713.830654] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81713.830659] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81717.843467] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81717.843471] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81721.856534] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81721.856539] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81725.869835] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81725.869839] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81729.883381] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81729.883389] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81733.899758] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81733.899761] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81737.899738] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81737.899742] amdgpu: Failed to evict process queues [81737.899743] amdgpu: Failed to quiesce KFD [81741.902263] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81741.902267] amdgpu: Failed to restore process queues [81741.902269] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD [81745.904901] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81745.904904] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81749.909894] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81749.909897] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81753.924789] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81753.924793] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81757.938917] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81757.938921] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81761.952736] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81761.952740] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81765.967069] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81765.967074] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81769.981609] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81769.981613] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81773.995553] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81773.995558] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81778.008707] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81778.008711] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81782.022107] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81782.022112] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81786.035002] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81786.035006] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81790.047844] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81790.047849] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81794.060562] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81794.060567] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81798.060543] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81798.060546] amdgpu: Failed to evict process queues [81798.060547] amdgpu: Failed to quiesce KFD [81802.063546] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81802.063548] amdgpu: Failed to restore process queues [81802.063549] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD [81806.066400] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81806.066405] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81810.071324] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81810.071328] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81814.084215] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81814.084219] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81818.096942] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81818.096946] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81822.109685] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81822.109689] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81826.122614] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81826.122618] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81830.127164] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81830.127168] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81834.145235] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81834.145239] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81838.165209] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81838.165214] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81842.169472] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81842.169477] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81846.182660] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81846.182664] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81850.214998] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81850.215002] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81854.229372] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81854.229376] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81858.277391] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81858.277395] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81862.293386] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81862.293390] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81866.320512] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81866.320516] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81870.334670] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81870.334677] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81874.334649] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81874.334653] amdgpu: Failed to evict process queues [81874.334654] amdgpu: Failed to quiesce KFD [81878.340606] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81878.340610] amdgpu: Failed to restore process queues [81878.340611] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD [81882.345168] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81882.345173] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81886.361487] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81886.361491] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81890.401063] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81890.401068] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81894.409440] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81894.409444] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81898.480003] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81898.480007] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81902.499650] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81902.499653] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81906.523949] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81906.523954] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81910.537804] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81910.537809] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81914.537798] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81914.537800] amdgpu: Failed to evict process queues [81914.537801] amdgpu: Failed to quiesce KFD [81918.542125] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81918.542127] amdgpu: Failed to restore process queues [81918.542128] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD [81922.547685] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81922.547690] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81926.566288] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81926.566292] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81930.579644] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81930.579648] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81934.592717] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81934.592720] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81938.606035] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81938.606039] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81942.620547] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81942.620552] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81946.643279] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81946.643284] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81950.658038] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81950.658042] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81954.677560] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81954.677564] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81958.692680] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81958.692684] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81962.713273] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81962.713277] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81966.728372] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81966.728377] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81970.748701] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81970.748705] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81974.760859] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81974.760863] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81978.818306] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81978.818310] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81982.827514] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81982.827519] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81986.827492] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81986.827495] amdgpu: Failed to evict process queues [81986.827496] amdgpu: Failed to quiesce KFD [81990.832319] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81990.832322] amdgpu: Failed to restore process queues [81990.832323] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD [81994.838673] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81994.838678] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [81998.848195] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [81998.848200] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [82002.875407] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [82002.875411] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [82006.881846] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [82006.881850] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [82010.940815] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [82010.940819] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [82014.959420] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [82014.959424] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [82018.971921] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [82018.971926] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [82022.975850] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [82022.975855] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [82026.979590] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [82026.979595] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [82030.983306] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [82030.983310] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [82034.986995] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [82034.986998] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [82038.990685] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [82038.990690] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [82042.994371] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [82042.994375] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [82046.998121] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [82046.998125] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [82051.001804] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [82051.001809] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [82055.005522] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [82055.005526] amdgpu: Pasid 0x8011 DQM create queue type 0 failed. ret -62 [82059.007540] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [82059.007542] amdgpu: Failed to evict process queues [82059.007543] amdgpu: Failed to quiesce KFD [82063.014424] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [82063.014428] amdgpu: Failed to restore process queues [82063.014429] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD [82067.014406] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [82067.014409] amdgpu: Failed to evict process queues [82067.014411] amdgpu: Failed to quiesce KFD [82071.042294] amdgpu 0000:03:00.0: amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out [82071.042297] amdgpu: Resetting wave fronts (cpsch) on dev 00000000e866c4fd [82071.042304] amdgpu 0000:03:00.0: amdgpu: Didn't find vmid for pasid 0x8011 ``` Fails with null pointer :(
Author
Owner

@dhiltgen commented on GitHub (Jan 24, 2024):

@0xdeafbeef it looks like it got past the detection logic and was deep in llama.cpp/ROCm when things went bad. I'm wondering if this is due to us not targeting your specific GPU processor. I can build a test container image with different GPU targets to try out. Can you share what type of Radeon card you have?

<!-- gh-comment-id:1908725924 --> @dhiltgen commented on GitHub (Jan 24, 2024): @0xdeafbeef it looks like it got past the detection logic and was deep in llama.cpp/ROCm when things went bad. I'm wondering if this is due to us not targeting your specific GPU processor. I can build a test container image with different GPU targets to try out. Can you share what type of Radeon card you have?
Author
Owner

@0xdeafbeef commented on GitHub (Jan 24, 2024):

@0xdeafbeef it looks like it got past the detection logic and was deep in llama.cpp/ROCm when things went bad. I'm wondering if this is due to us not targeting your specific GPU processor. I can build a test container image with different GPU targets to try out. Can you share what type of Radeon card you have?

  Name:                    gfx1030
  Uuid:                    GPU-8f3c72db82948540
  Marketing Name:          AMD Radeon RX 6900 XT
<!-- gh-comment-id:1908763831 --> @0xdeafbeef commented on GitHub (Jan 24, 2024): > @0xdeafbeef it looks like it got past the detection logic and was deep in llama.cpp/ROCm when things went bad. I'm wondering if this is due to us not targeting your specific GPU processor. I can build a test container image with different GPU targets to try out. Can you share what type of Radeon card you have? ``` Name: gfx1030 Uuid: GPU-8f3c72db82948540 Marketing Name: AMD Radeon RX 6900 XT ```
Author
Owner

@dhiltgen commented on GitHub (Jan 24, 2024):

gfx1030

It looks like we're already building with that target so my theory doesn't work. There's something else going wrong.

Until we can figure this one out, you can force it to run on the CPU with OLLAMA_LLM_LIBRARY="cpu_avx2"

<!-- gh-comment-id:1908773194 --> @dhiltgen commented on GitHub (Jan 24, 2024): > gfx1030 It looks like we're already building with that [target](https://github.com/ollama/ollama/blob/main/llm/generate/gen_linux.sh#L32) so my theory doesn't work. There's something else going wrong. Until we can figure this one out, you can [force](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#llm-libraries) it to run on the CPU with `OLLAMA_LLM_LIBRARY="cpu_avx2"`
Author
Owner

@dixonl90 commented on GitHub (Jan 24, 2024):

@dhiltgen thanks for making an image. It also crashes for me when I try and send a prompt.

  Name:                    gfx803
  Uuid:                    GPU-XX
  Marketing Name:          Radeon RX 570 Series
<!-- gh-comment-id:1908787927 --> @dixonl90 commented on GitHub (Jan 24, 2024): @dhiltgen thanks for making an image. It also crashes for me when I try and send a prompt. ``` Name: gfx803 Uuid: GPU-XX Marketing Name: Radeon RX 570 Series ```
Author
Owner

@dhiltgen commented on GitHub (Jan 24, 2024):

@0xdeafbeef can you share the output of rocm-smi --showdriverversion --showproductname --showhw and rocm-smi -V

@dixonl90 can you confirm the crash looks similar to https://github.com/ollama/ollama/issues/738#issuecomment-1908674324 ?

I've pushed an updated image dhiltgen/ollama:0.1.21-rc4 which has some more debug logging enabled. For these systems where you're seeing a crash when trying to send a prompt, could you share the log output before sending a prompt?

docker run --privileged --rm -it --device /dev/kfd -e OLLAMA_DEBUG=1 dhiltgen/ollama:0.1.21-rc4

The image I've pushed is compiled with ROCm v6, so if the host is v5, maybe that's a possible cause. I'll try to build a v5 based image and push that as well to see if that might yield a working setup.

<!-- gh-comment-id:1908798738 --> @dhiltgen commented on GitHub (Jan 24, 2024): @0xdeafbeef can you share the output of `rocm-smi --showdriverversion --showproductname --showhw` and `rocm-smi -V` @dixonl90 can you confirm the crash looks similar to https://github.com/ollama/ollama/issues/738#issuecomment-1908674324 ? I've pushed an updated image `dhiltgen/ollama:0.1.21-rc4` which has some more debug logging enabled. For these systems where you're seeing a crash when trying to send a prompt, could you share the log output before sending a prompt? ``` docker run --privileged --rm -it --device /dev/kfd -e OLLAMA_DEBUG=1 dhiltgen/ollama:0.1.21-rc4 ``` The image I've pushed is compiled with ROCm v6, so if the host is v5, maybe that's a possible cause. I'll try to build a v5 based image and push that as well to see if that might yield a working setup.
Author
Owner

@dhiltgen commented on GitHub (Jan 24, 2024):

I've pushed up dhiltgen/ollama:0.1.21-rocmv5 to explore if this is a rocm/driver mismatch in the container. On my test setup with a v6 host, the v5 container works, but maybe the reverse isn't true.

<!-- gh-comment-id:1908855240 --> @dhiltgen commented on GitHub (Jan 24, 2024): I've pushed up `dhiltgen/ollama:0.1.21-rocmv5` to explore if this is a rocm/driver mismatch in the container. On my test setup with a v6 host, the v5 container works, but maybe the reverse isn't true.
Author
Owner

@zaskokus commented on GitHub (Jan 24, 2024):

@dhiltgen https://nopaste.net/9UBIqwB41B

<!-- gh-comment-id:1908862561 --> @zaskokus commented on GitHub (Jan 24, 2024): @dhiltgen https://nopaste.net/9UBIqwB41B
Author
Owner

@0xdeafbeef commented on GitHub (Jan 24, 2024):

@dhiltgen https://nopaste.net/9UBIqwB41B
check dmesg | grep amd

<!-- gh-comment-id:1908867297 --> @0xdeafbeef commented on GitHub (Jan 24, 2024): > @dhiltgen https://nopaste.net/9UBIqwB41B check `dmesg | grep amd`
Author
Owner

@zaskokus commented on GitHub (Jan 24, 2024):

@0xdeafbeef

[180310.675453] amdgpu 0000:08:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
[180310.675464] amdgpu 0000:08:00.0: amdgpu:   in page starting at address 0x0000000000001000 from client 10
[180310.675466] amdgpu 0000:08:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000B32
[180310.675468] amdgpu 0000:08:00.0: amdgpu: 	 Faulty UTCL2 client ID: CPC (0x5)
[180310.675469] amdgpu 0000:08:00.0: amdgpu: 	 MORE_FAULTS: 0x0
[180310.675471] amdgpu 0000:08:00.0: amdgpu: 	 WALKER_ERROR: 0x1
[180310.675472] amdgpu 0000:08:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x3
[180310.675472] amdgpu 0000:08:00.0: amdgpu: 	 MAPPING_ERROR: 0x1
[180310.675473] amdgpu 0000:08:00.0: amdgpu: 	 RW: 0x0
[180311.383103] amdgpu 0000:08:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
[180311.383115] amdgpu 0000:08:00.0: amdgpu:   in page starting at address 0x0000000000001000 from client 10
[180311.383118] amdgpu 0000:08:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000B32
[180311.383120] amdgpu 0000:08:00.0: amdgpu: 	 Faulty UTCL2 client ID: CPC (0x5)
[180311.383122] amdgpu 0000:08:00.0: amdgpu: 	 MORE_FAULTS: 0x0
[180311.383124] amdgpu 0000:08:00.0: amdgpu: 	 WALKER_ERROR: 0x1
[180311.383125] amdgpu 0000:08:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x3
[180311.383126] amdgpu 0000:08:00.0: amdgpu: 	 MAPPING_ERROR: 0x1
[180311.383127] amdgpu 0000:08:00.0: amdgpu: 	 RW: 0x0
[180312.471317] amdgpu 0000:08:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
[180312.471328] amdgpu 0000:08:00.0: amdgpu:   in page starting at address 0x0000000000001000 from client 10
[180312.471331] amdgpu 0000:08:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000B32
[180312.471333] amdgpu 0000:08:00.0: amdgpu: 	 Faulty UTCL2 client ID: CPC (0x5)
[180312.471335] amdgpu 0000:08:00.0: amdgpu: 	 MORE_FAULTS: 0x0
[180312.471335] amdgpu 0000:08:00.0: amdgpu: 	 WALKER_ERROR: 0x1
[180312.471336] amdgpu 0000:08:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x3
[180312.471337] amdgpu 0000:08:00.0: amdgpu: 	 MAPPING_ERROR: 0x1
[180312.471338] amdgpu 0000:08:00.0: amdgpu: 	 RW: 0x0
[180315.942770] gmc_v11_0_process_interrupt: 5 callbacks suppressed
[180315.942776] amdgpu 0000:08:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
[180315.942787] amdgpu 0000:08:00.0: amdgpu:   in page starting at address 0x0000000000001000 from client 10
[180315.942789] amdgpu 0000:08:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000B32
[180315.942791] amdgpu 0000:08:00.0: amdgpu: 	 Faulty UTCL2 client ID: CPC (0x5)
[180315.942793] amdgpu 0000:08:00.0: amdgpu: 	 MORE_FAULTS: 0x0
[180315.942794] amdgpu 0000:08:00.0: amdgpu: 	 WALKER_ERROR: 0x1
[180315.942795] amdgpu 0000:08:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x3
[180315.942796] amdgpu 0000:08:00.0: amdgpu: 	 MAPPING_ERROR: 0x1
[180315.942796] amdgpu 0000:08:00.0: amdgpu: 	 RW: 0x0
[180316.134673] amdgpu 0000:08:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
[180316.134686] amdgpu 0000:08:00.0: amdgpu:   in page starting at address 0x0000000000001000 from client 10
[180316.134689] amdgpu 0000:08:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000B32
[180316.134692] amdgpu 0000:08:00.0: amdgpu: 	 Faulty UTCL2 client ID: CPC (0x5)
[180316.134693] amdgpu 0000:08:00.0: amdgpu: 	 MORE_FAULTS: 0x0
[180316.134695] amdgpu 0000:08:00.0: amdgpu: 	 WALKER_ERROR: 0x1
[180316.134696] amdgpu 0000:08:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x3
[180316.134697] amdgpu 0000:08:00.0: amdgpu: 	 MAPPING_ERROR: 0x1
[180316.134699] amdgpu 0000:08:00.0: amdgpu: 	 RW: 0x0
<!-- gh-comment-id:1908870522 --> @zaskokus commented on GitHub (Jan 24, 2024): @0xdeafbeef ``` [180310.675453] amdgpu 0000:08:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:0 pasid:0, for process pid 0 thread pid 0) [180310.675464] amdgpu 0000:08:00.0: amdgpu: in page starting at address 0x0000000000001000 from client 10 [180310.675466] amdgpu 0000:08:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000B32 [180310.675468] amdgpu 0000:08:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5) [180310.675469] amdgpu 0000:08:00.0: amdgpu: MORE_FAULTS: 0x0 [180310.675471] amdgpu 0000:08:00.0: amdgpu: WALKER_ERROR: 0x1 [180310.675472] amdgpu 0000:08:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [180310.675472] amdgpu 0000:08:00.0: amdgpu: MAPPING_ERROR: 0x1 [180310.675473] amdgpu 0000:08:00.0: amdgpu: RW: 0x0 [180311.383103] amdgpu 0000:08:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:0 pasid:0, for process pid 0 thread pid 0) [180311.383115] amdgpu 0000:08:00.0: amdgpu: in page starting at address 0x0000000000001000 from client 10 [180311.383118] amdgpu 0000:08:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000B32 [180311.383120] amdgpu 0000:08:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5) [180311.383122] amdgpu 0000:08:00.0: amdgpu: MORE_FAULTS: 0x0 [180311.383124] amdgpu 0000:08:00.0: amdgpu: WALKER_ERROR: 0x1 [180311.383125] amdgpu 0000:08:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [180311.383126] amdgpu 0000:08:00.0: amdgpu: MAPPING_ERROR: 0x1 [180311.383127] amdgpu 0000:08:00.0: amdgpu: RW: 0x0 [180312.471317] amdgpu 0000:08:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:0 pasid:0, for process pid 0 thread pid 0) [180312.471328] amdgpu 0000:08:00.0: amdgpu: in page starting at address 0x0000000000001000 from client 10 [180312.471331] amdgpu 0000:08:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000B32 [180312.471333] amdgpu 0000:08:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5) [180312.471335] amdgpu 0000:08:00.0: amdgpu: MORE_FAULTS: 0x0 [180312.471335] amdgpu 0000:08:00.0: amdgpu: WALKER_ERROR: 0x1 [180312.471336] amdgpu 0000:08:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [180312.471337] amdgpu 0000:08:00.0: amdgpu: MAPPING_ERROR: 0x1 [180312.471338] amdgpu 0000:08:00.0: amdgpu: RW: 0x0 [180315.942770] gmc_v11_0_process_interrupt: 5 callbacks suppressed [180315.942776] amdgpu 0000:08:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:0 pasid:0, for process pid 0 thread pid 0) [180315.942787] amdgpu 0000:08:00.0: amdgpu: in page starting at address 0x0000000000001000 from client 10 [180315.942789] amdgpu 0000:08:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000B32 [180315.942791] amdgpu 0000:08:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5) [180315.942793] amdgpu 0000:08:00.0: amdgpu: MORE_FAULTS: 0x0 [180315.942794] amdgpu 0000:08:00.0: amdgpu: WALKER_ERROR: 0x1 [180315.942795] amdgpu 0000:08:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [180315.942796] amdgpu 0000:08:00.0: amdgpu: MAPPING_ERROR: 0x1 [180315.942796] amdgpu 0000:08:00.0: amdgpu: RW: 0x0 [180316.134673] amdgpu 0000:08:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:0 pasid:0, for process pid 0 thread pid 0) [180316.134686] amdgpu 0000:08:00.0: amdgpu: in page starting at address 0x0000000000001000 from client 10 [180316.134689] amdgpu 0000:08:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000B32 [180316.134692] amdgpu 0000:08:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5) [180316.134693] amdgpu 0000:08:00.0: amdgpu: MORE_FAULTS: 0x0 [180316.134695] amdgpu 0000:08:00.0: amdgpu: WALKER_ERROR: 0x1 [180316.134696] amdgpu 0000:08:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [180316.134697] amdgpu 0000:08:00.0: amdgpu: MAPPING_ERROR: 0x1 [180316.134699] amdgpu 0000:08:00.0: amdgpu: RW: 0x0 ```
Author
Owner

@dhiltgen commented on GitHub (Jan 24, 2024):

@zaskokus I think you've hit #2054 - We don't have a fix for that yet, but hopefully the workaround noted on that issue will work for you so you can force ollama to run just on the discrete GPU and ignore your iGPU.

<!-- gh-comment-id:1908930074 --> @dhiltgen commented on GitHub (Jan 24, 2024): @zaskokus I think you've hit #2054 - We don't have a fix for that yet, but hopefully the workaround noted on that issue will work for you so you can force ollama to run just on the discrete GPU and ignore your iGPU.
Author
Owner

@0xdeafbeef commented on GitHub (Jan 24, 2024):

rocm-smi --showdriverversion --showproductname --showhw

========================= ROCm System Management Interface =========================
============================== Concise Hardware Info ===============================
GPU  DID   GFX RAS  SDMA RAS  UMC RAS  VBIOS             BUS
0    73af  N/A      N/A       N/A      113-EXT60460-X02  0000:03:00.0
====================================================================================
=========================== Version of System Component ============================
Driver version: 6.7.1-cb1.0.fc39.x86_64
====================================================================================
=================================== Product Info ===================================
GPU[0]          : Card series:          Navi 21 [Radeon RX 6900 XT]
GPU[0]          : Card model:           0x2332
GPU[0]          : Card vendor:          Advanced Micro Devices, Inc. [AMD/ATI]
GPU[0]          : Card SKU:             EXT60460
====================================================================================
=============================== End of ROCm SMI Log ================================
 rocm-smi -v


========================= ROCm System Management Interface =========================
====================================== VBIOS =======================================
GPU[0]          : VBIOS version: 113-EXT60460-X02
====================================================================================
=============================== End of ROCm SMI Log ================================

I have igpu, but it should be disabled

<!-- gh-comment-id:1908998519 --> @0xdeafbeef commented on GitHub (Jan 24, 2024): > rocm-smi --showdriverversion --showproductname --showhw ``` ========================= ROCm System Management Interface ========================= ============================== Concise Hardware Info =============================== GPU DID GFX RAS SDMA RAS UMC RAS VBIOS BUS 0 73af N/A N/A N/A 113-EXT60460-X02 0000:03:00.0 ==================================================================================== =========================== Version of System Component ============================ Driver version: 6.7.1-cb1.0.fc39.x86_64 ==================================================================================== =================================== Product Info =================================== GPU[0] : Card series: Navi 21 [Radeon RX 6900 XT] GPU[0] : Card model: 0x2332 GPU[0] : Card vendor: Advanced Micro Devices, Inc. [AMD/ATI] GPU[0] : Card SKU: EXT60460 ==================================================================================== =============================== End of ROCm SMI Log ================================ ``` ``` rocm-smi -v ========================= ROCm System Management Interface ========================= ====================================== VBIOS ======================================= GPU[0] : VBIOS version: 113-EXT60460-X02 ==================================================================================== =============================== End of ROCm SMI Log ================================ ``` I have igpu, but it should be disabled
Author
Owner

@dhiltgen commented on GitHub (Jan 24, 2024):

I have igpu, but it should be disabled

@0xdeafbeef in that case, I wonder if you've also hit #2054. If we're reporting discovered 2 ROCm GPU Devices in debug mode, you might want to try the workaround to force it to only use the discrete GPU.

<!-- gh-comment-id:1909028856 --> @dhiltgen commented on GitHub (Jan 24, 2024): > I have igpu, but it should be disabled @0xdeafbeef in that case, I wonder if you've also hit #2054. If we're reporting `discovered 2 ROCm GPU Devices` in debug mode, you might want to try the workaround to force it to only use the discrete GPU.
Author
Owner

@dixonl90 commented on GitHub (Jan 25, 2024):

@0xdeafbeef can you share the output of rocm-smi --showdriverversion --showproductname --showhw and rocm-smi -V

@dixonl90 can you confirm the crash looks similar to #738 (comment) ?

I've pushed an updated image dhiltgen/ollama:0.1.21-rc4 which has some more debug logging enabled. For these systems where you're seeing a crash when trying to send a prompt, could you share the log output before sending a prompt?

docker run --privileged --rm -it --device /dev/kfd -e OLLAMA_DEBUG=1 dhiltgen/ollama:0.1.21-rc4

The image I've pushed is compiled with ROCm v6, so if the host is v5, maybe that's a possible cause. I'll try to build a v5 based image and push that as well to see if that might yield a working setup.

@dhiltgen using your latest image, I get the following error. Looks to be different from above?

docker run --privileged --rm -it -v ./data:/root/.ollama --device /dev/kfd --device /dev/dri -p 11434:11434 --name ollama dhiltgen/ollama:0.1.21-rc4
2024/01/25 13:39:49 images.go:815: INFO total blobs: 6
2024/01/25 13:39:49 images.go:822: INFO total unused blobs removed: 0
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:	export GIN_MODE=release
 - using code:	gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST   /api/pull                 --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers)
[GIN-debug] POST   /api/generate             --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers)
[GIN-debug] POST   /api/chat                 --> github.com/jmorganca/ollama/server.ChatHandler (5 handlers)
[GIN-debug] POST   /api/embeddings           --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers)
[GIN-debug] POST   /api/create               --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers)
[GIN-debug] POST   /api/push                 --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers)
[GIN-debug] POST   /api/copy                 --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers)
[GIN-debug] DELETE /api/delete               --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers)
[GIN-debug] POST   /api/show                 --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers)
[GIN-debug] POST   /api/blobs/:digest        --> github.com/jmorganca/ollama/server.CreateBlobHandler (5 handlers)
[GIN-debug] HEAD   /api/blobs/:digest        --> github.com/jmorganca/ollama/server.HeadBlobHandler (5 handlers)
[GIN-debug] GET    /                         --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] GET    /api/tags                 --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
[GIN-debug] GET    /api/version              --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers)
[GIN-debug] HEAD   /                         --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD   /api/tags                 --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
[GIN-debug] HEAD   /api/version              --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers)
2024/01/25 13:39:49 routes.go:943: INFO Listening on [::]:11434 (version 0.0.0)
2024/01/25 13:39:49 payload_common.go:106: INFO Extracting dynamic libraries...
2024/01/25 13:39:52 payload_common.go:145: INFO Dynamic LLM libraries [cpu_avx2 rocm_v6 rocm_v5 cpu_avx cuda_v11 cpu]
2024/01/25 13:39:52 gpu.go:93: INFO Detecting GPU type
2024/01/25 13:39:52 gpu.go:212: INFO Searching for GPU management library libnvidia-ml.so
2024/01/25 13:39:52 gpu.go:258: INFO Discovered GPU libraries: []
2024/01/25 13:39:52 gpu.go:212: INFO Searching for GPU management library librocm_smi64.so
2024/01/25 13:39:52 gpu.go:258: INFO Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.6.0.60000 /opt/rocm-6.0.0/lib/librocm_smi64.so.6.0.60000]
2024/01/25 13:39:52 gpu.go:108: INFO Radeon GPU detected
[GIN] 2024/01/25 - 13:39:56 | 200 |       28.18µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/01/25 - 13:39:56 | 200 |     408.849µs |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/01/25 - 13:39:56 | 200 |     227.609µs |       127.0.0.1 | POST     "/api/show"
2024/01/25 13:39:56 cpu_common.go:11: INFO CPU has AVX2
loading library /tmp/ollama3155328738/rocm_v6/libext_server.so
2024/01/25 13:39:57 dyn_ext_server.go:90: INFO Loading Dynamic llm server: /tmp/ollama3155328738/rocm_v6/libext_server.so
2024/01/25 13:39:57 dyn_ext_server.go:145: INFO Initializing llama server

rocBLAS error: Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx803
 List of available TensileLibrary Files :
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx940.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx941.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx942.dat"
SIGSEGV: segmentation violation
PC=0x7f5a660d6bc7 m=9 sigcode=128
signal arrived during cgo execution

goroutine 87 [syscall]:
runtime.cgocall(0x9b4a50, 0xc0004d07f8)
	/usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc0004d07d0 sp=0xc0004d0798 pc=0x409b0b
github.com/jmorganca/ollama/llm._Cfunc_dyn_llama_server_init({0x7f59f0000ba0, 0x7f59f9218420, 0x7f59f9218b90, 0x7f59f9218c20, 0x7f59f9218dd0, 0x7f59f9218f40, 0x7f59f9219400, 0x7f59f92193e0, 0x7f59f9219490, 0x7f59f9219a40, ...}, ...)
	_cgo_gotypes.go:282 +0x45 fp=0xc0004d07f8 sp=0xc0004d07d0 pc=0x7c2d25
github.com/jmorganca/ollama/llm.newDynExtServer.func7(0xae83f9?, 0xc?)
	/go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:148 +0xef fp=0xc0004d08e8 sp=0xc0004d07f8 pc=0x7c424f
github.com/jmorganca/ollama/llm.newDynExtServer({0xc0005a6000, 0x2e}, {0xc000032e00, _}, {_, _, _}, {0x0, 0x0, 0x0}, ...)
	/go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:148 +0xa45 fp=0xc0004d0b88 sp=0xc0004d08e8 pc=0x7c3ee5
github.com/jmorganca/ollama/llm.newLlmServer({{_, _, _}, {_, _}, {_, _}}, {_, _}, {0x0, ...}, ...)
	/go/src/github.com/jmorganca/ollama/llm/llm.go:148 +0x36a fp=0xc0004d0d48 sp=0xc0004d0b88 pc=0x7c06ea
github.com/jmorganca/ollama/llm.New({0x0?, 0x1000100000100?}, {0xc000032e00, _}, {_, _, _}, {0x0, 0x0, 0x0}, ...)
	/go/src/github.com/jmorganca/ollama/llm/llm.go:123 +0x6f9 fp=0xc0004d0fb8 sp=0xc0004d0d48 pc=0x7c0119
github.com/jmorganca/ollama/server.load(0xc000002f00?, 0xc000002f00, {{0x0, 0x800, 0x200, 0x1, 0xffffffffffffffff, 0x0, 0x0, 0x1, ...}, ...}, ...)
	/go/src/github.com/jmorganca/ollama/server/routes.go:83 +0x3a5 fp=0xc0004d1138 sp=0xc0004d0fb8 pc=0x990da5
github.com/jmorganca/ollama/server.ChatHandler(0xc000134d00)
	/go/src/github.com/jmorganca/ollama/server/routes.go:1071 +0x828 fp=0xc0004d1748 sp=0xc0004d1138 pc=0x99b6e8
github.com/gin-gonic/gin.(*Context).Next(...)
	/root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174
github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func1(0xc000134d00)
	/go/src/github.com/jmorganca/ollama/server/routes.go:883 +0x68 fp=0xc0004d1780 sp=0xc0004d1748 pc=0x99a228
github.com/gin-gonic/gin.(*Context).Next(...)
	/root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174
github.com/gin-gonic/gin.CustomRecoveryWithWriter.func1(0xc000134d00)
	/root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/recovery.go:102 +0x7a fp=0xc0004d17d0 sp=0xc0004d1780 pc=0x97595a
github.com/gin-gonic/gin.(*Context).Next(...)
	/root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174
github.com/gin-gonic/gin.LoggerWithConfig.func1(0xc000134d00)
	/root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/logger.go:240 +0xde fp=0xc0004d1980 sp=0xc0004d17d0 pc=0x974afe
github.com/gin-gonic/gin.(*Context).Next(...)
	/root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174
github.com/gin-gonic/gin.(*Engine).handleHTTPRequest(0xc0000cfa00, 0xc000134d00)
	/root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:620 +0x65b fp=0xc0004d1b08 sp=0xc0004d1980 pc=0x973bbb
github.com/gin-gonic/gin.(*Engine).ServeHTTP(0xc0000cfa00, {0x106b0120?, 0xc0004208c0}, 0xc000135200)
	/root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:576 +0x1dd fp=0xc0004d1b48 sp=0xc0004d1b08 pc=0x97337d
net/http.serverHandler.ServeHTTP({0x106ae440?}, {0x106b0120?, 0xc0004208c0?}, 0x6?)
	/usr/local/go/src/net/http/server.go:2938 +0x8e fp=0xc0004d1b78 sp=0xc0004d1b48 pc=0x6ce60e
net/http.(*conn).serve(0xc0000fb440, {0x106b1788, 0xc000482840})
	/usr/local/go/src/net/http/server.go:2009 +0x5f4 fp=0xc0004d1fb8 sp=0xc0004d1b78 pc=0x6ca4f4
net/http.(*Server).Serve.func3()
	/usr/local/go/src/net/http/server.go:3086 +0x28 fp=0xc0004d1fe0 sp=0xc0004d1fb8 pc=0x6cee28
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004d1fe8 sp=0xc0004d1fe0 pc=0x46e0a1
created by net/http.(*Server).Serve in goroutine 1
	/usr/local/go/src/net/http/server.go:3086 +0x5cb

goroutine 1 [IO wait]:
runtime.gopark(0x4808b0?, 0xc0000c9848?, 0x98?, 0x98?, 0x4f69dd?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0000c9828 sp=0xc0000c9808 pc=0x43e6ae
runtime.netpollblock(0x46c112?, 0x4092a6?, 0x0?)
	/usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc0000c9860 sp=0xc0000c9828 pc=0x437137
internal/poll.runtime_pollWait(0x7f5a1cc33e80, 0x72)
	/usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc0000c9880 sp=0xc0000c9860 pc=0x4688c5
internal/poll.(*pollDesc).wait(0xc000468000?, 0x4?, 0x0)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0000c98a8 sp=0xc0000c9880 pc=0x4ef627
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc000468000)
	/usr/local/go/src/internal/poll/fd_unix.go:611 +0x2ac fp=0xc0000c9950 sp=0xc0000c98a8 pc=0x4f4b0c
net.(*netFD).accept(0xc000468000)
	/usr/local/go/src/net/fd_unix.go:172 +0x29 fp=0xc0000c9a08 sp=0xc0000c9950 pc=0x56b609
net.(*TCPListener).accept(0xc00043f580)
	/usr/local/go/src/net/tcpsock_posix.go:152 +0x1e fp=0xc0000c9a30 sp=0xc0000c9a08 pc=0x58041e
net.(*TCPListener).Accept(0xc00043f580)
	/usr/local/go/src/net/tcpsock.go:315 +0x30 fp=0xc0000c9a60 sp=0xc0000c9a30 pc=0x57f5d0
net/http.(*onceCloseListener).Accept(0xc0000fb440?)
	<autogenerated>:1 +0x24 fp=0xc0000c9a78 sp=0xc0000c9a60 pc=0x6f13a4
net/http.(*Server).Serve(0xc000376ff0, {0x106aff10, 0xc00043f580})
	/usr/local/go/src/net/http/server.go:3056 +0x364 fp=0xc0000c9ba8 sp=0xc0000c9a78 pc=0x6cea64
github.com/jmorganca/ollama/server.Serve({0x106aff10, 0xc00043f580})
	/go/src/github.com/jmorganca/ollama/server/routes.go:970 +0x488 fp=0xc0000c9c98 sp=0xc0000c9ba8 pc=0x99a708
github.com/jmorganca/ollama/cmd.RunServer(0xc000466300?, {0x10af4780?, 0x4?, 0xad0281?})
	/go/src/github.com/jmorganca/ollama/cmd/cmd.go:690 +0x199 fp=0xc0000c9d30 sp=0xc0000c9c98 pc=0x9acaf9
github.com/spf13/cobra.(*Command).execute(0xc0003ed800, {0x10af4780, 0x0, 0x0})
	/root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0x87c fp=0xc0000c9e68 sp=0xc0000c9d30 pc=0x7641dc
github.com/spf13/cobra.(*Command).ExecuteC(0xc0003ecc00)
	/root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc0000c9f20 sp=0xc0000c9e68 pc=0x764a05
github.com/spf13/cobra.(*Command).Execute(...)
	/root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	/root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
	/go/src/github.com/jmorganca/ollama/main.go:11 +0x4d fp=0xc0000c9f40 sp=0xc0000c9f20 pc=0x9b3b6d
runtime.main()
	/usr/local/go/src/runtime/proc.go:267 +0x2bb fp=0xc0000c9fe0 sp=0xc0000c9f40 pc=0x43e25b
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000c9fe8 sp=0xc0000c9fe0 pc=0x46e0a1

goroutine 2 [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000058fa8 sp=0xc000058f88 pc=0x43e6ae
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:404
runtime.forcegchelper()
	/usr/local/go/src/runtime/proc.go:322 +0xb3 fp=0xc000058fe0 sp=0xc000058fa8 pc=0x43e533
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000058fe8 sp=0xc000058fe0 pc=0x46e0a1
created by runtime.init.6 in goroutine 1
	/usr/local/go/src/runtime/proc.go:310 +0x1a

goroutine 3 [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000059778 sp=0xc000059758 pc=0x43e6ae
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:404
runtime.bgsweep(0x0?)
	/usr/local/go/src/runtime/mgcsweep.go:321 +0xdf fp=0xc0000597c8 sp=0xc000059778 pc=0x42a5ff
runtime.gcenable.func1()
	/usr/local/go/src/runtime/mgc.go:200 +0x25 fp=0xc0000597e0 sp=0xc0000597c8 pc=0x41f725
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000597e8 sp=0xc0000597e0 pc=0x46e0a1
created by runtime.gcenable in goroutine 1
	/usr/local/go/src/runtime/mgc.go:200 +0x66

goroutine 4 [GC scavenge wait]:
runtime.gopark(0xb9335e?, 0xb45f95?, 0x0?, 0x0?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000059f70 sp=0xc000059f50 pc=0x43e6ae
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:404
runtime.(*scavengerState).park(0x10ac4b00)
	/usr/local/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc000059fa0 sp=0xc000059f70 pc=0x427e29
runtime.bgscavenge(0x0?)
	/usr/local/go/src/runtime/mgcscavenge.go:658 +0x59 fp=0xc000059fc8 sp=0xc000059fa0 pc=0x4283d9
runtime.gcenable.func2()
	/usr/local/go/src/runtime/mgc.go:201 +0x25 fp=0xc000059fe0 sp=0xc000059fc8 pc=0x41f6c5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000059fe8 sp=0xc000059fe0 pc=0x46e0a1
created by runtime.gcenable in goroutine 1
	/usr/local/go/src/runtime/mgc.go:201 +0xa5

goroutine 5 [finalizer wait]:
runtime.gopark(0xac9240?, 0x10043f801?, 0x0?, 0x0?, 0x446865?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000058628 sp=0xc000058608 pc=0x43e6ae
runtime.runfinq()
	/usr/local/go/src/runtime/mfinal.go:193 +0x107 fp=0xc0000587e0 sp=0xc000058628 pc=0x41e7a7
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000587e8 sp=0xc0000587e0 pc=0x46e0a1
created by runtime.createfing in goroutine 1
	/usr/local/go/src/runtime/mfinal.go:163 +0x3d

goroutine 6 [select, locked to thread]:
runtime.gopark(0xc00005a7a8?, 0x2?, 0x49?, 0xe9?, 0xc00005a7a4?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00005a638 sp=0xc00005a618 pc=0x43e6ae
runtime.selectgo(0xc00005a7a8, 0xc00005a7a0, 0x0?, 0x0, 0x0?, 0x1)
	/usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc00005a758 sp=0xc00005a638 pc=0x44e1e5
runtime.ensureSigM.func1()
	/usr/local/go/src/runtime/signal_unix.go:1014 +0x19f fp=0xc00005a7e0 sp=0xc00005a758 pc=0x46521f
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00005a7e8 sp=0xc00005a7e0 pc=0x46e0a1
created by runtime.ensureSigM in goroutine 1
	/usr/local/go/src/runtime/signal_unix.go:997 +0xc8

goroutine 18 [syscall]:
runtime.notetsleepg(0x0?, 0x0?)
	/usr/local/go/src/runtime/lock_futex.go:236 +0x29 fp=0xc0000547a0 sp=0xc000054768 pc=0x411209
os/signal.signal_recv()
	/usr/local/go/src/runtime/sigqueue.go:152 +0x29 fp=0xc0000547c0 sp=0xc0000547a0 pc=0x46aa69
os/signal.loop()
	/usr/local/go/src/os/signal/signal_unix.go:23 +0x13 fp=0xc0000547e0 sp=0xc0000547c0 pc=0x6f3dd3
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000547e8 sp=0xc0000547e0 pc=0x46e0a1
created by os/signal.Notify.func1.1 in goroutine 1
	/usr/local/go/src/os/signal/signal.go:151 +0x1f

goroutine 19 [chan receive]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000054f18 sp=0xc000054ef8 pc=0x43e6ae
runtime.chanrecv(0xc00018d6e0, 0x0, 0x1)
	/usr/local/go/src/runtime/chan.go:583 +0x3cd fp=0xc000054f90 sp=0xc000054f18 pc=0x40beed
runtime.chanrecv1(0x0?, 0x0?)
	/usr/local/go/src/runtime/chan.go:442 +0x12 fp=0xc000054fb8 sp=0xc000054f90 pc=0x40baf2
github.com/jmorganca/ollama/server.Serve.func1()
	/go/src/github.com/jmorganca/ollama/server/routes.go:952 +0x25 fp=0xc000054fe0 sp=0xc000054fb8 pc=0x99a7a5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000054fe8 sp=0xc000054fe0 pc=0x46e0a1
created by github.com/jmorganca/ollama/server.Serve in goroutine 1
	/go/src/github.com/jmorganca/ollama/server/routes.go:951 +0x3f6

goroutine 16 [IO wait]:
runtime.gopark(0x75?, 0xb?, 0x0?, 0x0?, 0xc?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0000cb8f8 sp=0xc0000cb8d8 pc=0x43e6ae
runtime.netpollblock(0x47ea18?, 0x4092a6?, 0x0?)
	/usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc0000cb930 sp=0xc0000cb8f8 pc=0x437137
internal/poll.runtime_pollWait(0x7f5a1cc33d88, 0x72)
	/usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc0000cb950 sp=0xc0000cb930 pc=0x4688c5
internal/poll.(*pollDesc).wait(0xc000468a80?, 0xc000490000?, 0x0)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0000cb978 sp=0xc0000cb950 pc=0x4ef627
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000468a80, {0xc000490000, 0x1000, 0x1000})
	/usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc0000cba10 sp=0xc0000cb978 pc=0x4f091a
net.(*netFD).Read(0xc000468a80, {0xc000490000?, 0x4efae5?, 0x0?})
	/usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc0000cba58 sp=0xc0000cba10 pc=0x5695e5
net.(*conn).Read(0xc0005ae0f0, {0xc000490000?, 0x0?, 0xc000482938?})
	/usr/local/go/src/net/net.go:179 +0x45 fp=0xc0000cbaa0 sp=0xc0000cba58 pc=0x577885
net.(*TCPConn).Read(0xc000482930?, {0xc000490000?, 0x0?, 0xc0000cbac0?})
	<autogenerated>:1 +0x25 fp=0xc0000cbad0 sp=0xc0000cbaa0 pc=0x589785
net/http.(*connReader).Read(0xc000482930, {0xc000490000, 0x1000, 0x1000})
	/usr/local/go/src/net/http/server.go:791 +0x14b fp=0xc0000cbb20 sp=0xc0000cbad0 pc=0x6c47ab
bufio.(*Reader).fill(0xc00012c840)
	/usr/local/go/src/bufio/bufio.go:113 +0x103 fp=0xc0000cbb58 sp=0xc0000cbb20 pc=0x6543e3
bufio.(*Reader).Peek(0xc00012c840, 0x4)
	/usr/local/go/src/bufio/bufio.go:151 +0x53 fp=0xc0000cbb78 sp=0xc0000cbb58 pc=0x654513
net/http.(*conn).serve(0xc0000fa2d0, {0x106b1788, 0xc000482840})
	/usr/local/go/src/net/http/server.go:2044 +0x75c fp=0xc0000cbfb8 sp=0xc0000cbb78 pc=0x6ca65c
net/http.(*Server).Serve.func3()
	/usr/local/go/src/net/http/server.go:3086 +0x28 fp=0xc0000cbfe0 sp=0xc0000cbfb8 pc=0x6cee28
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000cbfe8 sp=0xc0000cbfe0 pc=0x46e0a1
created by net/http.(*Server).Serve in goroutine 1
	/usr/local/go/src/net/http/server.go:3086 +0x5cb

goroutine 83 [IO wait]:
runtime.gopark(0x7?, 0xb?, 0x0?, 0x0?, 0xd?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0005998f8 sp=0xc0005998d8 pc=0x43e6ae
runtime.netpollblock(0x47ea18?, 0x4092a6?, 0x0?)
	/usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc000599930 sp=0xc0005998f8 pc=0x437137
internal/poll.runtime_pollWait(0x7f5a1cc33c90, 0x72)
	/usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc000599950 sp=0xc000599930 pc=0x4688c5
internal/poll.(*pollDesc).wait(0xc000468b80?, 0xc000321000?, 0x0)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000599978 sp=0xc000599950 pc=0x4ef627
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000468b80, {0xc000321000, 0x1000, 0x1000})
	/usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc000599a10 sp=0xc000599978 pc=0x4f091a
net.(*netFD).Read(0xc000468b80, {0xc000321000?, 0x4efae5?, 0x0?})
	/usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc000599a58 sp=0xc000599a10 pc=0x5695e5
net.(*conn).Read(0xc0005ae0f8, {0xc000321000?, 0x0?, 0xc000482b78?})
	/usr/local/go/src/net/net.go:179 +0x45 fp=0xc000599aa0 sp=0xc000599a58 pc=0x577885
net.(*TCPConn).Read(0xc000482b70?, {0xc000321000?, 0x0?, 0xc00039fac0?})
	<autogenerated>:1 +0x25 fp=0xc000599ad0 sp=0xc000599aa0 pc=0x589785
net/http.(*connReader).Read(0xc000482b70, {0xc000321000, 0x1000, 0x1000})
	/usr/local/go/src/net/http/server.go:791 +0x14b fp=0xc000599b20 sp=0xc000599ad0 pc=0x6c47ab
bufio.(*Reader).fill(0xc00012c900)
	/usr/local/go/src/bufio/bufio.go:113 +0x103 fp=0xc000599b58 sp=0xc000599b20 pc=0x6543e3
bufio.(*Reader).Peek(0xc00012c900, 0x4)
	/usr/local/go/src/bufio/bufio.go:151 +0x53 fp=0xc000599b78 sp=0xc000599b58 pc=0x654513
net/http.(*conn).serve(0xc0000fa480, {0x106b1788, 0xc000482840})
	/usr/local/go/src/net/http/server.go:2044 +0x75c fp=0xc000599fb8 sp=0xc000599b78 pc=0x6ca65c
net/http.(*Server).Serve.func3()
	/usr/local/go/src/net/http/server.go:3086 +0x28 fp=0xc000599fe0 sp=0xc000599fb8 pc=0x6cee28
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000599fe8 sp=0xc000599fe0 pc=0x46e0a1
created by net/http.(*Server).Serve in goroutine 1
	/usr/local/go/src/net/http/server.go:3086 +0x5cb

goroutine 34 [GC worker (idle)]:
runtime.gopark(0x466f55630f60?, 0x3?, 0x1f?, 0xd9?, 0xc000056fd0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000056f50 sp=0xc000056f30 pc=0x43e6ae
runtime.gcBgMarkWorker()
	/usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000056fe0 sp=0xc000056f50 pc=0x4212a5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000056fe8 sp=0xc000056fe0 pc=0x46e0a1
created by runtime.gcBgMarkStartWorkers in goroutine 23
	/usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 50 [GC worker (idle)]:
runtime.gopark(0x466f556312e4?, 0x3?, 0xa4?, 0x1a?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00013a750 sp=0xc00013a730 pc=0x43e6ae
runtime.gcBgMarkWorker()
	/usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc00013a7e0 sp=0xc00013a750 pc=0x4212a5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00013a7e8 sp=0xc00013a7e0 pc=0x46e0a1
created by runtime.gcBgMarkStartWorkers in goroutine 23
	/usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 66 [GC worker (idle)]:
runtime.gopark(0x466f55631014?, 0x1?, 0xdb?, 0x72?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000055750 sp=0xc000055730 pc=0x43e6ae
runtime.gcBgMarkWorker()
	/usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0000557e0 sp=0xc000055750 pc=0x4212a5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000557e8 sp=0xc0000557e0 pc=0x46e0a1
created by runtime.gcBgMarkStartWorkers in goroutine 23
	/usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 7 [GC worker (idle)]:
runtime.gopark(0x466f55631172?, 0x3?, 0x6a?, 0x2c?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000056750 sp=0xc000056730 pc=0x43e6ae
runtime.gcBgMarkWorker()
	/usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0000567e0 sp=0xc000056750 pc=0x4212a5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000567e8 sp=0xc0000567e0 pc=0x46e0a1
created by runtime.gcBgMarkStartWorkers in goroutine 23
	/usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 8 [GC worker (idle)]:
runtime.gopark(0x466f556312b2?, 0x3?, 0x2?, 0x30?, 0xc0000577d0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000057750 sp=0xc000057730 pc=0x43e6ae
runtime.gcBgMarkWorker()
	/usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0000577e0 sp=0xc000057750 pc=0x4212a5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000577e8 sp=0xc0000577e0 pc=0x46e0a1
created by runtime.gcBgMarkStartWorkers in goroutine 23
	/usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 9 [GC worker (idle)]:
runtime.gopark(0x466f55631370?, 0x1?, 0x94?, 0xba?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000057f50 sp=0xc000057f30 pc=0x43e6ae
runtime.gcBgMarkWorker()
	/usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000057fe0 sp=0xc000057f50 pc=0x4212a5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000057fe8 sp=0xc000057fe0 pc=0x46e0a1
created by runtime.gcBgMarkStartWorkers in goroutine 23
	/usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 10 [GC worker (idle)]:
runtime.gopark(0x466f55630b82?, 0x3?, 0xc8?, 0xe?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00005af50 sp=0xc00005af30 pc=0x43e6ae
runtime.gcBgMarkWorker()
	/usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc00005afe0 sp=0xc00005af50 pc=0x4212a5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00005afe8 sp=0xc00005afe0 pc=0x46e0a1
created by runtime.gcBgMarkStartWorkers in goroutine 23
	/usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 11 [GC worker (idle)]:
runtime.gopark(0x10af64a0?, 0x1?, 0x58?, 0x46?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00005b750 sp=0xc00005b730 pc=0x43e6ae
runtime.gcBgMarkWorker()
	/usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc00005b7e0 sp=0xc00005b750 pc=0x4212a5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00005b7e8 sp=0xc00005b7e0 pc=0x46e0a1
created by runtime.gcBgMarkStartWorkers in goroutine 23
	/usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 12 [GC worker (idle)]:
runtime.gopark(0x466f5563106e?, 0x1?, 0x90?, 0xa9?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00005bf50 sp=0xc00005bf30 pc=0x43e6ae
runtime.gcBgMarkWorker()
	/usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc00005bfe0 sp=0xc00005bf50 pc=0x4212a5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00005bfe8 sp=0xc00005bfe0 pc=0x46e0a1
created by runtime.gcBgMarkStartWorkers in goroutine 23
	/usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 13 [GC worker (idle)]:
runtime.gopark(0x466f55639570?, 0x3?, 0x8?, 0x41?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000498750 sp=0xc000498730 pc=0x43e6ae
runtime.gcBgMarkWorker()
	/usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0004987e0 sp=0xc000498750 pc=0x4212a5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004987e8 sp=0xc0004987e0 pc=0x46e0a1
created by runtime.gcBgMarkStartWorkers in goroutine 23
	/usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 14 [GC worker (idle)]:
runtime.gopark(0x10af64a0?, 0x3?, 0xf6?, 0x45?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000498f50 sp=0xc000498f30 pc=0x43e6ae
runtime.gcBgMarkWorker()
	/usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000498fe0 sp=0xc000498f50 pc=0x4212a5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000498fe8 sp=0xc000498fe0 pc=0x46e0a1
created by runtime.gcBgMarkStartWorkers in goroutine 23
	/usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 15 [GC worker (idle)]:
runtime.gopark(0x466f55631064?, 0x3?, 0x28?, 0x5?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000499750 sp=0xc000499730 pc=0x43e6ae
runtime.gcBgMarkWorker()
	/usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0004997e0 sp=0xc000499750 pc=0x4212a5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004997e8 sp=0xc0004997e0 pc=0x46e0a1
created by runtime.gcBgMarkStartWorkers in goroutine 23
	/usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 85 [IO wait]:
runtime.gopark(0x7?, 0xb?, 0x0?, 0x0?, 0xe?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0005958f8 sp=0xc0005958d8 pc=0x43e6ae
runtime.netpollblock(0x47ea18?, 0x4092a6?, 0x0?)
	/usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc000595930 sp=0xc0005958f8 pc=0x437137
internal/poll.runtime_pollWait(0x7f5a1cc33b98, 0x72)
	/usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc000595950 sp=0xc000595930 pc=0x4688c5
internal/poll.(*pollDesc).wait(0xc000468d00?, 0xc0003c4000?, 0x0)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000595978 sp=0xc000595950 pc=0x4ef627
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000468d00, {0xc0003c4000, 0x1000, 0x1000})
	/usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc000595a10 sp=0xc000595978 pc=0x4f091a
net.(*netFD).Read(0xc000468d00, {0xc0003c4000?, 0x4efae5?, 0x0?})
	/usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc000595a58 sp=0xc000595a10 pc=0x5695e5
net.(*conn).Read(0xc0005ae258, {0xc0003c4000?, 0x0?, 0xc000414818?})
	/usr/local/go/src/net/net.go:179 +0x45 fp=0xc000595aa0 sp=0xc000595a58 pc=0x577885
net.(*TCPConn).Read(0xc000414810?, {0xc0003c4000?, 0x0?, 0xc00039bac0?})
	<autogenerated>:1 +0x25 fp=0xc000595ad0 sp=0xc000595aa0 pc=0x589785
net/http.(*connReader).Read(0xc000414810, {0xc0003c4000, 0x1000, 0x1000})
	/usr/local/go/src/net/http/server.go:791 +0x14b fp=0xc000595b20 sp=0xc000595ad0 pc=0x6c47ab
bufio.(*Reader).fill(0xc00012d200)
	/usr/local/go/src/bufio/bufio.go:113 +0x103 fp=0xc000595b58 sp=0xc000595b20 pc=0x6543e3
bufio.(*Reader).Peek(0xc00012d200, 0x4)
	/usr/local/go/src/bufio/bufio.go:151 +0x53 fp=0xc000595b78 sp=0xc000595b58 pc=0x654513
net/http.(*conn).serve(0xc0000fafc0, {0x106b1788, 0xc000482840})
	/usr/local/go/src/net/http/server.go:2044 +0x75c fp=0xc000595fb8 sp=0xc000595b78 pc=0x6ca65c
net/http.(*Server).Serve.func3()
	/usr/local/go/src/net/http/server.go:3086 +0x28 fp=0xc000595fe0 sp=0xc000595fb8 pc=0x6cee28
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000595fe8 sp=0xc000595fe0 pc=0x46e0a1
created by net/http.(*Server).Serve in goroutine 1
	/usr/local/go/src/net/http/server.go:3086 +0x5cb

goroutine 88 [IO wait]:
runtime.gopark(0x0?, 0xb?, 0x0?, 0x0?, 0xf?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00013ada0 sp=0xc00013ad80 pc=0x43e6ae
runtime.netpollblock(0x47ea18?, 0x4092a6?, 0x0?)
	/usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc00013add8 sp=0xc00013ada0 pc=0x437137
internal/poll.runtime_pollWait(0x7f5a1cc33aa0, 0x72)
	/usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc00013adf8 sp=0xc00013add8 pc=0x4688c5
internal/poll.(*pollDesc).wait(0xc000468e80?, 0xc000375241?, 0x0)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00013ae20 sp=0xc00013adf8 pc=0x4ef627
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000468e80, {0xc000375241, 0x1, 0x1})
	/usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc00013aeb8 sp=0xc00013ae20 pc=0x4f091a
net.(*netFD).Read(0xc000468e80, {0xc000375241?, 0xc00013af40?, 0x46a770?})
	/usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc00013af00 sp=0xc00013aeb8 pc=0x5695e5
net.(*conn).Read(0xc0005ae310, {0xc000375241?, 0x1?, 0xc00042a9b0?})
	/usr/local/go/src/net/net.go:179 +0x45 fp=0xc00013af48 sp=0xc00013af00 pc=0x577885
net.(*TCPConn).Read(0xc000414810?, {0xc000375241?, 0xc00042a9b0?, 0x0?})
	<autogenerated>:1 +0x25 fp=0xc00013af78 sp=0xc00013af48 pc=0x589785
net/http.(*connReader).backgroundRead(0xc000375230)
	/usr/local/go/src/net/http/server.go:683 +0x37 fp=0xc00013afc8 sp=0xc00013af78 pc=0x6c4377
net/http.(*connReader).startBackgroundRead.func2()
	/usr/local/go/src/net/http/server.go:679 +0x25 fp=0xc00013afe0 sp=0xc00013afc8 pc=0x6c42a5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00013afe8 sp=0xc00013afe0 pc=0x46e0a1
created by net/http.(*connReader).startBackgroundRead in goroutine 87
	/usr/local/go/src/net/http/server.go:679 +0xba

rax    0x6
rbx    0x7f5a177fd1a0
rcx    0x7f5a660d5387
rdx    0x6
rdi    0x1
rsi    0xe
rbp    0x0
rsp    0x7f5a177fd070
r8     0x0
r9     0x7f5a177fcfc0
r10    0x8
r11    0x202
r12    0x7f59f0927be0
r13    0x7f59f0926c80
r14    0x7f5a177fd328
r15    0x7f5a177fd2c0
rip    0x7f5a660d6bc7
rflags 0x10246
cs     0x33
fs     0x0
gs     0x0
<!-- gh-comment-id:1910247041 --> @dixonl90 commented on GitHub (Jan 25, 2024): > @0xdeafbeef can you share the output of `rocm-smi --showdriverversion --showproductname --showhw` and `rocm-smi -V` > > @dixonl90 can you confirm the crash looks similar to [#738 (comment)](https://github.com/ollama/ollama/issues/738#issuecomment-1908674324) ? > > I've pushed an updated image `dhiltgen/ollama:0.1.21-rc4` which has some more debug logging enabled. For these systems where you're seeing a crash when trying to send a prompt, could you share the log output before sending a prompt? > > ``` > docker run --privileged --rm -it --device /dev/kfd -e OLLAMA_DEBUG=1 dhiltgen/ollama:0.1.21-rc4 > ``` > > The image I've pushed is compiled with ROCm v6, so if the host is v5, maybe that's a possible cause. I'll try to build a v5 based image and push that as well to see if that might yield a working setup. @dhiltgen using your latest image, I get the following error. Looks to be different from above? ``` docker run --privileged --rm -it -v ./data:/root/.ollama --device /dev/kfd --device /dev/dri -p 11434:11434 --name ollama dhiltgen/ollama:0.1.21-rc4 2024/01/25 13:39:49 images.go:815: INFO total blobs: 6 2024/01/25 13:39:49 images.go:822: INFO total unused blobs removed: 0 [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached. [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production. - using env: export GIN_MODE=release - using code: gin.SetMode(gin.ReleaseMode) [GIN-debug] POST /api/pull --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers) [GIN-debug] POST /api/generate --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers) [GIN-debug] POST /api/chat --> github.com/jmorganca/ollama/server.ChatHandler (5 handlers) [GIN-debug] POST /api/embeddings --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers) [GIN-debug] POST /api/create --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers) [GIN-debug] POST /api/push --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers) [GIN-debug] POST /api/copy --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers) [GIN-debug] DELETE /api/delete --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers) [GIN-debug] POST /api/show --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers) [GIN-debug] POST /api/blobs/:digest --> github.com/jmorganca/ollama/server.CreateBlobHandler (5 handlers) [GIN-debug] HEAD /api/blobs/:digest --> github.com/jmorganca/ollama/server.HeadBlobHandler (5 handlers) [GIN-debug] GET / --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) [GIN-debug] GET /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers) [GIN-debug] GET /api/version --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers) [GIN-debug] HEAD / --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) [GIN-debug] HEAD /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers) [GIN-debug] HEAD /api/version --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers) 2024/01/25 13:39:49 routes.go:943: INFO Listening on [::]:11434 (version 0.0.0) 2024/01/25 13:39:49 payload_common.go:106: INFO Extracting dynamic libraries... 2024/01/25 13:39:52 payload_common.go:145: INFO Dynamic LLM libraries [cpu_avx2 rocm_v6 rocm_v5 cpu_avx cuda_v11 cpu] 2024/01/25 13:39:52 gpu.go:93: INFO Detecting GPU type 2024/01/25 13:39:52 gpu.go:212: INFO Searching for GPU management library libnvidia-ml.so 2024/01/25 13:39:52 gpu.go:258: INFO Discovered GPU libraries: [] 2024/01/25 13:39:52 gpu.go:212: INFO Searching for GPU management library librocm_smi64.so 2024/01/25 13:39:52 gpu.go:258: INFO Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.6.0.60000 /opt/rocm-6.0.0/lib/librocm_smi64.so.6.0.60000] 2024/01/25 13:39:52 gpu.go:108: INFO Radeon GPU detected [GIN] 2024/01/25 - 13:39:56 | 200 | 28.18µs | 127.0.0.1 | HEAD "/" [GIN] 2024/01/25 - 13:39:56 | 200 | 408.849µs | 127.0.0.1 | POST "/api/show" [GIN] 2024/01/25 - 13:39:56 | 200 | 227.609µs | 127.0.0.1 | POST "/api/show" 2024/01/25 13:39:56 cpu_common.go:11: INFO CPU has AVX2 loading library /tmp/ollama3155328738/rocm_v6/libext_server.so 2024/01/25 13:39:57 dyn_ext_server.go:90: INFO Loading Dynamic llm server: /tmp/ollama3155328738/rocm_v6/libext_server.so 2024/01/25 13:39:57 dyn_ext_server.go:145: INFO Initializing llama server rocBLAS error: Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx803 List of available TensileLibrary Files : "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx940.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx941.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx942.dat" SIGSEGV: segmentation violation PC=0x7f5a660d6bc7 m=9 sigcode=128 signal arrived during cgo execution goroutine 87 [syscall]: runtime.cgocall(0x9b4a50, 0xc0004d07f8) /usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc0004d07d0 sp=0xc0004d0798 pc=0x409b0b github.com/jmorganca/ollama/llm._Cfunc_dyn_llama_server_init({0x7f59f0000ba0, 0x7f59f9218420, 0x7f59f9218b90, 0x7f59f9218c20, 0x7f59f9218dd0, 0x7f59f9218f40, 0x7f59f9219400, 0x7f59f92193e0, 0x7f59f9219490, 0x7f59f9219a40, ...}, ...) _cgo_gotypes.go:282 +0x45 fp=0xc0004d07f8 sp=0xc0004d07d0 pc=0x7c2d25 github.com/jmorganca/ollama/llm.newDynExtServer.func7(0xae83f9?, 0xc?) /go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:148 +0xef fp=0xc0004d08e8 sp=0xc0004d07f8 pc=0x7c424f github.com/jmorganca/ollama/llm.newDynExtServer({0xc0005a6000, 0x2e}, {0xc000032e00, _}, {_, _, _}, {0x0, 0x0, 0x0}, ...) /go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:148 +0xa45 fp=0xc0004d0b88 sp=0xc0004d08e8 pc=0x7c3ee5 github.com/jmorganca/ollama/llm.newLlmServer({{_, _, _}, {_, _}, {_, _}}, {_, _}, {0x0, ...}, ...) /go/src/github.com/jmorganca/ollama/llm/llm.go:148 +0x36a fp=0xc0004d0d48 sp=0xc0004d0b88 pc=0x7c06ea github.com/jmorganca/ollama/llm.New({0x0?, 0x1000100000100?}, {0xc000032e00, _}, {_, _, _}, {0x0, 0x0, 0x0}, ...) /go/src/github.com/jmorganca/ollama/llm/llm.go:123 +0x6f9 fp=0xc0004d0fb8 sp=0xc0004d0d48 pc=0x7c0119 github.com/jmorganca/ollama/server.load(0xc000002f00?, 0xc000002f00, {{0x0, 0x800, 0x200, 0x1, 0xffffffffffffffff, 0x0, 0x0, 0x1, ...}, ...}, ...) /go/src/github.com/jmorganca/ollama/server/routes.go:83 +0x3a5 fp=0xc0004d1138 sp=0xc0004d0fb8 pc=0x990da5 github.com/jmorganca/ollama/server.ChatHandler(0xc000134d00) /go/src/github.com/jmorganca/ollama/server/routes.go:1071 +0x828 fp=0xc0004d1748 sp=0xc0004d1138 pc=0x99b6e8 github.com/gin-gonic/gin.(*Context).Next(...) /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174 github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func1(0xc000134d00) /go/src/github.com/jmorganca/ollama/server/routes.go:883 +0x68 fp=0xc0004d1780 sp=0xc0004d1748 pc=0x99a228 github.com/gin-gonic/gin.(*Context).Next(...) /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174 github.com/gin-gonic/gin.CustomRecoveryWithWriter.func1(0xc000134d00) /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/recovery.go:102 +0x7a fp=0xc0004d17d0 sp=0xc0004d1780 pc=0x97595a github.com/gin-gonic/gin.(*Context).Next(...) /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174 github.com/gin-gonic/gin.LoggerWithConfig.func1(0xc000134d00) /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/logger.go:240 +0xde fp=0xc0004d1980 sp=0xc0004d17d0 pc=0x974afe github.com/gin-gonic/gin.(*Context).Next(...) /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174 github.com/gin-gonic/gin.(*Engine).handleHTTPRequest(0xc0000cfa00, 0xc000134d00) /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:620 +0x65b fp=0xc0004d1b08 sp=0xc0004d1980 pc=0x973bbb github.com/gin-gonic/gin.(*Engine).ServeHTTP(0xc0000cfa00, {0x106b0120?, 0xc0004208c0}, 0xc000135200) /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:576 +0x1dd fp=0xc0004d1b48 sp=0xc0004d1b08 pc=0x97337d net/http.serverHandler.ServeHTTP({0x106ae440?}, {0x106b0120?, 0xc0004208c0?}, 0x6?) /usr/local/go/src/net/http/server.go:2938 +0x8e fp=0xc0004d1b78 sp=0xc0004d1b48 pc=0x6ce60e net/http.(*conn).serve(0xc0000fb440, {0x106b1788, 0xc000482840}) /usr/local/go/src/net/http/server.go:2009 +0x5f4 fp=0xc0004d1fb8 sp=0xc0004d1b78 pc=0x6ca4f4 net/http.(*Server).Serve.func3() /usr/local/go/src/net/http/server.go:3086 +0x28 fp=0xc0004d1fe0 sp=0xc0004d1fb8 pc=0x6cee28 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004d1fe8 sp=0xc0004d1fe0 pc=0x46e0a1 created by net/http.(*Server).Serve in goroutine 1 /usr/local/go/src/net/http/server.go:3086 +0x5cb goroutine 1 [IO wait]: runtime.gopark(0x4808b0?, 0xc0000c9848?, 0x98?, 0x98?, 0x4f69dd?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0000c9828 sp=0xc0000c9808 pc=0x43e6ae runtime.netpollblock(0x46c112?, 0x4092a6?, 0x0?) /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc0000c9860 sp=0xc0000c9828 pc=0x437137 internal/poll.runtime_pollWait(0x7f5a1cc33e80, 0x72) /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc0000c9880 sp=0xc0000c9860 pc=0x4688c5 internal/poll.(*pollDesc).wait(0xc000468000?, 0x4?, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0000c98a8 sp=0xc0000c9880 pc=0x4ef627 internal/poll.(*pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0xc000468000) /usr/local/go/src/internal/poll/fd_unix.go:611 +0x2ac fp=0xc0000c9950 sp=0xc0000c98a8 pc=0x4f4b0c net.(*netFD).accept(0xc000468000) /usr/local/go/src/net/fd_unix.go:172 +0x29 fp=0xc0000c9a08 sp=0xc0000c9950 pc=0x56b609 net.(*TCPListener).accept(0xc00043f580) /usr/local/go/src/net/tcpsock_posix.go:152 +0x1e fp=0xc0000c9a30 sp=0xc0000c9a08 pc=0x58041e net.(*TCPListener).Accept(0xc00043f580) /usr/local/go/src/net/tcpsock.go:315 +0x30 fp=0xc0000c9a60 sp=0xc0000c9a30 pc=0x57f5d0 net/http.(*onceCloseListener).Accept(0xc0000fb440?) <autogenerated>:1 +0x24 fp=0xc0000c9a78 sp=0xc0000c9a60 pc=0x6f13a4 net/http.(*Server).Serve(0xc000376ff0, {0x106aff10, 0xc00043f580}) /usr/local/go/src/net/http/server.go:3056 +0x364 fp=0xc0000c9ba8 sp=0xc0000c9a78 pc=0x6cea64 github.com/jmorganca/ollama/server.Serve({0x106aff10, 0xc00043f580}) /go/src/github.com/jmorganca/ollama/server/routes.go:970 +0x488 fp=0xc0000c9c98 sp=0xc0000c9ba8 pc=0x99a708 github.com/jmorganca/ollama/cmd.RunServer(0xc000466300?, {0x10af4780?, 0x4?, 0xad0281?}) /go/src/github.com/jmorganca/ollama/cmd/cmd.go:690 +0x199 fp=0xc0000c9d30 sp=0xc0000c9c98 pc=0x9acaf9 github.com/spf13/cobra.(*Command).execute(0xc0003ed800, {0x10af4780, 0x0, 0x0}) /root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0x87c fp=0xc0000c9e68 sp=0xc0000c9d30 pc=0x7641dc github.com/spf13/cobra.(*Command).ExecuteC(0xc0003ecc00) /root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc0000c9f20 sp=0xc0000c9e68 pc=0x764a05 github.com/spf13/cobra.(*Command).Execute(...) /root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) /root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:985 main.main() /go/src/github.com/jmorganca/ollama/main.go:11 +0x4d fp=0xc0000c9f40 sp=0xc0000c9f20 pc=0x9b3b6d runtime.main() /usr/local/go/src/runtime/proc.go:267 +0x2bb fp=0xc0000c9fe0 sp=0xc0000c9f40 pc=0x43e25b runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000c9fe8 sp=0xc0000c9fe0 pc=0x46e0a1 goroutine 2 [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000058fa8 sp=0xc000058f88 pc=0x43e6ae runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:404 runtime.forcegchelper() /usr/local/go/src/runtime/proc.go:322 +0xb3 fp=0xc000058fe0 sp=0xc000058fa8 pc=0x43e533 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000058fe8 sp=0xc000058fe0 pc=0x46e0a1 created by runtime.init.6 in goroutine 1 /usr/local/go/src/runtime/proc.go:310 +0x1a goroutine 3 [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000059778 sp=0xc000059758 pc=0x43e6ae runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:404 runtime.bgsweep(0x0?) /usr/local/go/src/runtime/mgcsweep.go:321 +0xdf fp=0xc0000597c8 sp=0xc000059778 pc=0x42a5ff runtime.gcenable.func1() /usr/local/go/src/runtime/mgc.go:200 +0x25 fp=0xc0000597e0 sp=0xc0000597c8 pc=0x41f725 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000597e8 sp=0xc0000597e0 pc=0x46e0a1 created by runtime.gcenable in goroutine 1 /usr/local/go/src/runtime/mgc.go:200 +0x66 goroutine 4 [GC scavenge wait]: runtime.gopark(0xb9335e?, 0xb45f95?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000059f70 sp=0xc000059f50 pc=0x43e6ae runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:404 runtime.(*scavengerState).park(0x10ac4b00) /usr/local/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc000059fa0 sp=0xc000059f70 pc=0x427e29 runtime.bgscavenge(0x0?) /usr/local/go/src/runtime/mgcscavenge.go:658 +0x59 fp=0xc000059fc8 sp=0xc000059fa0 pc=0x4283d9 runtime.gcenable.func2() /usr/local/go/src/runtime/mgc.go:201 +0x25 fp=0xc000059fe0 sp=0xc000059fc8 pc=0x41f6c5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000059fe8 sp=0xc000059fe0 pc=0x46e0a1 created by runtime.gcenable in goroutine 1 /usr/local/go/src/runtime/mgc.go:201 +0xa5 goroutine 5 [finalizer wait]: runtime.gopark(0xac9240?, 0x10043f801?, 0x0?, 0x0?, 0x446865?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000058628 sp=0xc000058608 pc=0x43e6ae runtime.runfinq() /usr/local/go/src/runtime/mfinal.go:193 +0x107 fp=0xc0000587e0 sp=0xc000058628 pc=0x41e7a7 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000587e8 sp=0xc0000587e0 pc=0x46e0a1 created by runtime.createfing in goroutine 1 /usr/local/go/src/runtime/mfinal.go:163 +0x3d goroutine 6 [select, locked to thread]: runtime.gopark(0xc00005a7a8?, 0x2?, 0x49?, 0xe9?, 0xc00005a7a4?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00005a638 sp=0xc00005a618 pc=0x43e6ae runtime.selectgo(0xc00005a7a8, 0xc00005a7a0, 0x0?, 0x0, 0x0?, 0x1) /usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc00005a758 sp=0xc00005a638 pc=0x44e1e5 runtime.ensureSigM.func1() /usr/local/go/src/runtime/signal_unix.go:1014 +0x19f fp=0xc00005a7e0 sp=0xc00005a758 pc=0x46521f runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00005a7e8 sp=0xc00005a7e0 pc=0x46e0a1 created by runtime.ensureSigM in goroutine 1 /usr/local/go/src/runtime/signal_unix.go:997 +0xc8 goroutine 18 [syscall]: runtime.notetsleepg(0x0?, 0x0?) /usr/local/go/src/runtime/lock_futex.go:236 +0x29 fp=0xc0000547a0 sp=0xc000054768 pc=0x411209 os/signal.signal_recv() /usr/local/go/src/runtime/sigqueue.go:152 +0x29 fp=0xc0000547c0 sp=0xc0000547a0 pc=0x46aa69 os/signal.loop() /usr/local/go/src/os/signal/signal_unix.go:23 +0x13 fp=0xc0000547e0 sp=0xc0000547c0 pc=0x6f3dd3 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000547e8 sp=0xc0000547e0 pc=0x46e0a1 created by os/signal.Notify.func1.1 in goroutine 1 /usr/local/go/src/os/signal/signal.go:151 +0x1f goroutine 19 [chan receive]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000054f18 sp=0xc000054ef8 pc=0x43e6ae runtime.chanrecv(0xc00018d6e0, 0x0, 0x1) /usr/local/go/src/runtime/chan.go:583 +0x3cd fp=0xc000054f90 sp=0xc000054f18 pc=0x40beed runtime.chanrecv1(0x0?, 0x0?) /usr/local/go/src/runtime/chan.go:442 +0x12 fp=0xc000054fb8 sp=0xc000054f90 pc=0x40baf2 github.com/jmorganca/ollama/server.Serve.func1() /go/src/github.com/jmorganca/ollama/server/routes.go:952 +0x25 fp=0xc000054fe0 sp=0xc000054fb8 pc=0x99a7a5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000054fe8 sp=0xc000054fe0 pc=0x46e0a1 created by github.com/jmorganca/ollama/server.Serve in goroutine 1 /go/src/github.com/jmorganca/ollama/server/routes.go:951 +0x3f6 goroutine 16 [IO wait]: runtime.gopark(0x75?, 0xb?, 0x0?, 0x0?, 0xc?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0000cb8f8 sp=0xc0000cb8d8 pc=0x43e6ae runtime.netpollblock(0x47ea18?, 0x4092a6?, 0x0?) /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc0000cb930 sp=0xc0000cb8f8 pc=0x437137 internal/poll.runtime_pollWait(0x7f5a1cc33d88, 0x72) /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc0000cb950 sp=0xc0000cb930 pc=0x4688c5 internal/poll.(*pollDesc).wait(0xc000468a80?, 0xc000490000?, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0000cb978 sp=0xc0000cb950 pc=0x4ef627 internal/poll.(*pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc000468a80, {0xc000490000, 0x1000, 0x1000}) /usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc0000cba10 sp=0xc0000cb978 pc=0x4f091a net.(*netFD).Read(0xc000468a80, {0xc000490000?, 0x4efae5?, 0x0?}) /usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc0000cba58 sp=0xc0000cba10 pc=0x5695e5 net.(*conn).Read(0xc0005ae0f0, {0xc000490000?, 0x0?, 0xc000482938?}) /usr/local/go/src/net/net.go:179 +0x45 fp=0xc0000cbaa0 sp=0xc0000cba58 pc=0x577885 net.(*TCPConn).Read(0xc000482930?, {0xc000490000?, 0x0?, 0xc0000cbac0?}) <autogenerated>:1 +0x25 fp=0xc0000cbad0 sp=0xc0000cbaa0 pc=0x589785 net/http.(*connReader).Read(0xc000482930, {0xc000490000, 0x1000, 0x1000}) /usr/local/go/src/net/http/server.go:791 +0x14b fp=0xc0000cbb20 sp=0xc0000cbad0 pc=0x6c47ab bufio.(*Reader).fill(0xc00012c840) /usr/local/go/src/bufio/bufio.go:113 +0x103 fp=0xc0000cbb58 sp=0xc0000cbb20 pc=0x6543e3 bufio.(*Reader).Peek(0xc00012c840, 0x4) /usr/local/go/src/bufio/bufio.go:151 +0x53 fp=0xc0000cbb78 sp=0xc0000cbb58 pc=0x654513 net/http.(*conn).serve(0xc0000fa2d0, {0x106b1788, 0xc000482840}) /usr/local/go/src/net/http/server.go:2044 +0x75c fp=0xc0000cbfb8 sp=0xc0000cbb78 pc=0x6ca65c net/http.(*Server).Serve.func3() /usr/local/go/src/net/http/server.go:3086 +0x28 fp=0xc0000cbfe0 sp=0xc0000cbfb8 pc=0x6cee28 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000cbfe8 sp=0xc0000cbfe0 pc=0x46e0a1 created by net/http.(*Server).Serve in goroutine 1 /usr/local/go/src/net/http/server.go:3086 +0x5cb goroutine 83 [IO wait]: runtime.gopark(0x7?, 0xb?, 0x0?, 0x0?, 0xd?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0005998f8 sp=0xc0005998d8 pc=0x43e6ae runtime.netpollblock(0x47ea18?, 0x4092a6?, 0x0?) /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc000599930 sp=0xc0005998f8 pc=0x437137 internal/poll.runtime_pollWait(0x7f5a1cc33c90, 0x72) /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc000599950 sp=0xc000599930 pc=0x4688c5 internal/poll.(*pollDesc).wait(0xc000468b80?, 0xc000321000?, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000599978 sp=0xc000599950 pc=0x4ef627 internal/poll.(*pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc000468b80, {0xc000321000, 0x1000, 0x1000}) /usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc000599a10 sp=0xc000599978 pc=0x4f091a net.(*netFD).Read(0xc000468b80, {0xc000321000?, 0x4efae5?, 0x0?}) /usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc000599a58 sp=0xc000599a10 pc=0x5695e5 net.(*conn).Read(0xc0005ae0f8, {0xc000321000?, 0x0?, 0xc000482b78?}) /usr/local/go/src/net/net.go:179 +0x45 fp=0xc000599aa0 sp=0xc000599a58 pc=0x577885 net.(*TCPConn).Read(0xc000482b70?, {0xc000321000?, 0x0?, 0xc00039fac0?}) <autogenerated>:1 +0x25 fp=0xc000599ad0 sp=0xc000599aa0 pc=0x589785 net/http.(*connReader).Read(0xc000482b70, {0xc000321000, 0x1000, 0x1000}) /usr/local/go/src/net/http/server.go:791 +0x14b fp=0xc000599b20 sp=0xc000599ad0 pc=0x6c47ab bufio.(*Reader).fill(0xc00012c900) /usr/local/go/src/bufio/bufio.go:113 +0x103 fp=0xc000599b58 sp=0xc000599b20 pc=0x6543e3 bufio.(*Reader).Peek(0xc00012c900, 0x4) /usr/local/go/src/bufio/bufio.go:151 +0x53 fp=0xc000599b78 sp=0xc000599b58 pc=0x654513 net/http.(*conn).serve(0xc0000fa480, {0x106b1788, 0xc000482840}) /usr/local/go/src/net/http/server.go:2044 +0x75c fp=0xc000599fb8 sp=0xc000599b78 pc=0x6ca65c net/http.(*Server).Serve.func3() /usr/local/go/src/net/http/server.go:3086 +0x28 fp=0xc000599fe0 sp=0xc000599fb8 pc=0x6cee28 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000599fe8 sp=0xc000599fe0 pc=0x46e0a1 created by net/http.(*Server).Serve in goroutine 1 /usr/local/go/src/net/http/server.go:3086 +0x5cb goroutine 34 [GC worker (idle)]: runtime.gopark(0x466f55630f60?, 0x3?, 0x1f?, 0xd9?, 0xc000056fd0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000056f50 sp=0xc000056f30 pc=0x43e6ae runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000056fe0 sp=0xc000056f50 pc=0x4212a5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000056fe8 sp=0xc000056fe0 pc=0x46e0a1 created by runtime.gcBgMarkStartWorkers in goroutine 23 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 50 [GC worker (idle)]: runtime.gopark(0x466f556312e4?, 0x3?, 0xa4?, 0x1a?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00013a750 sp=0xc00013a730 pc=0x43e6ae runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc00013a7e0 sp=0xc00013a750 pc=0x4212a5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00013a7e8 sp=0xc00013a7e0 pc=0x46e0a1 created by runtime.gcBgMarkStartWorkers in goroutine 23 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 66 [GC worker (idle)]: runtime.gopark(0x466f55631014?, 0x1?, 0xdb?, 0x72?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000055750 sp=0xc000055730 pc=0x43e6ae runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0000557e0 sp=0xc000055750 pc=0x4212a5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000557e8 sp=0xc0000557e0 pc=0x46e0a1 created by runtime.gcBgMarkStartWorkers in goroutine 23 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 7 [GC worker (idle)]: runtime.gopark(0x466f55631172?, 0x3?, 0x6a?, 0x2c?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000056750 sp=0xc000056730 pc=0x43e6ae runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0000567e0 sp=0xc000056750 pc=0x4212a5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000567e8 sp=0xc0000567e0 pc=0x46e0a1 created by runtime.gcBgMarkStartWorkers in goroutine 23 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 8 [GC worker (idle)]: runtime.gopark(0x466f556312b2?, 0x3?, 0x2?, 0x30?, 0xc0000577d0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000057750 sp=0xc000057730 pc=0x43e6ae runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0000577e0 sp=0xc000057750 pc=0x4212a5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000577e8 sp=0xc0000577e0 pc=0x46e0a1 created by runtime.gcBgMarkStartWorkers in goroutine 23 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 9 [GC worker (idle)]: runtime.gopark(0x466f55631370?, 0x1?, 0x94?, 0xba?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000057f50 sp=0xc000057f30 pc=0x43e6ae runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000057fe0 sp=0xc000057f50 pc=0x4212a5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000057fe8 sp=0xc000057fe0 pc=0x46e0a1 created by runtime.gcBgMarkStartWorkers in goroutine 23 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 10 [GC worker (idle)]: runtime.gopark(0x466f55630b82?, 0x3?, 0xc8?, 0xe?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00005af50 sp=0xc00005af30 pc=0x43e6ae runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc00005afe0 sp=0xc00005af50 pc=0x4212a5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00005afe8 sp=0xc00005afe0 pc=0x46e0a1 created by runtime.gcBgMarkStartWorkers in goroutine 23 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 11 [GC worker (idle)]: runtime.gopark(0x10af64a0?, 0x1?, 0x58?, 0x46?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00005b750 sp=0xc00005b730 pc=0x43e6ae runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc00005b7e0 sp=0xc00005b750 pc=0x4212a5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00005b7e8 sp=0xc00005b7e0 pc=0x46e0a1 created by runtime.gcBgMarkStartWorkers in goroutine 23 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 12 [GC worker (idle)]: runtime.gopark(0x466f5563106e?, 0x1?, 0x90?, 0xa9?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00005bf50 sp=0xc00005bf30 pc=0x43e6ae runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc00005bfe0 sp=0xc00005bf50 pc=0x4212a5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00005bfe8 sp=0xc00005bfe0 pc=0x46e0a1 created by runtime.gcBgMarkStartWorkers in goroutine 23 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 13 [GC worker (idle)]: runtime.gopark(0x466f55639570?, 0x3?, 0x8?, 0x41?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000498750 sp=0xc000498730 pc=0x43e6ae runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0004987e0 sp=0xc000498750 pc=0x4212a5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004987e8 sp=0xc0004987e0 pc=0x46e0a1 created by runtime.gcBgMarkStartWorkers in goroutine 23 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 14 [GC worker (idle)]: runtime.gopark(0x10af64a0?, 0x3?, 0xf6?, 0x45?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000498f50 sp=0xc000498f30 pc=0x43e6ae runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000498fe0 sp=0xc000498f50 pc=0x4212a5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000498fe8 sp=0xc000498fe0 pc=0x46e0a1 created by runtime.gcBgMarkStartWorkers in goroutine 23 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 15 [GC worker (idle)]: runtime.gopark(0x466f55631064?, 0x3?, 0x28?, 0x5?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000499750 sp=0xc000499730 pc=0x43e6ae runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0004997e0 sp=0xc000499750 pc=0x4212a5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004997e8 sp=0xc0004997e0 pc=0x46e0a1 created by runtime.gcBgMarkStartWorkers in goroutine 23 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 85 [IO wait]: runtime.gopark(0x7?, 0xb?, 0x0?, 0x0?, 0xe?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0005958f8 sp=0xc0005958d8 pc=0x43e6ae runtime.netpollblock(0x47ea18?, 0x4092a6?, 0x0?) /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc000595930 sp=0xc0005958f8 pc=0x437137 internal/poll.runtime_pollWait(0x7f5a1cc33b98, 0x72) /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc000595950 sp=0xc000595930 pc=0x4688c5 internal/poll.(*pollDesc).wait(0xc000468d00?, 0xc0003c4000?, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000595978 sp=0xc000595950 pc=0x4ef627 internal/poll.(*pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc000468d00, {0xc0003c4000, 0x1000, 0x1000}) /usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc000595a10 sp=0xc000595978 pc=0x4f091a net.(*netFD).Read(0xc000468d00, {0xc0003c4000?, 0x4efae5?, 0x0?}) /usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc000595a58 sp=0xc000595a10 pc=0x5695e5 net.(*conn).Read(0xc0005ae258, {0xc0003c4000?, 0x0?, 0xc000414818?}) /usr/local/go/src/net/net.go:179 +0x45 fp=0xc000595aa0 sp=0xc000595a58 pc=0x577885 net.(*TCPConn).Read(0xc000414810?, {0xc0003c4000?, 0x0?, 0xc00039bac0?}) <autogenerated>:1 +0x25 fp=0xc000595ad0 sp=0xc000595aa0 pc=0x589785 net/http.(*connReader).Read(0xc000414810, {0xc0003c4000, 0x1000, 0x1000}) /usr/local/go/src/net/http/server.go:791 +0x14b fp=0xc000595b20 sp=0xc000595ad0 pc=0x6c47ab bufio.(*Reader).fill(0xc00012d200) /usr/local/go/src/bufio/bufio.go:113 +0x103 fp=0xc000595b58 sp=0xc000595b20 pc=0x6543e3 bufio.(*Reader).Peek(0xc00012d200, 0x4) /usr/local/go/src/bufio/bufio.go:151 +0x53 fp=0xc000595b78 sp=0xc000595b58 pc=0x654513 net/http.(*conn).serve(0xc0000fafc0, {0x106b1788, 0xc000482840}) /usr/local/go/src/net/http/server.go:2044 +0x75c fp=0xc000595fb8 sp=0xc000595b78 pc=0x6ca65c net/http.(*Server).Serve.func3() /usr/local/go/src/net/http/server.go:3086 +0x28 fp=0xc000595fe0 sp=0xc000595fb8 pc=0x6cee28 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000595fe8 sp=0xc000595fe0 pc=0x46e0a1 created by net/http.(*Server).Serve in goroutine 1 /usr/local/go/src/net/http/server.go:3086 +0x5cb goroutine 88 [IO wait]: runtime.gopark(0x0?, 0xb?, 0x0?, 0x0?, 0xf?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00013ada0 sp=0xc00013ad80 pc=0x43e6ae runtime.netpollblock(0x47ea18?, 0x4092a6?, 0x0?) /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc00013add8 sp=0xc00013ada0 pc=0x437137 internal/poll.runtime_pollWait(0x7f5a1cc33aa0, 0x72) /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc00013adf8 sp=0xc00013add8 pc=0x4688c5 internal/poll.(*pollDesc).wait(0xc000468e80?, 0xc000375241?, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00013ae20 sp=0xc00013adf8 pc=0x4ef627 internal/poll.(*pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc000468e80, {0xc000375241, 0x1, 0x1}) /usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc00013aeb8 sp=0xc00013ae20 pc=0x4f091a net.(*netFD).Read(0xc000468e80, {0xc000375241?, 0xc00013af40?, 0x46a770?}) /usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc00013af00 sp=0xc00013aeb8 pc=0x5695e5 net.(*conn).Read(0xc0005ae310, {0xc000375241?, 0x1?, 0xc00042a9b0?}) /usr/local/go/src/net/net.go:179 +0x45 fp=0xc00013af48 sp=0xc00013af00 pc=0x577885 net.(*TCPConn).Read(0xc000414810?, {0xc000375241?, 0xc00042a9b0?, 0x0?}) <autogenerated>:1 +0x25 fp=0xc00013af78 sp=0xc00013af48 pc=0x589785 net/http.(*connReader).backgroundRead(0xc000375230) /usr/local/go/src/net/http/server.go:683 +0x37 fp=0xc00013afc8 sp=0xc00013af78 pc=0x6c4377 net/http.(*connReader).startBackgroundRead.func2() /usr/local/go/src/net/http/server.go:679 +0x25 fp=0xc00013afe0 sp=0xc00013afc8 pc=0x6c42a5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00013afe8 sp=0xc00013afe0 pc=0x46e0a1 created by net/http.(*connReader).startBackgroundRead in goroutine 87 /usr/local/go/src/net/http/server.go:679 +0xba rax 0x6 rbx 0x7f5a177fd1a0 rcx 0x7f5a660d5387 rdx 0x6 rdi 0x1 rsi 0xe rbp 0x0 rsp 0x7f5a177fd070 r8 0x0 r9 0x7f5a177fcfc0 r10 0x8 r11 0x202 r12 0x7f59f0927be0 r13 0x7f59f0926c80 r14 0x7f5a177fd328 r15 0x7f5a177fd2c0 rip 0x7f5a660d6bc7 rflags 0x10246 cs 0x33 fs 0x0 gs 0x0 ```
Author
Owner

@dixonl90 commented on GitHub (Jan 25, 2024):

Actually, running the dhiltgen/ollama:0.1.21-rocmv5 image works. Although the output from a prompt is just hashes:

docker exec -it ollama ollama run llama2
>>> Tell me a joke
############################################################################################################################################################################################################################################################################^C
<!-- gh-comment-id:1910261223 --> @dixonl90 commented on GitHub (Jan 25, 2024): Actually, running the dhiltgen/ollama:0.1.21-rocmv5 image works. Although the output from a prompt is just hashes: ``` docker exec -it ollama ollama run llama2 >>> Tell me a joke ############################################################################################################################################################################################################################################################################^C ```
Author
Owner

@kescherCode commented on GitHub (Jan 25, 2024):

The host ROCm versions do not matter. All that matters is the ROCm version within the container.

The root cause:
rocBLAS error: Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1032 (in the prior log by @dixonl90, it says gfx803 instead)

My system uses a 6600 XT (gfx1032). The crash is clearly caused by rocBLAS not supporting it directly. However, adding "-e HSA_OVERRIDE_GFX_VERSION=10.3.0", thereby making ROCm assume my card is a gfx1030 (which would stand for a 6900 XT) makes it function. The reason for this is that the ISAs from gfx1030 through gfx1035 are identical.

This crash is partially caused by ROCm/rocBLAS not implementing the condition for the above on their own, but the crash ultimately shouldn't happen to begin with. Considering it is a cgo crash, this might be a llama.cpp issue.

<!-- gh-comment-id:1911152276 --> @kescherCode commented on GitHub (Jan 25, 2024): The host ROCm versions do not matter. All that matters is the ROCm version within the container. The root cause: `rocBLAS error: Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1032` (in the prior log by @dixonl90, it says `gfx803` instead) My system uses a 6600 XT (`gfx1032`). The crash is clearly caused by rocBLAS not supporting it directly. However, adding "-e HSA_OVERRIDE_GFX_VERSION=10.3.0", thereby making ROCm assume my card is a `gfx1030` (which would stand for a 6900 XT) makes it function. The reason for this is that the ISAs from gfx1030 through gfx1035 are identical. This crash is partially caused by ROCm/rocBLAS not implementing the condition for the above on their own, but the crash ultimately shouldn't happen to begin with. Considering it is a cgo crash, this might be a llama.cpp issue.
Author
Owner

@dhiltgen commented on GitHub (Jan 26, 2024):

Poking around in the ROCm v5 and v6 container images, it does appear v6 has dropped support for these older gfx803 cards. @dixonl90 it sounds like you'll have to stay on the older v5 stack to retain compatibility with your GPU as it is now EOL on v6 and up.

% docker run --rm rocm/dev-centos-7:5.7.1-complete ls /opt/rocm/lib/rocblas/library/ | grep TensileLibrary_lazy_gfx
TensileLibrary_lazy_gfx1030.dat
TensileLibrary_lazy_gfx1100.dat
TensileLibrary_lazy_gfx1101.dat
TensileLibrary_lazy_gfx1102.dat
TensileLibrary_lazy_gfx803.dat
TensileLibrary_lazy_gfx900.dat
TensileLibrary_lazy_gfx906.dat
TensileLibrary_lazy_gfx908.dat
TensileLibrary_lazy_gfx90a.dat
TensileLibrary_lazy_gfx940.dat
TensileLibrary_lazy_gfx941.dat
TensileLibrary_lazy_gfx942.dat
% docker run --rm rocm/dev-centos-7:6.0-complete ls /opt/rocm/lib/rocblas/library/ | grep TensileLibrary_lazy_gfx
TensileLibrary_lazy_gfx1030.dat
TensileLibrary_lazy_gfx1100.dat
TensileLibrary_lazy_gfx1101.dat
TensileLibrary_lazy_gfx1102.dat
TensileLibrary_lazy_gfx900.dat
TensileLibrary_lazy_gfx906.dat
TensileLibrary_lazy_gfx908.dat
TensileLibrary_lazy_gfx90a.dat
TensileLibrary_lazy_gfx940.dat
TensileLibrary_lazy_gfx941.dat
TensileLibrary_lazy_gfx942.dat
<!-- gh-comment-id:1911221108 --> @dhiltgen commented on GitHub (Jan 26, 2024): Poking around in the ROCm v5 and v6 container images, it does appear v6 has dropped support for these older `gfx803` cards. @dixonl90 it sounds like you'll have to stay on the older v5 stack to retain compatibility with your GPU as it is now EOL on v6 and up. ``` % docker run --rm rocm/dev-centos-7:5.7.1-complete ls /opt/rocm/lib/rocblas/library/ | grep TensileLibrary_lazy_gfx TensileLibrary_lazy_gfx1030.dat TensileLibrary_lazy_gfx1100.dat TensileLibrary_lazy_gfx1101.dat TensileLibrary_lazy_gfx1102.dat TensileLibrary_lazy_gfx803.dat TensileLibrary_lazy_gfx900.dat TensileLibrary_lazy_gfx906.dat TensileLibrary_lazy_gfx908.dat TensileLibrary_lazy_gfx90a.dat TensileLibrary_lazy_gfx940.dat TensileLibrary_lazy_gfx941.dat TensileLibrary_lazy_gfx942.dat ``` ``` % docker run --rm rocm/dev-centos-7:6.0-complete ls /opt/rocm/lib/rocblas/library/ | grep TensileLibrary_lazy_gfx TensileLibrary_lazy_gfx1030.dat TensileLibrary_lazy_gfx1100.dat TensileLibrary_lazy_gfx1101.dat TensileLibrary_lazy_gfx1102.dat TensileLibrary_lazy_gfx900.dat TensileLibrary_lazy_gfx906.dat TensileLibrary_lazy_gfx908.dat TensileLibrary_lazy_gfx90a.dat TensileLibrary_lazy_gfx940.dat TensileLibrary_lazy_gfx941.dat TensileLibrary_lazy_gfx942.dat ```
Author
Owner

@dhiltgen commented on GitHub (Jan 27, 2024):

We've just pushed an updated release v0.1.22 which has some misc ROCm fixes, including the iGPU fix. There's also a container image now specific for ROCm support based on v5. ollama/ollama:0.1.22-rocm

<!-- gh-comment-id:1912889272 --> @dhiltgen commented on GitHub (Jan 27, 2024): We've just pushed an updated release [v0.1.22](https://github.com/ollama/ollama/releases/tag/v0.1.22) which has some misc ROCm fixes, including the iGPU fix. There's also a container image now specific for ROCm support based on v5. `ollama/ollama:0.1.22-rocm`
Author
Owner

@ignacio82 commented on GitHub (Jan 27, 2024):

@dhiltgen could you share an docker-compose file that uses ollama/ollama:0.1.22-rocm ?

<!-- gh-comment-id:1913298987 --> @ignacio82 commented on GitHub (Jan 27, 2024): @dhiltgen could you share an docker-compose file that uses ollama/ollama:0.1.22-rocm ?
Author
Owner

@Airradda commented on GitHub (Jan 27, 2024):

@dhiltgen could you share an docker-compose file that uses ollama/ollama:0.1.22-rocm ?

This is what has worked for me:

version: '3'
services:
  Ollama:
    image: ollama/ollama:0.1.22-rocm
    restart: unless-stopped
    network_mode: host
    container_name: Ollama
    devices:
      - /dev/kfd
      - /dev/dri
    volumes:
      - ~/.ollama:/root/.ollama

Then I use a model file like this to use only the gpu:

# Modelfile generated by "ollama show"
# To build a new Modelfile based on this one, replace the FROM line with:
FROM starling-lm:latest
PARAMETER num_gpu 50
<!-- gh-comment-id:1913301088 --> @Airradda commented on GitHub (Jan 27, 2024): > @dhiltgen could you share an docker-compose file that uses ollama/ollama:0.1.22-rocm ? This is what has worked for me: ```yaml version: '3' services: Ollama: image: ollama/ollama:0.1.22-rocm restart: unless-stopped network_mode: host container_name: Ollama devices: - /dev/kfd - /dev/dri volumes: - ~/.ollama:/root/.ollama ``` Then I use a model file like this to use only the gpu: ``` # Modelfile generated by "ollama show" # To build a new Modelfile based on this one, replace the FROM line with: FROM starling-lm:latest PARAMETER num_gpu 50 ```
Author
Owner

@ignacio82 commented on GitHub (Jan 27, 2024):

Thanks @Airradda This is what I have:

version: '3.8'
services:
  ollama-service:
    image: ollama/ollama:0.1.22-rocm
    container_name: ollama-rocm
    restart: unless-stopped
    network_mode: host
    devices:
      - /dev/kfd
      - /dev/dri
    group_add:
      - video
    stdin_open: true
    tty: true
    volumes:
      - nfs-ollama:/root/.ollama
    environment:
      - "PGID=100"
      - "PUID=1026"
      - "TZ=America/Los_Angeles"

  ollama-webui:
    image: ollamawebui/ollama-webui
    container_name: ollama-webui
    restart: unless-stopped
    volumes:
      - nfs-ollama-webui:/app/backend/data
    ports:
      - "3010:8080"
    environment:
      - OLLAMA_API_BASE_URL=http://192.168.86.124:11434/api
      - PGID=100
      - PUID=1026
      - TZ=America/Los_Angeles

volumes:
  nfs-ollama:
    external: true
  nfs-ollama-webui:
    external: true    

But I cannot connect using ollama-webui. I get 404 page not found when I try to go to http://192.168.86.124:11434/api What am I missing? Thanks again!

<!-- gh-comment-id:1913334004 --> @ignacio82 commented on GitHub (Jan 27, 2024): Thanks @Airradda This is what I have: ``` version: '3.8' services: ollama-service: image: ollama/ollama:0.1.22-rocm container_name: ollama-rocm restart: unless-stopped network_mode: host devices: - /dev/kfd - /dev/dri group_add: - video stdin_open: true tty: true volumes: - nfs-ollama:/root/.ollama environment: - "PGID=100" - "PUID=1026" - "TZ=America/Los_Angeles" ollama-webui: image: ollamawebui/ollama-webui container_name: ollama-webui restart: unless-stopped volumes: - nfs-ollama-webui:/app/backend/data ports: - "3010:8080" environment: - OLLAMA_API_BASE_URL=http://192.168.86.124:11434/api - PGID=100 - PUID=1026 - TZ=America/Los_Angeles volumes: nfs-ollama: external: true nfs-ollama-webui: external: true ``` But I cannot connect using ollama-webui. I get `404 page not found` when I try to go to `http://192.168.86.124:11434/api` What am I missing? Thanks again!
Author
Owner

@Airradda commented on GitHub (Jan 27, 2024):

@ignacio82 This is what I quickly spun up. It's worth noting that both are running on the same machine:

  Ollama-WebUI:
    image: ghcr.io/ollama-webui/ollama-webui:main
    container_name: Ollama-WebUI
    network_mode: host
    volumes:
      - ./Ollama-WebUI:/app/backend/data
    depends_on:
      - ollama
    environment:
      - 'OLLAMA_API_BASE_URL=http://0.0.0.0:11434/api'
    restart: unless-stopped
<!-- gh-comment-id:1913349721 --> @Airradda commented on GitHub (Jan 27, 2024): @ignacio82 This is what I quickly spun up. It's worth noting that both are running on the same machine: ```yaml Ollama-WebUI: image: ghcr.io/ollama-webui/ollama-webui:main container_name: Ollama-WebUI network_mode: host volumes: - ./Ollama-WebUI:/app/backend/data depends_on: - ollama environment: - 'OLLAMA_API_BASE_URL=http://0.0.0.0:11434/api' restart: unless-stopped ```
Author
Owner

@ignacio82 commented on GitHub (Jan 27, 2024):

I believe there is a problem with image: ollama/ollama:0.1.22-rocm. The following docker-compose does not work:

version: '3.8'
services:
  ollama-rocm:
    image: ollama/ollama:0.1.22-rocm
    container_name: ollama-rocm
    restart: unless-stopped
    network_mode: host
    devices:
      - /dev/kfd
      - /dev/dri
    group_add:
      - video
    stdin_open: true
    tty: true
    volumes:
      - nfs-ollama:/root/.ollama
    environment:
      - "PGID=100"
      - "PUID=1026"
      - "TZ=America/Los_Angeles"

  ollama-webui:
    image: ghcr.io/ollama-webui/ollama-webui:main
    container_name: ollama-webui
    restart: unless-stopped
    volumes:
      - nfs-ollama-webui:/app/backend/data
    depends_on:
      - ollama-rocm
    network_mode: host
    environment:
      - 'OLLAMA_API_BASE_URL=http://0.0.0.0:11434/api'
      - PGID=100
      - PUID=1026


volumes:
  nfs-ollama:
    external: true
  nfs-ollama-webui:
    external: true    

When I type something on the ui, i get Uh-oh! There was an issue connecting to Ollama.. However, if i just change to ollama/ollama:latest everything works fine. Any ideas?

<!-- gh-comment-id:1913374126 --> @ignacio82 commented on GitHub (Jan 27, 2024): I believe there is a problem with image: ollama/ollama:0.1.22-rocm. The following docker-compose does not work: ``` version: '3.8' services: ollama-rocm: image: ollama/ollama:0.1.22-rocm container_name: ollama-rocm restart: unless-stopped network_mode: host devices: - /dev/kfd - /dev/dri group_add: - video stdin_open: true tty: true volumes: - nfs-ollama:/root/.ollama environment: - "PGID=100" - "PUID=1026" - "TZ=America/Los_Angeles" ollama-webui: image: ghcr.io/ollama-webui/ollama-webui:main container_name: ollama-webui restart: unless-stopped volumes: - nfs-ollama-webui:/app/backend/data depends_on: - ollama-rocm network_mode: host environment: - 'OLLAMA_API_BASE_URL=http://0.0.0.0:11434/api' - PGID=100 - PUID=1026 volumes: nfs-ollama: external: true nfs-ollama-webui: external: true ``` When I type something on the ui, i get `Uh-oh! There was an issue connecting to Ollama.`. However, if i just change to ollama/ollama:latest everything works fine. Any ideas?
Author
Owner

@hiepxanh commented on GitHub (Jan 28, 2024):

@Airradda what is your GPU device are you using? can you share worked stack?

This is what has worked for me:

<!-- gh-comment-id:1913413018 --> @hiepxanh commented on GitHub (Jan 28, 2024): @Airradda what is your GPU device are you using? can you share worked stack? > This is what has worked for me:
Author
Owner

@Airradda commented on GitHub (Jan 28, 2024):

System

GPU: 6950 XT (I think gfx1030)
CPU: Ryzen 9 3900X
Container Engine: Podman v4.9.0

Compose File

version: '3'
services:
  Ollama:
    image: ollama/ollama:0.1.22-rocm
    restart: unless-stopped
    network_mode: host
    container_name: Ollama
    devices:
      - /dev/kfd
      - /dev/dri
    volumes:
      - ~/.ollama:/root/.ollama

  Ollama-WebUI:
    image: ghcr.io/ollama-webui/ollama-webui:main
    container_name: Ollama-WebUI
    network_mode: host
    volumes:
      - ./Ollama-WebUI:/app/backend/data
    depends_on:
      - ollama
    environment:
      - 'OLLAMA_API_BASE_URL=http://0.0.0.0:11434/api'
    restart: unless-stopped

Modelfile

# Modelfile generated by "ollama show"
# To build a new Modelfile based on this one, replace the FROM line with:
FROM starling-lm:latest
PARAMETER num_gpu 50
<!-- gh-comment-id:1913434274 --> @Airradda commented on GitHub (Jan 28, 2024): #### System GPU: `6950 XT` (I think `gfx1030`) CPU: `Ryzen 9 3900X` Container Engine: `Podman v4.9.0` #### Compose File ```yaml version: '3' services: Ollama: image: ollama/ollama:0.1.22-rocm restart: unless-stopped network_mode: host container_name: Ollama devices: - /dev/kfd - /dev/dri volumes: - ~/.ollama:/root/.ollama Ollama-WebUI: image: ghcr.io/ollama-webui/ollama-webui:main container_name: Ollama-WebUI network_mode: host volumes: - ./Ollama-WebUI:/app/backend/data depends_on: - ollama environment: - 'OLLAMA_API_BASE_URL=http://0.0.0.0:11434/api' restart: unless-stopped ``` #### Modelfile ``` # Modelfile generated by "ollama show" # To build a new Modelfile based on this one, replace the FROM line with: FROM starling-lm:latest PARAMETER num_gpu 50 ```
Author
Owner

@RootPrivileges commented on GitHub (Jan 28, 2024):

@ignacio82 this is working for me. It uses Docker name resolution to dynamically point the webui to the ollama container while staying inside of Docker (I'm wondering if the IP you've hardcoded in the env variable in your compose has changed since you found the IP maybe?)

I can see the AMD library loading in the container log output, and the GPU getting detected. Unfortunately, I only have a 2GB VRAM iGPU, so it falls back to CPU-only, but all the logs up until then appear like it would use the GPU correctly if I had one that met the minimum requirements. I am, however, able to make queries to the model and get answers back, even in this degraded state.

---
version: "3.7"

services:
  ollama:
    container_name: ollama
    image: ollama/ollama:0.1.22-rocm
    volumes:
      - ./ollama:/root/.ollama
    devices:
      - /dev/dri
      - /dev/kfd
    restart: unless-stopped

  ollama-webui:
    image: ghcr.io/ollama-webui/ollama-webui:main
    container_name: ollama-webui
    ports:
      - "3000:8080"
    volumes:
      - ./ollama-webui:/app/backend/data
    environment:
      - 'OLLAMA_API_BASE_URL=http://ollama:11434/api'
    restart: unless-stopped
<!-- gh-comment-id:1913655623 --> @RootPrivileges commented on GitHub (Jan 28, 2024): @ignacio82 this is working for me. It uses Docker name resolution to dynamically point the webui to the ollama container while staying inside of Docker (I'm wondering if the IP you've hardcoded in the env variable in your compose has changed since you found the IP maybe?) I can see the AMD library loading in the container log output, and the GPU getting detected. Unfortunately, I only have a 2GB VRAM iGPU, so it falls back to CPU-only, but all the logs up until then appear like it would use the GPU correctly if I had one that met the minimum requirements. I am, however, able to make queries to the model and get answers back, even in this degraded state. ``` --- version: "3.7" services: ollama: container_name: ollama image: ollama/ollama:0.1.22-rocm volumes: - ./ollama:/root/.ollama devices: - /dev/dri - /dev/kfd restart: unless-stopped ollama-webui: image: ghcr.io/ollama-webui/ollama-webui:main container_name: ollama-webui ports: - "3000:8080" volumes: - ./ollama-webui:/app/backend/data environment: - 'OLLAMA_API_BASE_URL=http://ollama:11434/api' restart: unless-stopped ```
Author
Owner

@dhiltgen commented on GitHub (Jan 28, 2024):

@ignacio82 if you're still having troubles, I'd suggest we isolate this down to reduce variables. Focus on just getting the rocm container to run on the GPU and process prompts without adding the webui complexity into the mix, then once that's working, add the webui back. If you docker exec into the running ollama container, try to use the CLI, and confirm that works, and check the container logs to verify the server is seeing your GPU and running on it.

<!-- gh-comment-id:1913750207 --> @dhiltgen commented on GitHub (Jan 28, 2024): @ignacio82 if you're still having troubles, I'd suggest we isolate this down to reduce variables. Focus on just getting the rocm container to run on the GPU and process prompts without adding the webui complexity into the mix, then once that's working, add the webui back. If you `docker exec` into the running ollama container, try to use the CLI, and confirm that works, and check the container logs to verify the server is seeing your GPU and running on it.
Author
Owner

@ignacio82 commented on GitHub (Jan 28, 2024):

@dhiltgen I think my problem is with ollama-rocm . I can get into the container by running sudo docker exec -ti ollama-rocm bash. However, when i try to run I model i get kicked out of the container:

ignacio@mini-server:~$ sudo docker exec -ti ollama-rocm bash
[root@mini-server /]# ollama run --verbose llama2
⠹ ignacio@mini-server:~$ 

I'm not sure how to further debug this. Thanks for the help.

<!-- gh-comment-id:1913765664 --> @ignacio82 commented on GitHub (Jan 28, 2024): @dhiltgen I think my problem is with `ollama-rocm` . I can get into the container by running `sudo docker exec -ti ollama-rocm bash`. However, when i try to run I model i get kicked out of the container: ``` ignacio@mini-server:~$ sudo docker exec -ti ollama-rocm bash [root@mini-server /]# ollama run --verbose llama2 ⠹ ignacio@mini-server:~$ ``` I'm not sure how to further debug this. Thanks for the help.
Author
Owner

@kescherCode commented on GitHub (Jan 29, 2024):

@ignacio82 Try running docker logs ollama-rocm -f in a separate window

<!-- gh-comment-id:1913776230 --> @kescherCode commented on GitHub (Jan 29, 2024): @ignacio82 Try running `docker logs ollama-rocm -f` in a separate window
Author
Owner

@ignacio82 commented on GitHub (Jan 29, 2024):

ollama_logs.txt

I attached the logs. I'm guessing this is the problem:

rocBLAS error: Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1035
 List of available TensileLibrary Files : 
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx940.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx942.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx941.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx803.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat"
SIGSEGV: segmentation violation
PC=0x7faae4c45bc7 m=13 sigcode=128
signal arrived during cgo execution
<!-- gh-comment-id:1913842582 --> @ignacio82 commented on GitHub (Jan 29, 2024): [ollama_logs.txt](https://github.com/ollama/ollama/files/14078832/ollama_logs.txt) I attached the logs. I'm guessing this is the problem: ``` rocBLAS error: Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1035 List of available TensileLibrary Files : "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx940.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx942.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx941.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx803.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat" SIGSEGV: segmentation violation PC=0x7faae4c45bc7 m=13 sigcode=128 signal arrived during cgo execution ```
Author
Owner

@kescherCode commented on GitHub (Jan 29, 2024):

@ignacio82 Ah, you need to set HSA_OVERRIDE_GFX_VERSION to 10.3.0 as an env var for the container when running it, since the ISA for gfx1030 to gfx1035 are identical, but ROCm ignores that fact very well.

<!-- gh-comment-id:1914113325 --> @kescherCode commented on GitHub (Jan 29, 2024): @ignacio82 Ah, you need to set `HSA_OVERRIDE_GFX_VERSION` to `10.3.0` as an env var for the container when running it, since the ISA for gfx1030 to gfx1035 are identical, but ROCm ignores that fact very well.
Author
Owner

@hiepxanh commented on GitHub (Jan 29, 2024):

They just fix it few days ago, after i crying for a week, you need wait more few week until they decide to build the new merge

https://github.com/ROCm/Tensile/pull/1862

<!-- gh-comment-id:1914444168 --> @hiepxanh commented on GitHub (Jan 29, 2024): They just fix it few days ago, after i crying for a week, you need wait more few week until they decide to build the new merge https://github.com/ROCm/Tensile/pull/1862
Author
Owner

@meminens commented on GitHub (Jan 30, 2024):

Can someone please help set ollama/ROCm set up on Arch Linux with AMD GPU 7900 XTX? Thank you!

<!-- gh-comment-id:1915799409 --> @meminens commented on GitHub (Jan 30, 2024): Can someone please help set ollama/ROCm set up on Arch Linux with AMD GPU 7900 XTX? Thank you!
Author
Owner

@kokizzu commented on GitHub (Jan 30, 2024):

You guys rocks 🥇
tested with 6600XT took 20-30s for same query while using CPU took 60s

https://kokizzu.blogspot.com/2024/01/ollama-with-amd-gpu-rocm.html

<!-- gh-comment-id:1917725907 --> @kokizzu commented on GitHub (Jan 30, 2024): You guys rocks 🥇 tested with 6600XT took 20-30s for same query while using CPU took 60s https://kokizzu.blogspot.com/2024/01/ollama-with-amd-gpu-rocm.html
Author
Owner

@ignacio82 commented on GitHub (Feb 1, 2024):

@kescherCode it made no difference

$ sudo docker logs ollama-rocm -f --since 30s
[sudo] password for ignacio: 
[GIN] 2024/01/31 - 17:52:25 | 200 |     347.632µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/01/31 - 17:52:25 | 200 |    5.715307ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/01/31 - 17:52:25 | 200 |    1.037968ms |       127.0.0.1 | POST     "/api/show"
2024/01/31 17:52:25 cpu_common.go:11: INFO CPU has AVX2
loading library /tmp/ollama953082474/rocm_v5/libext_server.so
2024/01/31 17:52:25 dyn_ext_server.go:90: INFO Loading Dynamic llm server: /tmp/ollama953082474/rocm_v5/libext_server.so
2024/01/31 17:52:25 dyn_ext_server.go:145: INFO Initializing llama server
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 ROCm devices:
  Device 0: AMD Radeon Graphics, compute capability 10.3, VMM: no
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from /root/.ollama/models/blobs/sha256:22f7f8ef5f4c791c1b03d7eb414399294764d7cc82c7e94aa81a1feb80a983a2 (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 32
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 2
llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  17:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  18:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_0:  225 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format           = GGUF V2
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 4096
llm_load_print_meta: n_embd_v_gqa     = 4096
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 11008
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 6.74 B
llm_load_print_meta: model size       = 3.56 GiB (4.54 BPW) 
llm_load_print_meta: general.name     = LLaMA v2
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
CUDA error: invalid argument
  current device: 0, in function ggml_backend_cuda_get_device_memory at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:11084
  hipMemGetInfo(free, total)
GGML_ASSERT: /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:237: !"CUDA error"
No symbol table is loaded.  Use the "file" command.
ptrace: Operation not permitted.
No stack.
The program is not being run.
SIGABRT: abort
PC=0x7f42759db387 m=19 sigcode=18446744073709551610
signal arrived during cgo execution

goroutine 53 [syscall]:
runtime.cgocall(0x9b73b0, 0xc00028a7f8)
	/usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc00028a7d0 sp=0xc00028a798 pc=0x409b0b
github.com/jmorganca/ollama/llm._Cfunc_dyn_llama_server_init({0x7f41ec000ba0, 0x7f41f68f7430, 0x7f41f68f7ba0, 0x7f41f68f7c30, 0x7f41f68f7de0, 0x7f41f68f7f50, 0x7f41f68f8410, 0x7f41f68f83f0, 0x7f41f68f84a0, 0x7f41f68f8a50, ...}, ...)
	_cgo_gotypes.go:282 +0x45 fp=0xc00028a7f8 sp=0xc00028a7d0 pc=0x7c3ca5
github.com/jmorganca/ollama/llm.newDynExtServer.func7(0xaea801?, 0xc?)
	/go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:148 +0xef fp=0xc00028a8e8 sp=0xc00028a7f8 pc=0x7c51cf
github.com/jmorganca/ollama/llm.newDynExtServer({0xc0004a6180, 0x2d}, {0xc000036310, _}, {_, _, _}, {0x0, 0x0, 0x0}, ...)
	/go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:148 +0xa45 fp=0xc00028ab88 sp=0xc00028a8e8 pc=0x7c4e65
github.com/jmorganca/ollama/llm.newLlmServer({{_, _, _}, {_, _}, {_, _}}, {_, _}, {0x0, ...}, ...)
	/go/src/github.com/jmorganca/ollama/llm/llm.go:148 +0x36a fp=0xc00028ad48 sp=0xc00028ab88 pc=0x7c166a
github.com/jmorganca/ollama/llm.New({0x0?, 0x1000100000100?}, {0xc000036310, _}, {_, _, _}, {0x0, 0x0, 0x0}, ...)
	/go/src/github.com/jmorganca/ollama/llm/llm.go:123 +0x6f9 fp=0xc00028afb8 sp=0xc00028ad48 pc=0x7c1099
github.com/jmorganca/ollama/server.load(0xc0003f6000?, 0xc0003f6000, {{0x0, 0x1000, 0x200, 0x1, 0xffffffffffffffff, 0x0, 0x0, 0x1, ...}, ...}, ...)
	/go/src/github.com/jmorganca/ollama/server/routes.go:83 +0x3a5 fp=0xc00028b138 sp=0xc00028afb8 pc=0x992405
github.com/jmorganca/ollama/server.ChatHandler(0xc0003ee800)
	/go/src/github.com/jmorganca/ollama/server/routes.go:1078 +0x828 fp=0xc00028b748 sp=0xc00028b138 pc=0x99cfc8
github.com/gin-gonic/gin.(*Context).Next(...)
	/root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174
github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func1(0xc0003ee800)
	/go/src/github.com/jmorganca/ollama/server/routes.go:890 +0x68 fp=0xc00028b780 sp=0xc00028b748 pc=0x99bb08
github.com/gin-gonic/gin.(*Context).Next(...)
	/root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174
github.com/gin-gonic/gin.CustomRecoveryWithWriter.func1(0xc0003ee800)
	/root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/recovery.go:102 +0x7a fp=0xc00028b7d0 sp=0xc00028b780 pc=0x97691a
github.com/gin-gonic/gin.(*Context).Next(...)
	/root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174
github.com/gin-gonic/gin.LoggerWithConfig.func1(0xc0003ee800)
	/root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/logger.go:240 +0xde fp=0xc00028b980 sp=0xc00028b7d0 pc=0x975abe
github.com/gin-gonic/gin.(*Context).Next(...)
	/root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174
github.com/gin-gonic/gin.(*Engine).handleHTTPRequest(0xc0000e5ba0, 0xc0003ee800)
	/root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:620 +0x65b fp=0xc00028bb08 sp=0xc00028b980 pc=0x974b7b
github.com/gin-gonic/gin.(*Engine).ServeHTTP(0xc0000e5ba0, {0x106c1560?, 0xc00038c0e0}, 0xc0003ee900)
	/root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:576 +0x1dd fp=0xc00028bb48 sp=0xc00028bb08 pc=0x97433d
net/http.serverHandler.ServeHTTP({0x106bf880?}, {0x106c1560?, 0xc00038c0e0?}, 0x6?)
	/usr/local/go/src/net/http/server.go:2938 +0x8e fp=0xc00028bb78 sp=0xc00028bb48 pc=0x6ced4e
net/http.(*conn).serve(0xc000312000, {0x106c2bc8, 0xc000304c90})
	/usr/local/go/src/net/http/server.go:2009 +0x5f4 fp=0xc00028bfb8 sp=0xc00028bb78 pc=0x6cac34
net/http.(*Server).Serve.func3()
	/usr/local/go/src/net/http/server.go:3086 +0x28 fp=0xc00028bfe0 sp=0xc00028bfb8 pc=0x6cf568
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00028bfe8 sp=0xc00028bfe0 pc=0x46e2c1
created by net/http.(*Server).Serve in goroutine 1
	/usr/local/go/src/net/http/server.go:3086 +0x5cb

goroutine 1 [IO wait, 1 minutes]:
runtime.gopark(0x41c1a8?, 0x7f422c31ad58?, 0x98?, 0x38?, 0x4f703d?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000553828 sp=0xc000553808 pc=0x43e7ee
runtime.netpollblock(0xc0005538b8?, 0x4092a6?, 0x0?)
	/usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc000553860 sp=0xc000553828 pc=0x437277
internal/poll.runtime_pollWait(0x7f4276c11e80, 0x72)
	/usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc000553880 sp=0xc000553860 pc=0x468a05
internal/poll.(*pollDesc).wait(0xc0003f4080?, 0x16?, 0x0)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0005538a8 sp=0xc000553880 pc=0x4efc87
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc0003f4080)
	/usr/local/go/src/internal/poll/fd_unix.go:611 +0x2ac fp=0xc000553950 sp=0xc0005538a8 pc=0x4f516c
net.(*netFD).accept(0xc0003f4080)
	/usr/local/go/src/net/fd_unix.go:172 +0x29 fp=0xc000553a08 sp=0xc000553950 pc=0x56bd49
net.(*TCPListener).accept(0xc0003cb580)
	/usr/local/go/src/net/tcpsock_posix.go:152 +0x1e fp=0xc000553a30 sp=0xc000553a08 pc=0x580b5e
net.(*TCPListener).Accept(0xc0003cb580)
	/usr/local/go/src/net/tcpsock.go:315 +0x30 fp=0xc000553a60 sp=0xc000553a30 pc=0x57fd10
net/http.(*onceCloseListener).Accept(0xc000312000?)
	<autogenerated>:1 +0x24 fp=0xc000553a78 sp=0xc000553a60 pc=0x6f1ae4
net/http.(*Server).Serve(0xc000306ff0, {0x106c1350, 0xc0003cb580})
	/usr/local/go/src/net/http/server.go:3056 +0x364 fp=0xc000553ba8 sp=0xc000553a78 pc=0x6cf1a4
github.com/jmorganca/ollama/server.Serve({0x106c1350, 0xc0003cb580})
	/go/src/github.com/jmorganca/ollama/server/routes.go:977 +0x488 fp=0xc000553c98 sp=0xc000553ba8 pc=0x99bfe8
github.com/jmorganca/ollama/cmd.RunServer(0xc0003ee500?, {0x10b06800?, 0x4?, 0xad25e1?})
	/go/src/github.com/jmorganca/ollama/cmd/cmd.go:692 +0x199 fp=0xc000553d30 sp=0xc000553c98 pc=0x9ae499
github.com/spf13/cobra.(*Command).execute(0xc0003c7800, {0x10b06800, 0x0, 0x0})
	/root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0x87c fp=0xc000553e68 sp=0xc000553d30 pc=0x76491c
github.com/spf13/cobra.(*Command).ExecuteC(0xc0003c6c00)
	/root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000553f20 sp=0xc000553e68 pc=0x765145
github.com/spf13/cobra.(*Command).Execute(...)
	/root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	/root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
	/go/src/github.com/jmorganca/ollama/main.go:11 +0x4d fp=0xc000553f40 sp=0xc000553f20 pc=0x9b64cd
runtime.main()
	/usr/local/go/src/runtime/proc.go:267 +0x2bb fp=0xc000553fe0 sp=0xc000553f40 pc=0x43e39b
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000553fe8 sp=0xc000553fe0 pc=0x46e2c1

goroutine 2 [force gc (idle), 3 minutes]:
runtime.gopark(0x2206a24be7f3a?, 0x0?, 0x0?, 0x0?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00006afa8 sp=0xc00006af88 pc=0x43e7ee
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:404
runtime.forcegchelper()
	/usr/local/go/src/runtime/proc.go:322 +0xb3 fp=0xc00006afe0 sp=0xc00006afa8 pc=0x43e673
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006afe8 sp=0xc00006afe0 pc=0x46e2c1
created by runtime.init.6 in goroutine 1
	/usr/local/go/src/runtime/proc.go:310 +0x1a

goroutine 3 [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00006b778 sp=0xc00006b758 pc=0x43e7ee
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:404
runtime.bgsweep(0x0?)
	/usr/local/go/src/runtime/mgcsweep.go:321 +0xdf fp=0xc00006b7c8 sp=0xc00006b778 pc=0x42a73f
runtime.gcenable.func1()
	/usr/local/go/src/runtime/mgc.go:200 +0x25 fp=0xc00006b7e0 sp=0xc00006b7c8 pc=0x41f865
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006b7e8 sp=0xc00006b7e0 pc=0x46e2c1
created by runtime.gcenable in goroutine 1
	/usr/local/go/src/runtime/mgc.go:200 +0x66

goroutine 4 [GC scavenge wait]:
runtime.gopark(0x1c1755?, 0x738fc7?, 0x0?, 0x0?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00006bf70 sp=0xc00006bf50 pc=0x43e7ee
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:404
runtime.(*scavengerState).park(0x10ad6b80)
	/usr/local/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc00006bfa0 sp=0xc00006bf70 pc=0x427f69
runtime.bgscavenge(0x0?)
	/usr/local/go/src/runtime/mgcscavenge.go:658 +0x59 fp=0xc00006bfc8 sp=0xc00006bfa0 pc=0x428519
runtime.gcenable.func2()
	/usr/local/go/src/runtime/mgc.go:201 +0x25 fp=0xc00006bfe0 sp=0xc00006bfc8 pc=0x41f805
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006bfe8 sp=0xc00006bfe0 pc=0x46e2c1
created by runtime.gcenable in goroutine 1
	/usr/local/go/src/runtime/mgc.go:201 +0xa5

goroutine 5 [finalizer wait, 9 minutes]:
runtime.gopark(0xacb5a0?, 0x10043f901?, 0x0?, 0x0?, 0x4469a5?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00006a628 sp=0xc00006a608 pc=0x43e7ee
runtime.runfinq()
	/usr/local/go/src/runtime/mfinal.go:193 +0x107 fp=0xc00006a7e0 sp=0xc00006a628 pc=0x41e8e7
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006a7e8 sp=0xc00006a7e0 pc=0x46e2c1
created by runtime.createfing in goroutine 1
	/usr/local/go/src/runtime/mfinal.go:163 +0x3d

goroutine 6 [select, 9 minutes, locked to thread]:
runtime.gopark(0xc00006c7a8?, 0x2?, 0x89?, 0xea?, 0xc00006c7a4?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00006c638 sp=0xc00006c618 pc=0x43e7ee
runtime.selectgo(0xc00006c7a8, 0xc00006c7a0, 0x0?, 0x0, 0x0?, 0x1)
	/usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc00006c758 sp=0xc00006c638 pc=0x44e325
runtime.ensureSigM.func1()
	/usr/local/go/src/runtime/signal_unix.go:1014 +0x19f fp=0xc00006c7e0 sp=0xc00006c758 pc=0x46535f
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006c7e8 sp=0xc00006c7e0 pc=0x46e2c1
created by runtime.ensureSigM in goroutine 1
	/usr/local/go/src/runtime/signal_unix.go:997 +0xc8

goroutine 18 [syscall, 9 minutes]:
runtime.notetsleepg(0x0?, 0x0?)
	/usr/local/go/src/runtime/lock_futex.go:236 +0x29 fp=0xc0000667a0 sp=0xc000066768 pc=0x411349
os/signal.signal_recv()
	/usr/local/go/src/runtime/sigqueue.go:152 +0x29 fp=0xc0000667c0 sp=0xc0000667a0 pc=0x46ac89
os/signal.loop()
	/usr/local/go/src/os/signal/signal_unix.go:23 +0x13 fp=0xc0000667e0 sp=0xc0000667c0 pc=0x6f4513
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000667e8 sp=0xc0000667e0 pc=0x46e2c1
created by os/signal.Notify.func1.1 in goroutine 1
	/usr/local/go/src/os/signal/signal.go:151 +0x1f

goroutine 19 [chan receive, 9 minutes]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000066f18 sp=0xc000066ef8 pc=0x43e7ee
runtime.chanrecv(0xc00011fec0, 0x0, 0x1)
	/usr/local/go/src/runtime/chan.go:583 +0x3cd fp=0xc000066f90 sp=0xc000066f18 pc=0x40beed
runtime.chanrecv1(0x0?, 0x0?)
	/usr/local/go/src/runtime/chan.go:442 +0x12 fp=0xc000066fb8 sp=0xc000066f90 pc=0x40baf2
github.com/jmorganca/ollama/server.Serve.func1()
	/go/src/github.com/jmorganca/ollama/server/routes.go:959 +0x25 fp=0xc000066fe0 sp=0xc000066fb8 pc=0x99c085
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000066fe8 sp=0xc000066fe0 pc=0x46e2c1
created by github.com/jmorganca/ollama/server.Serve in goroutine 1
	/go/src/github.com/jmorganca/ollama/server/routes.go:958 +0x3f6

goroutine 20 [GC worker (idle)]:
runtime.gopark(0x22084eccebe64?, 0x3?, 0x78?, 0x89?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000067750 sp=0xc000067730 pc=0x43e7ee
runtime.gcBgMarkWorker()
	/usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0000677e0 sp=0xc000067750 pc=0x4213e5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000677e8 sp=0xc0000677e0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 34 [GC worker (idle)]:
runtime.gopark(0x22084eccebb7f?, 0x3?, 0xdd?, 0x39?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000586750 sp=0xc000586730 pc=0x43e7ee
runtime.gcBgMarkWorker()
	/usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0005867e0 sp=0xc000586750 pc=0x4213e5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0005867e8 sp=0xc0005867e0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 21 [GC worker (idle)]:
runtime.gopark(0x22084ea63bb8a?, 0x3?, 0xb4?, 0xe0?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000067f50 sp=0xc000067f30 pc=0x43e7ee
runtime.gcBgMarkWorker()
	/usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000067fe0 sp=0xc000067f50 pc=0x4213e5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000067fe8 sp=0xc000067fe0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 35 [GC worker (idle)]:
runtime.gopark(0x22084e9a41e0b?, 0x1?, 0xa9?, 0x3e?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000586f50 sp=0xc000586f30 pc=0x43e7ee
runtime.gcBgMarkWorker()
	/usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000586fe0 sp=0xc000586f50 pc=0x4213e5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000586fe8 sp=0xc000586fe0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 36 [GC worker (idle)]:
runtime.gopark(0x22084eccebcde?, 0x3?, 0xcf?, 0x41?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000587750 sp=0xc000587730 pc=0x43e7ee
runtime.gcBgMarkWorker()
	/usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0005877e0 sp=0xc000587750 pc=0x4213e5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0005877e8 sp=0xc0005877e0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 37 [GC worker (idle)]:
runtime.gopark(0x22084e8276c51?, 0x3?, 0xe8?, 0x38?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000587f50 sp=0xc000587f30 pc=0x43e7ee
runtime.gcBgMarkWorker()
	/usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000587fe0 sp=0xc000587f50 pc=0x4213e5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000587fe8 sp=0xc000587fe0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 22 [GC worker (idle)]:
runtime.gopark(0x10b08520?, 0x1?, 0x5c?, 0xdf?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000068750 sp=0xc000068730 pc=0x43e7ee
runtime.gcBgMarkWorker()
	/usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0000687e0 sp=0xc000068750 pc=0x4213e5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000687e8 sp=0xc0000687e0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 50 [GC worker (idle)]:
runtime.gopark(0x22084eccebc51?, 0x3?, 0x31?, 0x66?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000582750 sp=0xc000582730 pc=0x43e7ee
runtime.gcBgMarkWorker()
	/usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0005827e0 sp=0xc000582750 pc=0x4213e5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0005827e8 sp=0xc0005827e0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 23 [GC worker (idle)]:
runtime.gopark(0x22084eccacdc3?, 0x1?, 0x80?, 0xae?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000068f50 sp=0xc000068f30 pc=0x43e7ee
runtime.gcBgMarkWorker()
	/usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000068fe0 sp=0xc000068f50 pc=0x4213e5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000068fe8 sp=0xc000068fe0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 51 [GC worker (idle)]:
runtime.gopark(0x22084ea63be0b?, 0x3?, 0xfd?, 0xee?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000582f50 sp=0xc000582f30 pc=0x43e7ee
runtime.gcBgMarkWorker()
	/usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000582fe0 sp=0xc000582f50 pc=0x4213e5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000582fe8 sp=0xc000582fe0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 24 [GC worker (idle)]:
runtime.gopark(0x22084ea63bb30?, 0x3?, 0xb2?, 0x54?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000069750 sp=0xc000069730 pc=0x43e7ee
runtime.gcBgMarkWorker()
	/usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0000697e0 sp=0xc000069750 pc=0x4213e5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000697e8 sp=0xc0000697e0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 52 [GC worker (idle)]:
runtime.gopark(0x22084ea63bbf8?, 0x1?, 0x3c?, 0x8a?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000583750 sp=0xc000583730 pc=0x43e7ee
runtime.gcBgMarkWorker()
	/usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0005837e0 sp=0xc000583750 pc=0x4213e5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0005837e8 sp=0xc0005837e0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 38 [GC worker (idle)]:
runtime.gopark(0x22084eccec14a?, 0x1?, 0x57?, 0x9b?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000588750 sp=0xc000588730 pc=0x43e7ee
runtime.gcBgMarkWorker()
	/usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0005887e0 sp=0xc000588750 pc=0x4213e5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0005887e8 sp=0xc0005887e0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 39 [GC worker (idle)]:
runtime.gopark(0x22084eccaccdd?, 0x3?, 0x48?, 0x12?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000588f50 sp=0xc000588f30 pc=0x43e7ee
runtime.gcBgMarkWorker()
	/usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000588fe0 sp=0xc000588f50 pc=0x4213e5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000588fe8 sp=0xc000588fe0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 40 [GC worker (idle)]:
runtime.gopark(0x22084eccacdeb?, 0x1?, 0x93?, 0xaf?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000589750 sp=0xc000589730 pc=0x43e7ee
runtime.gcBgMarkWorker()
	/usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0005897e0 sp=0xc000589750 pc=0x4213e5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0005897e8 sp=0xc0005897e0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 41 [GC worker (idle)]:
runtime.gopark(0x22084ea63bc48?, 0x3?, 0xdb?, 0x71?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000589f50 sp=0xc000589f30 pc=0x43e7ee
runtime.gcBgMarkWorker()
	/usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000589fe0 sp=0xc000589f50 pc=0x4213e5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000589fe8 sp=0xc000589fe0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 25 [IO wait, 1 minutes]:
runtime.gopark(0x75?, 0xb?, 0x0?, 0x0?, 0xc?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0005518f8 sp=0xc0005518d8 pc=0x43e7ee
runtime.netpollblock(0x47f078?, 0x4092a6?, 0x0?)
	/usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc000551930 sp=0xc0005518f8 pc=0x437277
internal/poll.runtime_pollWait(0x7f4276c11d88, 0x72)
	/usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc000551950 sp=0xc000551930 pc=0x468a05
internal/poll.(*pollDesc).wait(0xc0003f4100?, 0xc0005f6000?, 0x0)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000551978 sp=0xc000551950 pc=0x4efc87
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc0003f4100, {0xc0005f6000, 0x1000, 0x1000})
	/usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc000551a10 sp=0xc000551978 pc=0x4f0f7a
net.(*netFD).Read(0xc0003f4100, {0xc0005f6000?, 0x4f0145?, 0x0?})
	/usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc000551a58 sp=0xc000551a10 pc=0x569d25
net.(*conn).Read(0xc00053a038, {0xc0005f6000?, 0x0?, 0xc000416128?})
	/usr/local/go/src/net/net.go:179 +0x45 fp=0xc000551aa0 sp=0xc000551a58 pc=0x577fc5
net.(*TCPConn).Read(0xc000416120?, {0xc0005f6000?, 0x0?, 0xc000551ac0?})
	<autogenerated>:1 +0x25 fp=0xc000551ad0 sp=0xc000551aa0 pc=0x589ec5
net/http.(*connReader).Read(0xc000416120, {0xc0005f6000, 0x1000, 0x1000})
	/usr/local/go/src/net/http/server.go:791 +0x14b fp=0xc000551b20 sp=0xc000551ad0 pc=0x6c4eeb
bufio.(*Reader).fill(0xc00050e060)
	/usr/local/go/src/bufio/bufio.go:113 +0x103 fp=0xc000551b58 sp=0xc000551b20 pc=0x654b23
bufio.(*Reader).Peek(0xc00050e060, 0x4)
	/usr/local/go/src/bufio/bufio.go:151 +0x53 fp=0xc000551b78 sp=0xc000551b58 pc=0x654c53
net/http.(*conn).serve(0xc0000d2360, {0x106c2bc8, 0xc000304c90})
	/usr/local/go/src/net/http/server.go:2044 +0x75c fp=0xc000551fb8 sp=0xc000551b78 pc=0x6cad9c
net/http.(*Server).Serve.func3()
	/usr/local/go/src/net/http/server.go:3086 +0x28 fp=0xc000551fe0 sp=0xc000551fb8 pc=0x6cf568
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000551fe8 sp=0xc000551fe0 pc=0x46e2c1
created by net/http.(*Server).Serve in goroutine 1
	/usr/local/go/src/net/http/server.go:3086 +0x5cb

goroutine 9 [IO wait, 1 minutes]:
runtime.gopark(0x0?, 0xb?, 0x0?, 0x0?, 0xe?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000316da0 sp=0xc000316d80 pc=0x43e7ee
runtime.netpollblock(0x47f078?, 0x4092a6?, 0x0?)
	/usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc000316dd8 sp=0xc000316da0 pc=0x437277
internal/poll.runtime_pollWait(0x7f4276c11b98, 0x72)
	/usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc000316df8 sp=0xc000316dd8 pc=0x468a05
internal/poll.(*pollDesc).wait(0xc0003a4000?, 0xc0000aa5e1?, 0x0)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000316e20 sp=0xc000316df8 pc=0x4efc87
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc0003a4000, {0xc0000aa5e1, 0x1, 0x1})
	/usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc000316eb8 sp=0xc000316e20 pc=0x4f0f7a
net.(*netFD).Read(0xc0003a4000, {0xc0000aa5e1?, 0x0?, 0x0?})
	/usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc000316f00 sp=0xc000316eb8 pc=0x569d25
net.(*conn).Read(0xc000302000, {0xc0000aa5e1?, 0x0?, 0x0?})
	/usr/local/go/src/net/net.go:179 +0x45 fp=0xc000316f48 sp=0xc000316f00 pc=0x577fc5
net.(*TCPConn).Read(0x0?, {0xc0000aa5e1?, 0x0?, 0x0?})
	<autogenerated>:1 +0x25 fp=0xc000316f78 sp=0xc000316f48 pc=0x589ec5
net/http.(*connReader).backgroundRead(0xc0000aa5d0)
	/usr/local/go/src/net/http/server.go:683 +0x37 fp=0xc000316fc8 sp=0xc000316f78 pc=0x6c4ab7
net/http.(*connReader).startBackgroundRead.func2()
	/usr/local/go/src/net/http/server.go:679 +0x25 fp=0xc000316fe0 sp=0xc000316fc8 pc=0x6c49e5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000316fe8 sp=0xc000316fe0 pc=0x46e2c1
created by net/http.(*connReader).startBackgroundRead in goroutine 53
	/usr/local/go/src/net/http/server.go:679 +0xba

goroutine 8 [IO wait, 1 minutes]:
runtime.gopark(0x7?, 0xb?, 0x0?, 0x0?, 0xd?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0002838f8 sp=0xc0002838d8 pc=0x43e7ee
runtime.netpollblock(0x47f078?, 0x4092a6?, 0x0?)
	/usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc000283930 sp=0xc0002838f8 pc=0x437277
internal/poll.runtime_pollWait(0x7f4276c11c90, 0x72)
	/usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc000283950 sp=0xc000283930 pc=0x468a05
internal/poll.(*pollDesc).wait(0xc00003a080?, 0xc00012a000?, 0x0)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000283978 sp=0xc000283950 pc=0x4efc87
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc00003a080, {0xc00012a000, 0x1000, 0x1000})
	/usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc000283a10 sp=0xc000283978 pc=0x4f0f7a
net.(*netFD).Read(0xc00003a080, {0xc00012a000?, 0x4f0145?, 0x0?})
	/usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc000283a58 sp=0xc000283a10 pc=0x569d25
net.(*conn).Read(0xc00006e038, {0xc00012a000?, 0x0?, 0xc0003041e8?})
	/usr/local/go/src/net/net.go:179 +0x45 fp=0xc000283aa0 sp=0xc000283a58 pc=0x577fc5
net.(*TCPConn).Read(0xc0003041e0?, {0xc00012a000?, 0x0?, 0xc00028fac0?})
	<autogenerated>:1 +0x25 fp=0xc000283ad0 sp=0xc000283aa0 pc=0x589ec5
net/http.(*connReader).Read(0xc0003041e0, {0xc00012a000, 0x1000, 0x1000})
	/usr/local/go/src/net/http/server.go:791 +0x14b fp=0xc000283b20 sp=0xc000283ad0 pc=0x6c4eeb
bufio.(*Reader).fill(0xc0005c4060)
	/usr/local/go/src/bufio/bufio.go:113 +0x103 fp=0xc000283b58 sp=0xc000283b20 pc=0x654b23
bufio.(*Reader).Peek(0xc0005c4060, 0x4)
	/usr/local/go/src/bufio/bufio.go:151 +0x53 fp=0xc000283b78 sp=0xc000283b58 pc=0x654c53
net/http.(*conn).serve(0xc0005f8120, {0x106c2bc8, 0xc000304c90})
	/usr/local/go/src/net/http/server.go:2044 +0x75c fp=0xc000283fb8 sp=0xc000283b78 pc=0x6cad9c
net/http.(*Server).Serve.func3()
	/usr/local/go/src/net/http/server.go:3086 +0x28 fp=0xc000283fe0 sp=0xc000283fb8 pc=0x6cf568
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000283fe8 sp=0xc000283fe0 pc=0x46e2c1
created by net/http.(*Server).Serve in goroutine 1
	/usr/local/go/src/net/http/server.go:3086 +0x5cb

rax    0x0
rbx    0x7f41f6ace70f
rcx    0x7f42759db387
rdx    0x6
rdi    0x1
rsi    0x18
rbp    0x2b4c
rsp    0x7f42057f9348
r8     0x0
r9     0x1
r10    0x8
r11    0x202
r12    0x7f4275d6d868
r13    0x1
r14    0x7f41f6ace4a4
r15    0x7f41f6ace72a
rip    0x7f42759db387
rflags 0x202
cs     0x33
fs     0x0
gs     0x0

$ rocminfo | grep 'Name'    
  Name:                    AMD Ryzen 9 6900HX with Radeon Graphics
  Marketing Name:          AMD Ryzen 9 6900HX with Radeon Graphics
  Vendor Name:             CPU                                
  Name:                    gfx1035                            
  Marketing Name:          AMD Radeon Graphics                
  Vendor Name:             AMD                                
      Name:                    amdgcn-amd-amdhsa--gfx1035         

Any ideas?

<!-- gh-comment-id:1920346531 --> @ignacio82 commented on GitHub (Feb 1, 2024): @kescherCode it made no difference ``` $ sudo docker logs ollama-rocm -f --since 30s [sudo] password for ignacio: [GIN] 2024/01/31 - 17:52:25 | 200 | 347.632µs | 127.0.0.1 | HEAD "/" [GIN] 2024/01/31 - 17:52:25 | 200 | 5.715307ms | 127.0.0.1 | POST "/api/show" [GIN] 2024/01/31 - 17:52:25 | 200 | 1.037968ms | 127.0.0.1 | POST "/api/show" 2024/01/31 17:52:25 cpu_common.go:11: INFO CPU has AVX2 loading library /tmp/ollama953082474/rocm_v5/libext_server.so 2024/01/31 17:52:25 dyn_ext_server.go:90: INFO Loading Dynamic llm server: /tmp/ollama953082474/rocm_v5/libext_server.so 2024/01/31 17:52:25 dyn_ext_server.go:145: INFO Initializing llama server ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes ggml_init_cublas: found 1 ROCm devices: Device 0: AMD Radeon Graphics, compute capability 10.3, VMM: no llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from /root/.ollama/models/blobs/sha256:22f7f8ef5f4c791c1b03d7eb414399294764d7cc82c7e94aa81a1feb80a983a2 (version GGUF V2) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = LLaMA v2 llama_model_loader: - kv 2: llama.context_length u32 = 4096 llama_model_loader: - kv 3: llama.embedding_length u32 = 4096 llama_model_loader: - kv 4: llama.block_count u32 = 32 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 11008 llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 32 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: general.file_type u32 = 2 llama_model_loader: - kv 11: tokenizer.ggml.model str = llama llama_model_loader: - kv 12: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<... llama_model_loader: - kv 13: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... llama_model_loader: - kv 15: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 17: tokenizer.ggml.unknown_token_id u32 = 0 llama_model_loader: - kv 18: general.quantization_version u32 = 2 llama_model_loader: - type f32: 65 tensors llama_model_loader: - type q4_0: 225 tensors llama_model_loader: - type q6_K: 1 tensors llm_load_vocab: special tokens definition check successful ( 259/32000 ). llm_load_print_meta: format = GGUF V2 llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train = 4096 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 32 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 1 llm_load_print_meta: n_embd_k_gqa = 4096 llm_load_print_meta: n_embd_v_gqa = 4096 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: n_ff = 11008 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 4096 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: model type = 7B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 6.74 B llm_load_print_meta: model size = 3.56 GiB (4.54 BPW) llm_load_print_meta: general.name = LLaMA v2 llm_load_print_meta: BOS token = 1 '<s>' llm_load_print_meta: EOS token = 2 '</s>' llm_load_print_meta: UNK token = 0 '<unk>' llm_load_print_meta: LF token = 13 '<0x0A>' CUDA error: invalid argument current device: 0, in function ggml_backend_cuda_get_device_memory at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:11084 hipMemGetInfo(free, total) GGML_ASSERT: /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:237: !"CUDA error" No symbol table is loaded. Use the "file" command. ptrace: Operation not permitted. No stack. The program is not being run. SIGABRT: abort PC=0x7f42759db387 m=19 sigcode=18446744073709551610 signal arrived during cgo execution goroutine 53 [syscall]: runtime.cgocall(0x9b73b0, 0xc00028a7f8) /usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc00028a7d0 sp=0xc00028a798 pc=0x409b0b github.com/jmorganca/ollama/llm._Cfunc_dyn_llama_server_init({0x7f41ec000ba0, 0x7f41f68f7430, 0x7f41f68f7ba0, 0x7f41f68f7c30, 0x7f41f68f7de0, 0x7f41f68f7f50, 0x7f41f68f8410, 0x7f41f68f83f0, 0x7f41f68f84a0, 0x7f41f68f8a50, ...}, ...) _cgo_gotypes.go:282 +0x45 fp=0xc00028a7f8 sp=0xc00028a7d0 pc=0x7c3ca5 github.com/jmorganca/ollama/llm.newDynExtServer.func7(0xaea801?, 0xc?) /go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:148 +0xef fp=0xc00028a8e8 sp=0xc00028a7f8 pc=0x7c51cf github.com/jmorganca/ollama/llm.newDynExtServer({0xc0004a6180, 0x2d}, {0xc000036310, _}, {_, _, _}, {0x0, 0x0, 0x0}, ...) /go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:148 +0xa45 fp=0xc00028ab88 sp=0xc00028a8e8 pc=0x7c4e65 github.com/jmorganca/ollama/llm.newLlmServer({{_, _, _}, {_, _}, {_, _}}, {_, _}, {0x0, ...}, ...) /go/src/github.com/jmorganca/ollama/llm/llm.go:148 +0x36a fp=0xc00028ad48 sp=0xc00028ab88 pc=0x7c166a github.com/jmorganca/ollama/llm.New({0x0?, 0x1000100000100?}, {0xc000036310, _}, {_, _, _}, {0x0, 0x0, 0x0}, ...) /go/src/github.com/jmorganca/ollama/llm/llm.go:123 +0x6f9 fp=0xc00028afb8 sp=0xc00028ad48 pc=0x7c1099 github.com/jmorganca/ollama/server.load(0xc0003f6000?, 0xc0003f6000, {{0x0, 0x1000, 0x200, 0x1, 0xffffffffffffffff, 0x0, 0x0, 0x1, ...}, ...}, ...) /go/src/github.com/jmorganca/ollama/server/routes.go:83 +0x3a5 fp=0xc00028b138 sp=0xc00028afb8 pc=0x992405 github.com/jmorganca/ollama/server.ChatHandler(0xc0003ee800) /go/src/github.com/jmorganca/ollama/server/routes.go:1078 +0x828 fp=0xc00028b748 sp=0xc00028b138 pc=0x99cfc8 github.com/gin-gonic/gin.(*Context).Next(...) /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174 github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func1(0xc0003ee800) /go/src/github.com/jmorganca/ollama/server/routes.go:890 +0x68 fp=0xc00028b780 sp=0xc00028b748 pc=0x99bb08 github.com/gin-gonic/gin.(*Context).Next(...) /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174 github.com/gin-gonic/gin.CustomRecoveryWithWriter.func1(0xc0003ee800) /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/recovery.go:102 +0x7a fp=0xc00028b7d0 sp=0xc00028b780 pc=0x97691a github.com/gin-gonic/gin.(*Context).Next(...) /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174 github.com/gin-gonic/gin.LoggerWithConfig.func1(0xc0003ee800) /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/logger.go:240 +0xde fp=0xc00028b980 sp=0xc00028b7d0 pc=0x975abe github.com/gin-gonic/gin.(*Context).Next(...) /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174 github.com/gin-gonic/gin.(*Engine).handleHTTPRequest(0xc0000e5ba0, 0xc0003ee800) /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:620 +0x65b fp=0xc00028bb08 sp=0xc00028b980 pc=0x974b7b github.com/gin-gonic/gin.(*Engine).ServeHTTP(0xc0000e5ba0, {0x106c1560?, 0xc00038c0e0}, 0xc0003ee900) /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:576 +0x1dd fp=0xc00028bb48 sp=0xc00028bb08 pc=0x97433d net/http.serverHandler.ServeHTTP({0x106bf880?}, {0x106c1560?, 0xc00038c0e0?}, 0x6?) /usr/local/go/src/net/http/server.go:2938 +0x8e fp=0xc00028bb78 sp=0xc00028bb48 pc=0x6ced4e net/http.(*conn).serve(0xc000312000, {0x106c2bc8, 0xc000304c90}) /usr/local/go/src/net/http/server.go:2009 +0x5f4 fp=0xc00028bfb8 sp=0xc00028bb78 pc=0x6cac34 net/http.(*Server).Serve.func3() /usr/local/go/src/net/http/server.go:3086 +0x28 fp=0xc00028bfe0 sp=0xc00028bfb8 pc=0x6cf568 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00028bfe8 sp=0xc00028bfe0 pc=0x46e2c1 created by net/http.(*Server).Serve in goroutine 1 /usr/local/go/src/net/http/server.go:3086 +0x5cb goroutine 1 [IO wait, 1 minutes]: runtime.gopark(0x41c1a8?, 0x7f422c31ad58?, 0x98?, 0x38?, 0x4f703d?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000553828 sp=0xc000553808 pc=0x43e7ee runtime.netpollblock(0xc0005538b8?, 0x4092a6?, 0x0?) /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc000553860 sp=0xc000553828 pc=0x437277 internal/poll.runtime_pollWait(0x7f4276c11e80, 0x72) /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc000553880 sp=0xc000553860 pc=0x468a05 internal/poll.(*pollDesc).wait(0xc0003f4080?, 0x16?, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0005538a8 sp=0xc000553880 pc=0x4efc87 internal/poll.(*pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0xc0003f4080) /usr/local/go/src/internal/poll/fd_unix.go:611 +0x2ac fp=0xc000553950 sp=0xc0005538a8 pc=0x4f516c net.(*netFD).accept(0xc0003f4080) /usr/local/go/src/net/fd_unix.go:172 +0x29 fp=0xc000553a08 sp=0xc000553950 pc=0x56bd49 net.(*TCPListener).accept(0xc0003cb580) /usr/local/go/src/net/tcpsock_posix.go:152 +0x1e fp=0xc000553a30 sp=0xc000553a08 pc=0x580b5e net.(*TCPListener).Accept(0xc0003cb580) /usr/local/go/src/net/tcpsock.go:315 +0x30 fp=0xc000553a60 sp=0xc000553a30 pc=0x57fd10 net/http.(*onceCloseListener).Accept(0xc000312000?) <autogenerated>:1 +0x24 fp=0xc000553a78 sp=0xc000553a60 pc=0x6f1ae4 net/http.(*Server).Serve(0xc000306ff0, {0x106c1350, 0xc0003cb580}) /usr/local/go/src/net/http/server.go:3056 +0x364 fp=0xc000553ba8 sp=0xc000553a78 pc=0x6cf1a4 github.com/jmorganca/ollama/server.Serve({0x106c1350, 0xc0003cb580}) /go/src/github.com/jmorganca/ollama/server/routes.go:977 +0x488 fp=0xc000553c98 sp=0xc000553ba8 pc=0x99bfe8 github.com/jmorganca/ollama/cmd.RunServer(0xc0003ee500?, {0x10b06800?, 0x4?, 0xad25e1?}) /go/src/github.com/jmorganca/ollama/cmd/cmd.go:692 +0x199 fp=0xc000553d30 sp=0xc000553c98 pc=0x9ae499 github.com/spf13/cobra.(*Command).execute(0xc0003c7800, {0x10b06800, 0x0, 0x0}) /root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0x87c fp=0xc000553e68 sp=0xc000553d30 pc=0x76491c github.com/spf13/cobra.(*Command).ExecuteC(0xc0003c6c00) /root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000553f20 sp=0xc000553e68 pc=0x765145 github.com/spf13/cobra.(*Command).Execute(...) /root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) /root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:985 main.main() /go/src/github.com/jmorganca/ollama/main.go:11 +0x4d fp=0xc000553f40 sp=0xc000553f20 pc=0x9b64cd runtime.main() /usr/local/go/src/runtime/proc.go:267 +0x2bb fp=0xc000553fe0 sp=0xc000553f40 pc=0x43e39b runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000553fe8 sp=0xc000553fe0 pc=0x46e2c1 goroutine 2 [force gc (idle), 3 minutes]: runtime.gopark(0x2206a24be7f3a?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00006afa8 sp=0xc00006af88 pc=0x43e7ee runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:404 runtime.forcegchelper() /usr/local/go/src/runtime/proc.go:322 +0xb3 fp=0xc00006afe0 sp=0xc00006afa8 pc=0x43e673 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006afe8 sp=0xc00006afe0 pc=0x46e2c1 created by runtime.init.6 in goroutine 1 /usr/local/go/src/runtime/proc.go:310 +0x1a goroutine 3 [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00006b778 sp=0xc00006b758 pc=0x43e7ee runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:404 runtime.bgsweep(0x0?) /usr/local/go/src/runtime/mgcsweep.go:321 +0xdf fp=0xc00006b7c8 sp=0xc00006b778 pc=0x42a73f runtime.gcenable.func1() /usr/local/go/src/runtime/mgc.go:200 +0x25 fp=0xc00006b7e0 sp=0xc00006b7c8 pc=0x41f865 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006b7e8 sp=0xc00006b7e0 pc=0x46e2c1 created by runtime.gcenable in goroutine 1 /usr/local/go/src/runtime/mgc.go:200 +0x66 goroutine 4 [GC scavenge wait]: runtime.gopark(0x1c1755?, 0x738fc7?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00006bf70 sp=0xc00006bf50 pc=0x43e7ee runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:404 runtime.(*scavengerState).park(0x10ad6b80) /usr/local/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc00006bfa0 sp=0xc00006bf70 pc=0x427f69 runtime.bgscavenge(0x0?) /usr/local/go/src/runtime/mgcscavenge.go:658 +0x59 fp=0xc00006bfc8 sp=0xc00006bfa0 pc=0x428519 runtime.gcenable.func2() /usr/local/go/src/runtime/mgc.go:201 +0x25 fp=0xc00006bfe0 sp=0xc00006bfc8 pc=0x41f805 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006bfe8 sp=0xc00006bfe0 pc=0x46e2c1 created by runtime.gcenable in goroutine 1 /usr/local/go/src/runtime/mgc.go:201 +0xa5 goroutine 5 [finalizer wait, 9 minutes]: runtime.gopark(0xacb5a0?, 0x10043f901?, 0x0?, 0x0?, 0x4469a5?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00006a628 sp=0xc00006a608 pc=0x43e7ee runtime.runfinq() /usr/local/go/src/runtime/mfinal.go:193 +0x107 fp=0xc00006a7e0 sp=0xc00006a628 pc=0x41e8e7 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006a7e8 sp=0xc00006a7e0 pc=0x46e2c1 created by runtime.createfing in goroutine 1 /usr/local/go/src/runtime/mfinal.go:163 +0x3d goroutine 6 [select, 9 minutes, locked to thread]: runtime.gopark(0xc00006c7a8?, 0x2?, 0x89?, 0xea?, 0xc00006c7a4?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00006c638 sp=0xc00006c618 pc=0x43e7ee runtime.selectgo(0xc00006c7a8, 0xc00006c7a0, 0x0?, 0x0, 0x0?, 0x1) /usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc00006c758 sp=0xc00006c638 pc=0x44e325 runtime.ensureSigM.func1() /usr/local/go/src/runtime/signal_unix.go:1014 +0x19f fp=0xc00006c7e0 sp=0xc00006c758 pc=0x46535f runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006c7e8 sp=0xc00006c7e0 pc=0x46e2c1 created by runtime.ensureSigM in goroutine 1 /usr/local/go/src/runtime/signal_unix.go:997 +0xc8 goroutine 18 [syscall, 9 minutes]: runtime.notetsleepg(0x0?, 0x0?) /usr/local/go/src/runtime/lock_futex.go:236 +0x29 fp=0xc0000667a0 sp=0xc000066768 pc=0x411349 os/signal.signal_recv() /usr/local/go/src/runtime/sigqueue.go:152 +0x29 fp=0xc0000667c0 sp=0xc0000667a0 pc=0x46ac89 os/signal.loop() /usr/local/go/src/os/signal/signal_unix.go:23 +0x13 fp=0xc0000667e0 sp=0xc0000667c0 pc=0x6f4513 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000667e8 sp=0xc0000667e0 pc=0x46e2c1 created by os/signal.Notify.func1.1 in goroutine 1 /usr/local/go/src/os/signal/signal.go:151 +0x1f goroutine 19 [chan receive, 9 minutes]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000066f18 sp=0xc000066ef8 pc=0x43e7ee runtime.chanrecv(0xc00011fec0, 0x0, 0x1) /usr/local/go/src/runtime/chan.go:583 +0x3cd fp=0xc000066f90 sp=0xc000066f18 pc=0x40beed runtime.chanrecv1(0x0?, 0x0?) /usr/local/go/src/runtime/chan.go:442 +0x12 fp=0xc000066fb8 sp=0xc000066f90 pc=0x40baf2 github.com/jmorganca/ollama/server.Serve.func1() /go/src/github.com/jmorganca/ollama/server/routes.go:959 +0x25 fp=0xc000066fe0 sp=0xc000066fb8 pc=0x99c085 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000066fe8 sp=0xc000066fe0 pc=0x46e2c1 created by github.com/jmorganca/ollama/server.Serve in goroutine 1 /go/src/github.com/jmorganca/ollama/server/routes.go:958 +0x3f6 goroutine 20 [GC worker (idle)]: runtime.gopark(0x22084eccebe64?, 0x3?, 0x78?, 0x89?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000067750 sp=0xc000067730 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0000677e0 sp=0xc000067750 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000677e8 sp=0xc0000677e0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 34 [GC worker (idle)]: runtime.gopark(0x22084eccebb7f?, 0x3?, 0xdd?, 0x39?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000586750 sp=0xc000586730 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0005867e0 sp=0xc000586750 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0005867e8 sp=0xc0005867e0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 21 [GC worker (idle)]: runtime.gopark(0x22084ea63bb8a?, 0x3?, 0xb4?, 0xe0?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000067f50 sp=0xc000067f30 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000067fe0 sp=0xc000067f50 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000067fe8 sp=0xc000067fe0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 35 [GC worker (idle)]: runtime.gopark(0x22084e9a41e0b?, 0x1?, 0xa9?, 0x3e?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000586f50 sp=0xc000586f30 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000586fe0 sp=0xc000586f50 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000586fe8 sp=0xc000586fe0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 36 [GC worker (idle)]: runtime.gopark(0x22084eccebcde?, 0x3?, 0xcf?, 0x41?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000587750 sp=0xc000587730 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0005877e0 sp=0xc000587750 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0005877e8 sp=0xc0005877e0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 37 [GC worker (idle)]: runtime.gopark(0x22084e8276c51?, 0x3?, 0xe8?, 0x38?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000587f50 sp=0xc000587f30 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000587fe0 sp=0xc000587f50 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000587fe8 sp=0xc000587fe0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 22 [GC worker (idle)]: runtime.gopark(0x10b08520?, 0x1?, 0x5c?, 0xdf?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000068750 sp=0xc000068730 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0000687e0 sp=0xc000068750 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000687e8 sp=0xc0000687e0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 50 [GC worker (idle)]: runtime.gopark(0x22084eccebc51?, 0x3?, 0x31?, 0x66?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000582750 sp=0xc000582730 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0005827e0 sp=0xc000582750 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0005827e8 sp=0xc0005827e0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 23 [GC worker (idle)]: runtime.gopark(0x22084eccacdc3?, 0x1?, 0x80?, 0xae?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000068f50 sp=0xc000068f30 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000068fe0 sp=0xc000068f50 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000068fe8 sp=0xc000068fe0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 51 [GC worker (idle)]: runtime.gopark(0x22084ea63be0b?, 0x3?, 0xfd?, 0xee?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000582f50 sp=0xc000582f30 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000582fe0 sp=0xc000582f50 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000582fe8 sp=0xc000582fe0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 24 [GC worker (idle)]: runtime.gopark(0x22084ea63bb30?, 0x3?, 0xb2?, 0x54?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000069750 sp=0xc000069730 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0000697e0 sp=0xc000069750 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000697e8 sp=0xc0000697e0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 52 [GC worker (idle)]: runtime.gopark(0x22084ea63bbf8?, 0x1?, 0x3c?, 0x8a?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000583750 sp=0xc000583730 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0005837e0 sp=0xc000583750 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0005837e8 sp=0xc0005837e0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 38 [GC worker (idle)]: runtime.gopark(0x22084eccec14a?, 0x1?, 0x57?, 0x9b?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000588750 sp=0xc000588730 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0005887e0 sp=0xc000588750 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0005887e8 sp=0xc0005887e0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 39 [GC worker (idle)]: runtime.gopark(0x22084eccaccdd?, 0x3?, 0x48?, 0x12?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000588f50 sp=0xc000588f30 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000588fe0 sp=0xc000588f50 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000588fe8 sp=0xc000588fe0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 40 [GC worker (idle)]: runtime.gopark(0x22084eccacdeb?, 0x1?, 0x93?, 0xaf?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000589750 sp=0xc000589730 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0005897e0 sp=0xc000589750 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0005897e8 sp=0xc0005897e0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 41 [GC worker (idle)]: runtime.gopark(0x22084ea63bc48?, 0x3?, 0xdb?, 0x71?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000589f50 sp=0xc000589f30 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000589fe0 sp=0xc000589f50 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000589fe8 sp=0xc000589fe0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 25 [IO wait, 1 minutes]: runtime.gopark(0x75?, 0xb?, 0x0?, 0x0?, 0xc?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0005518f8 sp=0xc0005518d8 pc=0x43e7ee runtime.netpollblock(0x47f078?, 0x4092a6?, 0x0?) /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc000551930 sp=0xc0005518f8 pc=0x437277 internal/poll.runtime_pollWait(0x7f4276c11d88, 0x72) /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc000551950 sp=0xc000551930 pc=0x468a05 internal/poll.(*pollDesc).wait(0xc0003f4100?, 0xc0005f6000?, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000551978 sp=0xc000551950 pc=0x4efc87 internal/poll.(*pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc0003f4100, {0xc0005f6000, 0x1000, 0x1000}) /usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc000551a10 sp=0xc000551978 pc=0x4f0f7a net.(*netFD).Read(0xc0003f4100, {0xc0005f6000?, 0x4f0145?, 0x0?}) /usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc000551a58 sp=0xc000551a10 pc=0x569d25 net.(*conn).Read(0xc00053a038, {0xc0005f6000?, 0x0?, 0xc000416128?}) /usr/local/go/src/net/net.go:179 +0x45 fp=0xc000551aa0 sp=0xc000551a58 pc=0x577fc5 net.(*TCPConn).Read(0xc000416120?, {0xc0005f6000?, 0x0?, 0xc000551ac0?}) <autogenerated>:1 +0x25 fp=0xc000551ad0 sp=0xc000551aa0 pc=0x589ec5 net/http.(*connReader).Read(0xc000416120, {0xc0005f6000, 0x1000, 0x1000}) /usr/local/go/src/net/http/server.go:791 +0x14b fp=0xc000551b20 sp=0xc000551ad0 pc=0x6c4eeb bufio.(*Reader).fill(0xc00050e060) /usr/local/go/src/bufio/bufio.go:113 +0x103 fp=0xc000551b58 sp=0xc000551b20 pc=0x654b23 bufio.(*Reader).Peek(0xc00050e060, 0x4) /usr/local/go/src/bufio/bufio.go:151 +0x53 fp=0xc000551b78 sp=0xc000551b58 pc=0x654c53 net/http.(*conn).serve(0xc0000d2360, {0x106c2bc8, 0xc000304c90}) /usr/local/go/src/net/http/server.go:2044 +0x75c fp=0xc000551fb8 sp=0xc000551b78 pc=0x6cad9c net/http.(*Server).Serve.func3() /usr/local/go/src/net/http/server.go:3086 +0x28 fp=0xc000551fe0 sp=0xc000551fb8 pc=0x6cf568 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000551fe8 sp=0xc000551fe0 pc=0x46e2c1 created by net/http.(*Server).Serve in goroutine 1 /usr/local/go/src/net/http/server.go:3086 +0x5cb goroutine 9 [IO wait, 1 minutes]: runtime.gopark(0x0?, 0xb?, 0x0?, 0x0?, 0xe?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000316da0 sp=0xc000316d80 pc=0x43e7ee runtime.netpollblock(0x47f078?, 0x4092a6?, 0x0?) /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc000316dd8 sp=0xc000316da0 pc=0x437277 internal/poll.runtime_pollWait(0x7f4276c11b98, 0x72) /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc000316df8 sp=0xc000316dd8 pc=0x468a05 internal/poll.(*pollDesc).wait(0xc0003a4000?, 0xc0000aa5e1?, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000316e20 sp=0xc000316df8 pc=0x4efc87 internal/poll.(*pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc0003a4000, {0xc0000aa5e1, 0x1, 0x1}) /usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc000316eb8 sp=0xc000316e20 pc=0x4f0f7a net.(*netFD).Read(0xc0003a4000, {0xc0000aa5e1?, 0x0?, 0x0?}) /usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc000316f00 sp=0xc000316eb8 pc=0x569d25 net.(*conn).Read(0xc000302000, {0xc0000aa5e1?, 0x0?, 0x0?}) /usr/local/go/src/net/net.go:179 +0x45 fp=0xc000316f48 sp=0xc000316f00 pc=0x577fc5 net.(*TCPConn).Read(0x0?, {0xc0000aa5e1?, 0x0?, 0x0?}) <autogenerated>:1 +0x25 fp=0xc000316f78 sp=0xc000316f48 pc=0x589ec5 net/http.(*connReader).backgroundRead(0xc0000aa5d0) /usr/local/go/src/net/http/server.go:683 +0x37 fp=0xc000316fc8 sp=0xc000316f78 pc=0x6c4ab7 net/http.(*connReader).startBackgroundRead.func2() /usr/local/go/src/net/http/server.go:679 +0x25 fp=0xc000316fe0 sp=0xc000316fc8 pc=0x6c49e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000316fe8 sp=0xc000316fe0 pc=0x46e2c1 created by net/http.(*connReader).startBackgroundRead in goroutine 53 /usr/local/go/src/net/http/server.go:679 +0xba goroutine 8 [IO wait, 1 minutes]: runtime.gopark(0x7?, 0xb?, 0x0?, 0x0?, 0xd?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0002838f8 sp=0xc0002838d8 pc=0x43e7ee runtime.netpollblock(0x47f078?, 0x4092a6?, 0x0?) /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc000283930 sp=0xc0002838f8 pc=0x437277 internal/poll.runtime_pollWait(0x7f4276c11c90, 0x72) /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc000283950 sp=0xc000283930 pc=0x468a05 internal/poll.(*pollDesc).wait(0xc00003a080?, 0xc00012a000?, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000283978 sp=0xc000283950 pc=0x4efc87 internal/poll.(*pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc00003a080, {0xc00012a000, 0x1000, 0x1000}) /usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc000283a10 sp=0xc000283978 pc=0x4f0f7a net.(*netFD).Read(0xc00003a080, {0xc00012a000?, 0x4f0145?, 0x0?}) /usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc000283a58 sp=0xc000283a10 pc=0x569d25 net.(*conn).Read(0xc00006e038, {0xc00012a000?, 0x0?, 0xc0003041e8?}) /usr/local/go/src/net/net.go:179 +0x45 fp=0xc000283aa0 sp=0xc000283a58 pc=0x577fc5 net.(*TCPConn).Read(0xc0003041e0?, {0xc00012a000?, 0x0?, 0xc00028fac0?}) <autogenerated>:1 +0x25 fp=0xc000283ad0 sp=0xc000283aa0 pc=0x589ec5 net/http.(*connReader).Read(0xc0003041e0, {0xc00012a000, 0x1000, 0x1000}) /usr/local/go/src/net/http/server.go:791 +0x14b fp=0xc000283b20 sp=0xc000283ad0 pc=0x6c4eeb bufio.(*Reader).fill(0xc0005c4060) /usr/local/go/src/bufio/bufio.go:113 +0x103 fp=0xc000283b58 sp=0xc000283b20 pc=0x654b23 bufio.(*Reader).Peek(0xc0005c4060, 0x4) /usr/local/go/src/bufio/bufio.go:151 +0x53 fp=0xc000283b78 sp=0xc000283b58 pc=0x654c53 net/http.(*conn).serve(0xc0005f8120, {0x106c2bc8, 0xc000304c90}) /usr/local/go/src/net/http/server.go:2044 +0x75c fp=0xc000283fb8 sp=0xc000283b78 pc=0x6cad9c net/http.(*Server).Serve.func3() /usr/local/go/src/net/http/server.go:3086 +0x28 fp=0xc000283fe0 sp=0xc000283fb8 pc=0x6cf568 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000283fe8 sp=0xc000283fe0 pc=0x46e2c1 created by net/http.(*Server).Serve in goroutine 1 /usr/local/go/src/net/http/server.go:3086 +0x5cb rax 0x0 rbx 0x7f41f6ace70f rcx 0x7f42759db387 rdx 0x6 rdi 0x1 rsi 0x18 rbp 0x2b4c rsp 0x7f42057f9348 r8 0x0 r9 0x1 r10 0x8 r11 0x202 r12 0x7f4275d6d868 r13 0x1 r14 0x7f41f6ace4a4 r15 0x7f41f6ace72a rip 0x7f42759db387 rflags 0x202 cs 0x33 fs 0x0 gs 0x0 ``` ``` $ rocminfo | grep 'Name' Name: AMD Ryzen 9 6900HX with Radeon Graphics Marketing Name: AMD Ryzen 9 6900HX with Radeon Graphics Vendor Name: CPU Name: gfx1035 Marketing Name: AMD Radeon Graphics Vendor Name: AMD Name: amdgcn-amd-amdhsa--gfx1035 ``` Any ideas?
Author
Owner

@kescherCode commented on GitHub (Feb 1, 2024):

@ignacio82 if the second command is within the container, you did not properly run it. Try running rocminfo with HSA_OVERRIDE_GFX_VERSION set to 10.3.0. So prepend the rocminfo with HSA_OVERRIDE_GFX_VERSION=10.3.0, or just make sure to run the container with that env var set.

<!-- gh-comment-id:1920682158 --> @kescherCode commented on GitHub (Feb 1, 2024): @ignacio82 if the second command is within the container, you did not properly run it. Try running rocminfo with HSA_OVERRIDE_GFX_VERSION set to 10.3.0. So prepend the `rocminfo` with HSA_OVERRIDE_GFX_VERSION=10.3.0, or just make sure to run the container with that env var set.
Author
Owner

@ignacio82 commented on GitHub (Feb 2, 2024):

@kescherCode what I have in my previous post is outside the container, jut to show the graphic card that I have. This is what I get inside the container:

[root@mini-server /]# rocminfo | grep 'Name'  
  Name:                    AMD Ryzen 9 6900HX with Radeon Graphics
  Marketing Name:          AMD Ryzen 9 6900HX with Radeon Graphics
  Vendor Name:             CPU                                
  Name:                    gfx1030                            
  Marketing Name:          AMD Radeon Graphics                
  Vendor Name:             AMD                                
      Name:                    amdgcn-amd-amdhsa--gfx1030  

Does it matter that inside the container it says gfx1030 and outside gfx1035 ?? My docker-compose has HSA_OVERRIDE_GFX_VERSION=10.3.0 set as an environment variable

<!-- gh-comment-id:1923817206 --> @ignacio82 commented on GitHub (Feb 2, 2024): @kescherCode what I have in my previous post is outside the container, jut to show the graphic card that I have. This is what I get inside the container: ``` [root@mini-server /]# rocminfo | grep 'Name' Name: AMD Ryzen 9 6900HX with Radeon Graphics Marketing Name: AMD Ryzen 9 6900HX with Radeon Graphics Vendor Name: CPU Name: gfx1030 Marketing Name: AMD Radeon Graphics Vendor Name: AMD Name: amdgcn-amd-amdhsa--gfx1030 ``` Does it matter that inside the container it says gfx1030 and outside gfx1035 ?? My docker-compose has `HSA_OVERRIDE_GFX_VERSION=10.3.0` set as an environment variable
Author
Owner

@kescherCode commented on GitHub (Feb 2, 2024):

@ignacio82 ah, now I see. gfx1035 is actually a mobile GPU, which has no VRAM on its own.

<!-- gh-comment-id:1923824422 --> @kescherCode commented on GitHub (Feb 2, 2024): @ignacio82 ah, now I see. gfx1035 is actually a mobile GPU, which has no VRAM on its own.
Author
Owner

@ignacio82 commented on GitHub (Feb 2, 2024):

@kescherCode that means I cannot get any GPU acceleration?

<!-- gh-comment-id:1923998401 --> @ignacio82 commented on GitHub (Feb 2, 2024): @kescherCode that means I cannot get any GPU acceleration?
Author
Owner

@mkesper commented on GitHub (Feb 2, 2024):

@kescherCode In llama.cpp you can use unified memory with integrated GPUs, would that be possible here, too? Else it makes no sense on iGPUs.

<!-- gh-comment-id:1924022052 --> @mkesper commented on GitHub (Feb 2, 2024): @kescherCode In llama.cpp you can use unified memory with integrated GPUs, would that be possible here, too? Else it makes no sense on iGPUs.
Author
Owner

@dhiltgen commented on GitHub (Feb 2, 2024):

Integrated GPUs from AMD are not currently supported.

<!-- gh-comment-id:1924726464 --> @dhiltgen commented on GitHub (Feb 2, 2024): Integrated GPUs from AMD are not currently supported.
Author
Owner

@Th3Rom3 commented on GitHub (Feb 3, 2024):

Just wanted to leave a quick note that ollama runs great with a current native release installation with an AMD RX6800 GPU on Fedora 39.

Thanks for implementing it into the main branch. I used to build my own version with ROCm before but now it runs with less effort needed using the mainline release.

<!-- gh-comment-id:1925219041 --> @Th3Rom3 commented on GitHub (Feb 3, 2024): Just wanted to leave a quick note that ollama runs great with a current native release installation with an AMD RX6800 GPU on Fedora 39. Thanks for implementing it into the main branch. I used to build my own version with ROCm before but now it runs with less effort needed using the mainline release.
Author
Owner

@Knallli commented on GitHub (Feb 4, 2024):

Integrated GPUs from AMD are not currently supported.

@dhiltgen why is that if I may ask?

<!-- gh-comment-id:1925902485 --> @Knallli commented on GitHub (Feb 4, 2024): > Integrated GPUs from AMD are not currently supported. @dhiltgen why is that if I may ask?
Author
Owner

@dhiltgen commented on GitHub (Feb 5, 2024):

@Knallli with our current configuration for llama.cpp, the resulting builds crash on iGPUs. At this point we're focused on enabling discrete GPUs first, and once that's in good shape, we can evaluate if supporting iGPUs is possible in the future.

<!-- gh-comment-id:1926053333 --> @dhiltgen commented on GitHub (Feb 5, 2024): @Knallli with our current configuration for llama.cpp, the resulting builds crash on iGPUs. At this point we're focused on enabling discrete GPUs first, and once that's in good shape, we can evaluate if supporting iGPUs is possible in the future.
Author
Owner

@mkesper commented on GitHub (Feb 5, 2024):

Which is completely reasonable as AMD themselves do not support ROCm on
iGPUs sadly:
https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/amdgpu-install.html#installation-via-amdgpu-installer

Am 05.02.24 um 02:23 schrieb Daniel Hiltgen:

@Knallli https://github.com/Knallli with our current configuration for
llama.cpp, the resulting builds crash on iGPUs. At this point we're
focused on enabling discrete GPUs first, and once that's in good shape,
we can evaluate if supporting iGPUs is possible in the future.


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/738#issuecomment-1926053333,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAXL6BX3XVSELHI6FYFZSOTYSAYCFAVCNFSM6AAAAAA5XXUH42VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRWGA2TGMZTGM.
You are receiving this because you commented.Message ID:
@.***>

<!-- gh-comment-id:1927054471 --> @mkesper commented on GitHub (Feb 5, 2024): Which is completely reasonable as AMD themselves do not support ROCm on iGPUs sadly: https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/amdgpu-install.html#installation-via-amdgpu-installer Am 05.02.24 um 02:23 schrieb Daniel Hiltgen: > @Knallli <https://github.com/Knallli> with our current configuration for > llama.cpp, the resulting builds crash on iGPUs. At this point we're > focused on enabling discrete GPUs first, and once that's in good shape, > we can evaluate if supporting iGPUs is possible in the future. > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/738#issuecomment-1926053333>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAXL6BX3XVSELHI6FYFZSOTYSAYCFAVCNFSM6AAAAAA5XXUH42VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRWGA2TGMZTGM>. > You are receiving this because you commented.Message ID: > ***@***.***> >
Author
Owner

@tolasing commented on GitHub (Feb 5, 2024):

i am unable to build from source i keep getting this error :

CMake Error at /usr/share/cmake-3.22/Modules/CMakeTestCXXCompiler.cmake:62 (message):
  The C++ compiler

    "/opt/rocm/llvm/bin/clang++"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: /home/tola-dev/ollama/llm/llama.cpp/build/linux/x86_64/rocm_v6/CMakeFiles/CMakeTmp
    
    Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_b2a36/fast && /usr/bin/gmake  -f CMakeFiles/cmTC_b2a36.dir/build.make CMakeFiles/cmTC_b2a36.dir/build
    gmake[1]: Entering directory '/home/tola-dev/ollama/llm/llama.cpp/build/linux/x86_64/rocm_v6/CMakeFiles/CMakeTmp'
    Building CXX object CMakeFiles/cmTC_b2a36.dir/testCXXCompiler.cxx.o
    /opt/rocm/llvm/bin/clang++   -fPIE -MD -MT CMakeFiles/cmTC_b2a36.dir/testCXXCompiler.cxx.o -MF CMakeFiles/cmTC_b2a36.dir/testCXXCompiler.cxx.o.d -o CMakeFiles/cmTC_b2a36.dir/testCXXCompiler.cxx.o -c /home/tola-dev/ollama/llm/llama.cpp/build/linux/x86_64/rocm_v6/CMakeFiles/CMakeTmp/testCXXCompiler.cxx
    Linking CXX executable cmTC_b2a36
    /usr/bin/cmake -E cmake_link_script CMakeFiles/cmTC_b2a36.dir/link.txt --verbose=1
    /opt/rocm/llvm/bin/clang++ CMakeFiles/cmTC_b2a36.dir/testCXXCompiler.cxx.o -o cmTC_b2a36 
    ld.lld: error: unable to find library -lstdc++
    clang++: error: linker command failed with exit code 1 (use -v to see invocation)
    gmake[1]: *** [CMakeFiles/cmTC_b2a36.dir/build.make:100: cmTC_b2a36] Error 1
    gmake[1]: Leaving directory '/home/tola-dev/ollama/llm/llama.cpp/build/linux/x86_64/rocm_v6/CMakeFiles/CMakeTmp'
    gmake: *** [Makefile:127: cmTC_b2a36/fast] Error 2
    
    

  

  CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
  CMakeLists.txt:2 (project)



<!-- gh-comment-id:1927904817 --> @tolasing commented on GitHub (Feb 5, 2024): i am unable to build from source i keep getting this error : ``` CMake Error at /usr/share/cmake-3.22/Modules/CMakeTestCXXCompiler.cmake:62 (message): The C++ compiler "/opt/rocm/llvm/bin/clang++" is not able to compile a simple test program. It fails with the following output: Change Dir: /home/tola-dev/ollama/llm/llama.cpp/build/linux/x86_64/rocm_v6/CMakeFiles/CMakeTmp Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_b2a36/fast && /usr/bin/gmake -f CMakeFiles/cmTC_b2a36.dir/build.make CMakeFiles/cmTC_b2a36.dir/build gmake[1]: Entering directory '/home/tola-dev/ollama/llm/llama.cpp/build/linux/x86_64/rocm_v6/CMakeFiles/CMakeTmp' Building CXX object CMakeFiles/cmTC_b2a36.dir/testCXXCompiler.cxx.o /opt/rocm/llvm/bin/clang++ -fPIE -MD -MT CMakeFiles/cmTC_b2a36.dir/testCXXCompiler.cxx.o -MF CMakeFiles/cmTC_b2a36.dir/testCXXCompiler.cxx.o.d -o CMakeFiles/cmTC_b2a36.dir/testCXXCompiler.cxx.o -c /home/tola-dev/ollama/llm/llama.cpp/build/linux/x86_64/rocm_v6/CMakeFiles/CMakeTmp/testCXXCompiler.cxx Linking CXX executable cmTC_b2a36 /usr/bin/cmake -E cmake_link_script CMakeFiles/cmTC_b2a36.dir/link.txt --verbose=1 /opt/rocm/llvm/bin/clang++ CMakeFiles/cmTC_b2a36.dir/testCXXCompiler.cxx.o -o cmTC_b2a36 ld.lld: error: unable to find library -lstdc++ clang++: error: linker command failed with exit code 1 (use -v to see invocation) gmake[1]: *** [CMakeFiles/cmTC_b2a36.dir/build.make:100: cmTC_b2a36] Error 1 gmake[1]: Leaving directory '/home/tola-dev/ollama/llm/llama.cpp/build/linux/x86_64/rocm_v6/CMakeFiles/CMakeTmp' gmake: *** [Makefile:127: cmTC_b2a36/fast] Error 2 CMake will not be able to correctly generate this project. Call Stack (most recent call first): CMakeLists.txt:2 (project) ```
Author
Owner

@kescherCode commented on GitHub (Feb 5, 2024):

No ROCm Docker image seems to have been published for 0.1.23.

<!-- gh-comment-id:1928155027 --> @kescherCode commented on GitHub (Feb 5, 2024): No ROCm Docker image seems to have been published for 0.1.23.
Author
Owner

@dhiltgen commented on GitHub (Feb 5, 2024):

Sorry about that. The image has been pushed.

<!-- gh-comment-id:1928478583 --> @dhiltgen commented on GitHub (Feb 5, 2024): Sorry about that. The image has been pushed.
Author
Owner

@Th3Rom3 commented on GitHub (Feb 8, 2024):

After successfully getting ollama running on my RX6800 with ROCm bare metal I struggle to get the same running within Docker. It might be just my ignorance so let me know if this derails this issue too much or is beyond its scope.

I can run ollama within the docker container using cpu inference just fine but I fail to get my gpu recognized using the 0.1.23-rocm or 0.1.22-rocm docker images.

ROCm works fine on the host system but I fail to get it to run within a minimum viable container setup

version: "3.7"

services:
  ollama:
    container_name: ollama-minimal
    image: ollama/ollama:0.1.23-rocm
    volumes:
      - ./ollama:/root/.ollama
    devices:
      - /dev/dri
      - /dev/kfd
    restart: unless-stopped
    group_add:
      - video
    privileged: true
time=2024-02-08T16:23:31.760Z level=INFO source=images.go:860 msg="total blobs: 12"
time=2024-02-08T16:23:31.762Z level=INFO source=images.go:867 msg="total unused blobs removed: 0"
time=2024-02-08T16:23:31.763Z level=INFO source=routes.go:995 msg="Listening on [::]:11434 (version 0.1.23)"
time=2024-02-08T16:23:31.763Z level=INFO source=payload_common.go:106 msg="Extracting dynamic libraries..."
time=2024-02-08T16:23:33.813Z level=INFO source=payload_common.go:145 msg="Dynamic LLM libraries [cpu_avx cuda_v11 cpu_avx2 rocm_v6 rocm_v5 cpu]"
time=2024-02-08T16:23:33.813Z level=INFO source=gpu.go:94 msg="Detecting GPU type"
time=2024-02-08T16:23:33.813Z level=INFO source=gpu.go:242 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-02-08T16:23:33.815Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: []"
time=2024-02-08T16:23:33.815Z level=INFO source=gpu.go:242 msg="Searching for GPU management library librocm_smi64.so"
time=2024-02-08T16:23:33.815Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0.50701 /opt/rocm-5.7.1/lib/librocm_smi64.so.5.0.50701]"
Failed to open drm root directory /sys/class/drm.: No such file or directory
time=2024-02-08T16:23:33.816Z level=INFO source=gpu.go:317 msg="Unable to load ROCm management library /opt/rocm/lib/librocm_smi64.so.5.0.50701: rocm vram init failure: 8"
Failed to open drm root directory /sys/class/drm.: No such file or directory
time=2024-02-08T16:23:33.816Z level=INFO source=gpu.go:317 msg="Unable to load ROCm management library /opt/rocm-5.7.1/lib/librocm_smi64.so.5.0.50701: rocm vram init failure: 8"
time=2024-02-08T16:23:33.816Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-02-08T16:23:33.816Z level=INFO source=routes.go:1018 msg="no GPU detected"
time=2024-02-08T16:31:18.264Z level=INFO source=images.go:860 msg="total blobs: 12"
time=2024-02-08T16:31:18.266Z level=INFO source=images.go:867 msg="total unused blobs removed: 0"
time=2024-02-08T16:31:18.267Z level=INFO source=routes.go:995 msg="Listening on [::]:11434 (version 0.1.23)"
docker exec ollama-minimal rocminfo

ROCk module is NOT loaded, possibly no GPU devices

I have spun up a fresh Ubuntu 22.04.3 installation (logs shown above are on this system) to exclude issues with my niche forked Fedora main system. But with the exact same result.

I have tried the ROCm setup with the following two parameter sets:

--usecase=graphics,rocm
--usecase=graphics,rocm --no-dkms

I have followed most of this issue thread but maybe I am still missing something obvious, any pointers are appreciated.

Addendum: In the meantime I have tried different kernels under Ubuntu (6.2 and 6.3) as referenced by older versions of the ROCm documentation as requirements for amdgpu-dkms. No changes.

<!-- gh-comment-id:1934598734 --> @Th3Rom3 commented on GitHub (Feb 8, 2024): After successfully getting ollama running on my `RX6800` with ROCm bare metal I struggle to get the same running within Docker. It might be just my ignorance so let me know if this derails this issue too much or is beyond its scope. I can run ollama within the docker container using cpu inference just fine but I fail to get my gpu recognized using the `0.1.23-rocm` or `0.1.22-rocm` docker images. ROCm works fine on the host system but I fail to get it to run within a minimum viable container setup ``` version: "3.7" services: ollama: container_name: ollama-minimal image: ollama/ollama:0.1.23-rocm volumes: - ./ollama:/root/.ollama devices: - /dev/dri - /dev/kfd restart: unless-stopped group_add: - video privileged: true ``` ``` time=2024-02-08T16:23:31.760Z level=INFO source=images.go:860 msg="total blobs: 12" time=2024-02-08T16:23:31.762Z level=INFO source=images.go:867 msg="total unused blobs removed: 0" time=2024-02-08T16:23:31.763Z level=INFO source=routes.go:995 msg="Listening on [::]:11434 (version 0.1.23)" time=2024-02-08T16:23:31.763Z level=INFO source=payload_common.go:106 msg="Extracting dynamic libraries..." time=2024-02-08T16:23:33.813Z level=INFO source=payload_common.go:145 msg="Dynamic LLM libraries [cpu_avx cuda_v11 cpu_avx2 rocm_v6 rocm_v5 cpu]" time=2024-02-08T16:23:33.813Z level=INFO source=gpu.go:94 msg="Detecting GPU type" time=2024-02-08T16:23:33.813Z level=INFO source=gpu.go:242 msg="Searching for GPU management library libnvidia-ml.so" time=2024-02-08T16:23:33.815Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: []" time=2024-02-08T16:23:33.815Z level=INFO source=gpu.go:242 msg="Searching for GPU management library librocm_smi64.so" time=2024-02-08T16:23:33.815Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0.50701 /opt/rocm-5.7.1/lib/librocm_smi64.so.5.0.50701]" Failed to open drm root directory /sys/class/drm.: No such file or directory time=2024-02-08T16:23:33.816Z level=INFO source=gpu.go:317 msg="Unable to load ROCm management library /opt/rocm/lib/librocm_smi64.so.5.0.50701: rocm vram init failure: 8" Failed to open drm root directory /sys/class/drm.: No such file or directory time=2024-02-08T16:23:33.816Z level=INFO source=gpu.go:317 msg="Unable to load ROCm management library /opt/rocm-5.7.1/lib/librocm_smi64.so.5.0.50701: rocm vram init failure: 8" time=2024-02-08T16:23:33.816Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-02-08T16:23:33.816Z level=INFO source=routes.go:1018 msg="no GPU detected" time=2024-02-08T16:31:18.264Z level=INFO source=images.go:860 msg="total blobs: 12" time=2024-02-08T16:31:18.266Z level=INFO source=images.go:867 msg="total unused blobs removed: 0" time=2024-02-08T16:31:18.267Z level=INFO source=routes.go:995 msg="Listening on [::]:11434 (version 0.1.23)" ``` ``` docker exec ollama-minimal rocminfo ROCk module is NOT loaded, possibly no GPU devices ``` I have spun up a fresh Ubuntu 22.04.3 installation (logs shown above are on this system) to exclude issues with my niche forked Fedora main system. But with the exact same result. I have tried the ROCm setup with the following two parameter sets: `--usecase=graphics,rocm` `--usecase=graphics,rocm --no-dkms` I have followed most of this issue thread but maybe I am still missing something obvious, any pointers are appreciated. ### Addendum: In the meantime I have tried different kernels under Ubuntu (6.2 and 6.3) as referenced by older versions of the ROCm documentation as requirements for amdgpu-dkms. No changes.
Author
Owner

@chiragkrishna commented on GitHub (Feb 9, 2024):

ive installed CLBlast and ROCm and set the environment to HSA_OVERRIDE_GFX_VERSION=9.0.6
AMDGPU_TARGET=gfx906
and built ollama. when i run ollama i can see memory usage in radeontop. but there is no reply, it just keeps spinning. here are the logs.log. it was working before, but now it doesn't.

<!-- gh-comment-id:1936118508 --> @chiragkrishna commented on GitHub (Feb 9, 2024): ive installed [CLBlast](https://github.com/CNugteren/CLBlast/blob/master/doc/installation.md) and [ROCm](https://rocm.docs.amd.com/en/latest/deploy/linux/quick_start.html) and set the environment to HSA_OVERRIDE_GFX_VERSION=9.0.6 AMDGPU_TARGET=gfx906 and built ollama. when i run ollama i can see memory usage in radeontop. but there is no reply, it just keeps spinning. here are the [logs.log](https://github.com/ollama/ollama/files/14223419/logs.log). it was working before, but now it doesn't.
Author
Owner

@ghost commented on GitHub (Feb 9, 2024):

Just wanted to leave a quick note that ollama runs great with a current native release installation with an AMD RX6800 GPU on Fedora 39.

Thanks for implementing it into the main branch. I used to build my own version with ROCm before but now it runs with less effort needed using the mainline release.

How did you achieve that? I'm also on Fedora, but my offboard GPU is never used with Ollama.

<!-- gh-comment-id:1936174508 --> @ghost commented on GitHub (Feb 9, 2024): > Just wanted to leave a quick note that ollama runs great with a current native release installation with an AMD RX6800 GPU on Fedora 39. > > Thanks for implementing it into the main branch. I used to build my own version with ROCm before but now it runs with less effort needed using the mainline release. How did you achieve that? I'm also on Fedora, but my offboard GPU is never used with Ollama.
Author
Owner

@ghost commented on GitHub (Feb 9, 2024):

Just wanted to leave a quick note that ollama runs great with a current native release installation with an AMD RX6800 GPU on Fedora 39.
Thanks for implementing it into the main branch. I used to build my own version with ROCm before but now it runs with less effort needed using the mainline release.

How did you achieve that? I'm also on Fedora, but my offboard GPU is never used with Ollama.

When I try select gpus

Unable to find image 'ollama/ollama:latest' locally
latest: Pulling from ollama/ollama
57c139bbda7e: Pull complete
efa866b73628: Pull complete
a03f4e4cf912: Pull complete
Digest: sha256:3bc28f48a60ee34574dca0b0e310eff21e171b55d83fa06384bd83b97d9482b8
Status: Downloaded newer image for ollama/ollama:latest
fb68c6a21f168d3a0582cfc4b8891d80e42fb896c507e915a9c9a44a63c5e58a
docker: Error response from daemon: could not select device driver "" with capabilities: gpu.
(base) bash-5.2$

<!-- gh-comment-id:1936522362 --> @ghost commented on GitHub (Feb 9, 2024): > > Just wanted to leave a quick note that ollama runs great with a current native release installation with an AMD RX6800 GPU on Fedora 39. > > Thanks for implementing it into the main branch. I used to build my own version with ROCm before but now it runs with less effort needed using the mainline release. > > How did you achieve that? I'm also on Fedora, but my offboard GPU is never used with Ollama. When I try select gpus Unable to find image 'ollama/ollama:latest' locally latest: Pulling from ollama/ollama 57c139bbda7e: Pull complete efa866b73628: Pull complete a03f4e4cf912: Pull complete Digest: sha256:3bc28f48a60ee34574dca0b0e310eff21e171b55d83fa06384bd83b97d9482b8 Status: Downloaded newer image for ollama/ollama:latest fb68c6a21f168d3a0582cfc4b8891d80e42fb896c507e915a9c9a44a63c5e58a docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]. (base) bash-5.2$
Author
Owner

@Th3Rom3 commented on GitHub (Feb 9, 2024):

@CaioPrioridosSantos What GPU are you running? Looks like you are trying to run a docker setup? If you want to use an AMD card with docker you would need the separate ROCm enabled release: ollama/ollama:0.1.23-rocm.

As per documentation ROCm is not merged with ollama:latest due to the very large file size of the ROCm libraries that are included (and needed) in the ROCm release.

But as I've stated above so far I have not managed to get it to run within Docker myself. Only when installed on my bare metal machine using the bash script installation.

For reference this is the startup log of my bare metal/host system:

Feb 07 13:59:50 bravenewworld systemd[1]: Started ollama.service - Ollama Service.
Feb 07 13:59:50 bravenewworld ollama[1903]: 2024/02/07 13:59:50 images.go:857: INFO total blobs: 36
Feb 07 13:59:50 bravenewworld ollama[1903]: 2024/02/07 13:59:50 images.go:864: INFO total unused blobs removed: 0
Feb 07 13:59:50 bravenewworld ollama[1903]: 2024/02/07 13:59:50 routes.go:950: INFO Listening on 127.0.0.1:11434 (version 0.1.22)
Feb 07 13:59:50 bravenewworld ollama[1903]: 2024/02/07 13:59:50 payload_common.go:106: INFO Extracting dynamic libraries...
Feb 07 13:59:52 bravenewworld ollama[1903]: 2024/02/07 13:59:52 payload_common.go:145: INFO Dynamic LLM libraries [cpu_avx cpu_avx2 rocm_v5 rocm_v6 cpu cuda_v11]
Feb 07 13:59:52 bravenewworld ollama[1903]: 2024/02/07 13:59:52 gpu.go:94: INFO Detecting GPU type
Feb 07 13:59:52 bravenewworld ollama[1903]: 2024/02/07 13:59:52 gpu.go:236: INFO Searching for GPU management library libnvidia-ml.so
Feb 07 13:59:52 bravenewworld ollama[1903]: 2024/02/07 13:59:52 gpu.go:282: INFO Discovered GPU libraries: []
Feb 07 13:59:52 bravenewworld ollama[1903]: 2024/02/07 13:59:52 gpu.go:236: INFO Searching for GPU management library librocm_smi64.so
Feb 07 13:59:52 bravenewworld ollama[1903]: 2024/02/07 13:59:52 gpu.go:282: INFO Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.6.0.60002 /opt/rocm-6.0.2/lib/librocm_smi64.so.6.0.60002]
Feb 07 13:59:52 bravenewworld ollama[1903]: 2024/02/07 13:59:52 gpu.go:109: INFO Radeon GPU detected

It detects and uses my GPU while using the installed ROCm 6.0.2 libraries without fail.

<!-- gh-comment-id:1936545160 --> @Th3Rom3 commented on GitHub (Feb 9, 2024): @CaioPrioridosSantos What GPU are you running? Looks like you are trying to run a docker setup? If you want to use an AMD card with docker you would need the separate ROCm enabled release: ollama/ollama:0.1.23-rocm. As per documentation ROCm is not merged with ollama:latest due to the very large file size of the ROCm libraries that are included (and needed) in the ROCm release. But as I've stated above so far I have not managed to get it to run within Docker myself. Only when installed on my bare metal machine using the [bash script installation](https://github.com/ollama/ollama/blob/main/docs/linux.md). For reference this is the startup log of my bare metal/host system: ``` Feb 07 13:59:50 bravenewworld systemd[1]: Started ollama.service - Ollama Service. Feb 07 13:59:50 bravenewworld ollama[1903]: 2024/02/07 13:59:50 images.go:857: INFO total blobs: 36 Feb 07 13:59:50 bravenewworld ollama[1903]: 2024/02/07 13:59:50 images.go:864: INFO total unused blobs removed: 0 Feb 07 13:59:50 bravenewworld ollama[1903]: 2024/02/07 13:59:50 routes.go:950: INFO Listening on 127.0.0.1:11434 (version 0.1.22) Feb 07 13:59:50 bravenewworld ollama[1903]: 2024/02/07 13:59:50 payload_common.go:106: INFO Extracting dynamic libraries... Feb 07 13:59:52 bravenewworld ollama[1903]: 2024/02/07 13:59:52 payload_common.go:145: INFO Dynamic LLM libraries [cpu_avx cpu_avx2 rocm_v5 rocm_v6 cpu cuda_v11] Feb 07 13:59:52 bravenewworld ollama[1903]: 2024/02/07 13:59:52 gpu.go:94: INFO Detecting GPU type Feb 07 13:59:52 bravenewworld ollama[1903]: 2024/02/07 13:59:52 gpu.go:236: INFO Searching for GPU management library libnvidia-ml.so Feb 07 13:59:52 bravenewworld ollama[1903]: 2024/02/07 13:59:52 gpu.go:282: INFO Discovered GPU libraries: [] Feb 07 13:59:52 bravenewworld ollama[1903]: 2024/02/07 13:59:52 gpu.go:236: INFO Searching for GPU management library librocm_smi64.so Feb 07 13:59:52 bravenewworld ollama[1903]: 2024/02/07 13:59:52 gpu.go:282: INFO Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.6.0.60002 /opt/rocm-6.0.2/lib/librocm_smi64.so.6.0.60002] Feb 07 13:59:52 bravenewworld ollama[1903]: 2024/02/07 13:59:52 gpu.go:109: INFO Radeon GPU detected ``` It detects and uses my GPU while using the installed ROCm 6.0.2 libraries without fail.
Author
Owner

@havfo commented on GitHub (Feb 9, 2024):

I have a working docker setup for my RX6700XT on Debian testing/unstable. I am using latest ROCM libraries for Debian. I run the container like this:

docker run -d -p 11434:11434 --name ollama -e HSA_OVERRIDE_GFX_VERSION='10.3.0' --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v ollama:/root/.ollama ollama/ollama:0.1.24-rocm

Some logs from the conainer:

llm_load_tensors: ggml ctx size =    0.76 MiB
llm_load_tensors: offloading 12 repeating layers to GPU
llm_load_tensors: offloaded 12/33 layers to GPU
llm_load_tensors:      ROCm0 buffer size =  9391.12 MiB
llm_load_tensors:        CPU buffer size = 25215.87 MiB
....................................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: freq_base  = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      ROCm0 KV buffer size =    96.00 MiB
llama_kv_cache_init:  ROCm_Host KV buffer size =   160.00 MiB
llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
llama_new_context_with_model:  ROCm_Host input buffer size   =    12.01 MiB
llama_new_context_with_model:      ROCm0 compute buffer size =   211.21 MiB
llama_new_context_with_model:  ROCm_Host compute buffer size =   198.03 MiB
llama_new_context_with_model: graph splits (measure): 5
<!-- gh-comment-id:1936644187 --> @havfo commented on GitHub (Feb 9, 2024): I have a working docker setup for my RX6700XT on Debian testing/unstable. I am using latest ROCM libraries for Debian. I run the container like this: ``` docker run -d -p 11434:11434 --name ollama -e HSA_OVERRIDE_GFX_VERSION='10.3.0' --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v ollama:/root/.ollama ollama/ollama:0.1.24-rocm ``` Some logs from the conainer: ``` llm_load_tensors: ggml ctx size = 0.76 MiB llm_load_tensors: offloading 12 repeating layers to GPU llm_load_tensors: offloaded 12/33 layers to GPU llm_load_tensors: ROCm0 buffer size = 9391.12 MiB llm_load_tensors: CPU buffer size = 25215.87 MiB .................................................................................................... llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: freq_base = 1000000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: ROCm0 KV buffer size = 96.00 MiB llama_kv_cache_init: ROCm_Host KV buffer size = 160.00 MiB llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB llama_new_context_with_model: ROCm_Host input buffer size = 12.01 MiB llama_new_context_with_model: ROCm0 compute buffer size = 211.21 MiB llama_new_context_with_model: ROCm_Host compute buffer size = 198.03 MiB llama_new_context_with_model: graph splits (measure): 5 ```
Author
Owner

@zaskokus commented on GitHub (Feb 9, 2024):

Guys, gals, everyone, with latest archlinux, rocm 6.0 and latest ollama from releases everything just works. that is including using BOTH gpus (igpu and egpu in my example) at the same time.

time=2024-02-09T23:21:34.455+01:00 level=INFO source=images.go:860 msg="total blobs: 9"
time=2024-02-09T23:21:34.458+01:00 level=INFO source=images.go:867 msg="total unused blobs removed: 0"
time=2024-02-09T23:21:34.458+01:00 level=INFO source=routes.go:995 msg="Listening on 127.0.0.1:11434 (version 0.1.23)"
time=2024-02-09T23:21:34.458+01:00 level=INFO source=payload_common.go:106 msg="Extracting dynamic libraries..."
time=2024-02-09T23:21:36.393+01:00 level=INFO source=payload_common.go:145 msg="Dynamic LLM libraries [cpu cpu_avx cuda_v11 rocm_v6 rocm_v5 cpu_avx2]"
time=2024-02-09T23:21:36.393+01:00 level=INFO source=gpu.go:94 msg="Detecting GPU type"
time=2024-02-09T23:21:36.393+01:00 level=INFO source=gpu.go:242 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-02-09T23:21:36.400+01:00 level=INFO source=gpu.go:288 msg="Discovered GPU libraries: []"
time=2024-02-09T23:21:36.400+01:00 level=INFO source=gpu.go:242 msg="Searching for GPU management library librocm_smi64.so"
time=2024-02-09T23:21:36.400+01:00 level=INFO source=gpu.go:288 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.1.0]"
time=2024-02-09T23:21:36.411+01:00 level=INFO source=gpu.go:109 msg="Radeon GPU detected"
time=2024-02-09T23:21:36.411+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
[GIN] 2024/02/09 - 23:21:46 | 200 |     369.713µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/02/09 - 23:21:46 | 200 |     919.384µs |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/02/09 - 23:21:46 | 200 |     135.334µs |       127.0.0.1 | POST     "/api/show"
time=2024-02-09T23:21:46.423+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-02-09T23:21:46.423+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-02-09T23:21:46.423+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
loading library /tmp/ollama2449355153/rocm_v5/libext_server.so
time=2024-02-09T23:21:46.424+01:00 level=WARN source=llm.go:152 msg="Failed to load dynamic library /tmp/ollama2449355153/rocm_v5/libext_server.so  Unable to load dynamic library: Unable to load dynamic server library: libhipblas.so.1: cannot open shared object file: No such file or directory"
loading library /tmp/ollama2449355153/rocm_v6/libext_server.so
time=2024-02-09T23:21:46.525+01:00 level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama2449355153/rocm_v6/libext_server.so"
time=2024-02-09T23:21:46.525+01:00 level=INFO source=dyn_ext_server.go:145 msg="Initializing llama server"
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 2 ROCm devices:
  Device 0: AMD Radeon RX 7600M XT, compute capability 11.0, VMM: no
  Device 1: AMD Radeon Graphics, compute capability 11.0, VMM: no
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from /home/rhqq/.ollama/models/blobs/sha256:6739b3a1512affca82dc6a4575ecca0d44cd3c4b39ca3d0bf62f8d08b24c3b9e (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = ehartford
llama_model_loader: - kv   2:                       llama.context_length u32              = 2048
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 32
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  10:                          general.file_type u32              = 15
llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  17:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  18:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_K:  193 tensors
llama_model_loader: - type q6_K:   33 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format           = GGUF V2
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 2048
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 4096
llm_load_print_meta: n_embd_v_gqa     = 4096
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 11008
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 2048
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 6.74 B
llm_load_print_meta: model size       = 3.80 GiB (4.84 BPW) 
llm_load_print_meta: general.name     = ehartford
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.33 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:      ROCm0 buffer size =  2541.18 MiB
llm_load_tensors:      ROCm1 buffer size =  1279.76 MiB
llm_load_tensors:        CPU buffer size =    70.31 MiB
.................................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      ROCm0 KV buffer size =   704.00 MiB
llama_kv_cache_init:      ROCm1 KV buffer size =   320.00 MiB
llama_new_context_with_model: KV self size  = 1024.00 MiB, K (f16):  512.00 MiB, V (f16):  512.00 MiB
llama_new_context_with_model:  ROCm_Host input buffer size   =    12.01 MiB
llama_new_context_with_model:      ROCm0 compute buffer size =   171.60 MiB
llama_new_context_with_model:      ROCm1 compute buffer size =   171.60 MiB
llama_new_context_with_model:  ROCm_Host compute buffer size =     8.80 MiB
llama_new_context_with_model: graph splits (measure): 5
time=2024-02-09T23:22:06.780+01:00 level=INFO source=dyn_ext_server.go:156 msg="Starting llama main loop"
[GIN] 2024/02/09 - 23:22:06 | 200 | 20.415651619s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2024/02/09 - 23:22:48 | 200 | 36.751888362s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2024/02/09 - 23:23:32 | 200 |  3.412510236s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2024/02/09 - 23:27:30 | 200 |         1m39s |       127.0.0.1 | POST     "/api/chat"
<!-- gh-comment-id:1936688451 --> @zaskokus commented on GitHub (Feb 9, 2024): Guys, gals, everyone, with latest archlinux, rocm 6.0 and latest ollama from releases everything just works. that is including using BOTH gpus (igpu and egpu in my example) at the same time. ``` time=2024-02-09T23:21:34.455+01:00 level=INFO source=images.go:860 msg="total blobs: 9" time=2024-02-09T23:21:34.458+01:00 level=INFO source=images.go:867 msg="total unused blobs removed: 0" time=2024-02-09T23:21:34.458+01:00 level=INFO source=routes.go:995 msg="Listening on 127.0.0.1:11434 (version 0.1.23)" time=2024-02-09T23:21:34.458+01:00 level=INFO source=payload_common.go:106 msg="Extracting dynamic libraries..." time=2024-02-09T23:21:36.393+01:00 level=INFO source=payload_common.go:145 msg="Dynamic LLM libraries [cpu cpu_avx cuda_v11 rocm_v6 rocm_v5 cpu_avx2]" time=2024-02-09T23:21:36.393+01:00 level=INFO source=gpu.go:94 msg="Detecting GPU type" time=2024-02-09T23:21:36.393+01:00 level=INFO source=gpu.go:242 msg="Searching for GPU management library libnvidia-ml.so" time=2024-02-09T23:21:36.400+01:00 level=INFO source=gpu.go:288 msg="Discovered GPU libraries: []" time=2024-02-09T23:21:36.400+01:00 level=INFO source=gpu.go:242 msg="Searching for GPU management library librocm_smi64.so" time=2024-02-09T23:21:36.400+01:00 level=INFO source=gpu.go:288 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.1.0]" time=2024-02-09T23:21:36.411+01:00 level=INFO source=gpu.go:109 msg="Radeon GPU detected" time=2024-02-09T23:21:36.411+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" [GIN] 2024/02/09 - 23:21:46 | 200 | 369.713µs | 127.0.0.1 | HEAD "/" [GIN] 2024/02/09 - 23:21:46 | 200 | 919.384µs | 127.0.0.1 | POST "/api/show" [GIN] 2024/02/09 - 23:21:46 | 200 | 135.334µs | 127.0.0.1 | POST "/api/show" time=2024-02-09T23:21:46.423+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-02-09T23:21:46.423+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-02-09T23:21:46.423+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" loading library /tmp/ollama2449355153/rocm_v5/libext_server.so time=2024-02-09T23:21:46.424+01:00 level=WARN source=llm.go:152 msg="Failed to load dynamic library /tmp/ollama2449355153/rocm_v5/libext_server.so Unable to load dynamic library: Unable to load dynamic server library: libhipblas.so.1: cannot open shared object file: No such file or directory" loading library /tmp/ollama2449355153/rocm_v6/libext_server.so time=2024-02-09T23:21:46.525+01:00 level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama2449355153/rocm_v6/libext_server.so" time=2024-02-09T23:21:46.525+01:00 level=INFO source=dyn_ext_server.go:145 msg="Initializing llama server" ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes ggml_init_cublas: found 2 ROCm devices: Device 0: AMD Radeon RX 7600M XT, compute capability 11.0, VMM: no Device 1: AMD Radeon Graphics, compute capability 11.0, VMM: no llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from /home/rhqq/.ollama/models/blobs/sha256:6739b3a1512affca82dc6a4575ecca0d44cd3c4b39ca3d0bf62f8d08b24c3b9e (version GGUF V2) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = ehartford llama_model_loader: - kv 2: llama.context_length u32 = 2048 llama_model_loader: - kv 3: llama.embedding_length u32 = 4096 llama_model_loader: - kv 4: llama.block_count u32 = 32 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 11008 llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 32 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 10: general.file_type u32 = 15 llama_model_loader: - kv 11: tokenizer.ggml.model str = llama llama_model_loader: - kv 12: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<... llama_model_loader: - kv 13: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... llama_model_loader: - kv 15: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 17: tokenizer.ggml.unknown_token_id u32 = 0 llama_model_loader: - kv 18: general.quantization_version u32 = 2 llama_model_loader: - type f32: 65 tensors llama_model_loader: - type q4_K: 193 tensors llama_model_loader: - type q6_K: 33 tensors llm_load_vocab: special tokens definition check successful ( 259/32000 ). llm_load_print_meta: format = GGUF V2 llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train = 2048 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 32 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 1 llm_load_print_meta: n_embd_k_gqa = 4096 llm_load_print_meta: n_embd_v_gqa = 4096 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-06 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: n_ff = 11008 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 2048 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: model type = 7B llm_load_print_meta: model ftype = Q4_K - Medium llm_load_print_meta: model params = 6.74 B llm_load_print_meta: model size = 3.80 GiB (4.84 BPW) llm_load_print_meta: general.name = ehartford llm_load_print_meta: BOS token = 1 '<s>' llm_load_print_meta: EOS token = 2 '</s>' llm_load_print_meta: UNK token = 0 '<unk>' llm_load_print_meta: LF token = 13 '<0x0A>' llm_load_tensors: ggml ctx size = 0.33 MiB llm_load_tensors: offloading 32 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 33/33 layers to GPU llm_load_tensors: ROCm0 buffer size = 2541.18 MiB llm_load_tensors: ROCm1 buffer size = 1279.76 MiB llm_load_tensors: CPU buffer size = 70.31 MiB ................................................................................................. llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: ROCm0 KV buffer size = 704.00 MiB llama_kv_cache_init: ROCm1 KV buffer size = 320.00 MiB llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB llama_new_context_with_model: ROCm_Host input buffer size = 12.01 MiB llama_new_context_with_model: ROCm0 compute buffer size = 171.60 MiB llama_new_context_with_model: ROCm1 compute buffer size = 171.60 MiB llama_new_context_with_model: ROCm_Host compute buffer size = 8.80 MiB llama_new_context_with_model: graph splits (measure): 5 time=2024-02-09T23:22:06.780+01:00 level=INFO source=dyn_ext_server.go:156 msg="Starting llama main loop" [GIN] 2024/02/09 - 23:22:06 | 200 | 20.415651619s | 127.0.0.1 | POST "/api/chat" [GIN] 2024/02/09 - 23:22:48 | 200 | 36.751888362s | 127.0.0.1 | POST "/api/chat" [GIN] 2024/02/09 - 23:23:32 | 200 | 3.412510236s | 127.0.0.1 | POST "/api/chat" [GIN] 2024/02/09 - 23:27:30 | 200 | 1m39s | 127.0.0.1 | POST "/api/chat" ```
Author
Owner

@meminens commented on GitHub (Feb 9, 2024):

Guys, gals, everyone, with latest archlinux, rocm 6.0 and latest ollama from releases everything just works. that is including using BOTH gpus (igpu and egpu in my example) at the same time.

time=2024-02-09T23:21:34.455+01:00 level=INFO source=images.go:860 msg="total blobs: 9"
time=2024-02-09T23:21:34.458+01:00 level=INFO source=images.go:867 msg="total unused blobs removed: 0"
time=2024-02-09T23:21:34.458+01:00 level=INFO source=routes.go:995 msg="Listening on 127.0.0.1:11434 (version 0.1.23)"
time=2024-02-09T23:21:34.458+01:00 level=INFO source=payload_common.go:106 msg="Extracting dynamic libraries..."
time=2024-02-09T23:21:36.393+01:00 level=INFO source=payload_common.go:145 msg="Dynamic LLM libraries [cpu cpu_avx cuda_v11 rocm_v6 rocm_v5 cpu_avx2]"
time=2024-02-09T23:21:36.393+01:00 level=INFO source=gpu.go:94 msg="Detecting GPU type"
time=2024-02-09T23:21:36.393+01:00 level=INFO source=gpu.go:242 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-02-09T23:21:36.400+01:00 level=INFO source=gpu.go:288 msg="Discovered GPU libraries: []"
time=2024-02-09T23:21:36.400+01:00 level=INFO source=gpu.go:242 msg="Searching for GPU management library librocm_smi64.so"
time=2024-02-09T23:21:36.400+01:00 level=INFO source=gpu.go:288 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.1.0]"
time=2024-02-09T23:21:36.411+01:00 level=INFO source=gpu.go:109 msg="Radeon GPU detected"
time=2024-02-09T23:21:36.411+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
[GIN] 2024/02/09 - 23:21:46 | 200 |     369.713µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/02/09 - 23:21:46 | 200 |     919.384µs |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/02/09 - 23:21:46 | 200 |     135.334µs |       127.0.0.1 | POST     "/api/show"
time=2024-02-09T23:21:46.423+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-02-09T23:21:46.423+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-02-09T23:21:46.423+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
loading library /tmp/ollama2449355153/rocm_v5/libext_server.so
time=2024-02-09T23:21:46.424+01:00 level=WARN source=llm.go:152 msg="Failed to load dynamic library /tmp/ollama2449355153/rocm_v5/libext_server.so  Unable to load dynamic library: Unable to load dynamic server library: libhipblas.so.1: cannot open shared object file: No such file or directory"
loading library /tmp/ollama2449355153/rocm_v6/libext_server.so
time=2024-02-09T23:21:46.525+01:00 level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama2449355153/rocm_v6/libext_server.so"
time=2024-02-09T23:21:46.525+01:00 level=INFO source=dyn_ext_server.go:145 msg="Initializing llama server"
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 2 ROCm devices:
  Device 0: AMD Radeon RX 7600M XT, compute capability 11.0, VMM: no
  Device 1: AMD Radeon Graphics, compute capability 11.0, VMM: no
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from /home/rhqq/.ollama/models/blobs/sha256:6739b3a1512affca82dc6a4575ecca0d44cd3c4b39ca3d0bf62f8d08b24c3b9e (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = ehartford
llama_model_loader: - kv   2:                       llama.context_length u32              = 2048
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 32
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  10:                          general.file_type u32              = 15
llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  17:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  18:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_K:  193 tensors
llama_model_loader: - type q6_K:   33 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format           = GGUF V2
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 2048
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 4096
llm_load_print_meta: n_embd_v_gqa     = 4096
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 11008
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 2048
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 6.74 B
llm_load_print_meta: model size       = 3.80 GiB (4.84 BPW) 
llm_load_print_meta: general.name     = ehartford
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.33 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:      ROCm0 buffer size =  2541.18 MiB
llm_load_tensors:      ROCm1 buffer size =  1279.76 MiB
llm_load_tensors:        CPU buffer size =    70.31 MiB
.................................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      ROCm0 KV buffer size =   704.00 MiB
llama_kv_cache_init:      ROCm1 KV buffer size =   320.00 MiB
llama_new_context_with_model: KV self size  = 1024.00 MiB, K (f16):  512.00 MiB, V (f16):  512.00 MiB
llama_new_context_with_model:  ROCm_Host input buffer size   =    12.01 MiB
llama_new_context_with_model:      ROCm0 compute buffer size =   171.60 MiB
llama_new_context_with_model:      ROCm1 compute buffer size =   171.60 MiB
llama_new_context_with_model:  ROCm_Host compute buffer size =     8.80 MiB
llama_new_context_with_model: graph splits (measure): 5
time=2024-02-09T23:22:06.780+01:00 level=INFO source=dyn_ext_server.go:156 msg="Starting llama main loop"
[GIN] 2024/02/09 - 23:22:06 | 200 | 20.415651619s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2024/02/09 - 23:22:48 | 200 | 36.751888362s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2024/02/09 - 23:23:32 | 200 |  3.412510236s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2024/02/09 - 23:27:30 | 200 |         1m39s |       127.0.0.1 | POST     "/api/chat"

That's great news! Have you installed ollama with the bash script on the home page? How do you tell ollama to use the dGPU?

<!-- gh-comment-id:1936747511 --> @meminens commented on GitHub (Feb 9, 2024): > Guys, gals, everyone, with latest archlinux, rocm 6.0 and latest ollama from releases everything just works. that is including using BOTH gpus (igpu and egpu in my example) at the same time. > > ``` > time=2024-02-09T23:21:34.455+01:00 level=INFO source=images.go:860 msg="total blobs: 9" > time=2024-02-09T23:21:34.458+01:00 level=INFO source=images.go:867 msg="total unused blobs removed: 0" > time=2024-02-09T23:21:34.458+01:00 level=INFO source=routes.go:995 msg="Listening on 127.0.0.1:11434 (version 0.1.23)" > time=2024-02-09T23:21:34.458+01:00 level=INFO source=payload_common.go:106 msg="Extracting dynamic libraries..." > time=2024-02-09T23:21:36.393+01:00 level=INFO source=payload_common.go:145 msg="Dynamic LLM libraries [cpu cpu_avx cuda_v11 rocm_v6 rocm_v5 cpu_avx2]" > time=2024-02-09T23:21:36.393+01:00 level=INFO source=gpu.go:94 msg="Detecting GPU type" > time=2024-02-09T23:21:36.393+01:00 level=INFO source=gpu.go:242 msg="Searching for GPU management library libnvidia-ml.so" > time=2024-02-09T23:21:36.400+01:00 level=INFO source=gpu.go:288 msg="Discovered GPU libraries: []" > time=2024-02-09T23:21:36.400+01:00 level=INFO source=gpu.go:242 msg="Searching for GPU management library librocm_smi64.so" > time=2024-02-09T23:21:36.400+01:00 level=INFO source=gpu.go:288 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.1.0]" > time=2024-02-09T23:21:36.411+01:00 level=INFO source=gpu.go:109 msg="Radeon GPU detected" > time=2024-02-09T23:21:36.411+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" > [GIN] 2024/02/09 - 23:21:46 | 200 | 369.713µs | 127.0.0.1 | HEAD "/" > [GIN] 2024/02/09 - 23:21:46 | 200 | 919.384µs | 127.0.0.1 | POST "/api/show" > [GIN] 2024/02/09 - 23:21:46 | 200 | 135.334µs | 127.0.0.1 | POST "/api/show" > time=2024-02-09T23:21:46.423+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" > time=2024-02-09T23:21:46.423+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" > time=2024-02-09T23:21:46.423+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" > loading library /tmp/ollama2449355153/rocm_v5/libext_server.so > time=2024-02-09T23:21:46.424+01:00 level=WARN source=llm.go:152 msg="Failed to load dynamic library /tmp/ollama2449355153/rocm_v5/libext_server.so Unable to load dynamic library: Unable to load dynamic server library: libhipblas.so.1: cannot open shared object file: No such file or directory" > loading library /tmp/ollama2449355153/rocm_v6/libext_server.so > time=2024-02-09T23:21:46.525+01:00 level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama2449355153/rocm_v6/libext_server.so" > time=2024-02-09T23:21:46.525+01:00 level=INFO source=dyn_ext_server.go:145 msg="Initializing llama server" > ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no > ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes > ggml_init_cublas: found 2 ROCm devices: > Device 0: AMD Radeon RX 7600M XT, compute capability 11.0, VMM: no > Device 1: AMD Radeon Graphics, compute capability 11.0, VMM: no > llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from /home/rhqq/.ollama/models/blobs/sha256:6739b3a1512affca82dc6a4575ecca0d44cd3c4b39ca3d0bf62f8d08b24c3b9e (version GGUF V2) > llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. > llama_model_loader: - kv 0: general.architecture str = llama > llama_model_loader: - kv 1: general.name str = ehartford > llama_model_loader: - kv 2: llama.context_length u32 = 2048 > llama_model_loader: - kv 3: llama.embedding_length u32 = 4096 > llama_model_loader: - kv 4: llama.block_count u32 = 32 > llama_model_loader: - kv 5: llama.feed_forward_length u32 = 11008 > llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128 > llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 > llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 32 > llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000001 > llama_model_loader: - kv 10: general.file_type u32 = 15 > llama_model_loader: - kv 11: tokenizer.ggml.model str = llama > llama_model_loader: - kv 12: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<... > llama_model_loader: - kv 13: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... > llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... > llama_model_loader: - kv 15: tokenizer.ggml.bos_token_id u32 = 1 > llama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32 = 2 > llama_model_loader: - kv 17: tokenizer.ggml.unknown_token_id u32 = 0 > llama_model_loader: - kv 18: general.quantization_version u32 = 2 > llama_model_loader: - type f32: 65 tensors > llama_model_loader: - type q4_K: 193 tensors > llama_model_loader: - type q6_K: 33 tensors > llm_load_vocab: special tokens definition check successful ( 259/32000 ). > llm_load_print_meta: format = GGUF V2 > llm_load_print_meta: arch = llama > llm_load_print_meta: vocab type = SPM > llm_load_print_meta: n_vocab = 32000 > llm_load_print_meta: n_merges = 0 > llm_load_print_meta: n_ctx_train = 2048 > llm_load_print_meta: n_embd = 4096 > llm_load_print_meta: n_head = 32 > llm_load_print_meta: n_head_kv = 32 > llm_load_print_meta: n_layer = 32 > llm_load_print_meta: n_rot = 128 > llm_load_print_meta: n_embd_head_k = 128 > llm_load_print_meta: n_embd_head_v = 128 > llm_load_print_meta: n_gqa = 1 > llm_load_print_meta: n_embd_k_gqa = 4096 > llm_load_print_meta: n_embd_v_gqa = 4096 > llm_load_print_meta: f_norm_eps = 0.0e+00 > llm_load_print_meta: f_norm_rms_eps = 1.0e-06 > llm_load_print_meta: f_clamp_kqv = 0.0e+00 > llm_load_print_meta: f_max_alibi_bias = 0.0e+00 > llm_load_print_meta: n_ff = 11008 > llm_load_print_meta: n_expert = 0 > llm_load_print_meta: n_expert_used = 0 > llm_load_print_meta: rope scaling = linear > llm_load_print_meta: freq_base_train = 10000.0 > llm_load_print_meta: freq_scale_train = 1 > llm_load_print_meta: n_yarn_orig_ctx = 2048 > llm_load_print_meta: rope_finetuned = unknown > llm_load_print_meta: model type = 7B > llm_load_print_meta: model ftype = Q4_K - Medium > llm_load_print_meta: model params = 6.74 B > llm_load_print_meta: model size = 3.80 GiB (4.84 BPW) > llm_load_print_meta: general.name = ehartford > llm_load_print_meta: BOS token = 1 '<s>' > llm_load_print_meta: EOS token = 2 '</s>' > llm_load_print_meta: UNK token = 0 '<unk>' > llm_load_print_meta: LF token = 13 '<0x0A>' > llm_load_tensors: ggml ctx size = 0.33 MiB > llm_load_tensors: offloading 32 repeating layers to GPU > llm_load_tensors: offloading non-repeating layers to GPU > llm_load_tensors: offloaded 33/33 layers to GPU > llm_load_tensors: ROCm0 buffer size = 2541.18 MiB > llm_load_tensors: ROCm1 buffer size = 1279.76 MiB > llm_load_tensors: CPU buffer size = 70.31 MiB > ................................................................................................. > llama_new_context_with_model: n_ctx = 2048 > llama_new_context_with_model: freq_base = 10000.0 > llama_new_context_with_model: freq_scale = 1 > llama_kv_cache_init: ROCm0 KV buffer size = 704.00 MiB > llama_kv_cache_init: ROCm1 KV buffer size = 320.00 MiB > llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB > llama_new_context_with_model: ROCm_Host input buffer size = 12.01 MiB > llama_new_context_with_model: ROCm0 compute buffer size = 171.60 MiB > llama_new_context_with_model: ROCm1 compute buffer size = 171.60 MiB > llama_new_context_with_model: ROCm_Host compute buffer size = 8.80 MiB > llama_new_context_with_model: graph splits (measure): 5 > time=2024-02-09T23:22:06.780+01:00 level=INFO source=dyn_ext_server.go:156 msg="Starting llama main loop" > [GIN] 2024/02/09 - 23:22:06 | 200 | 20.415651619s | 127.0.0.1 | POST "/api/chat" > [GIN] 2024/02/09 - 23:22:48 | 200 | 36.751888362s | 127.0.0.1 | POST "/api/chat" > [GIN] 2024/02/09 - 23:23:32 | 200 | 3.412510236s | 127.0.0.1 | POST "/api/chat" > [GIN] 2024/02/09 - 23:27:30 | 200 | 1m39s | 127.0.0.1 | POST "/api/chat" > ``` That's great news! Have you installed ollama with the bash script on the home page? How do you tell ollama to use the dGPU?
Author
Owner

@zaskokus commented on GitHub (Feb 10, 2024):

@misaligar I just downloaded the compiled file from the releases, added executable flag and.. that's it. I have all the rocm libs/pkgs from the archlinux repo, everything is vanilla config, I have no extra flags added, no preloads, no variables before the command itself. I have NOT used the packaged ollama version, but the github bin file.

<!-- gh-comment-id:1936765124 --> @zaskokus commented on GitHub (Feb 10, 2024): @misaligar I just downloaded the compiled file from the releases, added executable flag and.. that's it. I have all the rocm libs/pkgs from the archlinux repo, everything is vanilla config, I have no extra flags added, no preloads, no variables before the command itself. I have NOT used the packaged ollama version, but the github bin file.
Author
Owner

@meminens commented on GitHub (Feb 10, 2024):

@misaligar I just downloaded the compiled file from the releases, added executable flag and.. that's it. I have all the rocm libs/pkgs from the archlinux repo, everything is vanilla config, I have no extra flags added, no preloads, no variables before the command itself. I have NOT used the packaged ollama version, but the github bin file.

Thank you so much! I just did the same and like you said it worked right off the bat. Amazing! I don't have compile with custom flags anymore.

<!-- gh-comment-id:1936769385 --> @meminens commented on GitHub (Feb 10, 2024): > @misaligar I just downloaded the compiled file from the releases, added executable flag and.. that's it. I have all the rocm libs/pkgs from the archlinux repo, everything is vanilla config, I have no extra flags added, no preloads, no variables before the command itself. I have NOT used the packaged ollama version, but the github bin file. Thank you so much! I just did the same and like you said it worked right off the bat. Amazing! I don't have compile with custom flags anymore.
Author
Owner

@ghost commented on GitHub (Feb 10, 2024):

@CaioPrioridosSantos What GPU are you running? Looks like you are trying to run a docker setup? If you want to use an AMD card with docker you would need the separate ROCm enabled release: ollama/ollama:0.1.23-rocm.

As per documentation ROCm is not merged with ollama:latest due to the very large file size of the ROCm libraries that are included (and needed) in the ROCm release.

But as I've stated above so far I have not managed to get it to run within Docker myself. Only when installed on my bare metal machine using the bash script installation.

For reference this is the startup log of my bare metal/host system:

Feb 07 13:59:50 bravenewworld systemd[1]: Started ollama.service - Ollama Service.
Feb 07 13:59:50 bravenewworld ollama[1903]: 2024/02/07 13:59:50 images.go:857: INFO total blobs: 36
Feb 07 13:59:50 bravenewworld ollama[1903]: 2024/02/07 13:59:50 images.go:864: INFO total unused blobs removed: 0
Feb 07 13:59:50 bravenewworld ollama[1903]: 2024/02/07 13:59:50 routes.go:950: INFO Listening on 127.0.0.1:11434 (version 0.1.22)
Feb 07 13:59:50 bravenewworld ollama[1903]: 2024/02/07 13:59:50 payload_common.go:106: INFO Extracting dynamic libraries...
Feb 07 13:59:52 bravenewworld ollama[1903]: 2024/02/07 13:59:52 payload_common.go:145: INFO Dynamic LLM libraries [cpu_avx cpu_avx2 rocm_v5 rocm_v6 cpu cuda_v11]
Feb 07 13:59:52 bravenewworld ollama[1903]: 2024/02/07 13:59:52 gpu.go:94: INFO Detecting GPU type
Feb 07 13:59:52 bravenewworld ollama[1903]: 2024/02/07 13:59:52 gpu.go:236: INFO Searching for GPU management library libnvidia-ml.so
Feb 07 13:59:52 bravenewworld ollama[1903]: 2024/02/07 13:59:52 gpu.go:282: INFO Discovered GPU libraries: []
Feb 07 13:59:52 bravenewworld ollama[1903]: 2024/02/07 13:59:52 gpu.go:236: INFO Searching for GPU management library librocm_smi64.so
Feb 07 13:59:52 bravenewworld ollama[1903]: 2024/02/07 13:59:52 gpu.go:282: INFO Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.6.0.60002 /opt/rocm-6.0.2/lib/librocm_smi64.so.6.0.60002]
Feb 07 13:59:52 bravenewworld ollama[1903]: 2024/02/07 13:59:52 gpu.go:109: INFO Radeon GPU detected

It detects and uses my GPU while using the installed ROCm 6.0.2 libraries without fail.

@Th3Rom3

Hello. I'm on a notebook, Fedora 39, AMD Ryzen™ 9 6900HS with Radeon™ Graphics × 16, and an AMD Radeon Rx6800s 8gb.

Yes, I'm on Docker, but if needed, I can remove the ollama on Docker and restart with the bash installation.

Thank you for sharing the image. It looks like it worked for you. What did you do beforehand for it to recognize ROCm and the GPU?

Thank you

<!-- gh-comment-id:1936972031 --> @ghost commented on GitHub (Feb 10, 2024): > @CaioPrioridosSantos What GPU are you running? Looks like you are trying to run a docker setup? If you want to use an AMD card with docker you would need the separate ROCm enabled release: ollama/ollama:0.1.23-rocm. > > As per documentation ROCm is not merged with ollama:latest due to the very large file size of the ROCm libraries that are included (and needed) in the ROCm release. > > But as I've stated above so far I have not managed to get it to run within Docker myself. Only when installed on my bare metal machine using the [bash script installation](https://github.com/ollama/ollama/blob/main/docs/linux.md). > > For reference this is the startup log of my bare metal/host system: > > ``` > Feb 07 13:59:50 bravenewworld systemd[1]: Started ollama.service - Ollama Service. > Feb 07 13:59:50 bravenewworld ollama[1903]: 2024/02/07 13:59:50 images.go:857: INFO total blobs: 36 > Feb 07 13:59:50 bravenewworld ollama[1903]: 2024/02/07 13:59:50 images.go:864: INFO total unused blobs removed: 0 > Feb 07 13:59:50 bravenewworld ollama[1903]: 2024/02/07 13:59:50 routes.go:950: INFO Listening on 127.0.0.1:11434 (version 0.1.22) > Feb 07 13:59:50 bravenewworld ollama[1903]: 2024/02/07 13:59:50 payload_common.go:106: INFO Extracting dynamic libraries... > Feb 07 13:59:52 bravenewworld ollama[1903]: 2024/02/07 13:59:52 payload_common.go:145: INFO Dynamic LLM libraries [cpu_avx cpu_avx2 rocm_v5 rocm_v6 cpu cuda_v11] > Feb 07 13:59:52 bravenewworld ollama[1903]: 2024/02/07 13:59:52 gpu.go:94: INFO Detecting GPU type > Feb 07 13:59:52 bravenewworld ollama[1903]: 2024/02/07 13:59:52 gpu.go:236: INFO Searching for GPU management library libnvidia-ml.so > Feb 07 13:59:52 bravenewworld ollama[1903]: 2024/02/07 13:59:52 gpu.go:282: INFO Discovered GPU libraries: [] > Feb 07 13:59:52 bravenewworld ollama[1903]: 2024/02/07 13:59:52 gpu.go:236: INFO Searching for GPU management library librocm_smi64.so > Feb 07 13:59:52 bravenewworld ollama[1903]: 2024/02/07 13:59:52 gpu.go:282: INFO Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.6.0.60002 /opt/rocm-6.0.2/lib/librocm_smi64.so.6.0.60002] > Feb 07 13:59:52 bravenewworld ollama[1903]: 2024/02/07 13:59:52 gpu.go:109: INFO Radeon GPU detected > ``` > > It detects and uses my GPU while using the installed ROCm 6.0.2 libraries without fail. @Th3Rom3 Hello. I'm on a notebook, Fedora 39, AMD Ryzen™ 9 6900HS with Radeon™ Graphics × 16, and an AMD Radeon Rx6800s 8gb. Yes, I'm on Docker, but if needed, I can remove the ollama on Docker and restart with the bash installation. Thank you for sharing the image. It looks like it worked for you. What did you do beforehand for it to recognize ROCm and the GPU? Thank you
Author
Owner

@chiragkrishna commented on GitHub (Feb 10, 2024):

version: '3.8'
services:
  ollama:
    volumes:
      - PATH:/root/.ollama #add your path
    container_name: ollama
    pull_policy: always
    tty: true
    stdin_open: true
    restart: unless-stopped
    image: ollama/ollama:0.1.24-rocm
    environment:
      # I am using AMD 5500U (gfx90c) which is not supported by ROCM, so using the closest match (gfx900)
      - "HSA_OVERRIDE_GFX_VERSION=9.0.0" #add cards supported by ROCM
      - "HCC_AMDGPU_TARGETS=gfx900" #add cards supported by ROCM
      - "OLLAMA_DEBUG=1"
    devices:
      - "/dev/dri/card0:/dev/dri/card0"
      - "/dev/dri/renderD128:/dev/dri/renderD128"
      - "/dev/kfd:/dev/kfd"
    group_add:
      - "video"
    security_opt:
      - seccomp:unconfined
    network_mode: 'host'

  ollama-webui:
    image: ghcr.io/ollama-webui/ollama-webui:main
    container_name: ollama-webui
    volumes:
      - PATH:/app/backend/data #add your path
    depends_on:
      - ollama
    ports:
      - 3000:8080
    environment:
      - 'OLLAMA_API_BASE_URL=http://localhost:11434/api' # replace localhost with your ip address
    extra_hosts:
      - host.docker.internal:host-gateway
    restart: unless-stopped

i can see the memory changes in radeontop, but there is no output, looks like have to wait for amd integrated graphics to work.

<!-- gh-comment-id:1936998775 --> @chiragkrishna commented on GitHub (Feb 10, 2024): ```yaml version: '3.8' services: ollama: volumes: - PATH:/root/.ollama #add your path container_name: ollama pull_policy: always tty: true stdin_open: true restart: unless-stopped image: ollama/ollama:0.1.24-rocm environment: # I am using AMD 5500U (gfx90c) which is not supported by ROCM, so using the closest match (gfx900) - "HSA_OVERRIDE_GFX_VERSION=9.0.0" #add cards supported by ROCM - "HCC_AMDGPU_TARGETS=gfx900" #add cards supported by ROCM - "OLLAMA_DEBUG=1" devices: - "/dev/dri/card0:/dev/dri/card0" - "/dev/dri/renderD128:/dev/dri/renderD128" - "/dev/kfd:/dev/kfd" group_add: - "video" security_opt: - seccomp:unconfined network_mode: 'host' ollama-webui: image: ghcr.io/ollama-webui/ollama-webui:main container_name: ollama-webui volumes: - PATH:/app/backend/data #add your path depends_on: - ollama ports: - 3000:8080 environment: - 'OLLAMA_API_BASE_URL=http://localhost:11434/api' # replace localhost with your ip address extra_hosts: - host.docker.internal:host-gateway restart: unless-stopped ``` i can see the memory changes in radeontop, but there is no output, looks like have to wait for amd integrated graphics to work.
Author
Owner

@zaskokus commented on GitHub (Feb 10, 2024):

@chiragkrishna I confirmed few posts before that iGPUS work well with ROCm 6.0, what's more, they work in tandem with dGPU/eGPU out of the box.

<!-- gh-comment-id:1937000669 --> @zaskokus commented on GitHub (Feb 10, 2024): @chiragkrishna I confirmed few posts before that iGPUS work well with ROCm 6.0, what's more, they work in tandem with dGPU/eGPU out of the box.
Author
Owner

@chiragkrishna commented on GitHub (Feb 10, 2024):

@zaskokus using rocm 6.0.2, downloaded the latest ollama, but still it is stuck here

llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 25/25 layers to GPU
llm_load_tensors:      ROCm0 buffer size =   703.44 MiB
llm_load_tensors:        CPU buffer size =    35.44 MiB

ollama_logs.txt

rocm test is good and i can run stable diffusion
test-rocm.py

python3 test-rocm.py                                                                          


Checking ROCM support...
GOOD: ROCM devices found:  2
Checking PyTorch...
GOOD: PyTorch is working fine.
Checking user groups...
GOOD: The user user is in RENDER and VIDEO groups.
GOOD: PyTorch ROCM support found.
Testing PyTorch ROCM support...
Everything fine! You can run PyTorch code inside of:
--->  AMD Ryzen 5 5500U with Radeon Graphics
--->  gfx900
<!-- gh-comment-id:1937023806 --> @chiragkrishna commented on GitHub (Feb 10, 2024): @zaskokus using rocm 6.0.2, downloaded the latest ollama, but still it is stuck here ```llm_load_tensors: offloading 24 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 25/25 layers to GPU llm_load_tensors: ROCm0 buffer size = 703.44 MiB llm_load_tensors: CPU buffer size = 35.44 MiB ``` [ollama_logs.txt](https://github.com/ollama/ollama/files/14229590/ollama_logs.txt) rocm test is good and i can run stable diffusion [test-rocm.py](https://gist.github.com/damico/484f7b0a148a0c5f707054cf9c0a0533) ``` python3 test-rocm.py Checking ROCM support... GOOD: ROCM devices found: 2 Checking PyTorch... GOOD: PyTorch is working fine. Checking user groups... GOOD: The user user is in RENDER and VIDEO groups. GOOD: PyTorch ROCM support found. Testing PyTorch ROCM support... Everything fine! You can run PyTorch code inside of: ---> AMD Ryzen 5 5500U with Radeon Graphics ---> gfx900 ```
Author
Owner

@Th3Rom3 commented on GitHub (Feb 11, 2024):

I have managed to figure out my problems with running ROCm powered ollama in a Docker container. Somehow the Docker Engine setup bundled with the Linux version of Docker Desktop does not work with my GPU setup. After purging Docker and manually installing the docker engine it now works flawlessly both in a container and on bare metal.

@CaioPrioridosSantos Disclaimer I was testing on a forked Fedora distro called Nobara. Said distro comes with some ROCm dependencies preinstalled. Although I later updated the libraries manually to 6.0.2 myself.

I find it to be easier to test with a local installation first, since it removes the added complications of handing the GPU to a docker container. You can start by running rocminfo on your bare metal system to see if it works at all (although it isn't strictly necessary to have the full ROCm stack set up both inside the container and on the host system).

<!-- gh-comment-id:1937622921 --> @Th3Rom3 commented on GitHub (Feb 11, 2024): I have managed to figure out my problems with running ROCm powered ollama in a Docker container. Somehow the Docker Engine setup bundled with the Linux version of Docker Desktop does not work with my GPU setup. After purging Docker and manually installing the docker engine it now works flawlessly both in a container and on bare metal. @CaioPrioridosSantos Disclaimer I was testing on a forked Fedora distro called Nobara. Said distro comes with some ROCm dependencies preinstalled. Although I later updated the libraries manually to 6.0.2 myself. I find it to be easier to test with a local installation first, since it removes the added complications of handing the GPU to a docker container. You can start by running `rocminfo` on your bare metal system to see if it works at all (although it isn't strictly necessary to have the full ROCm stack set up both inside the container and on the host system).
Author
Owner

@ghost commented on GitHub (Feb 11, 2024):

I have managed to figure out my problems with running ROCm powered ollama in a Docker container. Somehow the Docker Engine setup bundled with the Linux version of Docker Desktop does not work with my GPU setup. After purging Docker and manually installing the docker engine it now works flawlessly both in a container and on bare metal.

@CaioPrioridosSantos Disclaimer I was testing on a forked Fedora distro called Nobara. Said distro comes with some ROCm dependencies preinstalled. Although I later updated the libraries manually to 6.0.2 myself.

I find it to be easier to test with a local installation first, since it removes the added complications of handing the GPU to a docker container. You can start by running rocminfo on your bare metal system to see if it works at all (although it isn't strictly necessary to have the full ROCm stack set up both inside the container and on the host system).

Nothing yet. In intall bah cant recognize GPU AMD, searching only per GPU Nvidia but I have no Nvidia.

(base) bash-5.2$ ollama serve
time=2024-02-11T15:35:47.939Z level=INFO source=images.go:863 msg="total blobs: 0"
time=2024-02-11T15:35:47.939Z level=INFO source=images.go:870 msg="total unused blobs removed: 0"
time=2024-02-11T15:35:47.939Z level=INFO source=routes.go:999 msg="Listening on 127.0.0.1:11434 (version 0.1.24)"
time=2024-02-11T15:35:47.940Z level=INFO source=payload_common.go:106 msg="Extracting dynamic libraries..."
time=2024-02-11T15:35:49.691Z level=INFO source=payload_common.go:145 msg="Dynamic LLM libraries [rocm_v6 rocm_v5 cpu cpu_avx2 cpu_avx cuda_v11]"
time=2024-02-11T15:35:49.691Z level=INFO source=gpu.go:94 msg="Detecting GPU type"
time=2024-02-11T15:35:49.691Z level=INFO source=gpu.go:242 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-02-11T15:35:49.693Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: [/usr/lib64/libnvidia-ml.so.470.223.02]"
time=2024-02-11T15:35:49.894Z level=INFO source=gpu.go:300 msg="Unable to load CUDA management library /usr/lib64/libnvidia-ml.so.470.223.02: nvml vram init failure: 9"
time=2024-02-11T15:35:49.894Z level=INFO source=gpu.go:242 msg="Searching for GPU management library librocm_smi64.so"
time=2024-02-11T15:35:49.895Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: []"
time=2024-02-11T15:35:49.895Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-02-11T15:35:49.895Z level=INFO source=routes.go:1022 msg="no GPU detected"

But, in container docker this recognize, like these logs

(base) bash-5.2$ docker logs ollama
time=2024-02-11T14:59:53.833Z level=INFO source=images.go:863 msg="total blobs: 0"
time=2024-02-11T14:59:53.833Z level=INFO source=images.go:870 msg="total unused blobs removed: 0"
time=2024-02-11T14:59:53.834Z level=INFO source=routes.go:999 msg="Listening on [::]:11434 (version 0.1.24)"
time=2024-02-11T14:59:53.834Z level=INFO source=payload_common.go:106 msg="Extracting dynamic libraries..."
time=2024-02-11T14:59:55.676Z level=INFO source=payload_common.go:145 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 rocm_v5 cpu cuda_v11 rocm_v6]"
time=2024-02-11T14:59:55.676Z level=INFO source=gpu.go:94 msg="Detecting GPU type"
time=2024-02-11T14:59:55.676Z level=INFO source=gpu.go:242 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-02-11T14:59:55.677Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: []"
time=2024-02-11T14:59:55.677Z level=INFO source=gpu.go:242 msg="Searching for GPU management library librocm_smi64.so"
time=2024-02-11T14:59:55.678Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0.50701 /opt/rocm-5.7.1/lib/librocm_smi64.so.5.0.50701]"
time=2024-02-11T14:59:55.680Z level=INFO source=gpu.go:109 msg="Radeon GPU detected"
time=2024-02-11T14:59:55.680Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-02-11T14:59:55.680Z level=INFO source=gpu.go:177 msg="ROCm integrated GPU detected - ROCR_VISIBLE_DEVICES=0"
[GIN] 2024/02/11 - 15:08:35 | 200 | 53.445µs | 127.0.0.1 | HEAD "/"
[GIN] 2024/02/11 - 15:08:35 | 404 | 216.646µs | 127.0.0.1 | POST "/api/show"
time=2024-02-11T15:08:38.651Z level=INFO source=download.go:136 msg="downloading 2609048d349e in 65 115 MB part(s)"
time=2024-02-11T15:10:40.100Z level=INFO source=download.go:136 msg="downloading 8c17c2ebb0ea in 1 7.0 KB part(s)"
time=2024-02-11T15:10:43.706Z level=INFO source=download.go:136 msg="downloading 7c23fb36d801 in 1 4.8 KB part(s)"
time=2024-02-11T15:10:48.088Z level=INFO source=download.go:136 msg="downloading 2e0493f67d0c in 1 59 B part(s)"
time=2024-02-11T15:10:51.566Z level=INFO source=download.go:136 msg="downloading fa304d675061 in 1 91 B part(s)"
time=2024-02-11T15:10:54.968Z level=INFO source=download.go:136 msg="downloading be61bcdf308e in 1 558 B part(s)"
[GIN] 2024/02/11 - 15:11:01 | 200 | 2m25s | 127.0.0.1 | POST "/api/pull"
[GIN] 2024/02/11 - 15:11:01 | 200 | 659.444µs | 127.0.0.1 | POST "/api/show"
[GIN] 2024/02/11 - 15:11:01 | 200 | 238.289µs | 127.0.0.1 | POST "/api/show"
time=2024-02-11T15:11:01.766Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-02-11T15:11:01.766Z level=INFO source=gpu.go:177 msg="ROCm integrated GPU detected - ROCR_VISIBLE_DEVICES=0"
time=2024-02-11T15:11:01.766Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-02-11T15:11:01.766Z level=INFO source=gpu.go:177 msg="ROCm integrated GPU detected - ROCR_VISIBLE_DEVICES=0"
time=2024-02-11T15:11:01.766Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-02-11T15:11:01.861Z level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama2384028714/rocm_v5/libext_server.so"
time=2024-02-11T15:11:01.861Z level=INFO source=dyn_ext_server.go:145 msg="Initializing llama server"
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 ROCm devices:
Device 0: AMD Radeon RX 6800S, compute capability 10.3, VMM: no
llama_model_loader: loaded meta data with 23 key-value pairs and 363 tensors from /root/.ollama/models/blobs/sha256:2609048d349e7c70196401be59bea7eb89a968d4642e409b0e798b34403b96c8 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = LLaMA v2
llama_model_loader: - kv 2: llama.context_length u32 = 4096
llama_model_loader: - kv 3: llama.embedding_length u32 = 5120
llama_model_loader: - kv 4: llama.block_count u32 = 40
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 13824
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 7: llama.attention.head_count u32 = 40
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 40
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: general.file_type u32 = 2
llama_model_loader: - kv 11: tokenizer.ggml.model str = llama
llama_model_loader: - kv 12: tokenizer.ggml.tokens arr[str,32000] = ["", "", "", "<0x00>", "<...
llama_model_loader: - kv 13: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 15: tokenizer.ggml.merges arr[str,61249] = ["▁ t", "e r", "i n", "▁ a", "e n...
llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 19: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 20: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 21: tokenizer.chat_template str = {% if messages[0]['role'] == 'system'...
llama_model_loader: - kv 22: general.quantization_version u32 = 2
llama_model_loader: - type f32: 81 tensors
llama_model_loader: - type q4_0: 281 tensors
llama_model_loader: - type q6_K: 1 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 4096
llm_load_print_meta: n_embd = 5120
llm_load_print_meta: n_head = 40
llm_load_print_meta: n_head_kv = 40
llm_load_print_meta: n_layer = 40
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: n_embd_k_gqa = 5120
llm_load_print_meta: n_embd_v_gqa = 5120
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 13824
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 4096
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: model type = 13B
llm_load_print_meta: model ftype = Q4_0
llm_load_print_meta: model params = 13.02 B
llm_load_print_meta: model size = 6.86 GiB (4.53 BPW)
llm_load_print_meta: general.name = LLaMA v2
llm_load_print_meta: BOS token = 1 ''
llm_load_print_meta: EOS token = 2 '
'
llm_load_print_meta: UNK token = 0 ''
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.28 MiB
llm_load_tensors: offloading 22 repeating layers to GPU
llm_load_tensors: offloaded 22/41 layers to GPU
llm_load_tensors: ROCm0 buffer size = 3744.30 MiB
llm_load_tensors: CPU buffer size = 7023.90 MiB
....................................................................................................
llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: ROCm0 KV buffer size = 880.00 MiB
llama_kv_cache_init: ROCm_Host KV buffer size = 720.00 MiB
llama_new_context_with_model: KV self size = 1600.00 MiB, K (f16): 800.00 MiB, V (f16): 800.00 MiB
llama_new_context_with_model: ROCm_Host input buffer size = 14.01 MiB
llama_new_context_with_model: ROCm0 compute buffer size = 213.40 MiB
llama_new_context_with_model: ROCm_Host compute buffer size = 209.00 MiB
llama_new_context_with_model: graph splits (measure): 5
time=2024-02-11T15:11:07.468Z level=INFO source=dyn_ext_server.go:156 msg="Starting llama main loop"
[GIN] 2024/02/11 - 15:11:07 | 200 | 5.922620794s | 127.0.0.1 | POST "/api/chat"
[GIN] 2024/02/11 - 15:11:36 | 200 | 15.424189225s | 127.0.0.1 | POST "/api/chat"
[GIN] 2024/02/11 - 15:14:27 | 200 | 2m5s | 127.0.0.1 | POST "/api/chat"
[GIN] 2024/02/11 - 15:17:02 | 200 | 9.417514508s | 127.0.0.1 | POST "/api/chat"
time=2024-02-11T15:38:07.045Z level=INFO source=images.go:863 msg="total blobs: 6"
time=2024-02-11T15:38:07.045Z level=INFO source=images.go:870 msg="total unused blobs removed: 0"
time=2024-02-11T15:38:07.046Z level=INFO source=routes.go:999 msg="Listening on [::]:11434 (version 0.1.24)"
time=2024-02-11T15:38:07.046Z level=INFO source=payload_common.go:106 msg="Extracting dynamic libraries..."
time=2024-02-11T15:38:08.870Z level=INFO source=payload_common.go:145 msg="Dynamic LLM libraries [rocm_v6 cuda_v11 cpu_avx2 cpu_avx rocm_v5 cpu]"
time=2024-02-11T15:38:08.870Z level=INFO source=gpu.go:94 msg="Detecting GPU type"
time=2024-02-11T15:38:08.870Z level=INFO source=gpu.go:242 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-02-11T15:38:08.872Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: []"
time=2024-02-11T15:38:08.872Z level=INFO source=gpu.go:242 msg="Searching for GPU management library librocm_smi64.so"
time=2024-02-11T15:38:08.873Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0.50701 /opt/rocm-5.7.1/lib/librocm_smi64.so.5.0.50701]"
time=2024-02-11T15:38:08.889Z level=INFO source=gpu.go:109 msg="Radeon GPU detected"
time=2024-02-11T15:38:08.889Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-02-11T15:38:08.889Z level=INFO source=gpu.go:177 msg="ROCm integrated GPU detected - ROCR_VISIBLE_DEVICES=0"

<!-- gh-comment-id:1937788667 --> @ghost commented on GitHub (Feb 11, 2024): > I have managed to figure out my problems with running ROCm powered ollama in a Docker container. Somehow the Docker Engine setup bundled with the Linux version of Docker Desktop does not work with my GPU setup. After purging Docker and manually installing the docker engine it now works flawlessly both in a container and on bare metal. > > @CaioPrioridosSantos Disclaimer I was testing on a forked Fedora distro called Nobara. Said distro comes with some ROCm dependencies preinstalled. Although I later updated the libraries manually to 6.0.2 myself. > > I find it to be easier to test with a local installation first, since it removes the added complications of handing the GPU to a docker container. You can start by running `rocminfo` on your bare metal system to see if it works at all (although it isn't strictly necessary to have the full ROCm stack set up both inside the container and on the host system). Nothing yet. In intall bah cant recognize GPU AMD, searching only per GPU Nvidia but I have no Nvidia. (base) bash-5.2$ ollama serve time=2024-02-11T15:35:47.939Z level=INFO source=images.go:863 msg="total blobs: 0" time=2024-02-11T15:35:47.939Z level=INFO source=images.go:870 msg="total unused blobs removed: 0" time=2024-02-11T15:35:47.939Z level=INFO source=routes.go:999 msg="Listening on 127.0.0.1:11434 (version 0.1.24)" time=2024-02-11T15:35:47.940Z level=INFO source=payload_common.go:106 msg="Extracting dynamic libraries..." time=2024-02-11T15:35:49.691Z level=INFO source=payload_common.go:145 msg="Dynamic LLM libraries [rocm_v6 rocm_v5 cpu cpu_avx2 cpu_avx cuda_v11]" time=2024-02-11T15:35:49.691Z level=INFO source=gpu.go:94 msg="Detecting GPU type" time=2024-02-11T15:35:49.691Z level=INFO source=gpu.go:242 msg="Searching for GPU management library libnvidia-ml.so" time=2024-02-11T15:35:49.693Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: [/usr/lib64/libnvidia-ml.so.470.223.02]" time=2024-02-11T15:35:49.894Z level=INFO source=gpu.go:300 msg="Unable to load CUDA management library /usr/lib64/libnvidia-ml.so.470.223.02: nvml vram init failure: 9" time=2024-02-11T15:35:49.894Z level=INFO source=gpu.go:242 msg="Searching for GPU management library librocm_smi64.so" time=2024-02-11T15:35:49.895Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: []" time=2024-02-11T15:35:49.895Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-02-11T15:35:49.895Z level=INFO source=routes.go:1022 msg="no GPU detected" But, in container docker this recognize, like these logs (base) bash-5.2$ docker logs ollama time=2024-02-11T14:59:53.833Z level=INFO source=images.go:863 msg="total blobs: 0" time=2024-02-11T14:59:53.833Z level=INFO source=images.go:870 msg="total unused blobs removed: 0" time=2024-02-11T14:59:53.834Z level=INFO source=routes.go:999 msg="Listening on [::]:11434 (version 0.1.24)" time=2024-02-11T14:59:53.834Z level=INFO source=payload_common.go:106 msg="Extracting dynamic libraries..." time=2024-02-11T14:59:55.676Z level=INFO source=payload_common.go:145 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 rocm_v5 cpu cuda_v11 rocm_v6]" time=2024-02-11T14:59:55.676Z level=INFO source=gpu.go:94 msg="Detecting GPU type" time=2024-02-11T14:59:55.676Z level=INFO source=gpu.go:242 msg="Searching for GPU management library libnvidia-ml.so" time=2024-02-11T14:59:55.677Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: []" time=2024-02-11T14:59:55.677Z level=INFO source=gpu.go:242 msg="Searching for GPU management library librocm_smi64.so" time=2024-02-11T14:59:55.678Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0.50701 /opt/rocm-5.7.1/lib/librocm_smi64.so.5.0.50701]" time=2024-02-11T14:59:55.680Z level=INFO source=gpu.go:109 msg="Radeon GPU detected" time=2024-02-11T14:59:55.680Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-02-11T14:59:55.680Z level=INFO source=gpu.go:177 msg="ROCm integrated GPU detected - ROCR_VISIBLE_DEVICES=0" [GIN] 2024/02/11 - 15:08:35 | 200 | 53.445µs | 127.0.0.1 | HEAD "/" [GIN] 2024/02/11 - 15:08:35 | 404 | 216.646µs | 127.0.0.1 | POST "/api/show" time=2024-02-11T15:08:38.651Z level=INFO source=download.go:136 msg="downloading 2609048d349e in 65 115 MB part(s)" time=2024-02-11T15:10:40.100Z level=INFO source=download.go:136 msg="downloading 8c17c2ebb0ea in 1 7.0 KB part(s)" time=2024-02-11T15:10:43.706Z level=INFO source=download.go:136 msg="downloading 7c23fb36d801 in 1 4.8 KB part(s)" time=2024-02-11T15:10:48.088Z level=INFO source=download.go:136 msg="downloading 2e0493f67d0c in 1 59 B part(s)" time=2024-02-11T15:10:51.566Z level=INFO source=download.go:136 msg="downloading fa304d675061 in 1 91 B part(s)" time=2024-02-11T15:10:54.968Z level=INFO source=download.go:136 msg="downloading be61bcdf308e in 1 558 B part(s)" [GIN] 2024/02/11 - 15:11:01 | 200 | 2m25s | 127.0.0.1 | POST "/api/pull" [GIN] 2024/02/11 - 15:11:01 | 200 | 659.444µs | 127.0.0.1 | POST "/api/show" [GIN] 2024/02/11 - 15:11:01 | 200 | 238.289µs | 127.0.0.1 | POST "/api/show" time=2024-02-11T15:11:01.766Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-02-11T15:11:01.766Z level=INFO source=gpu.go:177 msg="ROCm integrated GPU detected - ROCR_VISIBLE_DEVICES=0" time=2024-02-11T15:11:01.766Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-02-11T15:11:01.766Z level=INFO source=gpu.go:177 msg="ROCm integrated GPU detected - ROCR_VISIBLE_DEVICES=0" time=2024-02-11T15:11:01.766Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-02-11T15:11:01.861Z level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama2384028714/rocm_v5/libext_server.so" time=2024-02-11T15:11:01.861Z level=INFO source=dyn_ext_server.go:145 msg="Initializing llama server" ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes ggml_init_cublas: found 1 ROCm devices: Device 0: AMD Radeon RX 6800S, compute capability 10.3, VMM: no llama_model_loader: loaded meta data with 23 key-value pairs and 363 tensors from /root/.ollama/models/blobs/sha256:2609048d349e7c70196401be59bea7eb89a968d4642e409b0e798b34403b96c8 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = LLaMA v2 llama_model_loader: - kv 2: llama.context_length u32 = 4096 llama_model_loader: - kv 3: llama.embedding_length u32 = 5120 llama_model_loader: - kv 4: llama.block_count u32 = 40 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 13824 llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 7: llama.attention.head_count u32 = 40 llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 40 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: general.file_type u32 = 2 llama_model_loader: - kv 11: tokenizer.ggml.model str = llama llama_model_loader: - kv 12: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<... llama_model_loader: - kv 13: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... llama_model_loader: - kv 15: tokenizer.ggml.merges arr[str,61249] = ["▁ t", "e r", "i n", "▁ a", "e n... llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0 llama_model_loader: - kv 19: tokenizer.ggml.add_bos_token bool = true llama_model_loader: - kv 20: tokenizer.ggml.add_eos_token bool = false llama_model_loader: - kv 21: tokenizer.chat_template str = {% if messages[0]['role'] == 'system'... llama_model_loader: - kv 22: general.quantization_version u32 = 2 llama_model_loader: - type f32: 81 tensors llama_model_loader: - type q4_0: 281 tensors llama_model_loader: - type q6_K: 1 tensors llm_load_vocab: special tokens definition check successful ( 259/32000 ). llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train = 4096 llm_load_print_meta: n_embd = 5120 llm_load_print_meta: n_head = 40 llm_load_print_meta: n_head_kv = 40 llm_load_print_meta: n_layer = 40 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 1 llm_load_print_meta: n_embd_k_gqa = 5120 llm_load_print_meta: n_embd_v_gqa = 5120 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: n_ff = 13824 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 4096 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: model type = 13B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 13.02 B llm_load_print_meta: model size = 6.86 GiB (4.53 BPW) llm_load_print_meta: general.name = LLaMA v2 llm_load_print_meta: BOS token = 1 '<s>' llm_load_print_meta: EOS token = 2 '</s>' llm_load_print_meta: UNK token = 0 '<unk>' llm_load_print_meta: LF token = 13 '<0x0A>' llm_load_tensors: ggml ctx size = 0.28 MiB llm_load_tensors: offloading 22 repeating layers to GPU llm_load_tensors: offloaded 22/41 layers to GPU llm_load_tensors: ROCm0 buffer size = 3744.30 MiB llm_load_tensors: CPU buffer size = 7023.90 MiB .................................................................................................... llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: ROCm0 KV buffer size = 880.00 MiB llama_kv_cache_init: ROCm_Host KV buffer size = 720.00 MiB llama_new_context_with_model: KV self size = 1600.00 MiB, K (f16): 800.00 MiB, V (f16): 800.00 MiB llama_new_context_with_model: ROCm_Host input buffer size = 14.01 MiB llama_new_context_with_model: ROCm0 compute buffer size = 213.40 MiB llama_new_context_with_model: ROCm_Host compute buffer size = 209.00 MiB llama_new_context_with_model: graph splits (measure): 5 time=2024-02-11T15:11:07.468Z level=INFO source=dyn_ext_server.go:156 msg="Starting llama main loop" [GIN] 2024/02/11 - 15:11:07 | 200 | 5.922620794s | 127.0.0.1 | POST "/api/chat" [GIN] 2024/02/11 - 15:11:36 | 200 | 15.424189225s | 127.0.0.1 | POST "/api/chat" [GIN] 2024/02/11 - 15:14:27 | 200 | 2m5s | 127.0.0.1 | POST "/api/chat" [GIN] 2024/02/11 - 15:17:02 | 200 | 9.417514508s | 127.0.0.1 | POST "/api/chat" time=2024-02-11T15:38:07.045Z level=INFO source=images.go:863 msg="total blobs: 6" time=2024-02-11T15:38:07.045Z level=INFO source=images.go:870 msg="total unused blobs removed: 0" time=2024-02-11T15:38:07.046Z level=INFO source=routes.go:999 msg="Listening on [::]:11434 (version 0.1.24)" time=2024-02-11T15:38:07.046Z level=INFO source=payload_common.go:106 msg="Extracting dynamic libraries..." time=2024-02-11T15:38:08.870Z level=INFO source=payload_common.go:145 msg="Dynamic LLM libraries [rocm_v6 cuda_v11 cpu_avx2 cpu_avx rocm_v5 cpu]" time=2024-02-11T15:38:08.870Z level=INFO source=gpu.go:94 msg="Detecting GPU type" time=2024-02-11T15:38:08.870Z level=INFO source=gpu.go:242 msg="Searching for GPU management library libnvidia-ml.so" time=2024-02-11T15:38:08.872Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: []" time=2024-02-11T15:38:08.872Z level=INFO source=gpu.go:242 msg="Searching for GPU management library librocm_smi64.so" time=2024-02-11T15:38:08.873Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0.50701 /opt/rocm-5.7.1/lib/librocm_smi64.so.5.0.50701]" time=2024-02-11T15:38:08.889Z level=INFO source=gpu.go:109 msg="Radeon GPU detected" time=2024-02-11T15:38:08.889Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-02-11T15:38:08.889Z level=INFO source=gpu.go:177 msg="ROCm integrated GPU detected - ROCR_VISIBLE_DEVICES=0"
Author
Owner

@mkesper commented on GitHub (Feb 11, 2024):

Docker Desktop on Linux is a non-starter as it uses a VM without any need "to give you the same experience as on MacOs and Windows". https://docs.docker.com/desktop/install/linux-install/ Please avoid this idiocy.

<!-- gh-comment-id:1937789338 --> @mkesper commented on GitHub (Feb 11, 2024): Docker Desktop on Linux is a non-starter as it uses a VM without any need "to give you the same experience as on MacOs and Windows". https://docs.docker.com/desktop/install/linux-install/ Please avoid this idiocy.
Author
Owner

@Th3Rom3 commented on GitHub (Feb 11, 2024):

That is good to know but it was not obvious to me as it did not even come up once as a potential problem during my troubleshooting steps. Hence my post to make the issue visible for posterity.

It also does not help that the Docker Desktop for Linux context was left even after I followed the uninstall procedure. I had to remove it manually via docker context remove desktop-linux. But as always this might have been due to an oversight on my part.

Bottom line: As of release 0.1.22 ollama works well for me with ROCm acceleration using a RX6800 both bare metal as well as (native) docker.

<!-- gh-comment-id:1937794802 --> @Th3Rom3 commented on GitHub (Feb 11, 2024): That is good to know but it was not obvious to me as it did not even come up once as a potential problem during my troubleshooting steps. Hence my post to make the issue visible for posterity. It also does not help that the Docker Desktop for Linux context was left even after I followed the uninstall procedure. I had to remove it manually via `docker context remove desktop-linux`. But as always this might have been due to an oversight on my part. Bottom line: As of release 0.1.22 ollama works well for me with ROCm acceleration using a RX6800 both bare metal as well as (native) docker.
Author
Owner

@ghost commented on GitHub (Feb 11, 2024):

That is good to know but it was not obvious to me as it did not even come up once as a potential problem during my troubleshooting steps. Hence my post to make the issue visible for posterity.

It also does not help that the Docker Desktop for Linux context was left even after I followed the uninstall procedure. I had to remove it manually via docker context remove desktop-linux. But as always this might have been due to an oversight on my part.

Bottom line: As of release 0.1.22 ollama works well for me with ROCm acceleration using a RX6800 both bare metal as well as (native) docker.

Hey, thanks for the quickly response.

Some tutorial to update rocm to 6.0.2 on fedora?

<!-- gh-comment-id:1937801149 --> @ghost commented on GitHub (Feb 11, 2024): > That is good to know but it was not obvious to me as it did not even come up once as a potential problem during my troubleshooting steps. Hence my post to make the issue visible for posterity. > > It also does not help that the Docker Desktop for Linux context was left even after I followed the uninstall procedure. I had to remove it manually via `docker context remove desktop-linux`. But as always this might have been due to an oversight on my part. > > Bottom line: As of release 0.1.22 ollama works well for me with ROCm acceleration using a RX6800 both bare metal as well as (native) docker. Hey, thanks for the quickly response. Some tutorial to update rocm to 6.0.2 on fedora?
Author
Owner

@dhiltgen commented on GitHub (Feb 12, 2024):

@CaioPrioridosSantos these two lines when running on your host:

time=2024-02-11T15:35:49.894Z level=INFO source=gpu.go:242 msg="Searching for GPU management library librocm_smi64.so"
time=2024-02-11T15:35:49.895Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: []"

Indicate that you don't have the rocm SMI library installed, which we currently use to query for GPU information. The package is likely called rocm-smi-lib or something along those lines. We are exploring refactoring the way we discover AMD GPUs to try to remove this dependenciy, but for now, you'll need to install that library for it to discover your GPU. The container image has the library bundled into the image.

<!-- gh-comment-id:1939680043 --> @dhiltgen commented on GitHub (Feb 12, 2024): @CaioPrioridosSantos these two lines when running on your host: ``` time=2024-02-11T15:35:49.894Z level=INFO source=gpu.go:242 msg="Searching for GPU management library librocm_smi64.so" time=2024-02-11T15:35:49.895Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: []" ``` Indicate that you don't have the rocm SMI library installed, which we currently use to query for GPU information. The package is likely called `rocm-smi-lib` or something along those lines. We are exploring refactoring the way we discover AMD GPUs to try to remove this dependenciy, but for now, you'll need to install that library for it to discover your GPU. The container image has the library bundled into the image.
Author
Owner

@askareija commented on GitHub (Feb 13, 2024):

I have an AMD laptop, My specifications are:

MSI Laptop - Bravo 15 B7E
OS: EndeavourOS Galileo (Based on Arch linux)
DE: Hyprland 0.35.0
CPU: AMD Ryzen 5 7535HS
iGPU: Radeon Graphics (gfx1035)
dGPU: Radeon RX 6550M (gfx1034)
RAM: 32GB DDR5-4800

I try to build with these steps:

  1. git clone --recursive https://github.com/ollama/ollama.git

then install some packages:

  1. sudo pacman -S rocm-hip-sdk rocm-opencl-sdk clblast go

after that i build with these params:
3. AMDGPU_TARGET=gfx1034 HSA_OVERRIDE_GFX_VERSION=10.3.0 ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./...

there are some warnings but i don't know what kind it is because there's a lot. kinda deprecated things.

now i run go build -tags rocm but nothing or no feedback to console. then i ran ollama serve

gambar

It's said that Radeon GPU detected, then on below no GPU detected again.

i try to run ollama run codellama:7b but it's said it's falling back to CPU:

gambar

this is result from rocm-smi:
gambar

clinfo -l:

❯ clinfo -l
Platform #0: AMD Accelerated Parallel Processing
 +-- Device #0: gfx1034
 `-- Device ROCm/rocBLAS#1: gfx1035

am i missing something here?

<!-- gh-comment-id:1940586781 --> @askareija commented on GitHub (Feb 13, 2024): I have an AMD laptop, My specifications are: MSI Laptop - Bravo 15 B7E OS: EndeavourOS Galileo (Based on Arch linux) DE: Hyprland 0.35.0 CPU: AMD Ryzen 5 7535HS iGPU: Radeon Graphics (gfx1035) dGPU: Radeon RX 6550M (gfx1034) RAM: 32GB DDR5-4800 I try to build with these steps: 1. `git clone --recursive https://github.com/ollama/ollama.git` then install some packages: 2. `sudo pacman -S rocm-hip-sdk rocm-opencl-sdk clblast go` after that i build with these params: 3. `AMDGPU_TARGET=gfx1034 HSA_OVERRIDE_GFX_VERSION=10.3.0 ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./...` there are some warnings but i don't know what kind it is because there's a lot. kinda deprecated things. now i run `go build -tags rocm` but nothing or no feedback to console. then i ran `ollama serve` ![gambar](https://github.com/ollama/ollama/assets/21377617/192ebe93-622e-409f-a49b-5eed4d650554) It's said that Radeon GPU detected, then on below no GPU detected again. i try to run `ollama run codellama:7b` but it's said it's falling back to CPU: ![gambar](https://github.com/ollama/ollama/assets/21377617/0ac194bb-5da9-40ad-93b1-927347810dbd) this is result from `rocm-smi`: ![gambar](https://github.com/ollama/ollama/assets/21377617/ad758bd4-20fd-4dca-bb33-d3229bba16a2) `clinfo -l`: ```zsh ❯ clinfo -l Platform #0: AMD Accelerated Parallel Processing +-- Device #0: gfx1034 `-- Device ROCm/rocBLAS#1: gfx1035 ``` am i missing something here?
Author
Owner

@chiragkrishna commented on GitHub (Feb 13, 2024):

I have an AMD laptop, My specifications are:

MSI Laptop - Bravo 15 B7E
OS: EndeavourOS Galileo (Based on Arch linux)
DE: Hyprland 0.35.0
CPU: AMD Ryzen 5 7535HS
iGPU: Radeon Graphics (gfx1035)
dGPU: Radeon RX 6550M (gfx1034)
RAM: 32GB DDR5-4800

I try to build with these steps:

  1. git clone --recursive https://github.com/ollama/ollama.git

then install some packages:

  1. sudo pacman -S rocm-hip-sdk rocm-opencl-sdk clblast go

after that i build with these params:
3. AMDGPU_TARGET=gfx1034 HSA_OVERRIDE_GFX_VERSION=10.3.0 ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./...

there are some warnings but i don't know what kind it is because there's a lot. kinda deprecated things.

now i run go build -tags rocm but nothing or no feedback to console. then i ran ollama serve

gambar

It's said that Radeon GPU detected, then on below no GPU detected again.

i try to run ollama run codellama:7b but it's said it's falling back to CPU:

gambar

this is result from rocm-smi:
gambar

clinfo -l:

❯ clinfo -l
Platform #0: AMD Accelerated Parallel Processing
 +-- Device #0: gfx1034
 `-- Device ROCm/rocBLAS#1: gfx1035

am i missing something here?

Build with this
AMDGPU_TARGET=gfx1030

Run with this
HSA_OVERRIDE_GFX_VERSION=10.3.0 ollama serve

<!-- gh-comment-id:1940615640 --> @chiragkrishna commented on GitHub (Feb 13, 2024): > I have an AMD laptop, My specifications are: > > MSI Laptop - Bravo 15 B7E > OS: EndeavourOS Galileo (Based on Arch linux) > DE: Hyprland 0.35.0 > CPU: AMD Ryzen 5 7535HS > iGPU: Radeon Graphics (gfx1035) > dGPU: Radeon RX 6550M (gfx1034) > RAM: 32GB DDR5-4800 > > I try to build with these steps: > > 1. `git clone --recursive https://github.com/ollama/ollama.git` > > then install some packages: > > 2. `sudo pacman -S rocm-hip-sdk rocm-opencl-sdk clblast go` > > after that i build with these params: > 3. `AMDGPU_TARGET=gfx1034 HSA_OVERRIDE_GFX_VERSION=10.3.0 ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./...` > > there are some warnings but i don't know what kind it is because there's a lot. kinda deprecated things. > > now i run `go build -tags rocm` but nothing or no feedback to console. then i ran `ollama serve` > > ![gambar](https://github.com/ollama/ollama/assets/21377617/192ebe93-622e-409f-a49b-5eed4d650554) > > It's said that Radeon GPU detected, then on below no GPU detected again. > > i try to run `ollama run codellama:7b` but it's said it's falling back to CPU: > > ![gambar](https://github.com/ollama/ollama/assets/21377617/0ac194bb-5da9-40ad-93b1-927347810dbd) > > this is result from `rocm-smi`: > ![gambar](https://github.com/ollama/ollama/assets/21377617/ad758bd4-20fd-4dca-bb33-d3229bba16a2) > > `clinfo -l`: > ```zsh > ❯ clinfo -l > Platform #0: AMD Accelerated Parallel Processing > +-- Device #0: gfx1034 > `-- Device ROCm/rocBLAS#1: gfx1035 > ``` > > am i missing something here? Build with this AMDGPU_TARGET=gfx1030 Run with this HSA_OVERRIDE_GFX_VERSION=10.3.0 ollama serve
Author
Owner

@askareija commented on GitHub (Feb 13, 2024):

Build with this AMDGPU_TARGET=gfx1030

Run with this HSA_OVERRIDE_GFX_VERSION=10.3.0 ollama serve

ok so i've removed ollama, clone & rebuild again with this,
AMDGPU_TARGET=gfx1030 HSA_OVERRIDE_GFX_VERSION=10.3.0 ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./... then run HSA_OVERRIDE_GFX_VERSION=10.3.0 ./ollama serve.

But it's still saying "GPU not available, falling back to CPU"

i'm using rocm 6.0.0 (according to pacman)

Paket (4)              Versi lama  Versi baru  Perubahan Bersih

extra/clblast          1.6.2-1     1.6.2-1             0,00 MiB
extra/go               2:1.22.0-1  2:1.22.0-1          0,00 MiB
extra/rocm-hip-sdk     6.0.0-1     6.0.0-1             0,00 MiB
extra/rocm-opencl-sdk  6.0.0-1     6.0.0-1             0,00 MiB

edit: git clone --recursive https://github.com/65a/ollama not working so i use git clone --recursive https://github.com/ollama/ollama

<!-- gh-comment-id:1940696734 --> @askareija commented on GitHub (Feb 13, 2024): > > Build with this AMDGPU_TARGET=gfx1030 > > Run with this HSA_OVERRIDE_GFX_VERSION=10.3.0 ollama serve ok so i've removed ollama, clone & rebuild again with this, `AMDGPU_TARGET=gfx1030 HSA_OVERRIDE_GFX_VERSION=10.3.0 ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./...` then run `HSA_OVERRIDE_GFX_VERSION=10.3.0 ./ollama serve`. But it's still saying "`GPU not available, falling back to CPU`" i'm using rocm 6.0.0 (according to pacman) ``` Paket (4) Versi lama Versi baru Perubahan Bersih extra/clblast 1.6.2-1 1.6.2-1 0,00 MiB extra/go 2:1.22.0-1 2:1.22.0-1 0,00 MiB extra/rocm-hip-sdk 6.0.0-1 6.0.0-1 0,00 MiB extra/rocm-opencl-sdk 6.0.0-1 6.0.0-1 0,00 MiB ``` edit: `git clone --recursive https://github.com/65a/ollama` not working so i use `git clone --recursive https://github.com/ollama/ollama`
Author
Owner

@ghost commented on GitHub (Feb 13, 2024):

@CaioPrioridosSantos these two lines when running on your host:

time=2024-02-11T15:35:49.894Z level=INFO source=gpu.go:242 msg="Searching for GPU management library librocm_smi64.so"
time=2024-02-11T15:35:49.895Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: []"

Indicate that you don't have the rocm SMI library installed, which we currently use to query for GPU information. The package is likely called rocm-smi-lib or something along those lines. We are exploring refactoring the way we discover AMD GPUs to try to remove this dependenciy, but for now, you'll need to install that library for it to discover your GPU. The container image has the library bundled into the image.

The question is: I have rocm but I did not the 6.0.2 installation, I have at the moment 5.7 version

<!-- gh-comment-id:1941468087 --> @ghost commented on GitHub (Feb 13, 2024): > @CaioPrioridosSantos these two lines when running on your host: > > ``` > time=2024-02-11T15:35:49.894Z level=INFO source=gpu.go:242 msg="Searching for GPU management library librocm_smi64.so" > time=2024-02-11T15:35:49.895Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: []" > ``` > > Indicate that you don't have the rocm SMI library installed, which we currently use to query for GPU information. The package is likely called `rocm-smi-lib` or something along those lines. We are exploring refactoring the way we discover AMD GPUs to try to remove this dependenciy, but for now, you'll need to install that library for it to discover your GPU. The container image has the library bundled into the image. The question is: I have rocm but I did not the 6.0.2 installation, I have at the moment 5.7 version
Author
Owner

@Th3Rom3 commented on GitHub (Feb 13, 2024):

@CaioPrioridosSantos
You can check what you have installed
dnf list installed | grep rocm-smi
It should be ROCm 5.7.1. based on the Fedora 39 repo

You should not need ROCm 6.0.x for it to run accelerated. That was just something I was messing with personally.

<!-- gh-comment-id:1941709301 --> @Th3Rom3 commented on GitHub (Feb 13, 2024): @CaioPrioridosSantos You can check what you have installed `dnf list installed | grep rocm-smi` It should be ROCm 5.7.1. based on the [Fedora 39 repo](https://packages.fedoraproject.org/pkgs/rocm-smi/rocm-smi/) You should not need ROCm 6.0.x for it to run accelerated. That was just something I was messing with personally.
Author
Owner

@Venefilyn commented on GitHub (Feb 16, 2024):

I can't figure mine out other than it does seem to detect my GPU correctly. Crashes as soon as any model is used. Using the ollama/ollama:0.1.24-rocm build with

podman run -d --privileged --device /dev/kfd -v ollama:/root/.ollama -p 11434:11434 -e OLLAMA_DEBUG=1 -e ROCR_VISIBLE_DEVICES="0" --name ollama ollama/ollama:0.1.24-rocm
[root@8bcdeb65f6ca /]# rocm-smi


========================= ROCm System Management Interface =========================
=================================== Concise Info ===================================
GPU  Temp (DieEdge)  AvgPwr  SCLK    MCLK    Fan     Perf  PwrCap  VRAM%  GPU%  
0    58.0c           33.0W   800Mhz  875Mhz  21.57%  auto  165.0W   17%   4%    
====================================================================================
=============================== End of ROCm SMI Log ================================
❯ ollama run codellama:7b "Write me a function that outputs the fibonacci sequence"
pulling manifest 
pulling 3a43f93b78ec... 100% ▕█████████████████████████████████████████████████████████████████████████████████████▏ 3.8 GB                         
pulling 8c17c2ebb0ea... 100% ▕█████████████████████████████████████████████████████████████████████████████████████▏ 7.0 KB                         
pulling 590d74a5569b... 100% ▕█████████████████████████████████████████████████████████████████████████████████████▏ 4.8 KB                         
pulling 2e0493f67d0c... 100% ▕█████████████████████████████████████████████████████████████████████████████████████▏   59 B                         
pulling 7f6a57943a88... 100% ▕█████████████████████████████████████████████████████████████████████████████████████▏  120 B                         
pulling 316526ac7323... 100% ▕█████████████████████████████████████████████████████████████████████████████████████▏  529 B                         
verifying sha256 digest 
writing manifest 
removing any unused layers 
success 
Error: Post "http://127.0.0.1:11434/api/generate": EOF

Specifically, from the logs below this seems to be the interesting error

rocBLAS error: Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1010
 List of available TensileLibrary Files : 
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx803.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx940.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx941.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx942.dat"
loading library /tmp/ollama1091808863/rocm_v5/libext_server.so
SIGSEGV: segmentation violation
PC=0x7f709c235bc7 m=21 sigcode=128
signal arrived during cgo execution
Log output
time=2024-02-16T15:17:29.216Z level=INFO source=images.go:863 msg="total blobs: 9"
time=2024-02-16T15:17:29.216Z level=INFO source=images.go:870 msg="total unused blobs removed: 0"
time=2024-02-16T15:17:29.217Z level=INFO source=routes.go:999 msg="Listening on [::]:11434 (version 0.1.24)"
time=2024-02-16T15:17:29.217Z level=INFO source=payload_common.go:106 msg="Extracting dynamic libraries..."
�time=2024-02-16T15:17:31.297Z level=INFO source=payload_common.go:145 msg="Dynamic LLM libraries [rocm_v6 cuda_v11 cpu cpu_avx2 cpu_avx rocm_v5]"
�time=2024-02-16T15:17:31.297Z level=DEBUG source=payload_common.go:146 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-02-16T15:17:31.297Z level=INFO source=gpu.go:94 msg="Detecting GPU type"
time=2024-02-16T15:17:31.297Z level=INFO source=gpu.go:242 msg="Searching for GPU management library libnvidia-ml.so"
�time=2024-02-16T15:17:31.297Z level=DEBUG source=gpu.go:260 msg="gpu management search paths: [/usr/local/cuda/lib64/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/libnvidia-ml.so* /usr/lib/wsl/lib/libnvidia-ml.so* /usr/lib/wsl/drivers/*/libnvidia-ml.so* /opt/cuda/lib64/libnvidia-ml.so* /usr/lib*/libnvidia-ml.so* /usr/local/lib*/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/libnvidia-ml.so* /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so* /opt/rocm/lib/libnvidia-ml.so* /usr/local/lib/libnvidia-ml.so* /opt/rh/devtoolset-7/root/libnvidia-ml.so*]"
time=2024-02-16T15:17:31.298Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: []"
time=2024-02-16T15:17:31.298Z level=INFO source=gpu.go:242 msg="Searching for GPU management library librocm_smi64.so"
�time=2024-02-16T15:17:31.298Z level=DEBUG source=gpu.go:260 msg="gpu management search paths: [/opt/rocm*/lib*/librocm_smi64.so* /opt/rocm/lib/librocm_smi64.so* /usr/local/lib/librocm_smi64.so* /opt/rh/devtoolset-7/root/librocm_smi64.so*]"
�time=2024-02-16T15:17:31.299Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0.50701 /opt/rocm-5.7.1/lib/librocm_smi64.so.5.0.50701]"
wiring rocm management library functions in /opt/rocm/lib/librocm_smi64.so.5.0.50701
dlsym: rsmi_init
dlsym: rsmi_shut_down
dlsym: rsmi_dev_memory_total_get
dlsym: rsmi_dev_memory_usage_get
dlsym: rsmi_version_get
dlsym: rsmi_num_monitor_devices
dlsym: rsmi_dev_id_get
dlsym: rsmi_dev_name_get
dlsym: rsmi_dev_brand_get
dlsym: rsmi_dev_vendor_name_get
dlsym: rsmi_dev_vram_vendor_get
dlsym: rsmi_dev_serial_number_get
dlsym: rsmi_dev_subsystem_name_get
dlsym: rsmi_dev_vbios_version_get
time=2024-02-16T15:17:31.301Z level=INFO source=gpu.go:109 msg="Radeon GPU detected"
time=2024-02-16T15:17:31.301Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
discovered 1 ROCm GPU Devices
[0] ROCm device name: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
[0] ROCm brand: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
[0] ROCm vendor: Advanced Micro Devices, Inc. [AMD/ATI]
[0] ROCm VRAM vendor: micron
rsmi_dev_serial_number_get failed: 2
[0] ROCm subsystem name: 0xe408
[0] ROCm vbios version: 113-2E40803-O4A
[0] ROCm totalMem 8573157376
[0] ROCm usedMem 1484857344
time=2024-02-16T15:17:31.305Z level=DEBUG source=gpu.go:231 msg="rocm detected 1 devices with 5735M available memory"
T[GIN] 2024/02/16 - 15:17:47 | 200 |        58.9µs |      10.0.2.100 | HEAD     "/"
\[GIN] 2024/02/16 - 15:17:47 | 404 |       313.3µs |      10.0.2.100 | POST     "/api/show"
[GIN] 2024/02/16 - 15:17:52 | 200 |  4.925207389s |      10.0.2.100 | POST     "/api/pull"
\[GIN] 2024/02/16 - 15:17:52 | 200 |     422.721µs |      10.0.2.100 | POST     "/api/show"
time=2024-02-16T15:17:52.660Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
discovered 1 ROCm GPU Devices
[0] ROCm device name: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
[0] ROCm brand: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
[0] ROCm vendor: Advanced Micro Devices, Inc. [AMD/ATI]
[0] ROCm VRAM vendor: micron
rsmi_dev_serial_number_get failed: 2
[0] ROCm subsystem name: 0xe408
[0] ROCm vbios version: 113-2E40803-O4A
[0] ROCm totalMem 8573157376
[0] ROCm usedMem 1486024704
time=2024-02-16T15:17:52.666Z level=DEBUG source=gpu.go:231 msg="rocm detected 1 devices with 5734M available memory"
time=2024-02-16T15:17:52.666Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
discovered 1 ROCm GPU Devices
[0] ROCm device name: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
[0] ROCm brand: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
[0] ROCm vendor: Advanced Micro Devices, Inc. [AMD/ATI]
[0] ROCm VRAM vendor: micron
rsmi_dev_serial_number_get failed: 2
[0] ROCm subsystem name: 0xe408
[0] ROCm vbios version: 113-2E40803-O4A
[0] ROCm totalMem 8573157376
[0] ROCm usedMem 1487368192
time=2024-02-16T15:17:52.671Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
�time=2024-02-16T15:17:52.710Z level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama1091808863/rocm_v5/libext_server.so"
time=2024-02-16T15:17:52.710Z level=INFO source=dyn_ext_server.go:145 msg="Initializing llama server"
�[1708096672] system info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
[1708096672] Performing pre-initialization of GPU

rocBLAS error: Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1010
 List of available TensileLibrary Files : 
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx803.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx940.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx941.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx942.dat"
loading library /tmp/ollama1091808863/rocm_v5/libext_server.so
SIGSEGV: segmentation violation
PC=0x7f709c235bc7 m=21 sigcode=128
signal arrived during cgo execution

goroutine 75 [syscall]:
runtime.cgocall(0x9bc250, 0xc0004d0710)
        /usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc0004d06e8 sp=0xc0004d06b0 pc=0x409b0b
�github.com/jmorganca/ollama/llm._Cfunc_dyn_llama_server_init({0x7f6fec000ba0, 0x7f703a042890, 0x7f703a0430a0, 0x7f703a043130, 0x7f703a043350, 0x7f703a043530, 0x7f703a043da0, 0x7f703a043d80, 0x7f703a043eb0, 0x7f703a044460, ...}, ...)
        _cgo_gotypes.go:282 +0x45 fp=0xc0004d0710 sp=0xc0004d06e8 pc=0x7c4625
github.com/jmorganca/ollama/llm.newDynExtServer.func7(0xaf1392?, 0xc?)
        /go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:148 +0xef fp=0xc0004d0800 sp=0xc0004d0710 pc=0x7c5b4f
github.com/jmorganca/ollama/llm.newDynExtServer({0xc00088e000, 0x2e}, {0xc00056cb60, _}, {_, _, _}, {0x0, 0x0, 0x0}, ...)
        /go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:148 +0xa45 fp=0xc0004d0aa0 sp=0xc0004d0800 pc=0x7c57e5
github.com/jmorganca/ollama/llm.newLlmServer({{_, _, _}, {_, _}, {_, _}}, {_, _}, {0xc00056cb60, ...}, ...)
        /go/src/github.com/jmorganca/ollama/llm/llm.go:158 +0x425 fp=0xc0004d0c60 sp=0xc0004d0aa0 pc=0x7c1fe5
github.com/jmorganca/ollama/llm.New({0xc0002f89a8, 0x15}, {0xc00056cb60, _}, {_, _, _}, {0x0, 0x0, 0x0}, ...)
        /go/src/github.com/jmorganca/ollama/llm/llm.go:123 +0x713 fp=0xc0004d0ee0 sp=0xc0004d0c60 pc=0x7c1953
�github.com/jmorganca/ollama/server.load(0xc000003080?, 0xc000003080, {{0x0, 0x800, 0x200, 0x1, 0xffffffffffffffff, 0x0, 0x0, 0x1, ...}, ...}, ...)
        /go/src/github.com/jmorganca/ollama/server/routes.go:84 +0x3a5 fp=0xc0004d1060 sp=0xc0004d0ee0 pc=0x995c25
github.com/jmorganca/ollama/server.GenerateHandler(0xc000512100)
        /go/src/github.com/jmorganca/ollama/server/routes.go:197 +0x6f7 fp=0xc0004d1748 sp=0xc0004d1060 pc=0x996797
github.com/gin-gonic/gin.(*Context).Next(...)
        /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174
github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func1(0xc000512100)
        /go/src/github.com/jmorganca/ollama/server/routes.go:923 +0x68 fp=0xc0004d1780 sp=0xc0004d1748 pc=0x99f688
github.com/gin-gonic/gin.(*Context).Next(...)
        /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174
github.com/gin-gonic/gin.CustomRecoveryWithWriter.func1(0xc000512100)
        /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/recovery.go:102 +0x7a fp=0xc0004d17d0 sp=0xc0004d1780 pc=0x97719a
github.com/gin-gonic/gin.(*Context).Next(...)
        /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174
github.com/gin-gonic/gin.LoggerWithConfig.func1(0xc000512100)
        /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/logger.go:240 +0xde fp=0xc0004d1980 sp=0xc0004d17d0 pc=0x97633e
github.com/gin-gonic/gin.(*Context).Next(...)
        /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174
github.com/gin-gonic/gin.(*Engine).handleHTTPRequest(0xc000145ba0, 0xc000512100)
        /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:620 +0x65b fp=0xc0004d1b08 sp=0xc0004d1980 pc=0x9753fb
github.com/gin-gonic/gin.(*Engine).ServeHTTP(0xc000145ba0, {0x10f1cd20?, 0xc0001b22a0}, 0xc000452700)
        /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:576 +0x1dd fp=0xc0004d1b48 sp=0xc0004d1b08 pc=0x974bbd
net/http.serverHandler.ServeHTTP({0x10f1b040?}, {0x10f1cd20?, 0xc0001b22a0?}, 0x6?)
        /usr/local/go/src/net/http/server.go:2938 +0x8e fp=0xc0004d1b78 sp=0xc0004d1b48 pc=0x6cee2e
net/http.(*conn).serve(0xc000133170, {0x10f1e388, 0xc0003c35c0})
        /usr/local/go/src/net/http/server.go:2009 +0x5f4 fp=0xc0004d1fb8 sp=0xc0004d1b78 pc=0x6cad14
net/http.(*Server).Serve.func3()
        /usr/local/go/src/net/http/server.go:3086 +0x28 fp=0xc0004d1fe0 sp=0xc0004d1fb8 pc=0x6cf648
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004d1fe8 sp=0xc0004d1fe0 pc=0x46e2c1
created by net/http.(*Server).Serve in goroutine 1
        /usr/local/go/src/net/http/server.go:3086 +0x5cb

goroutine 1 [IO wait]:
runtime.gopark(0x480f10?, 0xc00013f850?, 0xa0?, 0xf8?, 0x4f711d?)
        /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00013f830 sp=0xc00013f810 pc=0x43e7ee
runtime.netpollblock(0x46c332?, 0x4092a6?, 0x0?)
        /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc00013f868 sp=0xc00013f830 pc=0x437277
internal/poll.runtime_pollWait(0x7f709be97e80, 0x72)
        /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc00013f888 sp=0xc00013f868 pc=0x468a05
internal/poll.(*pollDesc).wait(0xc000454080?, 0x4?, 0x0)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00013f8b0 sp=0xc00013f888 pc=0x4efd67
internal/poll.(*pollDesc).waitRead(...)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc000454080)
        /usr/local/go/src/internal/poll/fd_unix.go:611 +0x2ac fp=0xc00013f958 sp=0xc00013f8b0 pc=0x4f524c
net.(*netFD).accept(0xc000454080)
        /usr/local/go/src/net/fd_unix.go:172 +0x29 fp=0xc00013fa10 sp=0xc00013f958 pc=0x56be29
net.(*TCPListener).accept(0xc0004255a0)
        /usr/local/go/src/net/tcpsock_posix.go:152 +0x1e fp=0xc00013fa38 sp=0xc00013fa10 pc=0x580c3e
net.(*TCPListener).Accept(0xc0004255a0)
        /usr/local/go/src/net/tcpsock.go:315 +0x30 fp=0xc00013fa68 sp=0xc00013fa38 pc=0x57fdf0
net/http.(*onceCloseListener).Accept(0xc000133170?)
        <autogenerated>:1 +0x24 fp=0xc00013fa80 sp=0xc00013fa68 pc=0x6f1bc4
net/http.(*Server).Serve(0xc000366ff0, {0x10f1cb10, 0xc0004255a0})
        /usr/local/go/src/net/http/server.go:3056 +0x364 fp=0xc00013fbb0 sp=0xc00013fa80 pc=0x6cf284
github.com/jmorganca/ollama/server.Serve({0x10f1cb10, 0xc0004255a0})
        /go/src/github.com/jmorganca/ollama/server/routes.go:1026 +0x454 fp=0xc00013fc98 sp=0xc00013fbb0 pc=0x99fb34
github.com/jmorganca/ollama/cmd.RunServer(0xc000452300?, {0x11365820?, 0x4?, 0xad90c5?})
        /go/src/github.com/jmorganca/ollama/cmd/cmd.go:705 +0x199 fp=0xc00013fd30 sp=0xc00013fc98 pc=0x9b3399
github.com/spf13/cobra.(*Command).execute(0xc00040d800, {0x11365820, 0x0, 0x0})
        /root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0x87c fp=0xc00013fe68 sp=0xc00013fd30 pc=0x7649fc
github.com/spf13/cobra.(*Command).ExecuteC(0xc00040cc00)
        /root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc00013ff20 sp=0xc00013fe68 pc=0x765225
github.com/spf13/cobra.(*Command).Execute(...)
        /root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
        /root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
        /go/src/github.com/jmorganca/ollama/main.go:11 +0x4d fp=0xc00013ff40 sp=0xc00013ff20 pc=0x9bb36d
runtime.main()
        /usr/local/go/src/runtime/proc.go:267 +0x2bb fp=0xc00013ffe0 sp=0xc00013ff40 pc=0x43e39b
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00013ffe8 sp=0xc00013ffe0 pc=0x46e2c1

goroutine 2 [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00006afa8 sp=0xc00006af88 pc=0x43e7ee
runtime.goparkunlock(...)
        /usr/local/go/src/runtime/proc.go:404
runtime.forcegchelper()
        /usr/local/go/src/runtime/proc.go:322 +0xb3 fp=0xc00006afe0 sp=0xc00006afa8 pc=0x43e673
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006afe8 sp=0xc00006afe0 pc=0x46e2c1
created by runtime.init.6 in goroutine 1
        /usr/local/go/src/runtime/proc.go:310 +0x1a

goroutine 3 [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00006b778 sp=0xc00006b758 pc=0x43e7ee
runtime.goparkunlock(...)
        /usr/local/go/src/runtime/proc.go:404
runtime.bgsweep(0x0?)
        /usr/local/go/src/runtime/mgcsweep.go:321 +0xdf fp=0xc00006b7c8 sp=0xc00006b778 pc=0x42a73f
runtime.gcenable.func1()
        /usr/local/go/src/runtime/mgc.go:200 +0x25 fp=0xc00006b7e0 sp=0xc00006b7c8 pc=0x41f865
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006b7e8 sp=0xc00006b7e0 pc=0x46e2c1
created by runtime.gcenable in goroutine 1
        /usr/local/go/src/runtime/mgc.go:200 +0x66

goroutine 4 [GC scavenge wait]:
runtime.gopark(0x40d091?, 0x89c5a1?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00006bf70 sp=0xc00006bf50 pc=0x43e7ee
runtime.goparkunlock(...)
        /usr/local/go/src/runtime/proc.go:404
runtime.(*scavengerState).park(0x11335ba0)
        /usr/local/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc00006bfa0 sp=0xc00006bf70 pc=0x427f69
runtime.bgscavenge(0x0?)
        /usr/local/go/src/runtime/mgcscavenge.go:658 +0x59 fp=0xc00006bfc8 sp=0xc00006bfa0 pc=0x428519
runtime.gcenable.func2()
        /usr/local/go/src/runtime/mgc.go:201 +0x25 fp=0xc00006bfe0 sp=0xc00006bfc8 pc=0x41f805
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006bfe8 sp=0xc00006bfe0 pc=0x46e2c1
created by runtime.gcenable in goroutine 1
        /usr/local/go/src/runtime/mgc.go:201 +0xa5

goroutine 5 [finalizer wait]:
runtime.gopark(0xad2080?, 0x10043f901?, 0x0?, 0x0?, 0x4469a5?)
        /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00006a628 sp=0xc00006a608 pc=0x43e7ee
runtime.runfinq()
        /usr/local/go/src/runtime/mfinal.go:193 +0x107 fp=0xc00006a7e0 sp=0xc00006a628 pc=0x41e8e7
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006a7e8 sp=0xc00006a7e0 pc=0x46e2c1
created by runtime.createfing in goroutine 1
        /usr/local/go/src/runtime/mfinal.go:163 +0x3d

goroutine 6 [select, locked to thread]:
runtime.gopark(0xc00006c7a8?, 0x2?, 0x89?, 0xea?, 0xc00006c7a4?)
        /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00006c638 sp=0xc00006c618 pc=0x43e7ee
runtime.selectgo(0xc00006c7a8, 0xc00006c7a0, 0x0?, 0x0, 0x0?, 0x1)
        /usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc00006c758 sp=0xc00006c638 pc=0x44e325
runtime.ensureSigM.func1()
        /usr/local/go/src/runtime/signal_unix.go:1014 +0x19f fp=0xc00006c7e0 sp=0xc00006c758 pc=0x46535f
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006c7e8 sp=0xc00006c7e0 pc=0x46e2c1
created by runtime.ensureSigM in goroutine 1
        /usr/local/go/src/runtime/signal_unix.go:997 +0xc8

goroutine 18 [syscall]:
runtime.notetsleepg(0x0?, 0x0?)
        /usr/local/go/src/runtime/lock_futex.go:236 +0x29 fp=0xc0000667a0 sp=0xc000066768 pc=0x411349
os/signal.signal_recv()
        /usr/local/go/src/runtime/sigqueue.go:152 +0x29 fp=0xc0000667c0 sp=0xc0000667a0 pc=0x46ac89
os/signal.loop()
        /usr/local/go/src/os/signal/signal_unix.go:23 +0x13 fp=0xc0000667e0 sp=0xc0000667c0 pc=0x6f45f3
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000667e8 sp=0xc0000667e0 pc=0x46e2c1
created by os/signal.Notify.func1.1 in goroutine 1
        /usr/local/go/src/os/signal/signal.go:151 +0x1f

goroutine 34 [chan receive]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000494718 sp=0xc0004946f8 pc=0x43e7ee
runtime.chanrecv(0xc00017d800, 0x0, 0x1)
        /usr/local/go/src/runtime/chan.go:583 +0x3cd fp=0xc000494790 sp=0xc000494718 pc=0x40beed
runtime.chanrecv1(0x0?, 0x0?)
        /usr/local/go/src/runtime/chan.go:442 +0x12 fp=0xc0004947b8 sp=0xc000494790 pc=0x40baf2
github.com/jmorganca/ollama/server.Serve.func2()
        /go/src/github.com/jmorganca/ollama/server/routes.go:1008 +0x25 fp=0xc0004947e0 sp=0xc0004947b8 pc=0x99fbc5
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004947e8 sp=0xc0004947e0 pc=0x46e2c1
created by github.com/jmorganca/ollama/server.Serve in goroutine 1
        /go/src/github.com/jmorganca/ollama/server/routes.go:1007 +0x3c7

goroutine 35 [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000494f50 sp=0xc000494f30 pc=0x43e7ee
runtime.gcBgMarkWorker()
        /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000494fe0 sp=0xc000494f50 pc=0x4213e5
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000494fe8 sp=0xc000494fe0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 50 [GC worker (idle)]:
runtime.gopark(0x974edd7ad54?, 0x1?, 0xc3?, 0xd?, 0x0?)
        /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000490750 sp=0xc000490730 pc=0x43e7ee
runtime.gcBgMarkWorker()
        /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0004907e0 sp=0xc000490750 pc=0x4213e5
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004907e8 sp=0xc0004907e0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 36 [GC worker (idle)]:
runtime.gopark(0x974edd7adb8?, 0x1?, 0xf2?, 0x18?, 0x0?)
        /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000495750 sp=0xc000495730 pc=0x43e7ee
runtime.gcBgMarkWorker()
        /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0004957e0 sp=0xc000495750 pc=0x4213e5
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004957e8 sp=0xc0004957e0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 7 [GC worker (idle)]:
runtime.gopark(0x974edd7d568?, 0x3?, 0x47?, 0x74?, 0x0?)
        /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00006cf50 sp=0xc00006cf30 pc=0x43e7ee
runtime.gcBgMarkWorker()
        /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc00006cfe0 sp=0xc00006cf50 pc=0x4213e5
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006cfe8 sp=0xc00006cfe0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 51 [GC worker (idle)]:
runtime.gopark(0x974edd7cec4?, 0x3?, 0x66?, 0xdf?, 0x0?)
        /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000490f50 sp=0xc000490f30 pc=0x43e7ee
runtime.gcBgMarkWorker()
        /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000490fe0 sp=0xc000490f50 pc=0x4213e5
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000490fe8 sp=0xc000490fe0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 52 [GC worker (idle)]:
runtime.gopark(0x11367540?, 0x1?, 0x4b?, 0x1b?, 0x0?)
        /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000491750 sp=0xc000491730 pc=0x43e7ee
runtime.gcBgMarkWorker()
        /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0004917e0 sp=0xc000491750 pc=0x4213e5
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004917e8 sp=0xc0004917e0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 8 [GC worker (idle)]:
runtime.gopark(0x974edd7d0cc?, 0x3?, 0xf6?, 0x27?, 0x0?)
        /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00006d750 sp=0xc00006d730 pc=0x43e7ee
runtime.gcBgMarkWorker()
        /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc00006d7e0 sp=0xc00006d750 pc=0x4213e5
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006d7e8 sp=0xc00006d7e0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 53 [GC worker (idle)]:
runtime.gopark(0x974edd7ad9a?, 0x1?, 0xb4?, 0x47?, 0x0?)
        /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000491f50 sp=0xc000491f30 pc=0x43e7ee
runtime.gcBgMarkWorker()
        /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000491fe0 sp=0xc000491f50 pc=0x4213e5
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000491fe8 sp=0xc000491fe0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 37 [GC worker (idle)]:
runtime.gopark(0x974edd7aa70?, 0x3?, 0x82?, 0xd2?, 0x0?)
        /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000495f50 sp=0xc000495f30 pc=0x43e7ee
runtime.gcBgMarkWorker()
        /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000495fe0 sp=0xc000495f50 pc=0x4213e5
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000495fe8 sp=0xc000495fe0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 54 [GC worker (idle)]:
runtime.gopark(0x11367540?, 0x3?, 0x32?, 0xa?, 0x0?)
        /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000492750 sp=0xc000492730 pc=0x43e7ee
runtime.gcBgMarkWorker()
        /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0004927e0 sp=0xc000492750 pc=0x4213e5
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004927e8 sp=0xc0004927e0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 55 [GC worker (idle)]:
runtime.gopark(0x11367540?, 0x3?, 0xc0?, 0xcb?, 0x0?)
        /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000492f50 sp=0xc000492f30 pc=0x43e7ee
runtime.gcBgMarkWorker()
        /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000492fe0 sp=0xc000492f50 pc=0x4213e5
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000492fe8 sp=0xc000492fe0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 56 [GC worker (idle)]:
runtime.gopark(0x974edd7a23c?, 0x3?, 0x56?, 0xa7?, 0x0?)
        /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000493750 sp=0xc000493730 pc=0x43e7ee
runtime.gcBgMarkWorker()
        /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0004937e0 sp=0xc000493750 pc=0x4213e5
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004937e8 sp=0xc0004937e0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 57 [GC worker (idle)]:
runtime.gopark(0x974edd7b0ba?, 0x3?, 0x3d?, 0xe0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000493f50 sp=0xc000493f30 pc=0x43e7ee
runtime.gcBgMarkWorker()
        /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000493fe0 sp=0xc000493f50 pc=0x4213e5
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000493fe8 sp=0xc000493fe0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 58 [GC worker (idle)]:
runtime.gopark(0x974edd7a2c8?, 0xc0005020c0?, 0x1a?, 0x14?, 0x0?)
        /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00050a750 sp=0xc00050a730 pc=0x43e7ee
runtime.gcBgMarkWorker()
        /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc00050a7e0 sp=0xc00050a750 pc=0x4213e5
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00050a7e8 sp=0xc00050a7e0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 59 [GC worker (idle)]:
runtime.gopark(0x974e87fb589?, 0x3?, 0xc4?, 0xa4?, 0x0?)
        /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00050af50 sp=0xc00050af30 pc=0x43e7ee
runtime.gcBgMarkWorker()
        /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc00050afe0 sp=0xc00050af50 pc=0x4213e5
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00050afe8 sp=0xc00050afe0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 60 [GC worker (idle)]:
runtime.gopark(0x974edd7d612?, 0x3?, 0x40?, 0x5d?, 0x0?)
        /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00050b750 sp=0xc00050b730 pc=0x43e7ee
runtime.gcBgMarkWorker()
        /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc00050b7e0 sp=0xc00050b750 pc=0x4213e5
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00050b7e8 sp=0xc00050b7e0 pc=0x46e2c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1217 +0x1c

goroutine 19 [IO wait]:
runtime.gopark(0x7?, 0xb?, 0x0?, 0x0?, 0xd?)
        /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0004c58f8 sp=0xc0004c58d8 pc=0x43e7ee
runtime.netpollblock(0x47f078?, 0x4092a6?, 0x0?)
        /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc0004c5930 sp=0xc0004c58f8 pc=0x437277
internal/poll.runtime_pollWait(0x7f709be97c90, 0x72)
        /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc0004c5950 sp=0xc0004c5930 pc=0x468a05
internal/poll.(*pollDesc).wait(0xc000454100?, 0xc00049a000?, 0x0)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0004c5978 sp=0xc0004c5950 pc=0x4efd67
internal/poll.(*pollDesc).waitRead(...)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000454100, {0xc00049a000, 0x1000, 0x1000})
        /usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc0004c5a10 sp=0xc0004c5978 pc=0x4f105a
net.(*netFD).Read(0xc000454100, {0xc00049a000?, 0x4f0225?, 0x0?})
        /usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc0004c5a58 sp=0xc0004c5a10 pc=0x569e05
net.(*conn).Read(0xc00006e000, {0xc00049a000?, 0x0?, 0xc00010a4b8?})
        /usr/local/go/src/net/net.go:179 +0x45 fp=0xc0004c5aa0 sp=0xc0004c5a58 pc=0x5780a5
net.(*TCPConn).Read(0xc00010a4b0?, {0xc00049a000?, 0x0?, 0xc0004d5ac0?})
        <autogenerated>:1 +0x25 fp=0xc0004c5ad0 sp=0xc0004c5aa0 pc=0x589fa5
net/http.(*connReader).Read(0xc00010a4b0, {0xc00049a000, 0x1000, 0x1000})
        /usr/local/go/src/net/http/server.go:791 +0x14b fp=0xc0004c5b20 sp=0xc0004c5ad0 pc=0x6c4fcb
bufio.(*Reader).fill(0xc0000ae180)
        /usr/local/go/src/bufio/bufio.go:113 +0x103 fp=0xc0004c5b58 sp=0xc0004c5b20 pc=0x654c03
bufio.(*Reader).Peek(0xc0000ae180, 0x4)
        /usr/local/go/src/bufio/bufio.go:151 +0x53 fp=0xc0004c5b78 sp=0xc0004c5b58 pc=0x654d33
net/http.(*conn).serve(0xc000498000, {0x10f1e388, 0xc0003c35c0})
        /usr/local/go/src/net/http/server.go:2044 +0x75c fp=0xc0004c5fb8 sp=0xc0004c5b78 pc=0x6cae7c
net/http.(*Server).Serve.func3()
        /usr/local/go/src/net/http/server.go:3086 +0x28 fp=0xc0004c5fe0 sp=0xc0004c5fb8 pc=0x6cf648
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004c5fe8 sp=0xc0004c5fe0 pc=0x46e2c1
created by net/http.(*Server).Serve in goroutine 1
        /usr/local/go/src/net/http/server.go:3086 +0x5cb

goroutine 67 [IO wait]:
runtime.gopark(0x75?, 0xb?, 0x0?, 0x0?, 0xc?)
        /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0001418f8 sp=0xc0001418d8 pc=0x43e7ee
runtime.netpollblock(0x47f078?, 0x4092a6?, 0x0?)
        /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc000141930 sp=0xc0001418f8 pc=0x437277
internal/poll.runtime_pollWait(0x7f709be97d88, 0x72)
        /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc000141950 sp=0xc000141930 pc=0x468a05
internal/poll.(*pollDesc).wait(0xc000734900?, 0xc0000c5000?, 0x0)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000141978 sp=0xc000141950 pc=0x4efd67
internal/poll.(*pollDesc).waitRead(...)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000734900, {0xc0000c5000, 0x1000, 0x1000})
        /usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc000141a10 sp=0xc000141978 pc=0x4f105a
net.(*netFD).Read(0xc000734900, {0xc0000c5000?, 0x4f0225?, 0x0?})
        /usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc000141a58 sp=0xc000141a10 pc=0x569e05
net.(*conn).Read(0xc0000de110, {0xc0000c5000?, 0x0?, 0xc0003c3838?})
        /usr/local/go/src/net/net.go:179 +0x45 fp=0xc000141aa0 sp=0xc000141a58 pc=0x5780a5
net.(*TCPConn).Read(0xc0003c3830?, {0xc0000c5000?, 0x0?, 0xc000141ac0?})
        <autogenerated>:1 +0x25 fp=0xc000141ad0 sp=0xc000141aa0 pc=0x589fa5
net/http.(*connReader).Read(0xc0003c3830, {0xc0000c5000, 0x1000, 0x1000})
        /usr/local/go/src/net/http/server.go:791 +0x14b fp=0xc000141b20 sp=0xc000141ad0 pc=0x6c4fcb
bufio.(*Reader).fill(0xc000528a20)
        /usr/local/go/src/bufio/bufio.go:113 +0x103 fp=0xc000141b58 sp=0xc000141b20 pc=0x654c03
bufio.(*Reader).Peek(0xc000528a20, 0x4)
        /usr/local/go/src/bufio/bufio.go:151 +0x53 fp=0xc000141b78 sp=0xc000141b58 pc=0x654d33
net/http.(*conn).serve(0xc0001322d0, {0x10f1e388, 0xc0003c35c0})
        /usr/local/go/src/net/http/server.go:2044 +0x75c fp=0xc000141fb8 sp=0xc000141b78 pc=0x6cae7c
net/http.(*Server).Serve.func3()
        /usr/local/go/src/net/http/server.go:3086 +0x28 fp=0xc000141fe0 sp=0xc000141fb8 pc=0x6cf648
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000141fe8 sp=0xc000141fe0 pc=0x46e2c1
created by net/http.(*Server).Serve in goroutine 1
        /usr/local/go/src/net/http/server.go:3086 +0x5cb

goroutine 82 [IO wait]:
runtime.gopark(0x5?, 0xb?, 0x0?, 0x0?, 0xe?)
        /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0004c18f8 sp=0xc0004c18d8 pc=0x43e7ee
runtime.netpollblock(0x47f078?, 0x4092a6?, 0x0?)
        /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc0004c1930 sp=0xc0004c18f8 pc=0x437277
internal/poll.runtime_pollWait(0x7f709be97b98, 0x72)
        /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc0004c1950 sp=0xc0004c1930 pc=0x468a05
internal/poll.(*pollDesc).wait(0xc000038080?, 0xc000510000?, 0x0)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0004c1978 sp=0xc0004c1950 pc=0x4efd67
internal/poll.(*pollDesc).waitRead(...)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000038080, {0xc000510000, 0x1000, 0x1000})
        /usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc0004c1a10 sp=0xc0004c1978 pc=0x4f105a
net.(*netFD).Read(0xc000038080, {0xc000510000?, 0x4f0225?, 0x0?})
        /usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc0004c1a58 sp=0xc0004c1a10 pc=0x569e05
net.(*conn).Read(0xc00063e008, {0xc000510000?, 0x0?, 0xc0001a4338?})
        /usr/local/go/src/net/net.go:179 +0x45 fp=0xc0004c1aa0 sp=0xc0004c1a58 pc=0x5780a5
net.(*TCPConn).Read(0xc0001a4330?, {0xc000510000?, 0x0?, 0xc0004c1ac0?})
        <autogenerated>:1 +0x25 fp=0xc0004c1ad0 sp=0xc0004c1aa0 pc=0x589fa5
net/http.(*connReader).Read(0xc0001a4330, {0xc000510000, 0x1000, 0x1000})
        /usr/local/go/src/net/http/server.go:791 +0x14b fp=0xc0004c1b20 sp=0xc0004c1ad0 pc=0x6c4fcb
bufio.(*Reader).fill(0xc00017c7e0)
        /usr/local/go/src/bufio/bufio.go:113 +0x103 fp=0xc0004c1b58 sp=0xc0004c1b20 pc=0x654c03
bufio.(*Reader).Peek(0xc00017c7e0, 0x4)
        /usr/local/go/src/bufio/bufio.go:151 +0x53 fp=0xc0004c1b78 sp=0xc0004c1b58 pc=0x654d33
net/http.(*conn).serve(0xc00050e000, {0x10f1e388, 0xc0003c35c0})
        /usr/local/go/src/net/http/server.go:2044 +0x75c fp=0xc0004c1fb8 sp=0xc0004c1b78 pc=0x6cae7c
net/http.(*Server).Serve.func3()
        /usr/local/go/src/net/http/server.go:3086 +0x28 fp=0xc0004c1fe0 sp=0xc0004c1fb8 pc=0x6cf648
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004c1fe8 sp=0xc0004c1fe0 pc=0x46e2c1
created by net/http.(*Server).Serve in goroutine 1
        /usr/local/go/src/net/http/server.go:3086 +0x5cb

goroutine 76 [IO wait]:
runtime.gopark(0x10f29a40?, 0xb?, 0x0?, 0x0?, 0x10?)
        /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000508da0 sp=0xc000508d80 pc=0x43e7ee
runtime.netpollblock(0x47f078?, 0x4092a6?, 0x0?)
        /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc000508dd8 sp=0xc000508da0 pc=0x437277
internal/poll.runtime_pollWait(0x7f709be979a8, 0x72)
        /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc000508df8 sp=0xc000508dd8 pc=0x468a05
internal/poll.(*pollDesc).wait(0xc000734280?, 0xc0003f82e1?, 0x0)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000508e20 sp=0xc000508df8 pc=0x4efd67
internal/poll.(*pollDesc).waitRead(...)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000734280, {0xc0003f82e1, 0x1, 0x1})
        /usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc000508eb8 sp=0xc000508e20 pc=0x4f105a
net.(*netFD).Read(0xc000734280, {0xc0003f82e1?, 0xc000508f40?, 0x46a990?})
        /usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc000508f00 sp=0xc000508eb8 pc=0x569e05
net.(*conn).Read(0xc0000de328, {0xc0003f82e1?, 0x1?, 0xc0003a0cd0?})
        /usr/local/go/src/net/net.go:179 +0x45 fp=0xc000508f48 sp=0xc000508f00 pc=0x5780a5
net.(*TCPConn).Read(0xc00010a4b0?, {0xc0003f82e1?, 0xc0003a0cd0?, 0xc00044ae80?})
        <autogenerated>:1 +0x25 fp=0xc000508f78 sp=0xc000508f48 pc=0x589fa5
net/http.(*connReader).backgroundRead(0xc0003f82d0)
        /usr/local/go/src/net/http/server.go:683 +0x37 fp=0xc000508fc8 sp=0xc000508f78 pc=0x6c4b97
net/http.(*connReader).startBackgroundRead.func2()
        /usr/local/go/src/net/http/server.go:679 +0x25 fp=0xc000508fe0 sp=0xc000508fc8 pc=0x6c4ac5
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000508fe8 sp=0xc000508fe0 pc=0x46e2c1
created by net/http.(*connReader).startBackgroundRead in goroutine 75
        /usr/local/go/src/net/http/server.go:679 +0xba

goroutine 72 [IO wait]:
runtime.gopark(0x45cb3f2c0deee1c4?, 0xb?, 0x0?, 0x0?, 0xf?)
        /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00013b5c8 sp=0xc00013b5a8 pc=0x43e7ee
runtime.netpollblock(0x47f078?, 0x4092a6?, 0x0?)
        /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc00013b600 sp=0xc00013b5c8 pc=0x437277
internal/poll.runtime_pollWait(0x7f709be97aa0, 0x72)
        /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc00013b620 sp=0xc00013b600 pc=0x468a05
internal/poll.(*pollDesc).wait(0xc000038280?, 0xc000443500?, 0x0)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00013b648 sp=0xc00013b620 pc=0x4efd67
internal/poll.(*pollDesc).waitRead(...)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000038280, {0xc000443500, 0x1500, 0x1500})
        /usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc00013b6e0 sp=0xc00013b648 pc=0x4f105a
net.(*netFD).Read(0xc000038280, {0xc000443500?, 0xc00013b7b0?, 0x12?})
        /usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc00013b728 sp=0xc00013b6e0 pc=0x569e05
net.(*conn).Read(0xc0000de138, {0xc000443500?, 0xc0000a3380?, 0x27?})
        /usr/local/go/src/net/net.go:179 +0x45 fp=0xc00013b770 sp=0xc00013b728 pc=0x5780a5
net.(*TCPConn).Read(0xc00013b808?, {0xc000443500?, 0xc0005b6060?, 0x18?})
        <autogenerated>:1 +0x25 fp=0xc00013b7a0 sp=0xc00013b770 pc=0x589fa5
crypto/tls.(*atLeastReader).Read(0xc0005b6060, {0xc000443500?, 0xc0005b6060?, 0x0?})
        /usr/local/go/src/crypto/tls/conn.go:805 +0x3b fp=0xc00013b7e8 sp=0xc00013b7a0 pc=0x617ddb
bytes.(*Buffer).ReadFrom(0xc000138d28, {0x10f1a0c0, 0xc0005b6060})
        /usr/local/go/src/bytes/buffer.go:211 +0x98 fp=0xc00013b840 sp=0xc00013b7e8 pc=0x4a2f18
crypto/tls.(*Conn).readFromUntil(0xc000138a80, {0x10f192c0?, 0xc0000de138}, 0xc00013b948?)
        /usr/local/go/src/crypto/tls/conn.go:827 +0xde fp=0xc00013b880 sp=0xc00013b840 pc=0x617fbe
crypto/tls.(*Conn).readRecordOrCCS(0xc000138a80, 0x0)
        /usr/local/go/src/crypto/tls/conn.go:625 +0x250 fp=0xc00013bc20 sp=0xc00013b880 pc=0x615590
crypto/tls.(*Conn).readRecord(...)
        /usr/local/go/src/crypto/tls/conn.go:587
crypto/tls.(*Conn).Read(0xc000138a80, {0xc0000c4000, 0x1000, 0x10f21318?})
        /usr/local/go/src/crypto/tls/conn.go:1369 +0x158 fp=0xc00013bc90 sp=0xc00013bc20 pc=0x61b858
bufio.(*Reader).Read(0xc000528ea0, {0xc0001b2120, 0x9, 0x6f064e?})
        /usr/local/go/src/bufio/bufio.go:244 +0x197 fp=0xc00013bcc8 sp=0xc00013bc90 pc=0x655137
io.ReadAtLeast({0x10f19600, 0xc000528ea0}, {0xc0001b2120, 0x9, 0x9}, 0x9)
        /usr/local/go/src/io/io.go:335 +0x90 fp=0xc00013bd10 sp=0xc00013bcc8 pc=0x49ac50
io.ReadFull(...)
        /usr/local/go/src/io/io.go:354
net/http.http2readFrameHeader({0xc0001b2120, 0x9, 0x6b6472?}, {0x10f19600?, 0xc000528ea0?})
        /usr/local/go/src/net/http/h2_bundle.go:1635 +0x65 fp=0xc00013bd60 sp=0xc00013bd10 pc=0x68e905
net/http.(*http2Framer).ReadFrame(0xc0001b20e0)
        /usr/local/go/src/net/http/h2_bundle.go:1899 +0x85 fp=0xc00013be08 sp=0xc00013bd60 pc=0x68f045
net/http.(*http2clientConnReadLoop).run(0xc00013bf98)
        /usr/local/go/src/net/http/h2_bundle.go:9338 +0x11f fp=0xc00013bf60 sp=0xc00013be08 pc=0x6b1eff
net/http.(*http2ClientConn).readLoop(0xc000002000)
        /usr/local/go/src/net/http/h2_bundle.go:9233 +0x65 fp=0xc00013bfc8 sp=0xc00013bf60 pc=0x6b1485
net/http.(*http2Transport).newClientConn.func3()
        /usr/local/go/src/net/http/h2_bundle.go:7905 +0x25 fp=0xc00013bfe0 sp=0xc00013bfc8 pc=0x6aa365
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00013bfe8 sp=0xc00013bfe0 pc=0x46e2c1
created by net/http.(*http2Transport).newClientConn in goroutine 71
        /usr/local/go/src/net/http/h2_bundle.go:7905 +0xcbe

rax    0x6
rbx    0x7f703bdfee70
rcx    0x7f709c234387
rdx    0x6
rdi    0x1
rsi    0x15
rbp    0x0
rsp    0x7f703bdfed40
r8     0x0
r9     0x7f703bdfec90
r10    0x8
r11    0x202
r12    0x7f6fec8f23b0
r13    0x7f6fec8f0eb0
r14    0x7f703bdfefa8
r15    0x7f703bdff2e8
rip    0x7f709c235bc7
rflags 0x10246
cs     0x33
fs     0x0
gs     0x0
<!-- gh-comment-id:1948602668 --> @Venefilyn commented on GitHub (Feb 16, 2024): I can't figure mine out other than it does seem to detect my GPU correctly. Crashes as soon as any model is used. Using the ollama/ollama:0.1.24-rocm build with ``` podman run -d --privileged --device /dev/kfd -v ollama:/root/.ollama -p 11434:11434 -e OLLAMA_DEBUG=1 -e ROCR_VISIBLE_DEVICES="0" --name ollama ollama/ollama:0.1.24-rocm ``` ``` [root@8bcdeb65f6ca /]# rocm-smi ========================= ROCm System Management Interface ========================= =================================== Concise Info =================================== GPU Temp (DieEdge) AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 0 58.0c 33.0W 800Mhz 875Mhz 21.57% auto 165.0W 17% 4% ==================================================================================== =============================== End of ROCm SMI Log ================================ ``` ``` ❯ ollama run codellama:7b "Write me a function that outputs the fibonacci sequence" pulling manifest pulling 3a43f93b78ec... 100% ▕█████████████████████████████████████████████████████████████████████████████████████▏ 3.8 GB pulling 8c17c2ebb0ea... 100% ▕█████████████████████████████████████████████████████████████████████████████████████▏ 7.0 KB pulling 590d74a5569b... 100% ▕█████████████████████████████████████████████████████████████████████████████████████▏ 4.8 KB pulling 2e0493f67d0c... 100% ▕█████████████████████████████████████████████████████████████████████████████████████▏ 59 B pulling 7f6a57943a88... 100% ▕█████████████████████████████████████████████████████████████████████████████████████▏ 120 B pulling 316526ac7323... 100% ▕█████████████████████████████████████████████████████████████████████████████████████▏ 529 B verifying sha256 digest writing manifest removing any unused layers success Error: Post "http://127.0.0.1:11434/api/generate": EOF ``` Specifically, from the logs below this seems to be the interesting error ``` rocBLAS error: Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1010 List of available TensileLibrary Files : "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx803.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx940.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx941.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx942.dat" loading library /tmp/ollama1091808863/rocm_v5/libext_server.so SIGSEGV: segmentation violation PC=0x7f709c235bc7 m=21 sigcode=128 signal arrived during cgo execution ``` <details> <summary>Log output</summary> ``` time=2024-02-16T15:17:29.216Z level=INFO source=images.go:863 msg="total blobs: 9" time=2024-02-16T15:17:29.216Z level=INFO source=images.go:870 msg="total unused blobs removed: 0" time=2024-02-16T15:17:29.217Z level=INFO source=routes.go:999 msg="Listening on [::]:11434 (version 0.1.24)" time=2024-02-16T15:17:29.217Z level=INFO source=payload_common.go:106 msg="Extracting dynamic libraries..." �time=2024-02-16T15:17:31.297Z level=INFO source=payload_common.go:145 msg="Dynamic LLM libraries [rocm_v6 cuda_v11 cpu cpu_avx2 cpu_avx rocm_v5]" �time=2024-02-16T15:17:31.297Z level=DEBUG source=payload_common.go:146 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-02-16T15:17:31.297Z level=INFO source=gpu.go:94 msg="Detecting GPU type" time=2024-02-16T15:17:31.297Z level=INFO source=gpu.go:242 msg="Searching for GPU management library libnvidia-ml.so" �time=2024-02-16T15:17:31.297Z level=DEBUG source=gpu.go:260 msg="gpu management search paths: [/usr/local/cuda/lib64/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/libnvidia-ml.so* /usr/lib/wsl/lib/libnvidia-ml.so* /usr/lib/wsl/drivers/*/libnvidia-ml.so* /opt/cuda/lib64/libnvidia-ml.so* /usr/lib*/libnvidia-ml.so* /usr/local/lib*/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/libnvidia-ml.so* /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so* /opt/rocm/lib/libnvidia-ml.so* /usr/local/lib/libnvidia-ml.so* /opt/rh/devtoolset-7/root/libnvidia-ml.so*]" time=2024-02-16T15:17:31.298Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: []" time=2024-02-16T15:17:31.298Z level=INFO source=gpu.go:242 msg="Searching for GPU management library librocm_smi64.so" �time=2024-02-16T15:17:31.298Z level=DEBUG source=gpu.go:260 msg="gpu management search paths: [/opt/rocm*/lib*/librocm_smi64.so* /opt/rocm/lib/librocm_smi64.so* /usr/local/lib/librocm_smi64.so* /opt/rh/devtoolset-7/root/librocm_smi64.so*]" �time=2024-02-16T15:17:31.299Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0.50701 /opt/rocm-5.7.1/lib/librocm_smi64.so.5.0.50701]" wiring rocm management library functions in /opt/rocm/lib/librocm_smi64.so.5.0.50701 dlsym: rsmi_init dlsym: rsmi_shut_down dlsym: rsmi_dev_memory_total_get dlsym: rsmi_dev_memory_usage_get dlsym: rsmi_version_get dlsym: rsmi_num_monitor_devices dlsym: rsmi_dev_id_get dlsym: rsmi_dev_name_get dlsym: rsmi_dev_brand_get dlsym: rsmi_dev_vendor_name_get dlsym: rsmi_dev_vram_vendor_get dlsym: rsmi_dev_serial_number_get dlsym: rsmi_dev_subsystem_name_get dlsym: rsmi_dev_vbios_version_get time=2024-02-16T15:17:31.301Z level=INFO source=gpu.go:109 msg="Radeon GPU detected" time=2024-02-16T15:17:31.301Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" discovered 1 ROCm GPU Devices [0] ROCm device name: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [0] ROCm brand: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [0] ROCm vendor: Advanced Micro Devices, Inc. [AMD/ATI] [0] ROCm VRAM vendor: micron rsmi_dev_serial_number_get failed: 2 [0] ROCm subsystem name: 0xe408 [0] ROCm vbios version: 113-2E40803-O4A [0] ROCm totalMem 8573157376 [0] ROCm usedMem 1484857344 time=2024-02-16T15:17:31.305Z level=DEBUG source=gpu.go:231 msg="rocm detected 1 devices with 5735M available memory" T[GIN] 2024/02/16 - 15:17:47 | 200 | 58.9µs | 10.0.2.100 | HEAD "/" \[GIN] 2024/02/16 - 15:17:47 | 404 | 313.3µs | 10.0.2.100 | POST "/api/show" [GIN] 2024/02/16 - 15:17:52 | 200 | 4.925207389s | 10.0.2.100 | POST "/api/pull" \[GIN] 2024/02/16 - 15:17:52 | 200 | 422.721µs | 10.0.2.100 | POST "/api/show" time=2024-02-16T15:17:52.660Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" discovered 1 ROCm GPU Devices [0] ROCm device name: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [0] ROCm brand: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [0] ROCm vendor: Advanced Micro Devices, Inc. [AMD/ATI] [0] ROCm VRAM vendor: micron rsmi_dev_serial_number_get failed: 2 [0] ROCm subsystem name: 0xe408 [0] ROCm vbios version: 113-2E40803-O4A [0] ROCm totalMem 8573157376 [0] ROCm usedMem 1486024704 time=2024-02-16T15:17:52.666Z level=DEBUG source=gpu.go:231 msg="rocm detected 1 devices with 5734M available memory" time=2024-02-16T15:17:52.666Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" discovered 1 ROCm GPU Devices [0] ROCm device name: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [0] ROCm brand: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [0] ROCm vendor: Advanced Micro Devices, Inc. [AMD/ATI] [0] ROCm VRAM vendor: micron rsmi_dev_serial_number_get failed: 2 [0] ROCm subsystem name: 0xe408 [0] ROCm vbios version: 113-2E40803-O4A [0] ROCm totalMem 8573157376 [0] ROCm usedMem 1487368192 time=2024-02-16T15:17:52.671Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" �time=2024-02-16T15:17:52.710Z level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama1091808863/rocm_v5/libext_server.so" time=2024-02-16T15:17:52.710Z level=INFO source=dyn_ext_server.go:145 msg="Initializing llama server" �[1708096672] system info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | [1708096672] Performing pre-initialization of GPU rocBLAS error: Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1010 List of available TensileLibrary Files : "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx803.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx940.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx941.dat" "/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx942.dat" loading library /tmp/ollama1091808863/rocm_v5/libext_server.so SIGSEGV: segmentation violation PC=0x7f709c235bc7 m=21 sigcode=128 signal arrived during cgo execution goroutine 75 [syscall]: runtime.cgocall(0x9bc250, 0xc0004d0710) /usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc0004d06e8 sp=0xc0004d06b0 pc=0x409b0b �github.com/jmorganca/ollama/llm._Cfunc_dyn_llama_server_init({0x7f6fec000ba0, 0x7f703a042890, 0x7f703a0430a0, 0x7f703a043130, 0x7f703a043350, 0x7f703a043530, 0x7f703a043da0, 0x7f703a043d80, 0x7f703a043eb0, 0x7f703a044460, ...}, ...) _cgo_gotypes.go:282 +0x45 fp=0xc0004d0710 sp=0xc0004d06e8 pc=0x7c4625 github.com/jmorganca/ollama/llm.newDynExtServer.func7(0xaf1392?, 0xc?) /go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:148 +0xef fp=0xc0004d0800 sp=0xc0004d0710 pc=0x7c5b4f github.com/jmorganca/ollama/llm.newDynExtServer({0xc00088e000, 0x2e}, {0xc00056cb60, _}, {_, _, _}, {0x0, 0x0, 0x0}, ...) /go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:148 +0xa45 fp=0xc0004d0aa0 sp=0xc0004d0800 pc=0x7c57e5 github.com/jmorganca/ollama/llm.newLlmServer({{_, _, _}, {_, _}, {_, _}}, {_, _}, {0xc00056cb60, ...}, ...) /go/src/github.com/jmorganca/ollama/llm/llm.go:158 +0x425 fp=0xc0004d0c60 sp=0xc0004d0aa0 pc=0x7c1fe5 github.com/jmorganca/ollama/llm.New({0xc0002f89a8, 0x15}, {0xc00056cb60, _}, {_, _, _}, {0x0, 0x0, 0x0}, ...) /go/src/github.com/jmorganca/ollama/llm/llm.go:123 +0x713 fp=0xc0004d0ee0 sp=0xc0004d0c60 pc=0x7c1953 �github.com/jmorganca/ollama/server.load(0xc000003080?, 0xc000003080, {{0x0, 0x800, 0x200, 0x1, 0xffffffffffffffff, 0x0, 0x0, 0x1, ...}, ...}, ...) /go/src/github.com/jmorganca/ollama/server/routes.go:84 +0x3a5 fp=0xc0004d1060 sp=0xc0004d0ee0 pc=0x995c25 github.com/jmorganca/ollama/server.GenerateHandler(0xc000512100) /go/src/github.com/jmorganca/ollama/server/routes.go:197 +0x6f7 fp=0xc0004d1748 sp=0xc0004d1060 pc=0x996797 github.com/gin-gonic/gin.(*Context).Next(...) /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174 github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func1(0xc000512100) /go/src/github.com/jmorganca/ollama/server/routes.go:923 +0x68 fp=0xc0004d1780 sp=0xc0004d1748 pc=0x99f688 github.com/gin-gonic/gin.(*Context).Next(...) /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174 github.com/gin-gonic/gin.CustomRecoveryWithWriter.func1(0xc000512100) /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/recovery.go:102 +0x7a fp=0xc0004d17d0 sp=0xc0004d1780 pc=0x97719a github.com/gin-gonic/gin.(*Context).Next(...) /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174 github.com/gin-gonic/gin.LoggerWithConfig.func1(0xc000512100) /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/logger.go:240 +0xde fp=0xc0004d1980 sp=0xc0004d17d0 pc=0x97633e github.com/gin-gonic/gin.(*Context).Next(...) /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174 github.com/gin-gonic/gin.(*Engine).handleHTTPRequest(0xc000145ba0, 0xc000512100) /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:620 +0x65b fp=0xc0004d1b08 sp=0xc0004d1980 pc=0x9753fb github.com/gin-gonic/gin.(*Engine).ServeHTTP(0xc000145ba0, {0x10f1cd20?, 0xc0001b22a0}, 0xc000452700) /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:576 +0x1dd fp=0xc0004d1b48 sp=0xc0004d1b08 pc=0x974bbd net/http.serverHandler.ServeHTTP({0x10f1b040?}, {0x10f1cd20?, 0xc0001b22a0?}, 0x6?) /usr/local/go/src/net/http/server.go:2938 +0x8e fp=0xc0004d1b78 sp=0xc0004d1b48 pc=0x6cee2e net/http.(*conn).serve(0xc000133170, {0x10f1e388, 0xc0003c35c0}) /usr/local/go/src/net/http/server.go:2009 +0x5f4 fp=0xc0004d1fb8 sp=0xc0004d1b78 pc=0x6cad14 net/http.(*Server).Serve.func3() /usr/local/go/src/net/http/server.go:3086 +0x28 fp=0xc0004d1fe0 sp=0xc0004d1fb8 pc=0x6cf648 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004d1fe8 sp=0xc0004d1fe0 pc=0x46e2c1 created by net/http.(*Server).Serve in goroutine 1 /usr/local/go/src/net/http/server.go:3086 +0x5cb goroutine 1 [IO wait]: runtime.gopark(0x480f10?, 0xc00013f850?, 0xa0?, 0xf8?, 0x4f711d?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00013f830 sp=0xc00013f810 pc=0x43e7ee runtime.netpollblock(0x46c332?, 0x4092a6?, 0x0?) /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc00013f868 sp=0xc00013f830 pc=0x437277 internal/poll.runtime_pollWait(0x7f709be97e80, 0x72) /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc00013f888 sp=0xc00013f868 pc=0x468a05 internal/poll.(*pollDesc).wait(0xc000454080?, 0x4?, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00013f8b0 sp=0xc00013f888 pc=0x4efd67 internal/poll.(*pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0xc000454080) /usr/local/go/src/internal/poll/fd_unix.go:611 +0x2ac fp=0xc00013f958 sp=0xc00013f8b0 pc=0x4f524c net.(*netFD).accept(0xc000454080) /usr/local/go/src/net/fd_unix.go:172 +0x29 fp=0xc00013fa10 sp=0xc00013f958 pc=0x56be29 net.(*TCPListener).accept(0xc0004255a0) /usr/local/go/src/net/tcpsock_posix.go:152 +0x1e fp=0xc00013fa38 sp=0xc00013fa10 pc=0x580c3e net.(*TCPListener).Accept(0xc0004255a0) /usr/local/go/src/net/tcpsock.go:315 +0x30 fp=0xc00013fa68 sp=0xc00013fa38 pc=0x57fdf0 net/http.(*onceCloseListener).Accept(0xc000133170?) <autogenerated>:1 +0x24 fp=0xc00013fa80 sp=0xc00013fa68 pc=0x6f1bc4 net/http.(*Server).Serve(0xc000366ff0, {0x10f1cb10, 0xc0004255a0}) /usr/local/go/src/net/http/server.go:3056 +0x364 fp=0xc00013fbb0 sp=0xc00013fa80 pc=0x6cf284 github.com/jmorganca/ollama/server.Serve({0x10f1cb10, 0xc0004255a0}) /go/src/github.com/jmorganca/ollama/server/routes.go:1026 +0x454 fp=0xc00013fc98 sp=0xc00013fbb0 pc=0x99fb34 github.com/jmorganca/ollama/cmd.RunServer(0xc000452300?, {0x11365820?, 0x4?, 0xad90c5?}) /go/src/github.com/jmorganca/ollama/cmd/cmd.go:705 +0x199 fp=0xc00013fd30 sp=0xc00013fc98 pc=0x9b3399 github.com/spf13/cobra.(*Command).execute(0xc00040d800, {0x11365820, 0x0, 0x0}) /root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0x87c fp=0xc00013fe68 sp=0xc00013fd30 pc=0x7649fc github.com/spf13/cobra.(*Command).ExecuteC(0xc00040cc00) /root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc00013ff20 sp=0xc00013fe68 pc=0x765225 github.com/spf13/cobra.(*Command).Execute(...) /root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) /root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:985 main.main() /go/src/github.com/jmorganca/ollama/main.go:11 +0x4d fp=0xc00013ff40 sp=0xc00013ff20 pc=0x9bb36d runtime.main() /usr/local/go/src/runtime/proc.go:267 +0x2bb fp=0xc00013ffe0 sp=0xc00013ff40 pc=0x43e39b runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00013ffe8 sp=0xc00013ffe0 pc=0x46e2c1 goroutine 2 [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00006afa8 sp=0xc00006af88 pc=0x43e7ee runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:404 runtime.forcegchelper() /usr/local/go/src/runtime/proc.go:322 +0xb3 fp=0xc00006afe0 sp=0xc00006afa8 pc=0x43e673 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006afe8 sp=0xc00006afe0 pc=0x46e2c1 created by runtime.init.6 in goroutine 1 /usr/local/go/src/runtime/proc.go:310 +0x1a goroutine 3 [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00006b778 sp=0xc00006b758 pc=0x43e7ee runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:404 runtime.bgsweep(0x0?) /usr/local/go/src/runtime/mgcsweep.go:321 +0xdf fp=0xc00006b7c8 sp=0xc00006b778 pc=0x42a73f runtime.gcenable.func1() /usr/local/go/src/runtime/mgc.go:200 +0x25 fp=0xc00006b7e0 sp=0xc00006b7c8 pc=0x41f865 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006b7e8 sp=0xc00006b7e0 pc=0x46e2c1 created by runtime.gcenable in goroutine 1 /usr/local/go/src/runtime/mgc.go:200 +0x66 goroutine 4 [GC scavenge wait]: runtime.gopark(0x40d091?, 0x89c5a1?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00006bf70 sp=0xc00006bf50 pc=0x43e7ee runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:404 runtime.(*scavengerState).park(0x11335ba0) /usr/local/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc00006bfa0 sp=0xc00006bf70 pc=0x427f69 runtime.bgscavenge(0x0?) /usr/local/go/src/runtime/mgcscavenge.go:658 +0x59 fp=0xc00006bfc8 sp=0xc00006bfa0 pc=0x428519 runtime.gcenable.func2() /usr/local/go/src/runtime/mgc.go:201 +0x25 fp=0xc00006bfe0 sp=0xc00006bfc8 pc=0x41f805 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006bfe8 sp=0xc00006bfe0 pc=0x46e2c1 created by runtime.gcenable in goroutine 1 /usr/local/go/src/runtime/mgc.go:201 +0xa5 goroutine 5 [finalizer wait]: runtime.gopark(0xad2080?, 0x10043f901?, 0x0?, 0x0?, 0x4469a5?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00006a628 sp=0xc00006a608 pc=0x43e7ee runtime.runfinq() /usr/local/go/src/runtime/mfinal.go:193 +0x107 fp=0xc00006a7e0 sp=0xc00006a628 pc=0x41e8e7 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006a7e8 sp=0xc00006a7e0 pc=0x46e2c1 created by runtime.createfing in goroutine 1 /usr/local/go/src/runtime/mfinal.go:163 +0x3d goroutine 6 [select, locked to thread]: runtime.gopark(0xc00006c7a8?, 0x2?, 0x89?, 0xea?, 0xc00006c7a4?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00006c638 sp=0xc00006c618 pc=0x43e7ee runtime.selectgo(0xc00006c7a8, 0xc00006c7a0, 0x0?, 0x0, 0x0?, 0x1) /usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc00006c758 sp=0xc00006c638 pc=0x44e325 runtime.ensureSigM.func1() /usr/local/go/src/runtime/signal_unix.go:1014 +0x19f fp=0xc00006c7e0 sp=0xc00006c758 pc=0x46535f runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006c7e8 sp=0xc00006c7e0 pc=0x46e2c1 created by runtime.ensureSigM in goroutine 1 /usr/local/go/src/runtime/signal_unix.go:997 +0xc8 goroutine 18 [syscall]: runtime.notetsleepg(0x0?, 0x0?) /usr/local/go/src/runtime/lock_futex.go:236 +0x29 fp=0xc0000667a0 sp=0xc000066768 pc=0x411349 os/signal.signal_recv() /usr/local/go/src/runtime/sigqueue.go:152 +0x29 fp=0xc0000667c0 sp=0xc0000667a0 pc=0x46ac89 os/signal.loop() /usr/local/go/src/os/signal/signal_unix.go:23 +0x13 fp=0xc0000667e0 sp=0xc0000667c0 pc=0x6f45f3 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000667e8 sp=0xc0000667e0 pc=0x46e2c1 created by os/signal.Notify.func1.1 in goroutine 1 /usr/local/go/src/os/signal/signal.go:151 +0x1f goroutine 34 [chan receive]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000494718 sp=0xc0004946f8 pc=0x43e7ee runtime.chanrecv(0xc00017d800, 0x0, 0x1) /usr/local/go/src/runtime/chan.go:583 +0x3cd fp=0xc000494790 sp=0xc000494718 pc=0x40beed runtime.chanrecv1(0x0?, 0x0?) /usr/local/go/src/runtime/chan.go:442 +0x12 fp=0xc0004947b8 sp=0xc000494790 pc=0x40baf2 github.com/jmorganca/ollama/server.Serve.func2() /go/src/github.com/jmorganca/ollama/server/routes.go:1008 +0x25 fp=0xc0004947e0 sp=0xc0004947b8 pc=0x99fbc5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004947e8 sp=0xc0004947e0 pc=0x46e2c1 created by github.com/jmorganca/ollama/server.Serve in goroutine 1 /go/src/github.com/jmorganca/ollama/server/routes.go:1007 +0x3c7 goroutine 35 [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000494f50 sp=0xc000494f30 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000494fe0 sp=0xc000494f50 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000494fe8 sp=0xc000494fe0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 50 [GC worker (idle)]: runtime.gopark(0x974edd7ad54?, 0x1?, 0xc3?, 0xd?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000490750 sp=0xc000490730 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0004907e0 sp=0xc000490750 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004907e8 sp=0xc0004907e0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 36 [GC worker (idle)]: runtime.gopark(0x974edd7adb8?, 0x1?, 0xf2?, 0x18?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000495750 sp=0xc000495730 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0004957e0 sp=0xc000495750 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004957e8 sp=0xc0004957e0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 7 [GC worker (idle)]: runtime.gopark(0x974edd7d568?, 0x3?, 0x47?, 0x74?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00006cf50 sp=0xc00006cf30 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc00006cfe0 sp=0xc00006cf50 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006cfe8 sp=0xc00006cfe0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 51 [GC worker (idle)]: runtime.gopark(0x974edd7cec4?, 0x3?, 0x66?, 0xdf?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000490f50 sp=0xc000490f30 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000490fe0 sp=0xc000490f50 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000490fe8 sp=0xc000490fe0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 52 [GC worker (idle)]: runtime.gopark(0x11367540?, 0x1?, 0x4b?, 0x1b?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000491750 sp=0xc000491730 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0004917e0 sp=0xc000491750 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004917e8 sp=0xc0004917e0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 8 [GC worker (idle)]: runtime.gopark(0x974edd7d0cc?, 0x3?, 0xf6?, 0x27?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00006d750 sp=0xc00006d730 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc00006d7e0 sp=0xc00006d750 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00006d7e8 sp=0xc00006d7e0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 53 [GC worker (idle)]: runtime.gopark(0x974edd7ad9a?, 0x1?, 0xb4?, 0x47?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000491f50 sp=0xc000491f30 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000491fe0 sp=0xc000491f50 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000491fe8 sp=0xc000491fe0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 37 [GC worker (idle)]: runtime.gopark(0x974edd7aa70?, 0x3?, 0x82?, 0xd2?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000495f50 sp=0xc000495f30 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000495fe0 sp=0xc000495f50 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000495fe8 sp=0xc000495fe0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 54 [GC worker (idle)]: runtime.gopark(0x11367540?, 0x3?, 0x32?, 0xa?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000492750 sp=0xc000492730 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0004927e0 sp=0xc000492750 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004927e8 sp=0xc0004927e0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 55 [GC worker (idle)]: runtime.gopark(0x11367540?, 0x3?, 0xc0?, 0xcb?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000492f50 sp=0xc000492f30 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000492fe0 sp=0xc000492f50 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000492fe8 sp=0xc000492fe0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 56 [GC worker (idle)]: runtime.gopark(0x974edd7a23c?, 0x3?, 0x56?, 0xa7?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000493750 sp=0xc000493730 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc0004937e0 sp=0xc000493750 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004937e8 sp=0xc0004937e0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 57 [GC worker (idle)]: runtime.gopark(0x974edd7b0ba?, 0x3?, 0x3d?, 0xe0?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000493f50 sp=0xc000493f30 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc000493fe0 sp=0xc000493f50 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000493fe8 sp=0xc000493fe0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 58 [GC worker (idle)]: runtime.gopark(0x974edd7a2c8?, 0xc0005020c0?, 0x1a?, 0x14?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00050a750 sp=0xc00050a730 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc00050a7e0 sp=0xc00050a750 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00050a7e8 sp=0xc00050a7e0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 59 [GC worker (idle)]: runtime.gopark(0x974e87fb589?, 0x3?, 0xc4?, 0xa4?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00050af50 sp=0xc00050af30 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc00050afe0 sp=0xc00050af50 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00050afe8 sp=0xc00050afe0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 60 [GC worker (idle)]: runtime.gopark(0x974edd7d612?, 0x3?, 0x40?, 0x5d?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00050b750 sp=0xc00050b730 pc=0x43e7ee runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1293 +0xe5 fp=0xc00050b7e0 sp=0xc00050b750 pc=0x4213e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00050b7e8 sp=0xc00050b7e0 pc=0x46e2c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1217 +0x1c goroutine 19 [IO wait]: runtime.gopark(0x7?, 0xb?, 0x0?, 0x0?, 0xd?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0004c58f8 sp=0xc0004c58d8 pc=0x43e7ee runtime.netpollblock(0x47f078?, 0x4092a6?, 0x0?) /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc0004c5930 sp=0xc0004c58f8 pc=0x437277 internal/poll.runtime_pollWait(0x7f709be97c90, 0x72) /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc0004c5950 sp=0xc0004c5930 pc=0x468a05 internal/poll.(*pollDesc).wait(0xc000454100?, 0xc00049a000?, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0004c5978 sp=0xc0004c5950 pc=0x4efd67 internal/poll.(*pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc000454100, {0xc00049a000, 0x1000, 0x1000}) /usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc0004c5a10 sp=0xc0004c5978 pc=0x4f105a net.(*netFD).Read(0xc000454100, {0xc00049a000?, 0x4f0225?, 0x0?}) /usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc0004c5a58 sp=0xc0004c5a10 pc=0x569e05 net.(*conn).Read(0xc00006e000, {0xc00049a000?, 0x0?, 0xc00010a4b8?}) /usr/local/go/src/net/net.go:179 +0x45 fp=0xc0004c5aa0 sp=0xc0004c5a58 pc=0x5780a5 net.(*TCPConn).Read(0xc00010a4b0?, {0xc00049a000?, 0x0?, 0xc0004d5ac0?}) <autogenerated>:1 +0x25 fp=0xc0004c5ad0 sp=0xc0004c5aa0 pc=0x589fa5 net/http.(*connReader).Read(0xc00010a4b0, {0xc00049a000, 0x1000, 0x1000}) /usr/local/go/src/net/http/server.go:791 +0x14b fp=0xc0004c5b20 sp=0xc0004c5ad0 pc=0x6c4fcb bufio.(*Reader).fill(0xc0000ae180) /usr/local/go/src/bufio/bufio.go:113 +0x103 fp=0xc0004c5b58 sp=0xc0004c5b20 pc=0x654c03 bufio.(*Reader).Peek(0xc0000ae180, 0x4) /usr/local/go/src/bufio/bufio.go:151 +0x53 fp=0xc0004c5b78 sp=0xc0004c5b58 pc=0x654d33 net/http.(*conn).serve(0xc000498000, {0x10f1e388, 0xc0003c35c0}) /usr/local/go/src/net/http/server.go:2044 +0x75c fp=0xc0004c5fb8 sp=0xc0004c5b78 pc=0x6cae7c net/http.(*Server).Serve.func3() /usr/local/go/src/net/http/server.go:3086 +0x28 fp=0xc0004c5fe0 sp=0xc0004c5fb8 pc=0x6cf648 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004c5fe8 sp=0xc0004c5fe0 pc=0x46e2c1 created by net/http.(*Server).Serve in goroutine 1 /usr/local/go/src/net/http/server.go:3086 +0x5cb goroutine 67 [IO wait]: runtime.gopark(0x75?, 0xb?, 0x0?, 0x0?, 0xc?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0001418f8 sp=0xc0001418d8 pc=0x43e7ee runtime.netpollblock(0x47f078?, 0x4092a6?, 0x0?) /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc000141930 sp=0xc0001418f8 pc=0x437277 internal/poll.runtime_pollWait(0x7f709be97d88, 0x72) /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc000141950 sp=0xc000141930 pc=0x468a05 internal/poll.(*pollDesc).wait(0xc000734900?, 0xc0000c5000?, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000141978 sp=0xc000141950 pc=0x4efd67 internal/poll.(*pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc000734900, {0xc0000c5000, 0x1000, 0x1000}) /usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc000141a10 sp=0xc000141978 pc=0x4f105a net.(*netFD).Read(0xc000734900, {0xc0000c5000?, 0x4f0225?, 0x0?}) /usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc000141a58 sp=0xc000141a10 pc=0x569e05 net.(*conn).Read(0xc0000de110, {0xc0000c5000?, 0x0?, 0xc0003c3838?}) /usr/local/go/src/net/net.go:179 +0x45 fp=0xc000141aa0 sp=0xc000141a58 pc=0x5780a5 net.(*TCPConn).Read(0xc0003c3830?, {0xc0000c5000?, 0x0?, 0xc000141ac0?}) <autogenerated>:1 +0x25 fp=0xc000141ad0 sp=0xc000141aa0 pc=0x589fa5 net/http.(*connReader).Read(0xc0003c3830, {0xc0000c5000, 0x1000, 0x1000}) /usr/local/go/src/net/http/server.go:791 +0x14b fp=0xc000141b20 sp=0xc000141ad0 pc=0x6c4fcb bufio.(*Reader).fill(0xc000528a20) /usr/local/go/src/bufio/bufio.go:113 +0x103 fp=0xc000141b58 sp=0xc000141b20 pc=0x654c03 bufio.(*Reader).Peek(0xc000528a20, 0x4) /usr/local/go/src/bufio/bufio.go:151 +0x53 fp=0xc000141b78 sp=0xc000141b58 pc=0x654d33 net/http.(*conn).serve(0xc0001322d0, {0x10f1e388, 0xc0003c35c0}) /usr/local/go/src/net/http/server.go:2044 +0x75c fp=0xc000141fb8 sp=0xc000141b78 pc=0x6cae7c net/http.(*Server).Serve.func3() /usr/local/go/src/net/http/server.go:3086 +0x28 fp=0xc000141fe0 sp=0xc000141fb8 pc=0x6cf648 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000141fe8 sp=0xc000141fe0 pc=0x46e2c1 created by net/http.(*Server).Serve in goroutine 1 /usr/local/go/src/net/http/server.go:3086 +0x5cb goroutine 82 [IO wait]: runtime.gopark(0x5?, 0xb?, 0x0?, 0x0?, 0xe?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0004c18f8 sp=0xc0004c18d8 pc=0x43e7ee runtime.netpollblock(0x47f078?, 0x4092a6?, 0x0?) /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc0004c1930 sp=0xc0004c18f8 pc=0x437277 internal/poll.runtime_pollWait(0x7f709be97b98, 0x72) /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc0004c1950 sp=0xc0004c1930 pc=0x468a05 internal/poll.(*pollDesc).wait(0xc000038080?, 0xc000510000?, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0004c1978 sp=0xc0004c1950 pc=0x4efd67 internal/poll.(*pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc000038080, {0xc000510000, 0x1000, 0x1000}) /usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc0004c1a10 sp=0xc0004c1978 pc=0x4f105a net.(*netFD).Read(0xc000038080, {0xc000510000?, 0x4f0225?, 0x0?}) /usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc0004c1a58 sp=0xc0004c1a10 pc=0x569e05 net.(*conn).Read(0xc00063e008, {0xc000510000?, 0x0?, 0xc0001a4338?}) /usr/local/go/src/net/net.go:179 +0x45 fp=0xc0004c1aa0 sp=0xc0004c1a58 pc=0x5780a5 net.(*TCPConn).Read(0xc0001a4330?, {0xc000510000?, 0x0?, 0xc0004c1ac0?}) <autogenerated>:1 +0x25 fp=0xc0004c1ad0 sp=0xc0004c1aa0 pc=0x589fa5 net/http.(*connReader).Read(0xc0001a4330, {0xc000510000, 0x1000, 0x1000}) /usr/local/go/src/net/http/server.go:791 +0x14b fp=0xc0004c1b20 sp=0xc0004c1ad0 pc=0x6c4fcb bufio.(*Reader).fill(0xc00017c7e0) /usr/local/go/src/bufio/bufio.go:113 +0x103 fp=0xc0004c1b58 sp=0xc0004c1b20 pc=0x654c03 bufio.(*Reader).Peek(0xc00017c7e0, 0x4) /usr/local/go/src/bufio/bufio.go:151 +0x53 fp=0xc0004c1b78 sp=0xc0004c1b58 pc=0x654d33 net/http.(*conn).serve(0xc00050e000, {0x10f1e388, 0xc0003c35c0}) /usr/local/go/src/net/http/server.go:2044 +0x75c fp=0xc0004c1fb8 sp=0xc0004c1b78 pc=0x6cae7c net/http.(*Server).Serve.func3() /usr/local/go/src/net/http/server.go:3086 +0x28 fp=0xc0004c1fe0 sp=0xc0004c1fb8 pc=0x6cf648 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004c1fe8 sp=0xc0004c1fe0 pc=0x46e2c1 created by net/http.(*Server).Serve in goroutine 1 /usr/local/go/src/net/http/server.go:3086 +0x5cb goroutine 76 [IO wait]: runtime.gopark(0x10f29a40?, 0xb?, 0x0?, 0x0?, 0x10?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000508da0 sp=0xc000508d80 pc=0x43e7ee runtime.netpollblock(0x47f078?, 0x4092a6?, 0x0?) /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc000508dd8 sp=0xc000508da0 pc=0x437277 internal/poll.runtime_pollWait(0x7f709be979a8, 0x72) /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc000508df8 sp=0xc000508dd8 pc=0x468a05 internal/poll.(*pollDesc).wait(0xc000734280?, 0xc0003f82e1?, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000508e20 sp=0xc000508df8 pc=0x4efd67 internal/poll.(*pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc000734280, {0xc0003f82e1, 0x1, 0x1}) /usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc000508eb8 sp=0xc000508e20 pc=0x4f105a net.(*netFD).Read(0xc000734280, {0xc0003f82e1?, 0xc000508f40?, 0x46a990?}) /usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc000508f00 sp=0xc000508eb8 pc=0x569e05 net.(*conn).Read(0xc0000de328, {0xc0003f82e1?, 0x1?, 0xc0003a0cd0?}) /usr/local/go/src/net/net.go:179 +0x45 fp=0xc000508f48 sp=0xc000508f00 pc=0x5780a5 net.(*TCPConn).Read(0xc00010a4b0?, {0xc0003f82e1?, 0xc0003a0cd0?, 0xc00044ae80?}) <autogenerated>:1 +0x25 fp=0xc000508f78 sp=0xc000508f48 pc=0x589fa5 net/http.(*connReader).backgroundRead(0xc0003f82d0) /usr/local/go/src/net/http/server.go:683 +0x37 fp=0xc000508fc8 sp=0xc000508f78 pc=0x6c4b97 net/http.(*connReader).startBackgroundRead.func2() /usr/local/go/src/net/http/server.go:679 +0x25 fp=0xc000508fe0 sp=0xc000508fc8 pc=0x6c4ac5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000508fe8 sp=0xc000508fe0 pc=0x46e2c1 created by net/http.(*connReader).startBackgroundRead in goroutine 75 /usr/local/go/src/net/http/server.go:679 +0xba goroutine 72 [IO wait]: runtime.gopark(0x45cb3f2c0deee1c4?, 0xb?, 0x0?, 0x0?, 0xf?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00013b5c8 sp=0xc00013b5a8 pc=0x43e7ee runtime.netpollblock(0x47f078?, 0x4092a6?, 0x0?) /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc00013b600 sp=0xc00013b5c8 pc=0x437277 internal/poll.runtime_pollWait(0x7f709be97aa0, 0x72) /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc00013b620 sp=0xc00013b600 pc=0x468a05 internal/poll.(*pollDesc).wait(0xc000038280?, 0xc000443500?, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00013b648 sp=0xc00013b620 pc=0x4efd67 internal/poll.(*pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc000038280, {0xc000443500, 0x1500, 0x1500}) /usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc00013b6e0 sp=0xc00013b648 pc=0x4f105a net.(*netFD).Read(0xc000038280, {0xc000443500?, 0xc00013b7b0?, 0x12?}) /usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc00013b728 sp=0xc00013b6e0 pc=0x569e05 net.(*conn).Read(0xc0000de138, {0xc000443500?, 0xc0000a3380?, 0x27?}) /usr/local/go/src/net/net.go:179 +0x45 fp=0xc00013b770 sp=0xc00013b728 pc=0x5780a5 net.(*TCPConn).Read(0xc00013b808?, {0xc000443500?, 0xc0005b6060?, 0x18?}) <autogenerated>:1 +0x25 fp=0xc00013b7a0 sp=0xc00013b770 pc=0x589fa5 crypto/tls.(*atLeastReader).Read(0xc0005b6060, {0xc000443500?, 0xc0005b6060?, 0x0?}) /usr/local/go/src/crypto/tls/conn.go:805 +0x3b fp=0xc00013b7e8 sp=0xc00013b7a0 pc=0x617ddb bytes.(*Buffer).ReadFrom(0xc000138d28, {0x10f1a0c0, 0xc0005b6060}) /usr/local/go/src/bytes/buffer.go:211 +0x98 fp=0xc00013b840 sp=0xc00013b7e8 pc=0x4a2f18 crypto/tls.(*Conn).readFromUntil(0xc000138a80, {0x10f192c0?, 0xc0000de138}, 0xc00013b948?) /usr/local/go/src/crypto/tls/conn.go:827 +0xde fp=0xc00013b880 sp=0xc00013b840 pc=0x617fbe crypto/tls.(*Conn).readRecordOrCCS(0xc000138a80, 0x0) /usr/local/go/src/crypto/tls/conn.go:625 +0x250 fp=0xc00013bc20 sp=0xc00013b880 pc=0x615590 crypto/tls.(*Conn).readRecord(...) /usr/local/go/src/crypto/tls/conn.go:587 crypto/tls.(*Conn).Read(0xc000138a80, {0xc0000c4000, 0x1000, 0x10f21318?}) /usr/local/go/src/crypto/tls/conn.go:1369 +0x158 fp=0xc00013bc90 sp=0xc00013bc20 pc=0x61b858 bufio.(*Reader).Read(0xc000528ea0, {0xc0001b2120, 0x9, 0x6f064e?}) /usr/local/go/src/bufio/bufio.go:244 +0x197 fp=0xc00013bcc8 sp=0xc00013bc90 pc=0x655137 io.ReadAtLeast({0x10f19600, 0xc000528ea0}, {0xc0001b2120, 0x9, 0x9}, 0x9) /usr/local/go/src/io/io.go:335 +0x90 fp=0xc00013bd10 sp=0xc00013bcc8 pc=0x49ac50 io.ReadFull(...) /usr/local/go/src/io/io.go:354 net/http.http2readFrameHeader({0xc0001b2120, 0x9, 0x6b6472?}, {0x10f19600?, 0xc000528ea0?}) /usr/local/go/src/net/http/h2_bundle.go:1635 +0x65 fp=0xc00013bd60 sp=0xc00013bd10 pc=0x68e905 net/http.(*http2Framer).ReadFrame(0xc0001b20e0) /usr/local/go/src/net/http/h2_bundle.go:1899 +0x85 fp=0xc00013be08 sp=0xc00013bd60 pc=0x68f045 net/http.(*http2clientConnReadLoop).run(0xc00013bf98) /usr/local/go/src/net/http/h2_bundle.go:9338 +0x11f fp=0xc00013bf60 sp=0xc00013be08 pc=0x6b1eff net/http.(*http2ClientConn).readLoop(0xc000002000) /usr/local/go/src/net/http/h2_bundle.go:9233 +0x65 fp=0xc00013bfc8 sp=0xc00013bf60 pc=0x6b1485 net/http.(*http2Transport).newClientConn.func3() /usr/local/go/src/net/http/h2_bundle.go:7905 +0x25 fp=0xc00013bfe0 sp=0xc00013bfc8 pc=0x6aa365 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00013bfe8 sp=0xc00013bfe0 pc=0x46e2c1 created by net/http.(*http2Transport).newClientConn in goroutine 71 /usr/local/go/src/net/http/h2_bundle.go:7905 +0xcbe rax 0x6 rbx 0x7f703bdfee70 rcx 0x7f709c234387 rdx 0x6 rdi 0x1 rsi 0x15 rbp 0x0 rsp 0x7f703bdfed40 r8 0x0 r9 0x7f703bdfec90 r10 0x8 r11 0x202 r12 0x7f6fec8f23b0 r13 0x7f6fec8f0eb0 r14 0x7f703bdfefa8 r15 0x7f703bdff2e8 rip 0x7f709c235bc7 rflags 0x10246 cs 0x33 fs 0x0 gs 0x0 ``` </details>
Author
Owner

@Venefilyn commented on GitHub (Feb 16, 2024):

Nevermind my above comment, I think it's because I run a GPU that doesn't support ROCm, Radeon RX 5600 XT

<!-- gh-comment-id:1948635358 --> @Venefilyn commented on GitHub (Feb 16, 2024): Nevermind my above comment, I think it's because I run a GPU that doesn't support ROCm, Radeon RX 5600 XT
Author
Owner

@askareija commented on GitHub (Feb 17, 2024):

okay so after many rebuilds i've got my AMD GPU detected on ROCm.

time=2024-02-17T10:55:35.266+07:00 level=INFO source=routes.go:1014 msg="Listening on [::]:11434 (version 0.0.0)"
time=2024-02-17T10:55:35.266+07:00 level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-02-17T10:55:35.350+07:00 level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [rocm_v1 cpu cpu_avx2]"
time=2024-02-17T10:55:35.350+07:00 level=INFO source=gpu.go:94 msg="Detecting GPU type"
time=2024-02-17T10:55:35.350+07:00 level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-02-17T10:55:35.354+07:00 level=INFO source=gpu.go:311 msg="Discovered GPU libraries: []"
time=2024-02-17T10:55:35.354+07:00 level=INFO source=gpu.go:265 msg="Searching for GPU management library librocm_smi64.so"
time=2024-02-17T10:55:35.355+07:00 level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.1.0]"
time=2024-02-17T10:55:35.359+07:00 level=INFO source=gpu.go:109 msg="Radeon GPU detected"
time=2024-02-17T10:55:35.359+07:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-02-17T10:55:35.360+07:00 level=INFO source=gpu.go:199 msg="ROCm integrated GPU detected - ROCR_VISIBLE_DEVICES=0"
[GIN] 2024/02/17 - 10:55:52 | 200 |      40.358µs |      172.17.0.3 | GET      "/api/version"
[GIN] 2024/02/17 - 10:55:52 | 200 |     525.262µs |      172.17.0.3 | GET      "/api/tags"
time=2024-02-17T10:55:59.164+07:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-02-17T10:55:59.164+07:00 level=INFO source=gpu.go:199 msg="ROCm integrated GPU detected - ROCR_VISIBLE_DEVICES=0"
time=2024-02-17T10:55:59.164+07:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-02-17T10:55:59.164+07:00 level=INFO source=gpu.go:199 msg="ROCm integrated GPU detected - ROCR_VISIBLE_DEVICES=0"
time=2024-02-17T10:55:59.164+07:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
loading library /tmp/ollama1475251691/rocm_v1/libext_server.so
time=2024-02-17T10:55:59.194+07:00 level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama1475251691/rocm_v1/libext_server.so"
time=2024-02-17T10:55:59.194+07:00 level=INFO source=dyn_ext_server.go:145 msg="Initializing llama server"
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 ROCm devices:
  Device 0: AMD Radeon RX 6500M, compute capability 10.3, VMM: no
llama_model_loader: loaded meta data with 20 key-value pairs and 363 tensors from /home/aden/.ollama/models/blobs/sha256:e73cc17c718156e5ad34b119eb363e2c10389a503673f9c36144c42dfde8334c (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.

now the weird thing is when i use codellama:7b it's still using my CPU insted of GPU. my GPU usage only used about 9% below (monitoring using amdgpu_top & nvtop)

gambar

But, if i'm using deepseek-coder:1.3b it's blazingly fast and my GPU usage goes up like 50% or more.

gambar

gambar

so... what's seem to be the problem here?

<!-- gh-comment-id:1949639201 --> @askareija commented on GitHub (Feb 17, 2024): okay so after many rebuilds i've got my AMD GPU detected on ROCm. ``` time=2024-02-17T10:55:35.266+07:00 level=INFO source=routes.go:1014 msg="Listening on [::]:11434 (version 0.0.0)" time=2024-02-17T10:55:35.266+07:00 level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." time=2024-02-17T10:55:35.350+07:00 level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [rocm_v1 cpu cpu_avx2]" time=2024-02-17T10:55:35.350+07:00 level=INFO source=gpu.go:94 msg="Detecting GPU type" time=2024-02-17T10:55:35.350+07:00 level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so" time=2024-02-17T10:55:35.354+07:00 level=INFO source=gpu.go:311 msg="Discovered GPU libraries: []" time=2024-02-17T10:55:35.354+07:00 level=INFO source=gpu.go:265 msg="Searching for GPU management library librocm_smi64.so" time=2024-02-17T10:55:35.355+07:00 level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.1.0]" time=2024-02-17T10:55:35.359+07:00 level=INFO source=gpu.go:109 msg="Radeon GPU detected" time=2024-02-17T10:55:35.359+07:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-02-17T10:55:35.360+07:00 level=INFO source=gpu.go:199 msg="ROCm integrated GPU detected - ROCR_VISIBLE_DEVICES=0" [GIN] 2024/02/17 - 10:55:52 | 200 | 40.358µs | 172.17.0.3 | GET "/api/version" [GIN] 2024/02/17 - 10:55:52 | 200 | 525.262µs | 172.17.0.3 | GET "/api/tags" time=2024-02-17T10:55:59.164+07:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-02-17T10:55:59.164+07:00 level=INFO source=gpu.go:199 msg="ROCm integrated GPU detected - ROCR_VISIBLE_DEVICES=0" time=2024-02-17T10:55:59.164+07:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-02-17T10:55:59.164+07:00 level=INFO source=gpu.go:199 msg="ROCm integrated GPU detected - ROCR_VISIBLE_DEVICES=0" time=2024-02-17T10:55:59.164+07:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" loading library /tmp/ollama1475251691/rocm_v1/libext_server.so time=2024-02-17T10:55:59.194+07:00 level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama1475251691/rocm_v1/libext_server.so" time=2024-02-17T10:55:59.194+07:00 level=INFO source=dyn_ext_server.go:145 msg="Initializing llama server" ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes ggml_init_cublas: found 1 ROCm devices: Device 0: AMD Radeon RX 6500M, compute capability 10.3, VMM: no llama_model_loader: loaded meta data with 20 key-value pairs and 363 tensors from /home/aden/.ollama/models/blobs/sha256:e73cc17c718156e5ad34b119eb363e2c10389a503673f9c36144c42dfde8334c (version GGUF V2) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. ``` now the weird thing is when i use `codellama:7b` **it's still using my CPU** insted of GPU. my GPU usage only used about 9% below (monitoring using amdgpu_top & nvtop) ![gambar](https://github.com/ollama/ollama/assets/21377617/68d1cd23-1a97-4624-9393-55bb6c93774c) But, if i'm using `deepseek-coder:1.3b` it's blazingly fast and my GPU usage goes up like 50% or more. ![gambar](https://github.com/ollama/ollama/assets/21377617/180befde-bd02-4996-a5f7-0cb6307a2be0) ![gambar](https://github.com/ollama/ollama/assets/21377617/319290af-fed1-4781-b4c8-43228b95c9cb) so... what's seem to be the problem here?
Author
Owner

@mkesper commented on GitHub (Feb 19, 2024):

Hi Aden,

it would be interesting to know what you changed between the rebuilds.
For the different models: Did you check the VRAM requirements of each? If it won't fit it probably can only be run on CPU.

<!-- gh-comment-id:1951859970 --> @mkesper commented on GitHub (Feb 19, 2024): Hi Aden, it would be interesting to know what you changed between the rebuilds. For the different models: Did you check the VRAM requirements of each? If it won't fit it probably can only be run on CPU.
Author
Owner

@askareija commented on GitHub (Feb 19, 2024):

@mkesper

Hi Aden,

it would be interesting to know what you changed between the rebuilds. For the different models: Did you check the VRAM requirements of each? If it won't fit it probably can only be run on CPU.

Hi Michael,

Here's the step i've gone through to make it works on my laptop (on Arch Linux):

  1. Clone ollama git clone --recursive https://github.com/ollama/ollama.git
  2. go to directory cd ollama
  3. Install dependencies sudo pacman -S rocm-hip-sdk rocm-opencl-sdk clblast go
  4. Build ollama with env AMDGPU_TARGET=gfx1030 ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./...
  5. go build -tags rocm
  6. Now to run it HSA_OVERRIDE_GFX_VERSION=10.3.0 ./ollama serve

Now i have working ollama with my GPU, and yes after checking the VRAM looks like it's not fit so it's using my CPU. My solution was make my own model:

Modelfile

FROM codellama:7b
PARAMETER num_gpu 22
PARAMETER num_thread 6

and then create custom model:
./ollama create codellama:7b-22 -f ./Modelfile

now i have utilized my GPU:
gambar

finally it's working

<!-- gh-comment-id:1952085729 --> @askareija commented on GitHub (Feb 19, 2024): @mkesper > Hi Aden, > > it would be interesting to know what you changed between the rebuilds. For the different models: Did you check the VRAM requirements of each? If it won't fit it probably can only be run on CPU. Hi Michael, Here's the step i've gone through to make it works on my laptop (on Arch Linux): 1. Clone ollama `git clone --recursive https://github.com/ollama/ollama.git` 2. go to directory `cd ollama` 3. Install dependencies `sudo pacman -S rocm-hip-sdk rocm-opencl-sdk clblast go` 4. Build ollama with env `AMDGPU_TARGET=gfx1030 ROCM_PATH=/opt/rocm CLBlast_DIR=/usr/lib/cmake/CLBlast go generate -tags rocm ./...` 5. `go build -tags rocm` 6. Now to run it `HSA_OVERRIDE_GFX_VERSION=10.3.0 ./ollama serve` Now i have working ollama with my GPU, and yes after checking the VRAM looks like it's not fit so it's using my CPU. My solution was make my own model: `Modelfile` ``` Dockerfile FROM codellama:7b PARAMETER num_gpu 22 PARAMETER num_thread 6 ``` and then create custom model: `./ollama create codellama:7b-22 -f ./Modelfile` now i have utilized my GPU: ![gambar](https://github.com/ollama/ollama/assets/21377617/30a74107-71a0-427c-a117-158bb8aad262) finally it's working
Author
Owner

@badverybadboy commented on GitHub (Feb 20, 2024):

I have a working docker setup for my RX6700XT on Debian testing/unstable. I am using latest ROCM libraries for Debian. I run the container like this:

docker run -d -p 11434:11434 --name ollama -e HSA_OVERRIDE_GFX_VERSION='10.3.0' --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v ollama:/root/.ollama ollama/ollama:0.1.24-rocm

Some logs from the conainer:

llm_load_tensors: ggml ctx size =    0.76 MiB
llm_load_tensors: offloading 12 repeating layers to GPU
llm_load_tensors: offloaded 12/33 layers to GPU
llm_load_tensors:      ROCm0 buffer size =  9391.12 MiB
llm_load_tensors:        CPU buffer size = 25215.87 MiB
....................................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: freq_base  = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      ROCm0 KV buffer size =    96.00 MiB
llama_kv_cache_init:  ROCm_Host KV buffer size =   160.00 MiB
llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
llama_new_context_with_model:  ROCm_Host input buffer size   =    12.01 MiB
llama_new_context_with_model:      ROCm0 compute buffer size =   211.21 MiB
llama_new_context_with_model:  ROCm_Host compute buffer size =   198.03 MiB
llama_new_context_with_model: graph splits (measure): 5

Is it possible for you to share the docker image, or share detailed instructions. Does the host os need to have rocm support?

<!-- gh-comment-id:1953350777 --> @badverybadboy commented on GitHub (Feb 20, 2024): > I have a working docker setup for my RX6700XT on Debian testing/unstable. I am using latest ROCM libraries for Debian. I run the container like this: > ``` > docker run -d -p 11434:11434 --name ollama -e HSA_OVERRIDE_GFX_VERSION='10.3.0' --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v ollama:/root/.ollama ollama/ollama:0.1.24-rocm > ``` > Some logs from the conainer: > ``` > llm_load_tensors: ggml ctx size = 0.76 MiB > llm_load_tensors: offloading 12 repeating layers to GPU > llm_load_tensors: offloaded 12/33 layers to GPU > llm_load_tensors: ROCm0 buffer size = 9391.12 MiB > llm_load_tensors: CPU buffer size = 25215.87 MiB > .................................................................................................... > llama_new_context_with_model: n_ctx = 2048 > llama_new_context_with_model: freq_base = 1000000.0 > llama_new_context_with_model: freq_scale = 1 > llama_kv_cache_init: ROCm0 KV buffer size = 96.00 MiB > llama_kv_cache_init: ROCm_Host KV buffer size = 160.00 MiB > llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB > llama_new_context_with_model: ROCm_Host input buffer size = 12.01 MiB > llama_new_context_with_model: ROCm0 compute buffer size = 211.21 MiB > llama_new_context_with_model: ROCm_Host compute buffer size = 198.03 MiB > llama_new_context_with_model: graph splits (measure): 5 > ``` Is it possible for you to share the docker image, or share detailed instructions. Does the host os need to have rocm support?
Author
Owner

@sid-cypher commented on GitHub (Feb 20, 2024):

Does the host os need to have rocm support?

You'll need the amdgpu DKMS drivers from the ROCm package on the host OS.
As for the image builing, is the Dockerfile not enough?

<!-- gh-comment-id:1953891473 --> @sid-cypher commented on GitHub (Feb 20, 2024): > Does the host os need to have rocm support? You'll need the amdgpu DKMS drivers from the ROCm package on the host OS. As for the image builing, is the [Dockerfile](https://github.com/ollama/ollama/blob/main/Dockerfile) not enough?
Author
Owner

@shanoaice commented on GitHub (Feb 22, 2024):

I do wonder, is this possible on Windows? AMD had recently added HIP SDK support for Windows, or does this require something else that's not currently possible?

<!-- gh-comment-id:1958487531 --> @shanoaice commented on GitHub (Feb 22, 2024): I do wonder, is this possible on Windows? AMD had recently added HIP SDK support for Windows, or does this require something else that's not currently possible?
Author
Owner

@dhiltgen commented on GitHub (Feb 26, 2024):

@shanoaice I've opened #2598 to track Radeon native Windows support

<!-- gh-comment-id:1965056470 --> @dhiltgen commented on GitHub (Feb 26, 2024): @shanoaice I've opened #2598 to track Radeon native Windows support
Author
Owner

@mkesper commented on GitHub (Feb 27, 2024):

No luck with iGPU and ollama:0.1.27-rocm.
Does not find a /sys/module/amdgpu/version and reports negative memory.

time=2024-02-27T15:20:37.354Z level=INFO source=images.go:710 msg="total blobs: 8" time=2024-02-27T15:20:37.354Z level=INFO source=images.go:717 msg="total unused blobs removed: 0" time=2024-02-27T15:20:37.355Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.27)" time=2024-02-27T15:20:37.355Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." time=2024-02-27T15:20:39.398Z level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [rocm_v6 cpu_avx2 cuda_v11 cpu rocm_v5 cpu_avx]" time=2024-02-27T15:20:39.398Z level=DEBUG source=payload_common.go:147 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-02-27T15:20:39.398Z level=INFO source=gpu.go:94 msg="Detecting GPU type" time=2024-02-27T15:20:39.398Z level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so" time=2024-02-27T15:20:39.398Z level=DEBUG source=gpu.go:283 msg="gpu management search paths: [/usr/local/cuda/lib64/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/libnvidia-ml.so* /usr/lib/wsl/lib/libnvidia-ml.so* /usr/lib/wsl/drivers/*/libnvidia-ml.so* /opt/cuda/lib64/libnvidia-ml.so* /usr/lib*/libnvidia-ml.so* /usr/local/lib*/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/libnvidia-ml.so* /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so* /opt/rocm/lib/libnvidia-ml.so* /usr/local/lib/libnvidia-ml.so* /opt/rh/devtoolset-7/root/libnvidia-ml.so*]" time=2024-02-27T15:20:39.399Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: []" time=2024-02-27T15:20:39.399Z level=INFO source=gpu.go:265 msg="Searching for GPU management library librocm_smi64.so" time=2024-02-27T15:20:39.399Z level=DEBUG source=gpu.go:283 msg="gpu management search paths: [/opt/rocm*/lib*/librocm_smi64.so* /opt/rocm/lib/librocm_smi64.so* /usr/local/lib/librocm_smi64.so* /opt/rh/devtoolset-7/root/librocm_smi64.so*]" time=2024-02-27T15:20:39.400Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0.50701 /opt/rocm-5.7.1/lib/librocm_smi64.so.5.0.50701]" wiring rocm management library functions in /opt/rocm/lib/librocm_smi64.so.5.0.50701 dlsym: rsmi_init dlsym: rsmi_shut_down dlsym: rsmi_dev_memory_total_get dlsym: rsmi_dev_memory_usage_get dlsym: rsmi_version_get dlsym: rsmi_num_monitor_devices dlsym: rsmi_dev_id_get dlsym: rsmi_dev_name_get dlsym: rsmi_dev_brand_get dlsym: rsmi_dev_vendor_name_get dlsym: rsmi_dev_vram_vendor_get dlsym: rsmi_dev_serial_number_get dlsym: rsmi_dev_subsystem_name_get dlsym: rsmi_dev_vbios_version_get time=2024-02-27T15:20:39.401Z level=INFO source=gpu.go:109 msg="Radeon GPU detected" time=2024-02-27T15:20:39.401Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-02-27T15:20:39.401Z level=DEBUG source=gpu.go:158 msg="error looking up amd driver version: %s" !BADKEY="amdgpu file stat error: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-02-27T15:20:39.401Z level=DEBUG source=amd.go:76 msg="malformed gfx_target_version 0" discovered 1 ROCm GPU Devices [0] ROCm device name: Rembrandt [Radeon 680M] [0] ROCm brand: Rembrandt [Radeon 680M] [0] ROCm vendor: Advanced Micro Devices, Inc. [AMD/ATI] [0] ROCm VRAM vendor: unknown rsmi_dev_serial_number_get failed: 2 [0] ROCm subsystem name: 0x50b4 [0] ROCm vbios version: 113-REMBRANDT-X37 [0] ROCm totalMem 1073741824 [0] ROCm usedMem 925331456 time=2024-02-27T15:20:39.404Z level=DEBUG source=gpu.go:254 msg="rocm detected 1 devices with -882M available memory"

dmesg|grep amdgpu:
[ 2.451312] [drm] amdgpu kernel modesetting enabled. [ 2.461887] amdgpu: Virtual CRAT table created for CPU [ 2.461910] amdgpu: Topology: Add CPU node [ 2.462061] amdgpu 0000:33:00.0: enabling device (0006 -> 0007) [ 2.465847] amdgpu 0000:33:00.0: amdgpu: Fetched VBIOS from VFCT [ 2.465850] amdgpu: ATOM BIOS: 113-REMBRANDT-X37 [ 2.465893] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_toc.bin [ 2.466021] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_ta.bin [ 2.466207] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_dmcub.bin [ 2.466384] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_pfp.bin [ 2.466569] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_me.bin [ 2.466755] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_ce.bin [ 2.466925] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_rlc.bin [ 2.467074] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_mec.bin [ 2.467265] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_mec2.bin [ 2.467598] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_vcn.bin [ 2.468072] amdgpu 0000:33:00.0: vgaarb: deactivate vga console [ 2.468075] amdgpu 0000:33:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default) [ 2.468122] amdgpu 0000:33:00.0: amdgpu: VRAM: 1024M 0x000000F400000000 - 0x000000F43FFFFFFF (1024M used) [ 2.468124] amdgpu 0000:33:00.0: amdgpu: GART: 1024M 0x0000000000000000 - 0x000000003FFFFFFF [ 2.468126] amdgpu 0000:33:00.0: amdgpu: AGP: 267419648M 0x000000F800000000 - 0x0000FFFFFFFFFFFF [ 2.469069] [drm] amdgpu: 1024M of VRAM memory ready [ 2.469075] [drm] amdgpu: 15429M of GTT memory ready. [ 2.470296] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_sdma.bin [ 2.470358] amdgpu 0000:33:00.0: amdgpu: Will use PSP to load VCN firmware [ 2.658417] amdgpu 0000:33:00.0: amdgpu: RAS: optional ras ta ucode is not available [ 2.670602] amdgpu 0000:33:00.0: amdgpu: RAP: optional rap ta ucode is not available [ 2.670605] amdgpu 0000:33:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available [ 2.673063] amdgpu 0000:33:00.0: amdgpu: SMU is initialized successfully! [ 2.867007] kfd kfd: amdgpu: Allocated 3969056 bytes on gart [ 2.867028] kfd kfd: amdgpu: Total number of KFD nodes to be created: 1 [ 2.867223] amdgpu: Virtual CRAT table created for GPU [ 2.867381] amdgpu: Topology: Add dGPU node [0x1681:0x1002] [ 2.867383] kfd kfd: amdgpu: added device 1002:1681 [ 2.867396] amdgpu 0000:33:00.0: amdgpu: SE 1, SH per SE 2, CU per SH 6, active_cu_number 12 [ 2.867583] amdgpu 0000:33:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0 [ 2.867585] amdgpu 0000:33:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0 [ 2.867587] amdgpu 0000:33:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0 [ 2.867588] amdgpu 0000:33:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0 [ 2.867589] amdgpu 0000:33:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0 [ 2.867590] amdgpu 0000:33:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0 [ 2.867591] amdgpu 0000:33:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0 [ 2.867592] amdgpu 0000:33:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0 [ 2.867593] amdgpu 0000:33:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0 [ 2.867594] amdgpu 0000:33:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0 [ 2.867595] amdgpu 0000:33:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0 [ 2.867597] amdgpu 0000:33:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8 [ 2.867598] amdgpu 0000:33:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8 [ 2.867599] amdgpu 0000:33:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8 [ 2.867600] amdgpu 0000:33:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8 [ 2.876353] [drm] Initialized amdgpu 3.54.0 20150101 for 0000:33:00.0 on minor 0 [ 2.886076] fbcon: amdgpudrmfb (fb0) is primary device [ 3.809581] amdgpu 0000:33:00.0: [drm] fb0: amdgpudrmfb frame buffer device [ 16.739507] snd_hda_intel 0000:33:00.1: bound 0000:33:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu]) [ 9128.244511] amdgpu 0000:33:00.0: amdgpu: SMU is resuming... [ 9128.246334] amdgpu 0000:33:00.0: amdgpu: SMU is resumed successfully! [ 9129.206014] amdgpu 0000:33:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0 [ 9129.206017] amdgpu 0000:33:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0 [ 9129.206019] amdgpu 0000:33:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0 [ 9129.206021] amdgpu 0000:33:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0 [ 9129.206022] amdgpu 0000:33:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0 [ 9129.206024] amdgpu 0000:33:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0 [ 9129.206025] amdgpu 0000:33:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0 [ 9129.206027] amdgpu 0000:33:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0 [ 9129.206028] amdgpu 0000:33:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0 [ 9129.206030] amdgpu 0000:33:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0 [ 9129.206032] amdgpu 0000:33:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0 [ 9129.206034] amdgpu 0000:33:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8 [ 9129.206035] amdgpu 0000:33:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8 [ 9129.206036] amdgpu 0000:33:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8 [ 9129.206038] amdgpu 0000:33:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8

<!-- gh-comment-id:1966826004 --> @mkesper commented on GitHub (Feb 27, 2024): No luck with iGPU and ollama:0.1.27-rocm. Does not find a /sys/module/amdgpu/version and reports negative memory. `time=2024-02-27T15:20:37.354Z level=INFO source=images.go:710 msg="total blobs: 8" time=2024-02-27T15:20:37.354Z level=INFO source=images.go:717 msg="total unused blobs removed: 0" time=2024-02-27T15:20:37.355Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.27)" time=2024-02-27T15:20:37.355Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." time=2024-02-27T15:20:39.398Z level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [rocm_v6 cpu_avx2 cuda_v11 cpu rocm_v5 cpu_avx]" time=2024-02-27T15:20:39.398Z level=DEBUG source=payload_common.go:147 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-02-27T15:20:39.398Z level=INFO source=gpu.go:94 msg="Detecting GPU type" time=2024-02-27T15:20:39.398Z level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so" time=2024-02-27T15:20:39.398Z level=DEBUG source=gpu.go:283 msg="gpu management search paths: [/usr/local/cuda/lib64/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/libnvidia-ml.so* /usr/lib/wsl/lib/libnvidia-ml.so* /usr/lib/wsl/drivers/*/libnvidia-ml.so* /opt/cuda/lib64/libnvidia-ml.so* /usr/lib*/libnvidia-ml.so* /usr/local/lib*/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/libnvidia-ml.so* /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so* /opt/rocm/lib/libnvidia-ml.so* /usr/local/lib/libnvidia-ml.so* /opt/rh/devtoolset-7/root/libnvidia-ml.so*]" time=2024-02-27T15:20:39.399Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: []" time=2024-02-27T15:20:39.399Z level=INFO source=gpu.go:265 msg="Searching for GPU management library librocm_smi64.so" time=2024-02-27T15:20:39.399Z level=DEBUG source=gpu.go:283 msg="gpu management search paths: [/opt/rocm*/lib*/librocm_smi64.so* /opt/rocm/lib/librocm_smi64.so* /usr/local/lib/librocm_smi64.so* /opt/rh/devtoolset-7/root/librocm_smi64.so*]" time=2024-02-27T15:20:39.400Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0.50701 /opt/rocm-5.7.1/lib/librocm_smi64.so.5.0.50701]" wiring rocm management library functions in /opt/rocm/lib/librocm_smi64.so.5.0.50701 dlsym: rsmi_init dlsym: rsmi_shut_down dlsym: rsmi_dev_memory_total_get dlsym: rsmi_dev_memory_usage_get dlsym: rsmi_version_get dlsym: rsmi_num_monitor_devices dlsym: rsmi_dev_id_get dlsym: rsmi_dev_name_get dlsym: rsmi_dev_brand_get dlsym: rsmi_dev_vendor_name_get dlsym: rsmi_dev_vram_vendor_get dlsym: rsmi_dev_serial_number_get dlsym: rsmi_dev_subsystem_name_get dlsym: rsmi_dev_vbios_version_get time=2024-02-27T15:20:39.401Z level=INFO source=gpu.go:109 msg="Radeon GPU detected" time=2024-02-27T15:20:39.401Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-02-27T15:20:39.401Z level=DEBUG source=gpu.go:158 msg="error looking up amd driver version: %s" !BADKEY="amdgpu file stat error: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-02-27T15:20:39.401Z level=DEBUG source=amd.go:76 msg="malformed gfx_target_version 0" discovered 1 ROCm GPU Devices [0] ROCm device name: Rembrandt [Radeon 680M] [0] ROCm brand: Rembrandt [Radeon 680M] [0] ROCm vendor: Advanced Micro Devices, Inc. [AMD/ATI] [0] ROCm VRAM vendor: unknown rsmi_dev_serial_number_get failed: 2 [0] ROCm subsystem name: 0x50b4 [0] ROCm vbios version: 113-REMBRANDT-X37 [0] ROCm totalMem 1073741824 [0] ROCm usedMem 925331456 time=2024-02-27T15:20:39.404Z level=DEBUG source=gpu.go:254 msg="rocm detected 1 devices with -882M available memory" ` dmesg|grep amdgpu: `[ 2.451312] [drm] amdgpu kernel modesetting enabled. [ 2.461887] amdgpu: Virtual CRAT table created for CPU [ 2.461910] amdgpu: Topology: Add CPU node [ 2.462061] amdgpu 0000:33:00.0: enabling device (0006 -> 0007) [ 2.465847] amdgpu 0000:33:00.0: amdgpu: Fetched VBIOS from VFCT [ 2.465850] amdgpu: ATOM BIOS: 113-REMBRANDT-X37 [ 2.465893] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_toc.bin [ 2.466021] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_ta.bin [ 2.466207] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_dmcub.bin [ 2.466384] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_pfp.bin [ 2.466569] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_me.bin [ 2.466755] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_ce.bin [ 2.466925] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_rlc.bin [ 2.467074] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_mec.bin [ 2.467265] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_mec2.bin [ 2.467598] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_vcn.bin [ 2.468072] amdgpu 0000:33:00.0: vgaarb: deactivate vga console [ 2.468075] amdgpu 0000:33:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default) [ 2.468122] amdgpu 0000:33:00.0: amdgpu: VRAM: 1024M 0x000000F400000000 - 0x000000F43FFFFFFF (1024M used) [ 2.468124] amdgpu 0000:33:00.0: amdgpu: GART: 1024M 0x0000000000000000 - 0x000000003FFFFFFF [ 2.468126] amdgpu 0000:33:00.0: amdgpu: AGP: 267419648M 0x000000F800000000 - 0x0000FFFFFFFFFFFF [ 2.469069] [drm] amdgpu: 1024M of VRAM memory ready [ 2.469075] [drm] amdgpu: 15429M of GTT memory ready. [ 2.470296] amdgpu 0000:33:00.0: firmware: direct-loading firmware amdgpu/yellow_carp_sdma.bin [ 2.470358] amdgpu 0000:33:00.0: amdgpu: Will use PSP to load VCN firmware [ 2.658417] amdgpu 0000:33:00.0: amdgpu: RAS: optional ras ta ucode is not available [ 2.670602] amdgpu 0000:33:00.0: amdgpu: RAP: optional rap ta ucode is not available [ 2.670605] amdgpu 0000:33:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available [ 2.673063] amdgpu 0000:33:00.0: amdgpu: SMU is initialized successfully! [ 2.867007] kfd kfd: amdgpu: Allocated 3969056 bytes on gart [ 2.867028] kfd kfd: amdgpu: Total number of KFD nodes to be created: 1 [ 2.867223] amdgpu: Virtual CRAT table created for GPU [ 2.867381] amdgpu: Topology: Add dGPU node [0x1681:0x1002] [ 2.867383] kfd kfd: amdgpu: added device 1002:1681 [ 2.867396] amdgpu 0000:33:00.0: amdgpu: SE 1, SH per SE 2, CU per SH 6, active_cu_number 12 [ 2.867583] amdgpu 0000:33:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0 [ 2.867585] amdgpu 0000:33:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0 [ 2.867587] amdgpu 0000:33:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0 [ 2.867588] amdgpu 0000:33:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0 [ 2.867589] amdgpu 0000:33:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0 [ 2.867590] amdgpu 0000:33:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0 [ 2.867591] amdgpu 0000:33:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0 [ 2.867592] amdgpu 0000:33:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0 [ 2.867593] amdgpu 0000:33:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0 [ 2.867594] amdgpu 0000:33:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0 [ 2.867595] amdgpu 0000:33:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0 [ 2.867597] amdgpu 0000:33:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8 [ 2.867598] amdgpu 0000:33:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8 [ 2.867599] amdgpu 0000:33:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8 [ 2.867600] amdgpu 0000:33:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8 [ 2.876353] [drm] Initialized amdgpu 3.54.0 20150101 for 0000:33:00.0 on minor 0 [ 2.886076] fbcon: amdgpudrmfb (fb0) is primary device [ 3.809581] amdgpu 0000:33:00.0: [drm] fb0: amdgpudrmfb frame buffer device [ 16.739507] snd_hda_intel 0000:33:00.1: bound 0000:33:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu]) [ 9128.244511] amdgpu 0000:33:00.0: amdgpu: SMU is resuming... [ 9128.246334] amdgpu 0000:33:00.0: amdgpu: SMU is resumed successfully! [ 9129.206014] amdgpu 0000:33:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0 [ 9129.206017] amdgpu 0000:33:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0 [ 9129.206019] amdgpu 0000:33:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0 [ 9129.206021] amdgpu 0000:33:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0 [ 9129.206022] amdgpu 0000:33:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0 [ 9129.206024] amdgpu 0000:33:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0 [ 9129.206025] amdgpu 0000:33:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0 [ 9129.206027] amdgpu 0000:33:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0 [ 9129.206028] amdgpu 0000:33:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0 [ 9129.206030] amdgpu 0000:33:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0 [ 9129.206032] amdgpu 0000:33:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0 [ 9129.206034] amdgpu 0000:33:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8 [ 9129.206035] amdgpu 0000:33:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8 [ 9129.206036] amdgpu 0000:33:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8 [ 9129.206038] amdgpu 0000:33:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8`
Author
Owner

@mkesper commented on GitHub (Feb 27, 2024):

Does the host os need to have rocm support?

You'll need the amdgpu DKMS drivers from the ROCm package on the host OS. As for the image builing, is the Dockerfile not enough?

I for one wasn't able to build an image with it yet:

~/g/u/ollama (main)> ./scripts/build_docker.sh

[+] Building 0.0s (0/0)                                                                                                    docker:default
ERROR: docker exporter does not currently support exporting manifest lists
<!-- gh-comment-id:1966841350 --> @mkesper commented on GitHub (Feb 27, 2024): > > Does the host os need to have rocm support? > > You'll need the amdgpu DKMS drivers from the ROCm package on the host OS. As for the image builing, is the [Dockerfile](https://github.com/ollama/ollama/blob/main/Dockerfile) not enough? I for one wasn't able to build an image with it yet: ``` ~/g/u/ollama (main)> ./scripts/build_docker.sh [+] Building 0.0s (0/0) docker:default ERROR: docker exporter does not currently support exporting manifest lists ```
Author
Owner

@dhiltgen commented on GitHub (Mar 2, 2024):

I know everyone's eager for a more stable AMD GPU setup for Ollama, so I wanted to give a quick update on where we're at and the current plan.

I just got Radeon cards working in windows, so I should have a PR up in the next day or two adding support for Windows ROCm (tracked via #2598)

I know a lot of folks have been seeing crashes with ROCm v5 on linux. Based on feedback from folks at AMD, they've recommended we focus on the latest major version of ROCm for the best experience, so I'm going to pivot to only supporting v6 on Linux in our official builds. We'll set this up so we can auto-detect v6 if already installed in the system, and if not detected, the install script will download an artifact from our releases page if we see an AMD GPU. The goal is you should only need to have the driver installed and Ollama will take care of the library dependencies.

On windows, v6 has not yet shipped, so we'll use v5 for now, but I believe the v6 release is imminent, so we'll switch to that once it's available. Again, we'll aim to carry the library in the installer to streamline the user experience, although it will result in a much larger installer due to the size of the ROCm tensor data files.

For folks with older GPUs that aren't supported by v6 (e.g. RX 580 #2453) what I'm hoping to do is refine our build process so you could install an older ROCm library version that does support it, and build from source locally to get a working setup. That will take some more work as workarounds are required, but that's our goal.

We're going to focus on Discrete GPUs first and get those stable, then we'll come back to add support for iGPUs with #2637

Thanks everyone for your patience as we work through the best approach to support Radeon GPUs.

<!-- gh-comment-id:1974874171 --> @dhiltgen commented on GitHub (Mar 2, 2024): I know everyone's eager for a more stable AMD GPU setup for Ollama, so I wanted to give a quick update on where we're at and the current plan. I just got Radeon cards working in windows, so I should have a PR up in the next day or two adding support for Windows ROCm (tracked via #2598) I know a lot of folks have been seeing crashes with ROCm v5 on linux. Based on feedback from folks at AMD, they've recommended we focus on the latest major version of ROCm for the best experience, so I'm going to pivot to only supporting v6 on Linux in our official builds. We'll set this up so we can auto-detect v6 if already installed in the system, and if not detected, the install script will download an artifact from our releases page if we see an AMD GPU. The goal is you should only need to have the driver installed and Ollama will take care of the library dependencies. On windows, v6 has not yet shipped, so we'll use v5 for now, but I believe the v6 release is imminent, so we'll switch to that once it's available. Again, we'll aim to carry the library in the installer to streamline the user experience, although it will result in a much larger installer due to the size of the ROCm tensor data files. For folks with older GPUs that aren't supported by v6 (e.g. RX 580 #2453) what I'm hoping to do is refine our build process so you could install an older ROCm library version that does support it, and build from source locally to get a working setup. That will take some more work as workarounds are required, but that's our goal. We're going to focus on Discrete GPUs first and get those stable, then we'll come back to add support for iGPUs with #2637 Thanks everyone for your patience as we work through the best approach to support Radeon GPUs.
Author
Owner

@mnn commented on GitHub (Mar 3, 2024):

Based on feedback from folks at AMD, they've recommended we focus on the latest major version of ROCm for the best experience

As far as I know that is not supported by other AI stuff like automatic1111. Just yesterday I tried to update ROCm to 6 and that ended in a frozen PC. So until at least a1111 supports ROCm 6, I am staying on stable 5.6.1 and text-generation-webui. To be frank, ollama even on CPU was still far from usable (kept unloading model after few seconds, making it painfully slow to use). Kinda disappointed I wasted hours on this and it is still so far away...

<!-- gh-comment-id:1975050660 --> @mnn commented on GitHub (Mar 3, 2024): > Based on feedback from folks at AMD, they've recommended we focus on the latest major version of ROCm for the best experience As far as I know that is not supported by other AI stuff like automatic1111. Just yesterday I tried to update ROCm to 6 and that ended in a frozen PC. So until at least a1111 supports ROCm 6, I am staying on stable 5.6.1 and text-generation-webui. To be frank, ollama even on CPU was still far from usable (kept unloading model after few seconds, making it painfully slow to use). Kinda disappointed I wasted hours on this and it is still so far away...
Author
Owner

@sid-cypher commented on GitHub (Mar 3, 2024):

So until at least a1111 supports ROCm 6

A1111 relies on Torch, and using A1111 with

export TORCH_COMMAND="pip install torch torchvision --index-url https://download.pytorch.org/whl/nightly/rocm5.7"

have always worked well for me (with rx7900xtx, both ROCm 5.7 and 6.0.x).
There is also an option to build Torch for the newer ROCm, but in my experience Torch "nightly/rocm5.7" works just fine with 6.0.2 ? Your mileage may vary.

<!-- gh-comment-id:1975236080 --> @sid-cypher commented on GitHub (Mar 3, 2024): > So until at least a1111 supports ROCm 6 A1111 relies on Torch, and using A1111 with ``` export TORCH_COMMAND="pip install torch torchvision --index-url https://download.pytorch.org/whl/nightly/rocm5.7" ``` have always worked well for me (with rx7900xtx, both ROCm 5.7 and 6.0.x). There is also an option to build Torch for the newer ROCm, but in my experience Torch "nightly/rocm5.7" works just fine with 6.0.2 ? Your mileage may vary.
Author
Owner

@Th3Rom3 commented on GitHub (Mar 5, 2024):

One quick comment on that (hopefully that does not stray too far offtopic):

Yeah I agree unless we talk about running it on Windows I did not have problems running stable diffusion with different ROCm/torch combinations.

BTW the torch build is no longer nightly only, you can now use
--index-url https://download.pytorch.org/whl/rocm5.7
not sure if it redirects to the same repo.

However I agree that ROCm is a bit of a mess on consumer cards. Hopefully they can finally find a stable base to build upon with longer term support for different card generations without dropping support (offical or non-official) at the next minor version.

<!-- gh-comment-id:1978518805 --> @Th3Rom3 commented on GitHub (Mar 5, 2024): One quick comment on that (hopefully that does not stray too far offtopic): Yeah I agree unless we talk about running it on Windows I did not have problems running stable diffusion with different ROCm/torch combinations. BTW the torch build is no longer nightly only, you can now use `--index-url https://download.pytorch.org/whl/rocm5.7` not sure if it redirects to the same repo. However I agree that ROCm is a bit of a mess on consumer cards. Hopefully they can finally find a stable base to build upon with longer term support for different card generations without dropping support (offical or non-official) at the next minor version.
Author
Owner

@frostworx commented on GitHub (Mar 7, 2024):

Thank you very much!

<!-- gh-comment-id:1984223187 --> @frostworx commented on GitHub (Mar 7, 2024): Thank you very much!
Author
Owner

@dhiltgen commented on GitHub (Mar 11, 2024):

The pre-release for 0.1.29 is live and ready for broader testing.

To install on linux, you'll need to use an updated install script from my branch. (We'll merge this once we wrap up testing and mark the release latest) This new update to the install script will detect Radeon cards (via the amdgpu driver presence) and set up ROCm v6 for ollama if it's not already present on the host.

curl -fsSL https://raw.githubusercontent.com/dhiltgen/ollama/rocm_install/scripts/install.sh  | OLLAMA_VERSION="0.1.29" sh

Windows users can install the OllamaSetup.exe from the 0.1.29 release page which includes ROCm v5.7 (latest at this time.)

We've updated our troubleshooting docs to include some pointers on Radeon GPU compatibility.

Please let us know if you run into any problems with the pre-release (on this issue, or open a new issue)

<!-- gh-comment-id:1989169493 --> @dhiltgen commented on GitHub (Mar 11, 2024): The pre-release for [0.1.29](https://github.com/ollama/ollama/releases/tag/v0.1.29) is live and ready for broader testing. To install on linux, you'll need to use an updated install script from my branch. (We'll merge this once we wrap up testing and mark the release latest) This new update to the install script will detect Radeon cards (via the amdgpu driver presence) and set up ROCm v6 for ollama if it's not already present on the host. ``` curl -fsSL https://raw.githubusercontent.com/dhiltgen/ollama/rocm_install/scripts/install.sh | OLLAMA_VERSION="0.1.29" sh ``` Windows users can install the OllamaSetup.exe from the [0.1.29 release page](https://github.com/ollama/ollama/releases/tag/v0.1.29) which includes ROCm v5.7 (latest at this time.) We've updated our [troubleshooting docs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#amd-radeon-gpu-support) to include some pointers on Radeon GPU compatibility. Please let us know if you run into any problems with the pre-release (on this issue, or open a new issue)
Author
Owner

@JorySeverijnse commented on GitHub (Mar 12, 2024):

@dhiltgen I've tried to get my 6700xt working with the install script above and have exported this env variable HSA_OVERRIDE_GFX_VERSION="10.3.0" because my card is gfx1031. Also installed rocblas and rocm-core on arch btw so i actually have /opt/rocm/lib/librocblas.so. After running ollama serve i get

time=2024-03-12T23:34:26.569+01:00 level=INFO source=images.go:806 msg="total blobs: 0" time=2024-03-12T23:34:26.569+01:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0" time=2024-03-12T23:34:26.569+01:00 level=INFO source=routes.go:1082 msg="Listening on 127.0.0.1:11434 (version 0.1.29)" time=2024-03-12T23:34:26.569+01:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama3641506161/runners ..." time=2024-03-12T23:34:28.405+01:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [cpu cpu_avx cuda_v11 cpu_avx2 rocm_v60000]" time=2024-03-12T23:34:28.405+01:00 level=INFO source=gpu.go:77 msg="Detecting GPU type" time=2024-03-12T23:34:28.405+01:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-12T23:34:28.411+01:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []" time=2024-03-12T23:34:28.411+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-12T23:34:28.411+01:00 level=WARN source=amd_linux.go:50 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-03-12T23:34:28.411+01:00 level=INFO source=amd_linux.go:85 msg="detected amdgpu versions [gfx1031]" time=2024-03-12T23:34:28.411+01:00 level=WARN source=amd_linux.go:339 msg="amdgpu detected, but no compatible rocm library found. Either install rocm v6, or follow manual install instructions at https://github.com/ollama/ollama/blob/main/docs/linux.md#manual-install" time=2024-03-12T23:34:28.412+01:00 level=WARN source=amd_linux.go:96 msg="unable to verify rocm library, will use cpu: no suitable rocm found, falling back to CPU" time=2024-03-12T23:34:28.412+01:00 level=INFO source=routes.go:1105 msg="no GPU detected"

<!-- gh-comment-id:1992697121 --> @JorySeverijnse commented on GitHub (Mar 12, 2024): @dhiltgen I've tried to get my 6700xt working with the install script above and have exported this env variable HSA_OVERRIDE_GFX_VERSION="10.3.0" because my card is gfx1031. Also installed rocblas and rocm-core on arch btw so i actually have /opt/rocm/lib/librocblas.so. After running ollama serve i get `time=2024-03-12T23:34:26.569+01:00 level=INFO source=images.go:806 msg="total blobs: 0" time=2024-03-12T23:34:26.569+01:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0" time=2024-03-12T23:34:26.569+01:00 level=INFO source=routes.go:1082 msg="Listening on 127.0.0.1:11434 (version 0.1.29)" time=2024-03-12T23:34:26.569+01:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama3641506161/runners ..." time=2024-03-12T23:34:28.405+01:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [cpu cpu_avx cuda_v11 cpu_avx2 rocm_v60000]" time=2024-03-12T23:34:28.405+01:00 level=INFO source=gpu.go:77 msg="Detecting GPU type" time=2024-03-12T23:34:28.405+01:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-12T23:34:28.411+01:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []" time=2024-03-12T23:34:28.411+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-12T23:34:28.411+01:00 level=WARN source=amd_linux.go:50 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-03-12T23:34:28.411+01:00 level=INFO source=amd_linux.go:85 msg="detected amdgpu versions [gfx1031]" time=2024-03-12T23:34:28.411+01:00 level=WARN source=amd_linux.go:339 msg="amdgpu detected, but no compatible rocm library found. Either install rocm v6, or follow manual install instructions at https://github.com/ollama/ollama/blob/main/docs/linux.md#manual-install" time=2024-03-12T23:34:28.412+01:00 level=WARN source=amd_linux.go:96 msg="unable to verify rocm library, will use cpu: no suitable rocm found, falling back to CPU" time=2024-03-12T23:34:28.412+01:00 level=INFO source=routes.go:1105 msg="no GPU detected" `
Author
Owner

@dhiltgen commented on GitHub (Mar 13, 2024):

@JorySeverijnse the error amdgpu detected, but no compatible rocm library found. Either install rocm v6, or follow manual install instructions at is why it didn't use the GPU. If you run with OLLAMA_DEBUG=1 you'll be able to see more information about where it's searching for ROCm. What is probably most relevant for your setup is either set HIP_PATH or make sure LD_LIBRARY_PATH contains it. We also need v6, so if you have v5 installed, that wont work.

<!-- gh-comment-id:1992837840 --> @dhiltgen commented on GitHub (Mar 13, 2024): @JorySeverijnse the error `amdgpu detected, but no compatible rocm library found. Either install rocm v6, or follow manual install instructions at` is why it didn't use the GPU. If you run with `OLLAMA_DEBUG=1` you'll be able to see more information about where it's searching for ROCm. What is probably most relevant for your setup is either set HIP_PATH or make sure LD_LIBRARY_PATH contains it. We also need v6, so if you have v5 installed, that wont work.
Author
Owner

@askareija commented on GitHub (Mar 13, 2024):

@dhiltgen i've tried pre-release version 1.0.29, it's working good and detected my GPU's, the problem is i always got:

CUDA error: out of memory
  current device: 1, in function ggml_cuda_pool_malloc_leg at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:8584
  hipMalloc((void **) &ptr, look_ahead_size)

I think this is because i have integrated GPU:

gml_init_cublas: found 2 ROCm devices:
  Device 0: AMD Radeon RX 6500M, compute capability 10.3, VMM: no
  Device 1: AMD Radeon Graphics, compute capability 10.3, VMM: no

And i don't know how to set only using device 0, instead of using all GPU's.

Full logs:

HSA_OVERRIDE_GFX_VERSION=10.3.0 AMDGPU_TARGET=gfx1034 ollama serve
time=2024-03-13T10:22:06.982+07:00 level=INFO source=images.go:806 msg="total blobs: 36"
time=2024-03-13T10:22:06.982+07:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0"
time=2024-03-13T10:22:06.983+07:00 level=INFO source=routes.go:1082 msg="Listening on 127.0.0.1:11434 (version 0.1.29)"
time=2024-03-13T10:22:06.983+07:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama1709569959/runners ..."
time=2024-03-13T10:22:09.085+07:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [cpu_avx2 cpu rocm_v60000 cuda_v11 cpu_avx]"
time=2024-03-13T10:22:09.085+07:00 level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-13T10:22:09.085+07:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-13T10:22:09.089+07:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []"
time=2024-03-13T10:22:09.089+07:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-13T10:22:09.089+07:00 level=WARN source=amd_linux.go:50 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-03-13T10:22:09.090+07:00 level=INFO source=amd_linux.go:85 msg="detected amdgpu versions [gfx1034 gfx1035]"
time=2024-03-13T10:22:09.090+07:00 level=INFO source=amd_linux.go:235 msg="[1] amdgpu totalMemory 4278190080"
time=2024-03-13T10:22:09.090+07:00 level=INFO source=amd_linux.go:236 msg="[1] amdgpu freeMemory  4278190080"
time=2024-03-13T10:22:09.090+07:00 level=INFO source=amd_linux.go:235 msg="[2] amdgpu totalMemory 536870912"
time=2024-03-13T10:22:09.090+07:00 level=INFO source=amd_linux.go:236 msg="[2] amdgpu freeMemory  536870912"
time=2024-03-13T10:22:15.510+07:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-13T10:22:15.510+07:00 level=WARN source=amd_linux.go:50 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-03-13T10:22:15.510+07:00 level=INFO source=amd_linux.go:85 msg="detected amdgpu versions [gfx1034 gfx1035]"
time=2024-03-13T10:22:15.511+07:00 level=INFO source=amd_linux.go:235 msg="[2] amdgpu totalMemory 536870912"
time=2024-03-13T10:22:15.511+07:00 level=INFO source=amd_linux.go:236 msg="[2] amdgpu freeMemory  536870912"
time=2024-03-13T10:22:15.511+07:00 level=INFO source=amd_linux.go:235 msg="[1] amdgpu totalMemory 4278190080"
time=2024-03-13T10:22:15.511+07:00 level=INFO source=amd_linux.go:236 msg="[1] amdgpu freeMemory  4278190080"
time=2024-03-13T10:22:15.511+07:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-13T10:22:15.511+07:00 level=WARN source=amd_linux.go:50 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-03-13T10:22:15.511+07:00 level=INFO source=amd_linux.go:85 msg="detected amdgpu versions [gfx1035 gfx1034]"
time=2024-03-13T10:22:15.511+07:00 level=INFO source=amd_linux.go:235 msg="[2] amdgpu totalMemory 536870912"
time=2024-03-13T10:22:15.511+07:00 level=INFO source=amd_linux.go:236 msg="[2] amdgpu freeMemory  536870912"
time=2024-03-13T10:22:15.511+07:00 level=INFO source=amd_linux.go:235 msg="[1] amdgpu totalMemory 4278190080"
time=2024-03-13T10:22:15.511+07:00 level=INFO source=amd_linux.go:236 msg="[1] amdgpu freeMemory  4278190080"
time=2024-03-13T10:22:15.511+07:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
loading library /tmp/ollama1709569959/runners/rocm_v60000/libext_server.so
time=2024-03-13T10:22:15.552+07:00 level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama1709569959/runners/rocm_v60000/libext_server.so"
time=2024-03-13T10:22:15.552+07:00 level=INFO source=dyn_ext_server.go:150 msg="Initializing llama server"
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 2 ROCm devices:
  Device 0: AMD Radeon RX 6500M, compute capability 10.3, VMM: no
  Device 1: AMD Radeon Graphics, compute capability 10.3, VMM: no
llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from /home/aden/.ollama/models/blobs/sha256:3a43f93b78ec50f7c4e4dc8bd1cb3fff5a900e7d574c51a6f7495e48486e0dac (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = codellama
llama_model_loader: - kv   2:                       llama.context_length u32              = 16384
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 32
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  11:                          general.file_type u32              = 2
llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32016]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32016]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32016]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  19:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_0:  225 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: mismatch in special tokens definition ( 264/32016 vs 259/32016 ).
llm_load_print_meta: format           = GGUF V2
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32016
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 16384
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 4096
llm_load_print_meta: n_embd_v_gqa     = 4096
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 11008
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 1000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 16384
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 6.74 B
llm_load_print_meta: model size       = 3.56 GiB (4.54 BPW) 
llm_load_print_meta: general.name     = codellama
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.33 MiB
llm_load_tensors: offloading 10 repeating layers to GPU
llm_load_tensors: offloaded 10/33 layers to GPU
llm_load_tensors:      ROCm0 buffer size =   977.35 MiB
llm_load_tensors:      ROCm1 buffer size =   108.59 MiB
llm_load_tensors:        CPU buffer size =  3647.95 MiB
.................................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: freq_base  = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      ROCm0 KV buffer size =   288.00 MiB
llama_kv_cache_init:      ROCm1 KV buffer size =    32.00 MiB
llama_kv_cache_init:  ROCm_Host KV buffer size =   704.00 MiB
llama_new_context_with_model: KV self size  = 1024.00 MiB, K (f16):  512.00 MiB, V (f16):  512.00 MiB
llama_new_context_with_model:  ROCm_Host input buffer size   =    13.02 MiB
llama_new_context_with_model:      ROCm0 compute buffer size =   164.00 MiB
llama_new_context_with_model:      ROCm1 compute buffer size =   164.00 MiB
llama_new_context_with_model:  ROCm_Host compute buffer size =   168.00 MiB
llama_new_context_with_model: graph splits (measure): 4
{"function":"initialize","level":"INFO","line":434,"msg":"initializing slots","n_slots":1,"tid":"131049149433536","timestamp":1710300138}
{"function":"initialize","level":"INFO","line":446,"msg":"new slot","n_ctx_slot":2048,"slot_id":0,"tid":"131049149433536","timestamp":1710300138}
time=2024-03-13T10:22:18.319+07:00 level=INFO source=dyn_ext_server.go:162 msg="Starting llama main loop"
{"function":"update_slots","level":"INFO","line":1584,"msg":"all slots are idle and system prompt is empty, clear the KV cache","tid":"131027991267008","timestamp":1710300138}
{"function":"launch_slot_with_data","level":"INFO","line":827,"msg":"slot is processing task","slot_id":0,"task_id":0,"tid":"131027991267008","timestamp":1710300138}
{"function":"update_slots","ga_i":0,"level":"INFO","line":1822,"msg":"slot progression","n_past":0,"n_past_se":0,"n_prompt_tokens_processed":273,"slot_id":0,"task_id":0,"tid":"131027991267008","timestamp":1710300138}
{"function":"update_slots","level":"INFO","line":1846,"msg":"kv cache rm [p0, end)","p0":0,"slot_id":0,"task_id":0,"tid":"131027991267008","timestamp":1710300138}
CUDA error: out of memory
  current device: 1, in function ggml_cuda_pool_malloc_leg at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:8584
  hipMalloc((void **) &ptr, look_ahead_size)
GGML_ASSERT: /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:256: !"CUDA error"
[1]    28755 IOT instruction (core dumped)  HSA_OVERRIDE_GFX_VERSION=10.3.0 AMDGPU_TARGET=gfx1034 ollama serve
<!-- gh-comment-id:1993322005 --> @askareija commented on GitHub (Mar 13, 2024): @dhiltgen i've tried pre-release version 1.0.29, it's working good and detected my GPU's, the problem is i always got: ``` CUDA error: out of memory current device: 1, in function ggml_cuda_pool_malloc_leg at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:8584 hipMalloc((void **) &ptr, look_ahead_size) ``` I think this is because i have integrated GPU: ``` gml_init_cublas: found 2 ROCm devices: Device 0: AMD Radeon RX 6500M, compute capability 10.3, VMM: no Device 1: AMD Radeon Graphics, compute capability 10.3, VMM: no ``` And i don't know how to set only using device 0, instead of using all GPU's. Full logs: ``` HSA_OVERRIDE_GFX_VERSION=10.3.0 AMDGPU_TARGET=gfx1034 ollama serve time=2024-03-13T10:22:06.982+07:00 level=INFO source=images.go:806 msg="total blobs: 36" time=2024-03-13T10:22:06.982+07:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0" time=2024-03-13T10:22:06.983+07:00 level=INFO source=routes.go:1082 msg="Listening on 127.0.0.1:11434 (version 0.1.29)" time=2024-03-13T10:22:06.983+07:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama1709569959/runners ..." time=2024-03-13T10:22:09.085+07:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [cpu_avx2 cpu rocm_v60000 cuda_v11 cpu_avx]" time=2024-03-13T10:22:09.085+07:00 level=INFO source=gpu.go:77 msg="Detecting GPU type" time=2024-03-13T10:22:09.085+07:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-13T10:22:09.089+07:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []" time=2024-03-13T10:22:09.089+07:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-13T10:22:09.089+07:00 level=WARN source=amd_linux.go:50 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-03-13T10:22:09.090+07:00 level=INFO source=amd_linux.go:85 msg="detected amdgpu versions [gfx1034 gfx1035]" time=2024-03-13T10:22:09.090+07:00 level=INFO source=amd_linux.go:235 msg="[1] amdgpu totalMemory 4278190080" time=2024-03-13T10:22:09.090+07:00 level=INFO source=amd_linux.go:236 msg="[1] amdgpu freeMemory 4278190080" time=2024-03-13T10:22:09.090+07:00 level=INFO source=amd_linux.go:235 msg="[2] amdgpu totalMemory 536870912" time=2024-03-13T10:22:09.090+07:00 level=INFO source=amd_linux.go:236 msg="[2] amdgpu freeMemory 536870912" time=2024-03-13T10:22:15.510+07:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-13T10:22:15.510+07:00 level=WARN source=amd_linux.go:50 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-03-13T10:22:15.510+07:00 level=INFO source=amd_linux.go:85 msg="detected amdgpu versions [gfx1034 gfx1035]" time=2024-03-13T10:22:15.511+07:00 level=INFO source=amd_linux.go:235 msg="[2] amdgpu totalMemory 536870912" time=2024-03-13T10:22:15.511+07:00 level=INFO source=amd_linux.go:236 msg="[2] amdgpu freeMemory 536870912" time=2024-03-13T10:22:15.511+07:00 level=INFO source=amd_linux.go:235 msg="[1] amdgpu totalMemory 4278190080" time=2024-03-13T10:22:15.511+07:00 level=INFO source=amd_linux.go:236 msg="[1] amdgpu freeMemory 4278190080" time=2024-03-13T10:22:15.511+07:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-13T10:22:15.511+07:00 level=WARN source=amd_linux.go:50 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-03-13T10:22:15.511+07:00 level=INFO source=amd_linux.go:85 msg="detected amdgpu versions [gfx1035 gfx1034]" time=2024-03-13T10:22:15.511+07:00 level=INFO source=amd_linux.go:235 msg="[2] amdgpu totalMemory 536870912" time=2024-03-13T10:22:15.511+07:00 level=INFO source=amd_linux.go:236 msg="[2] amdgpu freeMemory 536870912" time=2024-03-13T10:22:15.511+07:00 level=INFO source=amd_linux.go:235 msg="[1] amdgpu totalMemory 4278190080" time=2024-03-13T10:22:15.511+07:00 level=INFO source=amd_linux.go:236 msg="[1] amdgpu freeMemory 4278190080" time=2024-03-13T10:22:15.511+07:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" loading library /tmp/ollama1709569959/runners/rocm_v60000/libext_server.so time=2024-03-13T10:22:15.552+07:00 level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama1709569959/runners/rocm_v60000/libext_server.so" time=2024-03-13T10:22:15.552+07:00 level=INFO source=dyn_ext_server.go:150 msg="Initializing llama server" ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes ggml_init_cublas: found 2 ROCm devices: Device 0: AMD Radeon RX 6500M, compute capability 10.3, VMM: no Device 1: AMD Radeon Graphics, compute capability 10.3, VMM: no llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from /home/aden/.ollama/models/blobs/sha256:3a43f93b78ec50f7c4e4dc8bd1cb3fff5a900e7d574c51a6f7495e48486e0dac (version GGUF V2) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = codellama llama_model_loader: - kv 2: llama.context_length u32 = 16384 llama_model_loader: - kv 3: llama.embedding_length u32 = 4096 llama_model_loader: - kv 4: llama.block_count u32 = 32 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 11008 llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 32 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: llama.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 11: general.file_type u32 = 2 llama_model_loader: - kv 12: tokenizer.ggml.model str = llama llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32016] = ["<unk>", "<s>", "</s>", "<0x00>", "<... llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32016] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32016] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0 llama_model_loader: - kv 19: general.quantization_version u32 = 2 llama_model_loader: - type f32: 65 tensors llama_model_loader: - type q4_0: 225 tensors llama_model_loader: - type q6_K: 1 tensors llm_load_vocab: mismatch in special tokens definition ( 264/32016 vs 259/32016 ). llm_load_print_meta: format = GGUF V2 llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32016 llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train = 16384 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 32 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 1 llm_load_print_meta: n_embd_k_gqa = 4096 llm_load_print_meta: n_embd_v_gqa = 4096 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: n_ff = 11008 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 1000000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 16384 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: model type = 7B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 6.74 B llm_load_print_meta: model size = 3.56 GiB (4.54 BPW) llm_load_print_meta: general.name = codellama llm_load_print_meta: BOS token = 1 '<s>' llm_load_print_meta: EOS token = 2 '</s>' llm_load_print_meta: UNK token = 0 '<unk>' llm_load_print_meta: LF token = 13 '<0x0A>' llm_load_tensors: ggml ctx size = 0.33 MiB llm_load_tensors: offloading 10 repeating layers to GPU llm_load_tensors: offloaded 10/33 layers to GPU llm_load_tensors: ROCm0 buffer size = 977.35 MiB llm_load_tensors: ROCm1 buffer size = 108.59 MiB llm_load_tensors: CPU buffer size = 3647.95 MiB ................................................................................................. llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: freq_base = 1000000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: ROCm0 KV buffer size = 288.00 MiB llama_kv_cache_init: ROCm1 KV buffer size = 32.00 MiB llama_kv_cache_init: ROCm_Host KV buffer size = 704.00 MiB llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB llama_new_context_with_model: ROCm_Host input buffer size = 13.02 MiB llama_new_context_with_model: ROCm0 compute buffer size = 164.00 MiB llama_new_context_with_model: ROCm1 compute buffer size = 164.00 MiB llama_new_context_with_model: ROCm_Host compute buffer size = 168.00 MiB llama_new_context_with_model: graph splits (measure): 4 {"function":"initialize","level":"INFO","line":434,"msg":"initializing slots","n_slots":1,"tid":"131049149433536","timestamp":1710300138} {"function":"initialize","level":"INFO","line":446,"msg":"new slot","n_ctx_slot":2048,"slot_id":0,"tid":"131049149433536","timestamp":1710300138} time=2024-03-13T10:22:18.319+07:00 level=INFO source=dyn_ext_server.go:162 msg="Starting llama main loop" {"function":"update_slots","level":"INFO","line":1584,"msg":"all slots are idle and system prompt is empty, clear the KV cache","tid":"131027991267008","timestamp":1710300138} {"function":"launch_slot_with_data","level":"INFO","line":827,"msg":"slot is processing task","slot_id":0,"task_id":0,"tid":"131027991267008","timestamp":1710300138} {"function":"update_slots","ga_i":0,"level":"INFO","line":1822,"msg":"slot progression","n_past":0,"n_past_se":0,"n_prompt_tokens_processed":273,"slot_id":0,"task_id":0,"tid":"131027991267008","timestamp":1710300138} {"function":"update_slots","level":"INFO","line":1846,"msg":"kv cache rm [p0, end)","p0":0,"slot_id":0,"task_id":0,"tid":"131027991267008","timestamp":1710300138} CUDA error: out of memory current device: 1, in function ggml_cuda_pool_malloc_leg at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:8584 hipMalloc((void **) &ptr, look_ahead_size) GGML_ASSERT: /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:256: !"CUDA error" [1] 28755 IOT instruction (core dumped) HSA_OVERRIDE_GFX_VERSION=10.3.0 AMDGPU_TARGET=gfx1034 ollama serve ```
Author
Owner

@shanoaice commented on GitHub (Mar 13, 2024):

@askareija You probably need to set ROCR_VISIBLE_DEVICES. For example in this case you only want to use device 0, then you set ROCR_VISIBLE_DEVICES=0. If you are manually running ollama server then ROCR_VISIBLE_DEVICES=0 ollama serve will do the trick. Take a look at the docs if you want to modify the systemd service.

<!-- gh-comment-id:1993666874 --> @shanoaice commented on GitHub (Mar 13, 2024): @askareija You probably need to set `ROCR_VISIBLE_DEVICES`. For example in this case you only want to use device 0, then you set `ROCR_VISIBLE_DEVICES=0`. If you are manually running ollama server then `ROCR_VISIBLE_DEVICES=0 ollama serve` will do the trick. Take a look at the [docs](https://github.com/ollama/ollama/blob/main/docs/faq.md#setting-environment-variables-on-linux) if you want to modify the systemd service.
Author
Owner

@JorySeverijnse commented on GitHub (Mar 13, 2024):

Thanks for the fast reply, i had a look at the manual installation but that was not for building from source or arch package. I've found the problem i also had to install those two packages in order to get rocm to work rocm-hip-sdk rocm-opencl-sdk

<!-- gh-comment-id:1994005059 --> @JorySeverijnse commented on GitHub (Mar 13, 2024): Thanks for the fast reply, i had a look at the manual installation but that was not for building from source or arch package. I've found the problem i also had to install those two packages in order to get rocm to work rocm-hip-sdk rocm-opencl-sdk
Author
Owner

@MorrisLu-Taipei commented on GitHub (Mar 13, 2024):

Hi thanks to all, provide a easier way to use RX7900 w/ollama
ollama docker runing well.
BUT, the performance is so terrible (compare to Dual 3090) with some prompt and base model.
Anyone can help?? thanks in advance.
image
image

<!-- gh-comment-id:1994487441 --> @MorrisLu-Taipei commented on GitHub (Mar 13, 2024): Hi thanks to all, provide a easier way to use RX7900 w/ollama ollama docker runing well. BUT, the performance is so terrible (compare to Dual 3090) with some prompt and base model. Anyone can help?? thanks in advance. ![image](https://github.com/ollama/ollama/assets/22585297/2442a53b-0d0a-4b91-aa00-ab15815d681f) ![image](https://github.com/ollama/ollama/assets/22585297/d6b810c9-7607-44cc-9cd4-37697bee546c)
Author
Owner

@dhiltgen commented on GitHub (Mar 13, 2024):

@askareija you hit a bug around iGPU detection that I fixed yesterday. We haven't pushed updated builds to github yet with that fix as we're chasing down a couple other release blocker bugs, but we hope to have an updated build later today.

<!-- gh-comment-id:1994507604 --> @dhiltgen commented on GitHub (Mar 13, 2024): @askareija you hit a bug around iGPU detection that I fixed yesterday. We haven't pushed updated builds to github yet with that fix as we're chasing down a couple other release blocker bugs, but we hope to have an updated build later today.
Author
Owner

@windblade89 commented on GitHub (Apr 17, 2024):

How do I make it support AMD Radeon RX 580 8GB Card anyone have like a guide?

<!-- gh-comment-id:2061999058 --> @windblade89 commented on GitHub (Apr 17, 2024): How do I make it support AMD Radeon RX 580 8GB Card anyone have like a guide?
Author
Owner

@shanoaice commented on GitHub (Apr 18, 2024):

How do I make it support AMD Radeon RX 580 8GB Card anyone have like a guide?

I don't believe that ROCm support GCN3 architecture cards. You will need a RDNA card for this, RX580 will not work.

<!-- gh-comment-id:2063674831 --> @shanoaice commented on GitHub (Apr 18, 2024): > How do I make it support AMD Radeon RX 580 8GB Card anyone have like a guide? I don't believe that ROCm support GCN3 architecture cards. You will need a RDNA card for this, RX580 will not work.
Author
Owner

@unoexperto commented on GitHub (Apr 26, 2024):

Hi folks, sorry for reviving old thread. I'm on ollama 0.1.32 and have ROCM 6 drivers installed. I modified /etc/systemd/system/ollama.service so that it contains following overrides

Environment="HSA_OVERRIDE_GFX_VERSION=11.0.0"
Environment="AMD_SERIALIZE_KERNEL=3"
Environment="OLLAMA_LLM_LIBRARY=rocm_v60002"

After this I see in the log that ollama uses "GPU" but the caveat is that I don't have dedicated GPU. I'm on Lenovo T14 Gen4 which has integrated videocard (AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics).

As result ollama reports in the log that GPU has 1GB of memory which is obvious too little. Is there anything I could do to make it allocated all available shared VRAM ?

<!-- gh-comment-id:2079848715 --> @unoexperto commented on GitHub (Apr 26, 2024): Hi folks, sorry for reviving old thread. I'm on ollama 0.1.32 and have ROCM 6 drivers installed. I modified `/etc/systemd/system/ollama.service` so that it contains following overrides ``` Environment="HSA_OVERRIDE_GFX_VERSION=11.0.0" Environment="AMD_SERIALIZE_KERNEL=3" Environment="OLLAMA_LLM_LIBRARY=rocm_v60002" ``` After this I see in the log that ollama uses "GPU" but the caveat is that I don't have dedicated GPU. I'm on Lenovo T14 Gen4 which has integrated videocard (AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics). As result ollama reports in the log that GPU has 1GB of memory which is obvious too little. Is there anything I could do to make it allocated all available shared VRAM ?
Author
Owner

@xyproto commented on GitHub (Apr 29, 2024):

Hi, I just packaged ollama-rocm for Arch Linux.

I don't have access to an ADM graphics card right now. Please test if it works.

Are these environment variables required for ollama-rocm to work for as many users as possible?

Environment="HSA_OVERRIDE_GFX_VERSION=11.0.0"
Environment="AMD_SERIALIZE_KERNEL=3"
Environment="OLLAMA_LLM_LIBRARY=rocm_v60002"
<!-- gh-comment-id:2082119491 --> @xyproto commented on GitHub (Apr 29, 2024): Hi, I just packaged `ollama-rocm` for Arch Linux. I don't have access to an ADM graphics card right now. Please test if it works. Are these environment variables required for `ollama-rocm` to work for as many users as possible? ```ruby Environment="HSA_OVERRIDE_GFX_VERSION=11.0.0" Environment="AMD_SERIALIZE_KERNEL=3" Environment="OLLAMA_LLM_LIBRARY=rocm_v60002" ```
Author
Owner

@badverybadboy commented on GitHub (Apr 29, 2024):

Hi, I just packaged ollama-rocm for Arch Linux.

I don't have access to an ADM graphics card right now. Please test if it works.

Are these environment variables required for ollama-rocm to work for as many users as possible?

Environment="HSA_OVERRIDE_GFX_VERSION=11.0.0"
Environment="AMD_SERIALIZE_KERNEL=3"
Environment="OLLAMA_LLM_LIBRARY=rocm_v60002"

I could test it out but issue is I have only Vega64 and according to documentation https://rocm.docs.amd.com/en/latest/reference/gpu-arch-specs.html it seems its no longer supported. Let me know if thats not the case then I could give it a try.

<!-- gh-comment-id:2082751151 --> @badverybadboy commented on GitHub (Apr 29, 2024): > Hi, I just packaged `ollama-rocm` for Arch Linux. > > I don't have access to an ADM graphics card right now. Please test if it works. > > Are these environment variables required for `ollama-rocm` to work for as many users as possible? > > ```ruby > Environment="HSA_OVERRIDE_GFX_VERSION=11.0.0" > Environment="AMD_SERIALIZE_KERNEL=3" > Environment="OLLAMA_LLM_LIBRARY=rocm_v60002" > ``` I could test it out but issue is I have only Vega64 and according to documentation https://rocm.docs.amd.com/en/latest/reference/gpu-arch-specs.html it seems its no longer supported. Let me know if thats not the case then I could give it a try.
Author
Owner

@badverybadboy commented on GitHub (Apr 29, 2024):

I know everyone's eager for a more stable AMD GPU setup for Ollama, so I wanted to give a quick update on where we're at and the current plan.

I just got Radeon cards working in windows, so I should have a PR up in the next day or two adding support for Windows ROCm (tracked via #2598)

I know a lot of folks have been seeing crashes with ROCm v5 on linux. Based on feedback from folks at AMD, they've recommended we focus on the latest major version of ROCm for the best experience, so I'm going to pivot to only supporting v6 on Linux in our official builds. We'll set this up so we can auto-detect v6 if already installed in the system, and if not detected, the install script will download an artifact from our releases page if we see an AMD GPU. The goal is you should only need to have the driver installed and Ollama will take care of the library dependencies.

On windows, v6 has not yet shipped, so we'll use v5 for now, but I believe the v6 release is imminent, so we'll switch to that once it's available. Again, we'll aim to carry the library in the installer to streamline the user experience, although it will result in a much larger installer due to the size of the ROCm tensor data files.

For folks with older GPUs that aren't supported by v6 (e.g. RX 580 #2453) what I'm hoping to do is refine our build process so you could install an older ROCm library version that does support it, and build from source locally to get a working setup. That will take some more work as workarounds are required, but that's our goal.

We're going to focus on Discrete GPUs first and get those stable, then we'll come back to add support for iGPUs with #2637

Thanks everyone for your patience as we work through the best approach to support Radeon GPUs.

I saw that you referenced RX580s(gfx803-gfx805) in the post but there is no mention of Vega(gfx900-906) architecture. Are they already working with some workaround as I could not get it to work on the Rocm 6xx on my install of ubuntu. Rocm actually caused issues of graphics card failing and things not working so I could not proceed with the Rocm drivers and gave up. If there is a way to get it working with Rocm, I would really appreciate.

<!-- gh-comment-id:2082767839 --> @badverybadboy commented on GitHub (Apr 29, 2024): > I know everyone's eager for a more stable AMD GPU setup for Ollama, so I wanted to give a quick update on where we're at and the current plan. > > I just got Radeon cards working in windows, so I should have a PR up in the next day or two adding support for Windows ROCm (tracked via #2598) > > I know a lot of folks have been seeing crashes with ROCm v5 on linux. Based on feedback from folks at AMD, they've recommended we focus on the latest major version of ROCm for the best experience, so I'm going to pivot to only supporting v6 on Linux in our official builds. We'll set this up so we can auto-detect v6 if already installed in the system, and if not detected, the install script will download an artifact from our releases page if we see an AMD GPU. The goal is you should only need to have the driver installed and Ollama will take care of the library dependencies. > > On windows, v6 has not yet shipped, so we'll use v5 for now, but I believe the v6 release is imminent, so we'll switch to that once it's available. Again, we'll aim to carry the library in the installer to streamline the user experience, although it will result in a much larger installer due to the size of the ROCm tensor data files. > > For folks with older GPUs that aren't supported by v6 (e.g. RX 580 #2453) what I'm hoping to do is refine our build process so you could install an older ROCm library version that does support it, and build from source locally to get a working setup. That will take some more work as workarounds are required, but that's our goal. > > We're going to focus on Discrete GPUs first and get those stable, then we'll come back to add support for iGPUs with #2637 > > Thanks everyone for your patience as we work through the best approach to support Radeon GPUs. I saw that you referenced RX580s(gfx803-gfx805) in the post but there is no mention of Vega(gfx900-906) architecture. Are they already working with some workaround as I could not get it to work on the Rocm 6xx on my install of ubuntu. Rocm actually caused issues of graphics card failing and things not working so I could not proceed with the Rocm drivers and gave up. If there is a way to get it working with Rocm, I would really appreciate.
Author
Owner

@sfxworks commented on GitHub (Apr 29, 2024):

Environment="HSA_OVERRIDE_GFX_VERSION=11.0.0"
Environment="AMD_SERIALIZE_KERNEL=3"
Environment="OLLAMA_LLM_LIBRARY=rocm_v60002"
Apr 29 14:10:12 epyc7713 ollama[1163713]: time=2024-04-29T14:10:12.646-04:00 level=WARN source=amd_linux.go:49 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/mo>
Apr 29 14:10:12 epyc7713 ollama[1163713]: time=2024-04-29T14:10:12.646-04:00 level=INFO source=amd_linux.go:217 msg="amdgpu memory" gpu=0 total="30704.0 MiB"
Apr 29 14:10:12 epyc7713 ollama[1163713]: time=2024-04-29T14:10:12.646-04:00 level=INFO source=amd_linux.go:218 msg="amdgpu memory" gpu=0 available="30704.0 MiB"
Apr 29 14:10:12 epyc7713 ollama[1163713]: time=2024-04-29T14:10:12.646-04:00 level=WARN source=amd_linux.go:321 msg="amdgpu detected, but no compatible rocm library found.  Either install rocm v6, or follow manual install instructions >
Apr 29 14:10:12 epyc7713 ollama[1163713]: time=2024-04-29T14:10:12.646-04:00 level=WARN source=amd_linux.go:253 msg="unable to verify rocm library, will use cpu" error="no suitable rocm found, falling back to CPU"

on my w6800 with those additions in the systemd
I have rocm lib in /opt/rocm, assuming its looking for /opt/rocm/hip/lib?

<!-- gh-comment-id:2083355822 --> @sfxworks commented on GitHub (Apr 29, 2024): > ```ruby > Environment="HSA_OVERRIDE_GFX_VERSION=11.0.0" > Environment="AMD_SERIALIZE_KERNEL=3" > Environment="OLLAMA_LLM_LIBRARY=rocm_v60002" > ``` ``` Apr 29 14:10:12 epyc7713 ollama[1163713]: time=2024-04-29T14:10:12.646-04:00 level=WARN source=amd_linux.go:49 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/mo> Apr 29 14:10:12 epyc7713 ollama[1163713]: time=2024-04-29T14:10:12.646-04:00 level=INFO source=amd_linux.go:217 msg="amdgpu memory" gpu=0 total="30704.0 MiB" Apr 29 14:10:12 epyc7713 ollama[1163713]: time=2024-04-29T14:10:12.646-04:00 level=INFO source=amd_linux.go:218 msg="amdgpu memory" gpu=0 available="30704.0 MiB" Apr 29 14:10:12 epyc7713 ollama[1163713]: time=2024-04-29T14:10:12.646-04:00 level=WARN source=amd_linux.go:321 msg="amdgpu detected, but no compatible rocm library found. Either install rocm v6, or follow manual install instructions > Apr 29 14:10:12 epyc7713 ollama[1163713]: time=2024-04-29T14:10:12.646-04:00 level=WARN source=amd_linux.go:253 msg="unable to verify rocm library, will use cpu" error="no suitable rocm found, falling back to CPU" ``` on my w6800 with those additions in the systemd I have rocm lib in /opt/rocm, assuming its looking for /opt/rocm/hip/lib?
Author
Owner

@sfxworks commented on GitHub (Apr 29, 2024):

OLLAMA_LLM_LIBRARY=rocm_v60002 ROCR_VISIBLE_DEVICES=0 OLLAMA_DEBUG=1 ROCM_PATH="/opt/rocm" CLBlast_DIR="/usr/lib/cmake/CLBlast" HIP_PATH="/opt/rocm/hip/lib" HSA_OVERRIDE_GFX_VERSION="10.3.0" ollama serve noooo idea where to go from here

<!-- gh-comment-id:2083501572 --> @sfxworks commented on GitHub (Apr 29, 2024): `OLLAMA_LLM_LIBRARY=rocm_v60002 ROCR_VISIBLE_DEVICES=0 OLLAMA_DEBUG=1 ROCM_PATH="/opt/rocm" CLBlast_DIR="/usr/lib/cmake/CLBlast" HIP_PATH="/opt/rocm/hip/lib" HSA_OVERRIDE_GFX_VERSION="10.3.0" ollama serve` noooo idea where to go from here
Author
Owner

@sfxworks commented on GitHub (Apr 30, 2024):

Well, uhh I also have this as a k8s node
so I installed the rocm k8s device plugin https://github.com/ROCm/k8s-device-plugin
and ran the rocm build

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama-rocm
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ollama-rocm
  template:
    metadata:
      labels:
        app: ollama-rocm
    spec:
      containers:
      - name: ollama
        image: ollama/ollama:rocm
        ports:
        - containerPort: 11434
          name: ollama
        volumeMounts:
        - name: ollama-data
          mountPath: /root/.ollama
        resources:
          requests:
            memory: "32Gi"
            cpu: "64"
          limits:
            memory: "100Gi"
            cpu: "64"
            amd.com/gpu: 1
      volumes:
      - name: ollama-data
        hostPath:
          path: /var/lib/ollama/.ollama
          type: DirectoryOrCreate
---
apiVersion: v1
kind: Service
metadata:
  name: ollama-service-rocm
spec:
  selector:
    app: ollama-rocm
  ports:
  - protocol: TCP
    port: 11434
    targetPort: 11434
    name: ollama

Seems to be detected and works ok

time=2024-04-30T17:59:07.816Z level=INFO source=images.go:817 msg="total blobs: 5"
time=2024-04-30T17:59:07.816Z level=INFO source=images.go:824 msg="total unused blobs removed: 0"
time=2024-04-30T17:59:07.816Z level=INFO source=routes.go:1143 msg="Listening on [::]:11434 (version 0.1.32)"
time=2024-04-30T17:59:07.817Z level=INFO source=payload.go:28 msg="extracting embedded files" dir=/tmp/ollama968791869/runners
time=2024-04-30T17:59:10.402Z level=INFO source=payload.go:41 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11 rocm_v60002 cpu]"
time=2024-04-30T17:59:10.402Z level=INFO source=gpu.go:121 msg="Detecting GPU type"
time=2024-04-30T17:59:10.402Z level=INFO source=gpu.go:268 msg="Searching for GPU management library libcudart.so*"
time=2024-04-30T17:59:10.403Z level=INFO source=gpu.go:314 msg="Discovered GPU libraries: [/tmp/ollama968791869/runners/cuda_v11/libcudart.so.11.0]"
time=2024-04-30T17:59:10.403Z level=INFO source=gpu.go:343 msg="Unable to load cudart CUDA management library /tmp/ollama968791869/runners/cuda_v11/libcudart.so.11.0: your nvidia driver is too old or missing, please upgrade to run ollama"
time=2024-04-30T17:59:10.403Z level=INFO source=gpu.go:268 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-04-30T17:59:10.405Z level=INFO source=gpu.go:314 msg="Discovered GPU libraries: []"
time=2024-04-30T17:59:10.405Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-04-30T17:59:10.405Z level=WARN source=amd_linux.go:53 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-04-30T17:59:10.405Z level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1030]"
time=2024-04-30T17:59:10.407Z level=INFO source=amd_linux.go:121 msg="amdgpu [0] gfx1030 is supported"
time=2024-04-30T17:59:10.408Z level=INFO source=amd_linux.go:263 msg="[0] amdgpu totalMemory 30704M"
time=2024-04-30T17:59:10.408Z level=INFO source=amd_linux.go:264 msg="[0] amdgpu freeMemory  30704M"
llm_load_tensors: offloading 22 repeating layers to GPU
llm_load_tensors: offloaded 22/57 layers to GPU
llm_load_tensors:      ROCm0 buffer size = 29686.59 MiB
llm_load_tensors:        CPU buffer size = 47319.40 MiB

<!-- gh-comment-id:2086311728 --> @sfxworks commented on GitHub (Apr 30, 2024): Well, uhh I also have this as a k8s node so I installed the rocm k8s device plugin https://github.com/ROCm/k8s-device-plugin and ran the rocm build ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: ollama-rocm spec: replicas: 1 selector: matchLabels: app: ollama-rocm template: metadata: labels: app: ollama-rocm spec: containers: - name: ollama image: ollama/ollama:rocm ports: - containerPort: 11434 name: ollama volumeMounts: - name: ollama-data mountPath: /root/.ollama resources: requests: memory: "32Gi" cpu: "64" limits: memory: "100Gi" cpu: "64" amd.com/gpu: 1 volumes: - name: ollama-data hostPath: path: /var/lib/ollama/.ollama type: DirectoryOrCreate --- apiVersion: v1 kind: Service metadata: name: ollama-service-rocm spec: selector: app: ollama-rocm ports: - protocol: TCP port: 11434 targetPort: 11434 name: ollama ``` Seems to be detected and works ok ``` time=2024-04-30T17:59:07.816Z level=INFO source=images.go:817 msg="total blobs: 5" time=2024-04-30T17:59:07.816Z level=INFO source=images.go:824 msg="total unused blobs removed: 0" time=2024-04-30T17:59:07.816Z level=INFO source=routes.go:1143 msg="Listening on [::]:11434 (version 0.1.32)" time=2024-04-30T17:59:07.817Z level=INFO source=payload.go:28 msg="extracting embedded files" dir=/tmp/ollama968791869/runners time=2024-04-30T17:59:10.402Z level=INFO source=payload.go:41 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11 rocm_v60002 cpu]" time=2024-04-30T17:59:10.402Z level=INFO source=gpu.go:121 msg="Detecting GPU type" time=2024-04-30T17:59:10.402Z level=INFO source=gpu.go:268 msg="Searching for GPU management library libcudart.so*" time=2024-04-30T17:59:10.403Z level=INFO source=gpu.go:314 msg="Discovered GPU libraries: [/tmp/ollama968791869/runners/cuda_v11/libcudart.so.11.0]" time=2024-04-30T17:59:10.403Z level=INFO source=gpu.go:343 msg="Unable to load cudart CUDA management library /tmp/ollama968791869/runners/cuda_v11/libcudart.so.11.0: your nvidia driver is too old or missing, please upgrade to run ollama" time=2024-04-30T17:59:10.403Z level=INFO source=gpu.go:268 msg="Searching for GPU management library libnvidia-ml.so" time=2024-04-30T17:59:10.405Z level=INFO source=gpu.go:314 msg="Discovered GPU libraries: []" time=2024-04-30T17:59:10.405Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-04-30T17:59:10.405Z level=WARN source=amd_linux.go:53 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-04-30T17:59:10.405Z level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1030]" time=2024-04-30T17:59:10.407Z level=INFO source=amd_linux.go:121 msg="amdgpu [0] gfx1030 is supported" time=2024-04-30T17:59:10.408Z level=INFO source=amd_linux.go:263 msg="[0] amdgpu totalMemory 30704M" time=2024-04-30T17:59:10.408Z level=INFO source=amd_linux.go:264 msg="[0] amdgpu freeMemory 30704M" ``` ``` llm_load_tensors: offloading 22 repeating layers to GPU llm_load_tensors: offloaded 22/57 layers to GPU llm_load_tensors: ROCm0 buffer size = 29686.59 MiB llm_load_tensors: CPU buffer size = 47319.40 MiB ```
Author
Owner

@sfxworks commented on GitHub (Apr 30, 2024):

I do wish it would remove just 1 layer from the gpu since theres some overhead here / keep getting oom for gpu cpu split but hey it tries. switching to 8x7b

<!-- gh-comment-id:2086336108 --> @sfxworks commented on GitHub (Apr 30, 2024): I do wish it would remove just 1 layer from the gpu since theres some overhead here / keep getting oom for gpu cpu split but hey it tries. switching to 8x7b
Author
Owner

@sfxworks commented on GitHub (Apr 30, 2024):

Yep, works perfect. Who knows what I have on this machine that may have been leftover between nvidia and amd drivers on this arch build. Containers rule.

 Hello! I'm an AI assistant and I'm here to help you. I'm functioning as expected. The information you provided about the current date, mermaid rendering, PlantUML rendering, SVG in markdown rendering, and data presentation preference will be useful for formatting our conversation in this markdown environment. If you have any questions or need assistance with something, feel free to ask!
<!-- gh-comment-id:2086390996 --> @sfxworks commented on GitHub (Apr 30, 2024): Yep, works perfect. Who knows what I have on this machine that may have been leftover between nvidia and amd drivers on this arch build. Containers rule. ``` Hello! I'm an AI assistant and I'm here to help you. I'm functioning as expected. The information you provided about the current date, mermaid rendering, PlantUML rendering, SVG in markdown rendering, and data presentation preference will be useful for formatting our conversation in this markdown environment. If you have any questions or need assistance with something, feel free to ask! ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#62384