[GH-ISSUE #2453] Add support for older AMD GPU gfx803, gfx802, gfx805 (e.g. Radeon RX 580, FirePro W7100) #1434

Open
opened 2026-04-12 11:18:32 -05:00 by GiteaMirror · 220 comments
Owner

Originally created by @dhiltgen on GitHub (Feb 11, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2453

Originally assigned to: @dhiltgen on GitHub.

Officially ROCm no longer supports these cards, but it looks like other projects have found workarounds. Let's explore if that's possible. Best case, built-in to our binaries. Fall-back if that's not plausible is document how to build from source with the appropriate older ROCm library and AMD drivers installed on your system and build a local binary that works.

Originally created by @dhiltgen on GitHub (Feb 11, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2453 Originally assigned to: @dhiltgen on GitHub. Officially ROCm no longer supports these cards, but it looks like other projects have found workarounds. Let's explore if that's possible. Best case, built-in to our binaries. Fall-back if that's not plausible is document how to build from source with the appropriate older ROCm library and AMD drivers installed on your system and build a local binary that works.
GiteaMirror added the feature requestbuildamd labels 2026-04-12 11:18:32 -05:00
Author
Owner

@dhiltgen commented on GitHub (Feb 12, 2024):

One interesting observation. I managed to get my gfx803 card not to crash with the invalid free by uninstalling the rocm libs on the host, and copying the exact libs from the build container over, however, when running models on the card, the responses were gibberish, so clearly it's more than just library dependencies and will require compile time changes.

<!-- gh-comment-id:1939201147 --> @dhiltgen commented on GitHub (Feb 12, 2024): One interesting observation. I managed to get my `gfx803` card not to crash with the invalid free by uninstalling the rocm libs on the host, and copying the exact libs from the build container over, however, when running models on the card, the responses were gibberish, so clearly it's more than just library dependencies and will require compile time changes.
Author
Owner

@Todd-Fulton commented on GitHub (Feb 20, 2024):

I'm trying to get this working on an RX 580.
With the 6.0.0-2 rocm packages on arch, I was getting free(): invalid pointer from clinfo (maybe a related issue).

In the logs after sending a "prompt" (not sure of the lingo?).

rocBLAS error: Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx803

I notice in the rocblas cmake file file that they removed support for gfx803 for the 6.0.X builds, so I downgraded to the 5.7.1 packages and rebuilt ollama using the PKGBUILD from #2473

Then when I sent the prompt I get this error:

Feb 19 19:43:16 tokyo ollama[130295]: /usr/lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/random.tcc:2665: void std::discrete_distribution<>::param_type::_M_initialize() [_IntType = int]: Assertion '__sum > 0' failed.

The assertion is coming from stdlibc++ here, so maybe if I change the PKGBUILD to build a different version of ollama, that might get fixed, I'll try that next.

Not sure how much help I can be here, but I can test things out if needed.

This is the full output in the logs:

Feb 19 19:38:10 tokyo systemd[1]: Started Ollama Service.
Feb 19 19:38:10 tokyo ollama[130295]: time=2024-02-19T19:38:10.155-06:00 level=INFO source=images.go:863 msg="total blobs: 6"
Feb 19 19:38:10 tokyo ollama[130295]: time=2024-02-19T19:38:10.155-06:00 level=INFO source=images.go:870 msg="total unused blobs removed: 0"
Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.
Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
Feb 19 19:38:10 tokyo ollama[130295]:  - using env:        export GIN_MODE=release
Feb 19 19:38:10 tokyo ollama[130295]:  - using code:        gin.SetMode(gin.ReleaseMode)
Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] POST   /api/pull                 --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers)
Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] POST   /api/generate             --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers)
Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] POST   /api/chat                 --> github.com/jmorganca/ollama/server.ChatHandler (5 handlers)
Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] POST   /api/embeddings           --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers)
Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] POST   /api/create               --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers)
Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] POST   /api/push                 --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers)
Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] POST   /api/copy                 --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers)
Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] DELETE /api/delete               --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers)
Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] POST   /api/show                 --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers)
Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] POST   /api/blobs/:digest        --> github.com/jmorganca/ollama/server.CreateBlobHandler (5 handlers)
Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] HEAD   /api/blobs/:digest        --> github.com/jmorganca/ollama/server.HeadBlobHandler (5 handlers)
Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] POST   /v1/chat/completions      --> github.com/jmorganca/ollama/server.ChatHandler (6 handlers)
Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] GET    /                         --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] GET    /api/tags                 --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] GET    /api/version              --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers)
Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] HEAD   /                         --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] HEAD   /api/tags                 --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] HEAD   /api/version              --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers)
Feb 19 19:38:10 tokyo ollama[130295]: time=2024-02-19T19:38:10.155-06:00 level=INFO source=routes.go:999 msg="Listening on 127.0.0.1:11434 (version 0.1.24)"
Feb 19 19:38:10 tokyo ollama[130295]: time=2024-02-19T19:38:10.155-06:00 level=INFO source=payload_common.go:106 msg="Extracting dynamic libraries..."
Feb 19 19:38:10 tokyo ollama[130295]: time=2024-02-19T19:38:10.367-06:00 level=INFO source=payload_common.go:145 msg="Dynamic LLM libraries [rocm_v5 cpu cpu_avx cpu_avx2]"
Feb 19 19:38:10 tokyo ollama[130295]: time=2024-02-19T19:38:10.367-06:00 level=INFO source=gpu.go:94 msg="Detecting GPU type"
Feb 19 19:38:10 tokyo ollama[130295]: time=2024-02-19T19:38:10.367-06:00 level=INFO source=gpu.go:242 msg="Searching for GPU management library libnvidia-ml.so"
Feb 19 19:38:10 tokyo ollama[130295]: time=2024-02-19T19:38:10.374-06:00 level=INFO source=gpu.go:288 msg="Discovered GPU libraries: []"
Feb 19 19:38:10 tokyo ollama[130295]: time=2024-02-19T19:38:10.374-06:00 level=INFO source=gpu.go:242 msg="Searching for GPU management library librocm_smi64.so"
Feb 19 19:38:10 tokyo ollama[130295]: time=2024-02-19T19:38:10.374-06:00 level=INFO source=gpu.go:288 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0]"
Feb 19 19:38:10 tokyo ollama[130295]: time=2024-02-19T19:38:10.378-06:00 level=INFO source=gpu.go:109 msg="Radeon GPU detected"
Feb 19 19:38:10 tokyo ollama[130295]: time=2024-02-19T19:38:10.378-06:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
Feb 19 19:43:06 tokyo ollama[130295]: [GIN] 2024/02/19 - 19:43:06 | 200 |      41.169µs |       127.0.0.1 | HEAD     "/"
Feb 19 19:43:06 tokyo ollama[130295]: [GIN] 2024/02/19 - 19:43:06 | 200 |     498.618µs |       127.0.0.1 | POST     "/api/show"
Feb 19 19:43:06 tokyo ollama[130295]: time=2024-02-19T19:43:06.221-06:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
Feb 19 19:43:06 tokyo ollama[130295]: time=2024-02-19T19:43:06.221-06:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
Feb 19 19:43:06 tokyo ollama[130295]: time=2024-02-19T19:43:06.221-06:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
Feb 19 19:43:06 tokyo ollama[130295]: time=2024-02-19T19:43:06.256-06:00 level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama460181430/rocm_v5/libext_server.so"
Feb 19 19:43:06 tokyo ollama[130295]: time=2024-02-19T19:43:06.256-06:00 level=INFO source=dyn_ext_server.go:145 msg="Initializing llama server"
Feb 19 19:43:06 tokyo ollama[130295]: ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
Feb 19 19:43:06 tokyo ollama[130295]: ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
Feb 19 19:43:06 tokyo ollama[130295]: ggml_init_cublas: found 1 ROCm devices:
Feb 19 19:43:06 tokyo ollama[130295]:   Device 0: AMD Radeon RX 580 Series, compute capability 8.0, VMM: no
Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from /var/lib/ollama/.ollama/models/blobs/sha256:3a43f93b78ec50f7c4e4dc8bd1cb3fff5a900e7d574c51a6f7495e48486e0dac (version GGUF V2)
Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv   0:                       general.architecture str              = llama
Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv   1:                               general.name str              = codellama
Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv   2:                       llama.context_length u32              = 16384
Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv   4:                          llama.block_count u32              = 32
Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 32
Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 1000000.000000
Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv  11:                          general.file_type u32              = 2
Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32016]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32016]   = [0.000000, 0.000000, 0.000000, 0.0000...
Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32016]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 2
Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 0
Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv  19:               general.quantization_version u32              = 2
Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - type  f32:   65 tensors
Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - type q4_0:  225 tensors
Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - type q6_K:    1 tensors
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_vocab: mismatch in special tokens definition ( 264/32016 vs 259/32016 ).
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: format           = GGUF V2
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: arch             = llama
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: vocab type       = SPM
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_vocab          = 32016
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_merges         = 0
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_ctx_train      = 16384
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_embd           = 4096
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_head           = 32
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_head_kv        = 32
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_layer          = 32
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_rot            = 128
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_embd_head_k    = 128
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_embd_head_v    = 128
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_gqa            = 1
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_embd_k_gqa     = 4096
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_embd_v_gqa     = 4096
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_ff             = 11008
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_expert         = 0
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_expert_used    = 0
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: rope scaling     = linear
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: freq_base_train  = 1000000.0
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: freq_scale_train = 1
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_yarn_orig_ctx  = 16384
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: rope_finetuned   = unknown
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: model type       = 7B
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: model ftype      = Q4_0
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: model params     = 6.74 B
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: model size       = 3.56 GiB (4.54 BPW)
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: general.name     = codellama
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: BOS token        = 1 '<s>'
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: EOS token        = 2 '</s>'
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: UNK token        = 0 '<unk>'
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: LF token         = 13 '<0x0A>'
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_tensors: ggml ctx size =    0.22 MiB
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_tensors: offloading 32 repeating layers to GPU
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_tensors: offloading non-repeating layers to GPU
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_tensors: offloaded 33/33 layers to GPU
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_tensors:      ROCm0 buffer size =  3577.61 MiB
Feb 19 19:43:06 tokyo ollama[130295]: llm_load_tensors:        CPU buffer size =    70.35 MiB
Feb 19 19:43:07 tokyo ollama[130295]: .................................................................................................
Feb 19 19:43:07 tokyo ollama[130295]: llama_new_context_with_model: n_ctx      = 2048
Feb 19 19:43:07 tokyo ollama[130295]: llama_new_context_with_model: freq_base  = 1000000.0
Feb 19 19:43:07 tokyo ollama[130295]: llama_new_context_with_model: freq_scale = 1
Feb 19 19:43:07 tokyo ollama[130295]: llama_kv_cache_init:      ROCm0 KV buffer size =  1024.00 MiB
Feb 19 19:43:07 tokyo ollama[130295]: llama_new_context_with_model: KV self size  = 1024.00 MiB, K (f16):  512.00 MiB, V (f16):  512.00 MiB
Feb 19 19:43:07 tokyo ollama[130295]: llama_new_context_with_model:  ROCm_Host input buffer size   =    12.01 MiB
Feb 19 19:43:07 tokyo ollama[130295]: llama_new_context_with_model:      ROCm0 compute buffer size =   171.60 MiB
Feb 19 19:43:07 tokyo ollama[130295]: llama_new_context_with_model:  ROCm_Host compute buffer size =     8.80 MiB
Feb 19 19:43:07 tokyo ollama[130295]: llama_new_context_with_model: graph splits (measure): 3
Feb 19 19:43:07 tokyo ollama[130295]: time=2024-02-19T19:43:07.868-06:00 level=INFO source=dyn_ext_server.go:156 msg="Starting llama main loop"
Feb 19 19:43:07 tokyo ollama[130295]: loading library /tmp/ollama460181430/rocm_v5/libext_server.so
Feb 19 19:43:07 tokyo ollama[130295]: {"timestamp":1708393387,"level":"VERBOSE","function":"start_loop","line":266,"message":"have new task"}
Feb 19 19:43:07 tokyo ollama[130295]: {"timestamp":1708393387,"level":"VERBOSE","function":"start_loop","line":281,"message":"callback_all_task_finished"}
Feb 19 19:43:07 tokyo ollama[130295]: {"timestamp":1708393387,"level":"VERBOSE","function":"start_loop","line":302,"message":"wait for new task"}
Feb 19 19:43:07 tokyo ollama[130295]: {"timestamp":1708393387,"level":"VERBOSE","function":"start_loop","line":266,"message":"have new task"}
Feb 19 19:43:07 tokyo ollama[130295]: {"timestamp":1708393387,"level":"VERBOSE","function":"start_loop","line":278,"message":"callback_new_task"}
Feb 19 19:43:07 tokyo ollama[130295]: {"timestamp":1708393387,"level":"VERBOSE","function":"start_loop","line":281,"message":"callback_all_task_finished"}
Feb 19 19:43:07 tokyo ollama[130295]: {"timestamp":1708393387,"level":"VERBOSE","function":"update_slots","line":1623,"message":"prompt ingested","n_past":0,"cached":"","to_eval":" [INST] <<SYS>><</SYS>>\n\nWrite me a function that outputs the fibonacci sequence in C. [/INST]\n"}
Feb 19 19:43:16 tokyo ollama[130295]: /usr/lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/random.tcc:2665: void std::discrete_distribution<>::param_type::_M_initialize() [_IntType = int]: Assertion '__sum > 0' failed.
Feb 19 19:43:17 tokyo systemd[1]: ollama.service: Main process exited, code=dumped, status=6/ABRT
Feb 19 19:43:17 tokyo systemd[1]: ollama.service: Failed with result 'core-dump'.
Feb 19 19:43:17 tokyo systemd[1]: ollama.service: Consumed 5.707s CPU time.
Feb 19 19:43:20 tokyo systemd[1]: ollama.service: Scheduled restart job, restart counter is at 2.
<!-- gh-comment-id:1953375768 --> @Todd-Fulton commented on GitHub (Feb 20, 2024): I'm trying to get this working on an RX 580. With the 6.0.0-2 rocm packages on arch, I was getting `free(): invalid pointer` from clinfo (maybe a related issue). In the logs after sending a "prompt" (not sure of the lingo?). ``` rocBLAS error: Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx803 ``` I notice in the rocblas [cmake file](https://github.com/ROCm/rocBLAS/blob/d0ae2751477bfa109a754f1e05875920df9a125b/CMakeLists.txt#L113) file that they removed support for gfx803 for the 6.0.X builds, so I downgraded to the 5.7.1 packages and rebuilt ollama using the PKGBUILD from [#2473](https://github.com/ollama/ollama/issues/2473#issuecomment-1948524969) Then when I sent the prompt I get this error: ``` Feb 19 19:43:16 tokyo ollama[130295]: /usr/lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/random.tcc:2665: void std::discrete_distribution<>::param_type::_M_initialize() [_IntType = int]: Assertion '__sum > 0' failed. ``` The assertion is coming from `stdlibc++` here, so maybe if I change the `PKGBUILD` to build a different version of ollama, that might get fixed, I'll try that next. Not sure how much help I can be here, but I can test things out if needed. This is the full output in the logs: ``` Feb 19 19:38:10 tokyo systemd[1]: Started Ollama Service. Feb 19 19:38:10 tokyo ollama[130295]: time=2024-02-19T19:38:10.155-06:00 level=INFO source=images.go:863 msg="total blobs: 6" Feb 19 19:38:10 tokyo ollama[130295]: time=2024-02-19T19:38:10.155-06:00 level=INFO source=images.go:870 msg="total unused blobs removed: 0" Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached. Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production. Feb 19 19:38:10 tokyo ollama[130295]: - using env: export GIN_MODE=release Feb 19 19:38:10 tokyo ollama[130295]: - using code: gin.SetMode(gin.ReleaseMode) Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] POST /api/pull --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers) Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] POST /api/generate --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers) Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] POST /api/chat --> github.com/jmorganca/ollama/server.ChatHandler (5 handlers) Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] POST /api/embeddings --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers) Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] POST /api/create --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers) Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] POST /api/push --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers) Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] POST /api/copy --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers) Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] DELETE /api/delete --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers) Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] POST /api/show --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers) Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] POST /api/blobs/:digest --> github.com/jmorganca/ollama/server.CreateBlobHandler (5 handlers) Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] HEAD /api/blobs/:digest --> github.com/jmorganca/ollama/server.HeadBlobHandler (5 handlers) Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] POST /v1/chat/completions --> github.com/jmorganca/ollama/server.ChatHandler (6 handlers) Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] GET / --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] GET /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers) Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] GET /api/version --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers) Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] HEAD / --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] HEAD /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers) Feb 19 19:38:10 tokyo ollama[130295]: [GIN-debug] HEAD /api/version --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers) Feb 19 19:38:10 tokyo ollama[130295]: time=2024-02-19T19:38:10.155-06:00 level=INFO source=routes.go:999 msg="Listening on 127.0.0.1:11434 (version 0.1.24)" Feb 19 19:38:10 tokyo ollama[130295]: time=2024-02-19T19:38:10.155-06:00 level=INFO source=payload_common.go:106 msg="Extracting dynamic libraries..." Feb 19 19:38:10 tokyo ollama[130295]: time=2024-02-19T19:38:10.367-06:00 level=INFO source=payload_common.go:145 msg="Dynamic LLM libraries [rocm_v5 cpu cpu_avx cpu_avx2]" Feb 19 19:38:10 tokyo ollama[130295]: time=2024-02-19T19:38:10.367-06:00 level=INFO source=gpu.go:94 msg="Detecting GPU type" Feb 19 19:38:10 tokyo ollama[130295]: time=2024-02-19T19:38:10.367-06:00 level=INFO source=gpu.go:242 msg="Searching for GPU management library libnvidia-ml.so" Feb 19 19:38:10 tokyo ollama[130295]: time=2024-02-19T19:38:10.374-06:00 level=INFO source=gpu.go:288 msg="Discovered GPU libraries: []" Feb 19 19:38:10 tokyo ollama[130295]: time=2024-02-19T19:38:10.374-06:00 level=INFO source=gpu.go:242 msg="Searching for GPU management library librocm_smi64.so" Feb 19 19:38:10 tokyo ollama[130295]: time=2024-02-19T19:38:10.374-06:00 level=INFO source=gpu.go:288 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0]" Feb 19 19:38:10 tokyo ollama[130295]: time=2024-02-19T19:38:10.378-06:00 level=INFO source=gpu.go:109 msg="Radeon GPU detected" Feb 19 19:38:10 tokyo ollama[130295]: time=2024-02-19T19:38:10.378-06:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" Feb 19 19:43:06 tokyo ollama[130295]: [GIN] 2024/02/19 - 19:43:06 | 200 | 41.169µs | 127.0.0.1 | HEAD "/" Feb 19 19:43:06 tokyo ollama[130295]: [GIN] 2024/02/19 - 19:43:06 | 200 | 498.618µs | 127.0.0.1 | POST "/api/show" Feb 19 19:43:06 tokyo ollama[130295]: time=2024-02-19T19:43:06.221-06:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" Feb 19 19:43:06 tokyo ollama[130295]: time=2024-02-19T19:43:06.221-06:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" Feb 19 19:43:06 tokyo ollama[130295]: time=2024-02-19T19:43:06.221-06:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" Feb 19 19:43:06 tokyo ollama[130295]: time=2024-02-19T19:43:06.256-06:00 level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama460181430/rocm_v5/libext_server.so" Feb 19 19:43:06 tokyo ollama[130295]: time=2024-02-19T19:43:06.256-06:00 level=INFO source=dyn_ext_server.go:145 msg="Initializing llama server" Feb 19 19:43:06 tokyo ollama[130295]: ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no Feb 19 19:43:06 tokyo ollama[130295]: ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes Feb 19 19:43:06 tokyo ollama[130295]: ggml_init_cublas: found 1 ROCm devices: Feb 19 19:43:06 tokyo ollama[130295]: Device 0: AMD Radeon RX 580 Series, compute capability 8.0, VMM: no Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from /var/lib/ollama/.ollama/models/blobs/sha256:3a43f93b78ec50f7c4e4dc8bd1cb3fff5a900e7d574c51a6f7495e48486e0dac (version GGUF V2) Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv 0: general.architecture str = llama Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv 1: general.name str = codellama Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv 2: llama.context_length u32 = 16384 Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv 3: llama.embedding_length u32 = 4096 Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv 4: llama.block_count u32 = 32 Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv 5: llama.feed_forward_length u32 = 11008 Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128 Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 32 Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv 10: llama.rope.freq_base f32 = 1000000.000000 Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv 11: general.file_type u32 = 2 Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv 12: tokenizer.ggml.model str = llama Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32016] = ["<unk>", "<s>", "</s>", "<0x00>", "<... Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32016] = [0.000000, 0.000000, 0.000000, 0.0000... Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32016] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1 Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2 Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0 Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - kv 19: general.quantization_version u32 = 2 Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - type f32: 65 tensors Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - type q4_0: 225 tensors Feb 19 19:43:06 tokyo ollama[130295]: llama_model_loader: - type q6_K: 1 tensors Feb 19 19:43:06 tokyo ollama[130295]: llm_load_vocab: mismatch in special tokens definition ( 264/32016 vs 259/32016 ). Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: format = GGUF V2 Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: arch = llama Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: vocab type = SPM Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_vocab = 32016 Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_merges = 0 Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_ctx_train = 16384 Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_embd = 4096 Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_head = 32 Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_head_kv = 32 Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_layer = 32 Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_rot = 128 Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_embd_head_k = 128 Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_embd_head_v = 128 Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_gqa = 1 Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_embd_k_gqa = 4096 Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_embd_v_gqa = 4096 Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: f_norm_eps = 0.0e+00 Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_ff = 11008 Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_expert = 0 Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_expert_used = 0 Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: rope scaling = linear Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: freq_base_train = 1000000.0 Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: freq_scale_train = 1 Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: n_yarn_orig_ctx = 16384 Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: rope_finetuned = unknown Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: model type = 7B Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: model ftype = Q4_0 Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: model params = 6.74 B Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: model size = 3.56 GiB (4.54 BPW) Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: general.name = codellama Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: BOS token = 1 '<s>' Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: EOS token = 2 '</s>' Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: UNK token = 0 '<unk>' Feb 19 19:43:06 tokyo ollama[130295]: llm_load_print_meta: LF token = 13 '<0x0A>' Feb 19 19:43:06 tokyo ollama[130295]: llm_load_tensors: ggml ctx size = 0.22 MiB Feb 19 19:43:06 tokyo ollama[130295]: llm_load_tensors: offloading 32 repeating layers to GPU Feb 19 19:43:06 tokyo ollama[130295]: llm_load_tensors: offloading non-repeating layers to GPU Feb 19 19:43:06 tokyo ollama[130295]: llm_load_tensors: offloaded 33/33 layers to GPU Feb 19 19:43:06 tokyo ollama[130295]: llm_load_tensors: ROCm0 buffer size = 3577.61 MiB Feb 19 19:43:06 tokyo ollama[130295]: llm_load_tensors: CPU buffer size = 70.35 MiB Feb 19 19:43:07 tokyo ollama[130295]: ................................................................................................. Feb 19 19:43:07 tokyo ollama[130295]: llama_new_context_with_model: n_ctx = 2048 Feb 19 19:43:07 tokyo ollama[130295]: llama_new_context_with_model: freq_base = 1000000.0 Feb 19 19:43:07 tokyo ollama[130295]: llama_new_context_with_model: freq_scale = 1 Feb 19 19:43:07 tokyo ollama[130295]: llama_kv_cache_init: ROCm0 KV buffer size = 1024.00 MiB Feb 19 19:43:07 tokyo ollama[130295]: llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB Feb 19 19:43:07 tokyo ollama[130295]: llama_new_context_with_model: ROCm_Host input buffer size = 12.01 MiB Feb 19 19:43:07 tokyo ollama[130295]: llama_new_context_with_model: ROCm0 compute buffer size = 171.60 MiB Feb 19 19:43:07 tokyo ollama[130295]: llama_new_context_with_model: ROCm_Host compute buffer size = 8.80 MiB Feb 19 19:43:07 tokyo ollama[130295]: llama_new_context_with_model: graph splits (measure): 3 Feb 19 19:43:07 tokyo ollama[130295]: time=2024-02-19T19:43:07.868-06:00 level=INFO source=dyn_ext_server.go:156 msg="Starting llama main loop" Feb 19 19:43:07 tokyo ollama[130295]: loading library /tmp/ollama460181430/rocm_v5/libext_server.so Feb 19 19:43:07 tokyo ollama[130295]: {"timestamp":1708393387,"level":"VERBOSE","function":"start_loop","line":266,"message":"have new task"} Feb 19 19:43:07 tokyo ollama[130295]: {"timestamp":1708393387,"level":"VERBOSE","function":"start_loop","line":281,"message":"callback_all_task_finished"} Feb 19 19:43:07 tokyo ollama[130295]: {"timestamp":1708393387,"level":"VERBOSE","function":"start_loop","line":302,"message":"wait for new task"} Feb 19 19:43:07 tokyo ollama[130295]: {"timestamp":1708393387,"level":"VERBOSE","function":"start_loop","line":266,"message":"have new task"} Feb 19 19:43:07 tokyo ollama[130295]: {"timestamp":1708393387,"level":"VERBOSE","function":"start_loop","line":278,"message":"callback_new_task"} Feb 19 19:43:07 tokyo ollama[130295]: {"timestamp":1708393387,"level":"VERBOSE","function":"start_loop","line":281,"message":"callback_all_task_finished"} Feb 19 19:43:07 tokyo ollama[130295]: {"timestamp":1708393387,"level":"VERBOSE","function":"update_slots","line":1623,"message":"prompt ingested","n_past":0,"cached":"","to_eval":" [INST] <<SYS>><</SYS>>\n\nWrite me a function that outputs the fibonacci sequence in C. [/INST]\n"} Feb 19 19:43:16 tokyo ollama[130295]: /usr/lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/random.tcc:2665: void std::discrete_distribution<>::param_type::_M_initialize() [_IntType = int]: Assertion '__sum > 0' failed. Feb 19 19:43:17 tokyo systemd[1]: ollama.service: Main process exited, code=dumped, status=6/ABRT Feb 19 19:43:17 tokyo systemd[1]: ollama.service: Failed with result 'core-dump'. Feb 19 19:43:17 tokyo systemd[1]: ollama.service: Consumed 5.707s CPU time. Feb 19 19:43:20 tokyo systemd[1]: ollama.service: Scheduled restart job, restart counter is at 2. ```
Author
Owner

@Todd-Fulton commented on GitHub (Feb 20, 2024):

I ended up disabling _GLIBCXX_ASSERTIONS in /etc/makepkg.conf and I am starting to get some responses, but they are gibberish, at least sometimes. I think the problem is in llama.cpp perhaps some sort of UB in the use of std::discrete_distribution that was triggering the assert. This is the only place I could find it being used. And a discussion which seems to resemble what's going on.

This is where libstdc++ was asserting in c++/13.2.1/bits/random.tcc on line 2665:

  template<typename _IntType>
    void
    discrete_distribution<_IntType>::param_type::
    _M_initialize()
    {
      ...
      const double __sum = std::accumulate(_M_prob.begin(),
					   _M_prob.end(), 0.0);
      __glibcxx_assert(__sum > 0);
      // Now normalize the probabilites.
      ...
    }

So it seems like the sum should be greater than 0, idk what the implications are, but that seems to be one of preconditions of using this type which llama.cpp is violating. May have some impact on the maths involved (which I am totally oblivious to).

I tried this:
ollama run codellama "Write me a function that outputs the fibonacci sequence in C."
and it just output a bunch of
############################## forever until I ctrl-c

Running the llama2 model:

>>> Why is the sky blue?
OOOlatooOwnGootUNSIreetOreoooohatГ GovernGBUNootIadruo delegladred EderGAootOALOO rangrehojection byther sywn�OOOLmoootGC�regnhatiloonoOWooAINTegruophonOalkOreouтоSIO го nobody.

I don't know if it's just messing with me, or if the bug is random.
Next try using codellama example:

➜  ollama-rocm git:(rocm) ✗ ollama run codellama                                                                
>>> Write me a function that outputs the fibonacci sequence
<details>
  <summary>Solution</summary>
  
``
def fibonacci(n):
    if n <= 1:
        return n
    else:
        return fibonacci(n-1) + fibonacci(n-2)
``

This function uses a recursive approach to calculate the `n`th number in the Fibonacci sequence. The base case is when `n <= 1`, in which case the function returns `n` directly. Otherwise,
it calculates the result by adding the previous two values in the sequence.

For example:
``
fibonacci(0) -> 0
fibonacci(1) -> 1
fibonacci(2) -> fibonacci(1) + fibonacci(0) -> 1 + 0 -> 1
fibonacci(3) -> fibonacci(2) + fibonacci(1) -> (1 + 0) + 1 -> 2
fibonacci(4) -> fibonacci(3) + fibonacci(2) -> (2 + 1) + 1 -> 4
fibonacci(5) -> fibonacci(4) + fibonacci(3) -> (4 + 2) + 1 -> 7
``

</details>
>>> Now do the same in C++
<details>
  <summary>Solution</summary>
  
``
int fibonacci(int n) {
    if (n <= 1) {
        return n;
    } else {
        return fibonacci(n-1) + fibonacci(n-2);
    }
}
``
This function uses a recursive approach to calculate the `n`th number in the Fibonacci sequence. The base case is when `n <= 1`, in which case the function returns `n` directly. Otherwise,
it calculates the result by adding the previous two values in the sequence.

For example:
``
fibonacci(0) -> 0
fibonacci(1) -> 1
fibonacci(2) -> fibonacci(1) + fibonacci(0) -> 1 + 0 -> 1
fibonacci(3) -> fibonacci(2) + fibonacci(1) -> (1 + 0) + 1 -> 2
fibonacci(4) -> fibonacci(3) + fibonacci(2) -> (2 + 1) + 1 -> 4
fibonacci(5) -> fibonacci(4) + fibonacci(3) -> (4 + 2) + 1 -> 7
``

</details>

ollama run llama2 "Why is the sky blue?"
 pedkO NO NO  O-   a                          dark in  a  .     to  in   of’ in  gar  a  of     in aj            a                           .  for  dark    in  a   at a   in   a a in     
a     a   of   to     the    IN of aj    very  gall    dark   in    and in     in     gar      in   dark         a  ap  for    the a  theaj  pro      .  in   in  gall IN  will  a of   and 
’aj gar to  in O in  in  to a no    no of   ofaj a can  to  dark a by for the    in   pro gall of  ap a   dark in  and have   f of a  for are   ’  that   a in by  under only’ a. in  rock  
the aaj a gar in  no due as in  to gar   in  friendly  a in     a   a  dark a  in     a in       for  aj to a  in dark in   have   a ofaj   a by a         a    were     a   in  a   at dark
 very the gall a gar  . ap ano a a CL a aj  a  the a  aj  to   as  gallaj     in     aj   aaj of  and   gar. of in f  a  in  at  dark    will    dark    a                for  in     of a  
in    to     following the  a   to   in a   aj CL  dark    in a       aj     gall a    a  gar aj O a          aj for  of in  a   a  friendly   and  a    a the dark  in     to  a   dark    
a in a   in    gar    to gar   as ajaj   a  that   for        the of  in   at       friendly  O     gall       previous  a in     a a  ’     dark  back        aj in                   the  
 a   a  to    a     a    a         a              in     aj   a    at  in aj aj a  of   a  gar           a              in        the       for IN     aj         in  a           ^V     in a 
 in   in^C
➜  ollama-rocm git:(rocm) ✗ ollama run codellama "Write me a function that outputs the fibonacci sequence in C."  

Here's an example of how you could calculate the Fibonacci sequence in C:
``
#include <stdio.h>

int main() {
  int n, i;
  long long int fib[n];

  // Ask user for input
  printf("Enter a number: ");
  scanf("%d", &n);

  // Initialize first two elements of the sequence
  fib[0] = 0;
  fib[1] = 1;

  // Calculate remaining elements of the sequence
  for (i = 2; i < n; i++) {
    fib[i] = fib[i-1] + fib[i-2];
  }

  // Output the calculated sequence
  printf("The Fibonacci sequence is: ");
  for (i = 0; i < n; i++) {
    printf("%lld ", fib[i]);
  }

  return 0;
}
``
This program will ask the user to input a number `n`, and then calculate the first `n` elements of the Fibonacci sequence. The output will be the calculated sequence, with each element 
separated by a space.

For example, if the user inputs `5`, the output will be:
``
The Fibonacci sequence is: 0 1 1 2 3 5
``
Note that this program uses an array to store the elements of the sequence, and loops through the elements to calculate them. The `long long int` type is used to avoid overflowing the 
integer range when calculating larger Fibonacci numbers.
<!-- gh-comment-id:1953532124 --> @Todd-Fulton commented on GitHub (Feb 20, 2024): I ended up disabling `_GLIBCXX_ASSERTIONS` in `/etc/makepkg.conf` and I am starting to get some responses, but they are gibberish, at least sometimes. I think the problem is in `llama.cpp` perhaps some sort of UB in the use of `std::discrete_distribution` that was triggering the assert. [This](https://github.com/ggerganov/llama.cpp/blob/633782b8d949f24b619e6c68ee37b5cc79167173/llama.cpp#L9909) is the only place I could find it being used. And a [discussion](https://github.com/ggerganov/llama.cpp/discussions/2421) which seems to resemble what's going on. This is where `libstdc++` was asserting in `c++/13.2.1/bits/random.tcc` on line 2665: ``` template<typename _IntType> void discrete_distribution<_IntType>::param_type:: _M_initialize() { ... const double __sum = std::accumulate(_M_prob.begin(), _M_prob.end(), 0.0); __glibcxx_assert(__sum > 0); // Now normalize the probabilites. ... } ``` So it seems like the sum should be greater than 0, idk what the implications are, but that seems to be one of preconditions of using this type which `llama.cpp` is violating. May have some impact on the maths involved (which I am totally oblivious to). I tried this: `ollama run codellama "Write me a function that outputs the fibonacci sequence in C." ` and it just output a bunch of `##############################` forever until I `ctrl-c` Running the llama2 model: ``` >>> Why is the sky blue? OOOlatooOwnGootUNSIreetOreoooohatГ GovernGBUNootIadruo delegladred EderGAootOALOO rangrehojection byther sywn�OOOLmoootGC�regnhatiloonoOWooAINTegruophonOalkOreouтоSIO го nobody. ``` I don't know if it's just messing with me, or if the bug is random. Next try using codellama example: ``` ➜ ollama-rocm git:(rocm) ✗ ollama run codellama >>> Write me a function that outputs the fibonacci sequence <details> <summary>Solution</summary> `` def fibonacci(n): if n <= 1: return n else: return fibonacci(n-1) + fibonacci(n-2) `` This function uses a recursive approach to calculate the `n`th number in the Fibonacci sequence. The base case is when `n <= 1`, in which case the function returns `n` directly. Otherwise, it calculates the result by adding the previous two values in the sequence. For example: `` fibonacci(0) -> 0 fibonacci(1) -> 1 fibonacci(2) -> fibonacci(1) + fibonacci(0) -> 1 + 0 -> 1 fibonacci(3) -> fibonacci(2) + fibonacci(1) -> (1 + 0) + 1 -> 2 fibonacci(4) -> fibonacci(3) + fibonacci(2) -> (2 + 1) + 1 -> 4 fibonacci(5) -> fibonacci(4) + fibonacci(3) -> (4 + 2) + 1 -> 7 `` </details> >>> Now do the same in C++ <details> <summary>Solution</summary> `` int fibonacci(int n) { if (n <= 1) { return n; } else { return fibonacci(n-1) + fibonacci(n-2); } } `` This function uses a recursive approach to calculate the `n`th number in the Fibonacci sequence. The base case is when `n <= 1`, in which case the function returns `n` directly. Otherwise, it calculates the result by adding the previous two values in the sequence. For example: `` fibonacci(0) -> 0 fibonacci(1) -> 1 fibonacci(2) -> fibonacci(1) + fibonacci(0) -> 1 + 0 -> 1 fibonacci(3) -> fibonacci(2) + fibonacci(1) -> (1 + 0) + 1 -> 2 fibonacci(4) -> fibonacci(3) + fibonacci(2) -> (2 + 1) + 1 -> 4 fibonacci(5) -> fibonacci(4) + fibonacci(3) -> (4 + 2) + 1 -> 7 `` </details> ``` ``` ollama run llama2 "Why is the sky blue?" pedkO NO NO O- a dark in a . to in of’ in gar a of in aj a . for dark in a at a in a a in a a of to the IN of aj very gall dark in and in in gar in dark a ap for the a theaj pro . in in gall IN will a of and ’aj gar to in O in in to a no no of ofaj a can to dark a by for the in pro gall of ap a dark in and have f of a for are ’ that a in by under only’ a. in rock the aaj a gar in no due as in to gar in friendly a in a a dark a in a in for aj to a in dark in have a ofaj a by a a were a in a at dark very the gall a gar . ap ano a a CL a aj a the a aj to as gallaj in aj aaj of and gar. of in f a in at dark will dark a for in of a in to following the a to in a aj CL dark in a aj gall a a gar aj O a aj for of in a a friendly and a a the dark in to a dark a in a in gar to gar as ajaj a that for the of in at friendly O gall previous a in a a ’ dark back aj in the a a to a a a a in aj a at in aj aj a of a gar a in the for IN aj in a ^V in a in in^C ``` ``` ➜ ollama-rocm git:(rocm) ✗ ollama run codellama "Write me a function that outputs the fibonacci sequence in C." Here's an example of how you could calculate the Fibonacci sequence in C: `` #include <stdio.h> int main() { int n, i; long long int fib[n]; // Ask user for input printf("Enter a number: "); scanf("%d", &n); // Initialize first two elements of the sequence fib[0] = 0; fib[1] = 1; // Calculate remaining elements of the sequence for (i = 2; i < n; i++) { fib[i] = fib[i-1] + fib[i-2]; } // Output the calculated sequence printf("The Fibonacci sequence is: "); for (i = 0; i < n; i++) { printf("%lld ", fib[i]); } return 0; } `` This program will ask the user to input a number `n`, and then calculate the first `n` elements of the Fibonacci sequence. The output will be the calculated sequence, with each element separated by a space. For example, if the user inputs `5`, the output will be: `` The Fibonacci sequence is: 0 1 1 2 3 5 `` Note that this program uses an array to store the elements of the sequence, and loops through the elements to calculate them. The `long long int` type is used to avoid overflowing the integer range when calculating larger Fibonacci numbers. ```
Author
Owner

@wilkensgomes commented on GitHub (Feb 20, 2024):

@Todd-Fulton Same error here. do you know how fix this ?

<!-- gh-comment-id:1954527259 --> @wilkensgomes commented on GitHub (Feb 20, 2024): @Todd-Fulton Same error here. do you know how fix this ?
Author
Owner

@Todd-Fulton commented on GitHub (Feb 21, 2024):

@wilkensgomes
for the error
rocBLAS error: Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx803

I downgraded to 5.7.1 rocm packages using downgrade on arch linux and then added them to Ignore at the end of the installation so that they don't get upgraded to 6.X packages.

For the error:
Feb 19 19:43:16 tokyo ollama[130295]: /usr/lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/random.tcc:2665: void std::discrete_distribution<>::param_type::_M_initialize() [_IntType = int]: Assertion '__sum > 0' failed.

I turned off _GLIBCXX_ASSERTIONS when building ollama, in /etc/makepkg.conf

# CXXFLAGS="$CFLAGS -Wp,-D_GLIBCXX_ASSERTIONS"
CXXFLAGS="$CFLAGS"

There might be a better way to disabling this in the PKGBUILD file just for building ollama/llama.cpp, but I haven't bothered with it, and just disabled the assertions globally.

Reading over the discussion for the second error, the gibberish happens after disabling the asserts, as the initialize method for std::discrete_distribution<> requires that the sum of the probabilities are greater than 0, this make sense. AFAIK it doesn't make sense for a probability to be negative, or NAN, or all 0, which are the cases I can think of that would trigger the assertion after summing the probabilities.

So as far as I can tell the gibberish is a result from certain models and small input prompts as said in the conversation. Somewhere between the model and the calculation of the probabilities, either some of them are negative, all are zero, or there is a NaN in there. For example, if for some reason a probability is a result of dividing a float by 0.0 p = x / y where y is 0.0 then p = NaN and then when llama.cpp calls llama_sample_token() and std::discrete_distribution calls std::accumulate then the result will be NaN, I can only imagine how that would mess up the LLM when trying to figure out the next word to use. At least this is as far as my understanding goes.

Apart from some of the smaller models and a small input prompts that produce gibberish, everything has been working for me since yesterday. I'm not even sure if the gibberish is particular to polaris gpus. I spent a few hours using llama2:13b as a Dungeon Master yesterday, was mind blowing.

<!-- gh-comment-id:1957187128 --> @Todd-Fulton commented on GitHub (Feb 21, 2024): @wilkensgomes for the error `rocBLAS error: Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx803` I downgraded to 5.7.1 rocm packages using [downgrade](https://github.com/archlinux-downgrade/downgrade) on arch linux and then added them to Ignore at the end of the installation so that they don't get upgraded to 6.X packages. For the error: `Feb 19 19:43:16 tokyo ollama[130295]: /usr/lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/random.tcc:2665: void std::discrete_distribution<>::param_type::_M_initialize() [_IntType = int]: Assertion '__sum > 0' failed.` I turned off `_GLIBCXX_ASSERTIONS` when building ollama, in `/etc/makepkg.conf` ```sh # CXXFLAGS="$CFLAGS -Wp,-D_GLIBCXX_ASSERTIONS" CXXFLAGS="$CFLAGS" ``` There might be a better way to disabling this in the PKGBUILD file just for building ollama/llama.cpp, but I haven't bothered with it, and just disabled the assertions globally. Reading over the [discussion](https://github.com/ggerganov/llama.cpp/discussions/2421) for the second error, the gibberish happens after disabling the asserts, as the initialize method for `std::discrete_distribution<>` requires that the sum of the probabilities are greater than 0, this make sense. AFAIK it doesn't make sense for a probability to be negative, or NAN, or all 0, which are the cases I can think of that would trigger the assertion after summing the probabilities. So as far as I can tell the gibberish is a result from certain models and small input prompts as said in the conversation. Somewhere between the model and the calculation of the probabilities, either some of them are negative, all are zero, or there is a NaN in there. For example, if for some reason a probability is a result of dividing a float by 0.0 `p = x / y where y is 0.0` then `p = NaN` and then when `llama.cpp` calls `llama_sample_token()` and `std::discrete_distribution` calls `std::accumulate` then the result will be `NaN`, I can only imagine how that would mess up the LLM when trying to figure out the next word to use. At least this is as far as my understanding goes. Apart from some of the smaller models and a small input prompts that produce gibberish, everything has been working for me since yesterday. I'm not even sure if the gibberish is particular to polaris gpus. I spent a few hours using llama2:13b as a Dungeon Master yesterday, was mind blowing.
Author
Owner

@Todd-Fulton commented on GitHub (Feb 26, 2024):

I'm still getting familiar with these code bases, but I did some print debugging in llama_sample_softmax and llama_sample_token and sure enough, there are nans everywhere on short prompt, it's fairly reproducible on my end.

I built both ollama and llama.cpp from their respective main branches, but took out the check for AMD version > 9 in ollama.

In file llama.cpp, with the logging that I put in.

void llama_sample_softmax(struct llama_context * ctx, llama_token_data_array * candidates) {
    //...
    //...
    float max_l = candidates->data[0].logit;
    float cum_sum = 0.0f;
    std::stringstream plogs;
    for (size_t i = 0; i < candidates->size; ++i) {
        float p = expf(candidates->data[i].logit - max_l);
        candidates->data[i].p = p;
        cum_sum += p;
    }
    for (size_t i = 0; i < candidates->size - 1; ++i) {
        candidates->data[i].p /= cum_sum;
        plogs << "{ token: " << candidates->data[i].id
            << ", probability: " << candidates->data[i].p
            << ", logit: " << candidates->data[i].logit
            << "},\n";
    }
    candidates->data[candidates->size - 1].p /= cum_sum;
    plogs << "{ token: " << candidates->data[candidates->size - 1].id
        << ", probability: " << candidates->data[candidates->size - 1].p
        << ", logit: " << candidates->data[candidates->size - 1].logit
        << " }\n";

    std::string plogs_string = plogs.str();

    LLAMA_LOG_INFO("Probabilities: [%s]\n", plogs_string.data());
    //...
}

I'll do my best to track down where the nans are coming from, it might be the gpu, which I have little experience in. I might try building rocm6.x from source if I can find an option to enable gfx803 support in the cmake files, and then build against that in case it's a bug in rocm 5.7.1 that I have installed.

Short prompt, nans, nans everywhere:

➜  ollama-rocm ollama run llama2
>>> Why is the sky blue?
####################################################################################^C

[Server side]
{"cached":"","function":"update_slots","level":"VERB","line":1876,"msg":"prompt ingested","n_past":0,"tid":"139442878678720","timestamp":1708984989,
"to_eval":" [INST] <<SYS>><</SYS>>\n\nWhy is the sky blue? [/INST]\n"}
Probabilities: [{ token: 38, probability: nan, logit: nan},
{ token: 22, probability: nan, logit: nan},
{ token: 10, probability: nan, logit: nan},
{ token: 34, probability: nan, logit: nan},
{ token: 26, probability: nan, logit: nan},
{ token: 18, probability: nan, logit: nan},
{ token: 20, probability: nan, logit: nan},
{ token: 4, probability: nan, logit: nan},
{ token: 24, probability: nan, logit: nan},
{ token: 12, probability: nan, logit: nan},
{ token: 32, probability: nan, logit: nan},
{ token: 28, probability: nan, logit: nan},
{ token: 16, probability: nan, logit: nan},
{ token: 36, probability: nan, logit: nan},
{ token: 8, probability: nan, logit: nan},
{ token: 39, probability: nan, logit: nan},
{ token: 9, probability: nan, logit: nan},
{ token: 21, probability: nan, logit: nan},
{ token: 1, probability: nan, logit: nan},
{ token: 23, probability: nan, logit: nan},
{ token: 11, probability: nan, logit: nan},
{ token: 25, probability: nan, logit: nan},
{ token: 5, probability: nan, logit: nan},
{ token: 27, probability: nan, logit: nan},
{ token: 13, probability: nan, logit: nan},
{ token: 31, probability: nan, logit: nan},
{ token: 29, probability: nan, logit: nan},
{ token: 15, probability: nan, logit: nan},
{ token: 33, probability: nan, logit: nan},
{ token: 7, probability: nan, logit: nan},
{ token: 35, probability: nan, logit: nan},
{ token: 17, probability: nan, logit: nan},
{ token: 37, probability: nan, logit: nan},
{ token: 3, probability: nan, logit: nan},
{ token: 19, probability: nan, logit: nan},
{ token: 0, probability: nan, logit: nan},
{ token: 2, probability: nan, logit: nan},
{ token: 6, probability: nan, logit: nan},
{ token: 14, probability: nan, logit: nan},
{ token: 30, probability: nan, logit: nan }
]

A little bit longer prompt, the calculations look right here:

>>> Why is the sky blue? Please explain it like I'm 5 years old. Use colorful language, but try
... to also explain the science.

Oh my goodness, let me tell you a secret about the sky! *winks* It's so cool! *excited
tone* The sky is blue because of something called light. *giggles* You know how things can
look different colors when the light hits them from different angles? Like how a red apple
looks red when the sun shines on it, but green when it's in shadow? Well, the sky does
that too! *excited nod*

So, when the sun shines on the Earth, it sends out all sorts of different colored lights.
*giggles* Like, did you know that light can be red, orange, yellow, green, blue, and
purple? Yep! And when these colors hit the Earth's atmosphere, they bounce around and mix
together to make the sky look blue! It's like a big ol' party in the sky! *giggles*
....
....

[Server Side]
{"cached":"","function":"update_slots","level":"VERB","line":1876,"msg":"prompt ingested","n_past":0,"tid":"137691662386880","timestamp":1708985350,
"to_eval":" [INST] <<SYS>><</SYS>>\n\nWhy is the sky blue? Please explain it like I'm 5 years old.
 Use colorful language, but try to also explain the science. [/INST]\n"}
Probabilities: [{ token: 13, probability: 0.999558, logit: 24.8022},
{ token: 9048, probability: 0.000197175, logit: 16.2713},
{ token: 6439, probability: 0.000159607, logit: 16.0599},
{ token: 243, probability: 2.25609e-05, logit: 14.1034},
{ token: 23170, probability: 1.64754e-05, logit: 13.789},
{ token: 18527, probability: 9.54511e-06, logit: 13.2432},
{ token: 11284, probability: 5.65038e-06, logit: 12.7189},
{ token: 29956, probability: 4.85538e-06, logit: 12.5673},
{ token: 7030, probability: 4.57228e-06, logit: 12.5072},
{ token: 9360, probability: 3.22113e-06, logit: 12.1569},
{ token: 2, probability: 3.0808e-06, logit: 12.1124},
{ token: 17565, probability: 2.19157e-06, logit: 11.7718},
{ token: 18637, probability: 2.17322e-06, logit: 11.7634},
{ token: 5674, probability: 1.88993e-06, logit: 11.6237},
{ token: 3611, probability: 1.08913e-06, logit: 11.0725},
{ token: 3257, probability: 8.80308e-07, logit: 10.8597},
{ token: 29930, probability: 7.47415e-07, logit: 10.696},
{ token: 9070, probability: 6.84936e-07, logit: 10.6087},
{ token: 1148, probability: 6.06698e-07, logit: 10.4874},
{ token: 29979, probability: 5.32776e-07, logit: 10.3575},
{ token: 9806, probability: 4.99208e-07, logit: 10.2924},
{ token: 2776, probability: 4.0466e-07, logit: 10.0825},
{ token: 8187, probability: 3.9383e-07, logit: 10.0553},
{ token: 29949, probability: 3.60271e-07, logit: 9.96626},
{ token: 10994, probability: 3.32183e-07, logit: 9.88509},
{ token: 29898, probability: 2.46243e-07, logit: 9.58573},
{ token: 1068, probability: 2.04658e-07, logit: 9.40075},
{ token: 27269, probability: 1.98792e-07, logit: 9.37167},
{ token: 827, probability: 1.87496e-07, logit: 9.31317},
{ token: 5872, probability: 1.8679e-07, logit: 9.30939},
{ token: 5634, probability: 1.83938e-07, logit: 9.29401},
{ token: 22110, probability: 1.67096e-07, logit: 9.19798},
{ token: 1532, probability: 1.5426e-07, logit: 9.11805},
{ token: 229, probability: 1.44642e-07, logit: 9.05367},
{ token: 9800, probability: 1.24852e-07, logit: 8.90654},
{ token: 399, probability: 1.23796e-07, logit: 8.89804},
{ token: 14962, probability: 1.16326e-07, logit: 8.83581},
{ token: 8851, probability: 1.08058e-07, logit: 8.76208},
{ token: 29909, probability: 1.06776e-07, logit: 8.75014},
{ token: 8879, probability: 9.68724e-08, logit: 8.6528 }
]

More detailed logs:
llama.cpp.good.log
llama.cpp.nan.log

<!-- gh-comment-id:1965472260 --> @Todd-Fulton commented on GitHub (Feb 26, 2024): I'm still getting familiar with these code bases, but I did some print debugging in `llama_sample_softmax` and `llama_sample_token` and sure enough, there are nans everywhere on short prompt, it's fairly reproducible on my end. I built both ollama and llama.cpp from their respective main branches, but took out the check for AMD version > 9 in ollama. In file `llama.cpp`, with the logging that I put in. ```c++ void llama_sample_softmax(struct llama_context * ctx, llama_token_data_array * candidates) { //... //... float max_l = candidates->data[0].logit; float cum_sum = 0.0f; std::stringstream plogs; for (size_t i = 0; i < candidates->size; ++i) { float p = expf(candidates->data[i].logit - max_l); candidates->data[i].p = p; cum_sum += p; } for (size_t i = 0; i < candidates->size - 1; ++i) { candidates->data[i].p /= cum_sum; plogs << "{ token: " << candidates->data[i].id << ", probability: " << candidates->data[i].p << ", logit: " << candidates->data[i].logit << "},\n"; } candidates->data[candidates->size - 1].p /= cum_sum; plogs << "{ token: " << candidates->data[candidates->size - 1].id << ", probability: " << candidates->data[candidates->size - 1].p << ", logit: " << candidates->data[candidates->size - 1].logit << " }\n"; std::string plogs_string = plogs.str(); LLAMA_LOG_INFO("Probabilities: [%s]\n", plogs_string.data()); //... } ``` I'll do my best to track down where the nans are coming from, it might be the gpu, which I have little experience in. I might try building rocm6.x from source if I can find an option to enable gfx803 support in the cmake files, and then build against that in case it's a bug in rocm 5.7.1 that I have installed. Short prompt, nans, nans everywhere: ``` ➜ ollama-rocm ollama run llama2 >>> Why is the sky blue? ####################################################################################^C [Server side] {"cached":"","function":"update_slots","level":"VERB","line":1876,"msg":"prompt ingested","n_past":0,"tid":"139442878678720","timestamp":1708984989, "to_eval":" [INST] <<SYS>><</SYS>>\n\nWhy is the sky blue? [/INST]\n"} Probabilities: [{ token: 38, probability: nan, logit: nan}, { token: 22, probability: nan, logit: nan}, { token: 10, probability: nan, logit: nan}, { token: 34, probability: nan, logit: nan}, { token: 26, probability: nan, logit: nan}, { token: 18, probability: nan, logit: nan}, { token: 20, probability: nan, logit: nan}, { token: 4, probability: nan, logit: nan}, { token: 24, probability: nan, logit: nan}, { token: 12, probability: nan, logit: nan}, { token: 32, probability: nan, logit: nan}, { token: 28, probability: nan, logit: nan}, { token: 16, probability: nan, logit: nan}, { token: 36, probability: nan, logit: nan}, { token: 8, probability: nan, logit: nan}, { token: 39, probability: nan, logit: nan}, { token: 9, probability: nan, logit: nan}, { token: 21, probability: nan, logit: nan}, { token: 1, probability: nan, logit: nan}, { token: 23, probability: nan, logit: nan}, { token: 11, probability: nan, logit: nan}, { token: 25, probability: nan, logit: nan}, { token: 5, probability: nan, logit: nan}, { token: 27, probability: nan, logit: nan}, { token: 13, probability: nan, logit: nan}, { token: 31, probability: nan, logit: nan}, { token: 29, probability: nan, logit: nan}, { token: 15, probability: nan, logit: nan}, { token: 33, probability: nan, logit: nan}, { token: 7, probability: nan, logit: nan}, { token: 35, probability: nan, logit: nan}, { token: 17, probability: nan, logit: nan}, { token: 37, probability: nan, logit: nan}, { token: 3, probability: nan, logit: nan}, { token: 19, probability: nan, logit: nan}, { token: 0, probability: nan, logit: nan}, { token: 2, probability: nan, logit: nan}, { token: 6, probability: nan, logit: nan}, { token: 14, probability: nan, logit: nan}, { token: 30, probability: nan, logit: nan } ] ``` A little bit longer prompt, the calculations look right here: ``` >>> Why is the sky blue? Please explain it like I'm 5 years old. Use colorful language, but try ... to also explain the science. Oh my goodness, let me tell you a secret about the sky! *winks* It's so cool! *excited tone* The sky is blue because of something called light. *giggles* You know how things can look different colors when the light hits them from different angles? Like how a red apple looks red when the sun shines on it, but green when it's in shadow? Well, the sky does that too! *excited nod* So, when the sun shines on the Earth, it sends out all sorts of different colored lights. *giggles* Like, did you know that light can be red, orange, yellow, green, blue, and purple? Yep! And when these colors hit the Earth's atmosphere, they bounce around and mix together to make the sky look blue! It's like a big ol' party in the sky! *giggles* .... .... [Server Side] {"cached":"","function":"update_slots","level":"VERB","line":1876,"msg":"prompt ingested","n_past":0,"tid":"137691662386880","timestamp":1708985350, "to_eval":" [INST] <<SYS>><</SYS>>\n\nWhy is the sky blue? Please explain it like I'm 5 years old. Use colorful language, but try to also explain the science. [/INST]\n"} Probabilities: [{ token: 13, probability: 0.999558, logit: 24.8022}, { token: 9048, probability: 0.000197175, logit: 16.2713}, { token: 6439, probability: 0.000159607, logit: 16.0599}, { token: 243, probability: 2.25609e-05, logit: 14.1034}, { token: 23170, probability: 1.64754e-05, logit: 13.789}, { token: 18527, probability: 9.54511e-06, logit: 13.2432}, { token: 11284, probability: 5.65038e-06, logit: 12.7189}, { token: 29956, probability: 4.85538e-06, logit: 12.5673}, { token: 7030, probability: 4.57228e-06, logit: 12.5072}, { token: 9360, probability: 3.22113e-06, logit: 12.1569}, { token: 2, probability: 3.0808e-06, logit: 12.1124}, { token: 17565, probability: 2.19157e-06, logit: 11.7718}, { token: 18637, probability: 2.17322e-06, logit: 11.7634}, { token: 5674, probability: 1.88993e-06, logit: 11.6237}, { token: 3611, probability: 1.08913e-06, logit: 11.0725}, { token: 3257, probability: 8.80308e-07, logit: 10.8597}, { token: 29930, probability: 7.47415e-07, logit: 10.696}, { token: 9070, probability: 6.84936e-07, logit: 10.6087}, { token: 1148, probability: 6.06698e-07, logit: 10.4874}, { token: 29979, probability: 5.32776e-07, logit: 10.3575}, { token: 9806, probability: 4.99208e-07, logit: 10.2924}, { token: 2776, probability: 4.0466e-07, logit: 10.0825}, { token: 8187, probability: 3.9383e-07, logit: 10.0553}, { token: 29949, probability: 3.60271e-07, logit: 9.96626}, { token: 10994, probability: 3.32183e-07, logit: 9.88509}, { token: 29898, probability: 2.46243e-07, logit: 9.58573}, { token: 1068, probability: 2.04658e-07, logit: 9.40075}, { token: 27269, probability: 1.98792e-07, logit: 9.37167}, { token: 827, probability: 1.87496e-07, logit: 9.31317}, { token: 5872, probability: 1.8679e-07, logit: 9.30939}, { token: 5634, probability: 1.83938e-07, logit: 9.29401}, { token: 22110, probability: 1.67096e-07, logit: 9.19798}, { token: 1532, probability: 1.5426e-07, logit: 9.11805}, { token: 229, probability: 1.44642e-07, logit: 9.05367}, { token: 9800, probability: 1.24852e-07, logit: 8.90654}, { token: 399, probability: 1.23796e-07, logit: 8.89804}, { token: 14962, probability: 1.16326e-07, logit: 8.83581}, { token: 8851, probability: 1.08058e-07, logit: 8.76208}, { token: 29909, probability: 1.06776e-07, logit: 8.75014}, { token: 8879, probability: 9.68724e-08, logit: 8.6528 } ] ``` More detailed logs: [llama.cpp.good.log](https://github.com/ollama/ollama/files/14411946/llama.cpp.good.log) [llama.cpp.nan.log](https://github.com/ollama/ollama/files/14411947/llama.cpp.nan.log)
Author
Owner

@ianlacerda commented on GitHub (Feb 27, 2024):

Is it not possible to create a docker image that supports gfx803? It would be easier than doing trial and error. Two weeks ago I was trying to install Ollama for my RX580 and I was only able to use the CPU due to conflicting dependencies on Arch Linux and Ubuntu 22.04.

<!-- gh-comment-id:1967584161 --> @ianlacerda commented on GitHub (Feb 27, 2024): Is it not possible to create a docker image that supports gfx803? It would be easier than doing trial and error. Two weeks ago I was trying to install Ollama for my RX580 and I was only able to use the CPU due to conflicting dependencies on Arch Linux and Ubuntu 22.04.
Author
Owner

@Todd-Fulton commented on GitHub (Mar 1, 2024):

This issue on llama.cpp seems to be the same bug.

I'm currently going through the Rocm stack and building it from source using the main branches and trying to find out if I can reintroduce rx580 "support" with patches if needed. I will put up a script and patches if I'm successful in that and it solves the problem. We could create a docker image from that script, or just use the script to create binary packages, or PKGBUILDS if it comes to that. Various parts of the stack still seem to "support" gfx803 (rx580), while other seem to have at least officially dropped it, like rocBLAS (though it might still work if I just patch up the build scripts).

I don't think this is a bug in ollama, but further down the stack. For example, clr introduced a free(): invalid pointer bug somewhere between 6.0.0 (unreleased) and 6.0.2 tags, that was the reason I downgraded to 5.7.1. So it's a matter of finding which commit introduced that bug.

As for the gibberish, I think that's a result of nans coming from somewhere. It seems to be specific to gfx803, otherwise a lot more users would be reporting it, and that bug also occurs in rocm 5.7.1.

It might be worth trying even older versions of rocm than 5.7.1 if ollama and llama.cpp are still compatible with those, at least in the meantime. Adding support for older gpus without requiring downgrading rocm doesn't seem possible if rocm isn't going to support older gpus in the first place, users would still have to install older versions, or at least would require re-implementing that functionality.

If the gibberish is coming from clBLAST, then that narrows that down and rocm support for older gpus is just a side issue, I think users will either have to work on support in the open source, or just use older packages.

<!-- gh-comment-id:1974097127 --> @Todd-Fulton commented on GitHub (Mar 1, 2024): This [issue](https://github.com/ggerganov/llama.cpp/issues/5355) on llama.cpp seems to be the same bug. I'm currently going through the Rocm stack and building it from source using the main branches and trying to find out if I can reintroduce rx580 "support" with patches if needed. I will put up a script and patches if I'm successful in that and it solves the problem. We could create a docker image from that script, or just use the script to create binary packages, or PKGBUILDS if it comes to that. Various parts of the stack still seem to "support" gfx803 (rx580), while other seem to have at least officially dropped it, like rocBLAS (though it might still work if I just patch up the build scripts). I don't think this is a bug in ollama, but further down the stack. For example, [clr](https://github.com/ROCm/clr) introduced a `free(): invalid pointer` bug somewhere between 6.0.0 (unreleased) and 6.0.2 tags, that was the reason I downgraded to 5.7.1. So it's a matter of finding which commit introduced that bug. As for the gibberish, I think that's a result of `nans` coming from somewhere. It seems to be specific to gfx803, otherwise a lot more users would be reporting it, and that bug also occurs in rocm 5.7.1. It might be worth trying even older versions of rocm than 5.7.1 if ollama and llama.cpp are still compatible with those, at least in the meantime. Adding support for older gpus without requiring downgrading rocm doesn't seem possible if rocm isn't going to support older gpus in the first place, users would still have to install older versions, or at least would require re-implementing that functionality. If the gibberish is coming from clBLAST, then that narrows that down and rocm support for older gpus is just a side issue, I think users will either have to work on support in the open source, or just use older packages.
Author
Owner

@nphalem commented on GitHub (Mar 21, 2024):

Any progress on this... ROCm successfully detects my gfx803 and it should work but ollama is blocking the card :/

<!-- gh-comment-id:2013615722 --> @nphalem commented on GitHub (Mar 21, 2024): Any progress on this... ROCm successfully detects my gfx803 and it should work but ollama is blocking the card :/
Author
Owner

@wreckdump commented on GitHub (Mar 26, 2024):

Could this also be applied to gfx804?

<!-- gh-comment-id:2019306861 --> @wreckdump commented on GitHub (Mar 26, 2024): Could this also be applied to gfx804?
Author
Owner

@eorisis commented on GitHub (Mar 30, 2024):

Support for Radeon RX 580/590 (I have a 590) would be super nice. Tried Ollama 0.1.30 update and is not possible yet.

<!-- gh-comment-id:2027889596 --> @eorisis commented on GitHub (Mar 30, 2024): Support for Radeon RX 580/590 (I have a 590) would be super nice. Tried Ollama 0.1.30 update and is not possible yet.
Author
Owner

@siavashmohammady66 commented on GitHub (Apr 2, 2024):

Please add support Older GPU's like RX 580 as Llama.cpp already support those GPU's

<!-- gh-comment-id:2031028608 --> @siavashmohammady66 commented on GitHub (Apr 2, 2024): Please add support Older GPU's like RX 580 as Llama.cpp already support those GPU's
Author
Owner

@6b6279 commented on GitHub (Apr 22, 2024):

@Todd-Fulton That's a regression with ROCm versions 6.0.* (see https://github.com/rocm-arch/rocm-arch/issues/981). Downgrading to 5.7.1 will enable support for, e.g., Polaris cards again.

<!-- gh-comment-id:2069953002 --> @6b6279 commented on GitHub (Apr 22, 2024): @Todd-Fulton That's a regression with ROCm versions 6.0.* (see https://github.com/rocm-arch/rocm-arch/issues/981). Downgrading to 5.7.1 will enable support for, e.g., Polaris cards again.
Author
Owner

@manuelpaulo commented on GitHub (Apr 25, 2024):

Please add support Older GPU's like RX 580 as Llama.cpp already support those GPU's

True, using CLBlast.

<!-- gh-comment-id:2076197207 --> @manuelpaulo commented on GitHub (Apr 25, 2024): > Please add support Older GPU's like RX 580 as Llama.cpp already support those GPU's True, using CLBlast.
Author
Owner

@DerRehberg commented on GitHub (Apr 26, 2024):

@6b6279 Can you give me detailed Instructions how to downgrade to 5.7.1 on Arch? I got an Rx 580

<!-- gh-comment-id:2079557461 --> @DerRehberg commented on GitHub (Apr 26, 2024): @6b6279 Can you give me detailed Instructions how to downgrade to 5.7.1 on Arch? I got an Rx 580
Author
Owner

@6b6279 commented on GitHub (Apr 26, 2024):

@DerRehberg Try downgrade rocm-opencl-runtime and choose 5.7.1 as the target version. Don't forget to add the package in IgnorePkg to pin that version until you manually update.

(downgrade is available on the AUR: https://aur.archlinux.org/packages/downgrade)

ollama won't use the GPU regardless, but it'll enable support for, e.g., the RX 580, while using darktable.

<!-- gh-comment-id:2079583803 --> @6b6279 commented on GitHub (Apr 26, 2024): @DerRehberg Try `downgrade rocm-opencl-runtime` and choose 5.7.1 as the target version. Don't forget to add the package in IgnorePkg to pin that version until you manually update. (downgrade is available on the AUR: https://aur.archlinux.org/packages/downgrade) ollama won't use the GPU regardless, but it'll enable support for, e.g., the RX 580, while using darktable.
Author
Owner

@DerRehberg commented on GitHub (Apr 26, 2024):

@6b6279 And now give me detailed instruction how to run Stable Diffusion on an RX 580

<!-- gh-comment-id:2079681843 --> @DerRehberg commented on GitHub (Apr 26, 2024): @6b6279 And now give me detailed instruction how to run Stable Diffusion on an RX 580
Author
Owner

@6b6279 commented on GitHub (Apr 26, 2024):

@DerRehberg No idea. I use rocm only for image processing.

<!-- gh-comment-id:2079976881 --> @6b6279 commented on GitHub (Apr 26, 2024): @DerRehberg No idea. I use rocm only for image processing.
Author
Owner

@janstadt commented on GitHub (May 15, 2024):

Is there any update to this? I have a 580 and would like to use it in addition to another gpu.

<!-- gh-comment-id:2111463621 --> @janstadt commented on GitHub (May 15, 2024): Is there any update to this? I have a 580 and would like to use it in addition to another gpu.
Author
Owner

@jiriks74 commented on GitHub (May 15, 2024):

Helo. I'm a user of an Radeon Rx580 8GB and the statement that

Officially ROCm no longer supports these cards

is not entirely true. While it is not officially supported anymore you don't really need any workarounds to make ROCm work with these GPUs. I've been using OpenCL through ROCm for quite some time in Blender without any issues at all. All I needed to do is set an environment variable: ROC_ENABLE_PRE_VEGA=1 and the GPU just worked.

I've tried dong so with Ollama but it seems that it disables the GPU manually as unsupported even if ROCm is able to run on it.

From ArchWiki

unofficial and partial support for Navi10 based cards. To support cards older than Vega, you need to set the runtime variable ROC_ENABLE_PRE_VEGA=1.

Note

I haven't used blender for some time and I switched to NixOS so I didn't test it right now. But if someone wants me to I'll look into it and see whether I can run ROCm on tha card without any additional setup.

<!-- gh-comment-id:2113329084 --> @jiriks74 commented on GitHub (May 15, 2024): Helo. I'm a user of an Radeon Rx580 8GB and the statement that > Officially ROCm no longer supports these cards is not entirely true. While it is not *officially* supported anymore you don't really need any workarounds to make ROCm work with these GPUs. I've been using OpenCL through ROCm for quite some time in Blender without any issues at all. All I needed to do is set an environment variable: `ROC_ENABLE_PRE_VEGA=1` and the GPU just worked. I've tried dong so with Ollama but it seems that it disables the GPU manually as unsupported even if ROCm is able to run on it. From [ArchWiki](https://wiki.archlinux.org/title/GPGPU#AMD/ATI) > unofficial and partial support for Navi10 based cards. To support cards older than Vega, you need to set the runtime variable ROC_ENABLE_PRE_VEGA=1. > [!Note] > I haven't used blender for some time and I switched to NixOS so I didn't test it right now. But if someone wants me to I'll look into it and see whether I can run ROCm on tha card without any additional setup.
Author
Owner

@Darin755 commented on GitHub (Jun 15, 2024):

I think this is really exciting as a RX580 on ebay is much more affordable. If support could be added it should be possible to built an AI machine for under 200 USD.

<!-- gh-comment-id:2170275770 --> @Darin755 commented on GitHub (Jun 15, 2024): I think this is really exciting as a RX580 on ebay is much more affordable. If support could be added it should be possible to built an AI machine for under 200 USD.
Author
Owner

@jidaojiuyou commented on GitHub (Jun 27, 2024):

闲鱼上的rx580/rx590 200-300元。如果ollama能支持rx580就太好了。或者有什么方案能让用户自己构建ollama也可以。

<!-- gh-comment-id:2194793980 --> @jidaojiuyou commented on GitHub (Jun 27, 2024): 闲鱼上的rx580/rx590 200-300元。如果ollama能支持rx580就太好了。或者有什么方案能让用户自己构建ollama也可以。
Author
Owner

@mgielissen commented on GitHub (Jul 4, 2024):

For example: GPT4All 3.0 (Vulkan support) works great with RX580 (8GB). Tested with Windows 10 and latest AMD driver.

<!-- gh-comment-id:2208190336 --> @mgielissen commented on GitHub (Jul 4, 2024): For example: GPT4All 3.0 (Vulkan support) works great with RX580 (8GB). Tested with Windows 10 and latest AMD driver.
Author
Owner

@manuelpaulo commented on GitHub (Jul 8, 2024):

For example: GPT4All 3.0 (Vulkan support) works great with RX580 (8GB). Tested with Windows 10 and latest AMD driver.

Also tested GPT4All 3.0 on an old PC with an RX580 (8GB) and it works nicely on Windows 10, with ROC_ENABLE_PRE_VEGA env variable set to 1, and HSA_OVERRIDE_GFX_VERSION set to "10.3.0". AMD HIP SDK version 5.5.1 only, not 5.7.1.

<!-- gh-comment-id:2212866212 --> @manuelpaulo commented on GitHub (Jul 8, 2024): > For example: GPT4All 3.0 (Vulkan support) works great with RX580 (8GB). Tested with Windows 10 and latest AMD driver. Also tested GPT4All 3.0 on an old PC with an RX580 (8GB) and it works nicely on Windows 10, with ROC_ENABLE_PRE_VEGA env variable set to 1, and HSA_OVERRIDE_GFX_VERSION set to "10.3.0". AMD HIP SDK version 5.5.1 only, not 5.7.1.
Author
Owner

@yourchanges commented on GitHub (Jul 18, 2024):

for ollama running on gfx803 on windows, just check https://github.com/likelovewant/ollama-for-amd https://github.com/likelovewant/ollama-for-amd/wiki

Tested ollama 0.2.5/0.2.7 RX 570 4GB memory on win10 64bit, it run ollama run phi3 well ,but sometimes it failed with

llm_load_tensors:      ROCm0 buffer size =  2021.84 MiB
llm_load_tensors:        CPU buffer size =    52.84 MiB
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      ROCm0 KV buffer size =   768.00 MiB
llama_new_context_with_model: KV self size  =  768.00 MiB, K (f16):  384.00 MiB, V (f16):  384.00 MiB
ggml_cuda_host_malloc: failed to allocate 0.13 MiB of pinned memory: hipErrorOutOfMemory
llama_new_context_with_model:        CPU  output buffer size =     0.13 MiB
ggml_cuda_host_malloc: failed to allocate 10.01 MiB of pinned memory: hipErrorOutOfMemory
llama_new_context_with_model:      ROCm0 compute buffer size =   168.00 MiB
llama_new_context_with_model:  ROCm_Host compute buffer size =    10.01 MiB

see https://github.com/likelovewant/ollama-for-amd/issues/8#issuecomment-2238714971

<!-- gh-comment-id:2236193832 --> @yourchanges commented on GitHub (Jul 18, 2024): for ollama running on gfx803 on windows, just check https://github.com/likelovewant/ollama-for-amd https://github.com/likelovewant/ollama-for-amd/wiki Tested ollama 0.2.5/0.2.7 RX 570 4GB memory on win10 64bit, it run `ollama run phi3` well ,but sometimes it failed with ``` llm_load_tensors: ROCm0 buffer size = 2021.84 MiB llm_load_tensors: CPU buffer size = 52.84 MiB llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: ROCm0 KV buffer size = 768.00 MiB llama_new_context_with_model: KV self size = 768.00 MiB, K (f16): 384.00 MiB, V (f16): 384.00 MiB ggml_cuda_host_malloc: failed to allocate 0.13 MiB of pinned memory: hipErrorOutOfMemory llama_new_context_with_model: CPU output buffer size = 0.13 MiB ggml_cuda_host_malloc: failed to allocate 10.01 MiB of pinned memory: hipErrorOutOfMemory llama_new_context_with_model: ROCm0 compute buffer size = 168.00 MiB llama_new_context_with_model: ROCm_Host compute buffer size = 10.01 MiB ``` see https://github.com/likelovewant/ollama-for-amd/issues/8#issuecomment-2238714971
Author
Owner

@KazeLiu commented on GitHub (Jul 25, 2024):

希望能提供RX580的支持
Hope to provide support for RX580

<!-- gh-comment-id:2249550419 --> @KazeLiu commented on GitHub (Jul 25, 2024): 希望能提供RX580的支持 Hope to provide support for RX580
Author
Owner

@yourchanges commented on GitHub (Jul 26, 2024):

@KazeLiu rx580 === rx 570 , they are both gfx803.

<!-- gh-comment-id:2251816311 --> @yourchanges commented on GitHub (Jul 26, 2024): @KazeLiu rx580 === rx 570 , they are both gfx803.
Author
Owner

@KazeLiu commented on GitHub (Jul 26, 2024):

@yourchanges
Thank you, this is my first time installing a large model, and I still don't quite understand the process. My environment is an internal offline network. My current process is to first enter the ollama-for-amd project, then download ollama-windows-amd64.7z and OllamaSetup.exe from version 0.2.8. After transferring them to the internal network, I first install OllamaSetup and then extract ollama-windows-amd64.7z, replacing the files in the Ollama folder. Then, I enter the ROCmLibs-for-gfx1103-AMD780M-APU project and download version 0.6.1.2 of rocm.gfx803.optic.vega10.logic.hip.sdk.6.1.2.7z. I download it and replace the files in the rocm folder within the Ollama folder. Then, I run the glm4 that I downloaded before going offline. When I check using ollama ps, I find that the CPU is still at 100%. I want to know which step I didn't do correctly.

谢谢,我是第一次安装大模型,这边还是不太明白内容,我这边的环境是内网离线环境。我现在的流程就是先进入ollama-for-amd项目,然后下载V0.2.8版本中的ollama-windows-amd64.7z和OllamaSetup.exe。传输到内网后,我先安装了OllamaSetup然后把ollama-windows-amd64.7z解压并替换掉了Ollama文件夹内的文件。然后进入了ROCmLibs-for-gfx1103-AMD780M-APU项目,下载 v0.6.1.2 的 rocm.gfx803.optic.vega10.logic.hip.sdk.6.1.2.7z。下载并替换掉Ollama文件夹内rocm文件夹内的文件。然后我将断网之前下载的glm4运行,用ollama ps查看,发现CPU还是100%。我想知道我是哪一步没有做好。

<!-- gh-comment-id:2251850227 --> @KazeLiu commented on GitHub (Jul 26, 2024): @yourchanges Thank you, this is my first time installing a large model, and I still don't quite understand the process. My environment is an internal offline network. My current process is to first enter the ollama-for-amd project, then download ollama-windows-amd64.7z and OllamaSetup.exe from version 0.2.8. After transferring them to the internal network, I first install OllamaSetup and then extract ollama-windows-amd64.7z, replacing the files in the Ollama folder. Then, I enter the ROCmLibs-for-gfx1103-AMD780M-APU project and download version 0.6.1.2 of rocm.gfx803.optic.vega10.logic.hip.sdk.6.1.2.7z. I download it and replace the files in the rocm folder within the Ollama folder. Then, I run the glm4 that I downloaded before going offline. When I check using ollama ps, I find that the CPU is still at 100%. I want to know which step I didn't do correctly. 谢谢,我是第一次安装大模型,这边还是不太明白内容,我这边的环境是内网离线环境。我现在的流程就是先进入ollama-for-amd项目,然后下载V0.2.8版本中的ollama-windows-amd64.7z和OllamaSetup.exe。传输到内网后,我先安装了OllamaSetup然后把ollama-windows-amd64.7z解压并替换掉了Ollama文件夹内的文件。然后进入了ROCmLibs-for-gfx1103-AMD780M-APU项目,下载 v0.6.1.2 的 rocm.gfx803.optic.vega10.logic.hip.sdk.6.1.2.7z。下载并替换掉Ollama文件夹内rocm文件夹内的文件。然后我将断网之前下载的glm4运行,用ollama ps查看,发现CPU还是100%。我想知道我是哪一步没有做好。
Author
Owner

@yourchanges commented on GitHub (Jul 26, 2024):

@KazeLiu You should check your logs of the ollama serve ,

D:\ollama_windows-amd64>ollama serve
2024/07/19 16:57:05 routes.go:965: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:C:\\Users\\Administrator\\.ollama\\models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR:D:\\ollama_windows-amd64\\ollama_runners OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-07-19T16:57:05.791+08:00 level=INFO source=images.go:760 msg="total blobs: 10"
time=2024-07-19T16:57:05.794+08:00 level=INFO source=images.go:767 msg="total unused blobs removed: 0"
time=2024-07-19T16:57:05.795+08:00 level=INFO source=routes.go:1012 msg="Listening on 127.0.0.1:11434 (version 0.2.5-0-gc7e2f88)"
time=2024-07-19T16:57:05.809+08:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 rocm_v5.7]"
time=2024-07-19T16:57:05.810+08:00 level=INFO source=gpu.go:205 msg="looking for compatible GPUs"
time=2024-07-19T16:57:06.902+08:00 level=INFO source=types.go:105 msg="inference compute" id=0 library=rocm compute=gfx803 driver=5.2 name="Radeon RX 570 Series" total="4.0 GiB" available="3.9 GiB"

key info: library=rocm compute=gfx803 driver=5.2 name="Radeon RX 570 Series" total="4.0 GiB" available="3.9 GiB" , if you got something like this, your llm run on your gpu, otherwise not.

you can carefully follow the https://github.com/likelovewant/ollama-for-amd/issues/8 maybe there are something wrong with you driver or hip sdk

<!-- gh-comment-id:2251861647 --> @yourchanges commented on GitHub (Jul 26, 2024): @KazeLiu You should check your logs of the `ollama serve` , ``` D:\ollama_windows-amd64>ollama serve 2024/07/19 16:57:05 routes.go:965: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:C:\\Users\\Administrator\\.ollama\\models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR:D:\\ollama_windows-amd64\\ollama_runners OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" time=2024-07-19T16:57:05.791+08:00 level=INFO source=images.go:760 msg="total blobs: 10" time=2024-07-19T16:57:05.794+08:00 level=INFO source=images.go:767 msg="total unused blobs removed: 0" time=2024-07-19T16:57:05.795+08:00 level=INFO source=routes.go:1012 msg="Listening on 127.0.0.1:11434 (version 0.2.5-0-gc7e2f88)" time=2024-07-19T16:57:05.809+08:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 rocm_v5.7]" time=2024-07-19T16:57:05.810+08:00 level=INFO source=gpu.go:205 msg="looking for compatible GPUs" time=2024-07-19T16:57:06.902+08:00 level=INFO source=types.go:105 msg="inference compute" id=0 library=rocm compute=gfx803 driver=5.2 name="Radeon RX 570 Series" total="4.0 GiB" available="3.9 GiB" ``` key info: library=rocm compute=gfx803 driver=5.2 name="Radeon RX 570 Series" total="4.0 GiB" available="3.9 GiB" , if you got something like this, your llm run on your gpu, otherwise not. you can carefully follow the https://github.com/likelovewant/ollama-for-amd/issues/8 maybe there are something wrong with you driver or hip sdk
Author
Owner

@jeromechungmf commented on GitHub (Aug 6, 2024):

may be try this https://github.com/YellowRoseCx/koboldcpp-rocm/releases/tag/v1.71.1.yr0-ROCm ,it's work fine for me.my gpu card is amd rx580,os is almalinux 9.if ollama not work,try koboldcpp-rocm instead.before install koboldcpp-rocm,rocm 5.7.1 enviorment must be ready.try this https://www.youtube.com/watch?v=ljXdEih2GOI&ab_channel=LinuxMadeEZ video step by step,finish the rocm 5.7.1 enviorment ready,good luck.

<!-- gh-comment-id:2271476251 --> @jeromechungmf commented on GitHub (Aug 6, 2024): may be try this https://github.com/YellowRoseCx/koboldcpp-rocm/releases/tag/v1.71.1.yr0-ROCm ,it's work fine for me.my gpu card is amd rx580,os is almalinux 9.if ollama not work,try koboldcpp-rocm instead.before install koboldcpp-rocm,rocm 5.7.1 enviorment must be ready.try this https://www.youtube.com/watch?v=ljXdEih2GOI&ab_channel=LinuxMadeEZ video step by step,finish the rocm 5.7.1 enviorment ready,good luck.
Author
Owner

@zuli12-dev commented on GitHub (Aug 10, 2024):

came across this as i am in the same situation with an old RX580 without any budget to get something new und unable to get it running, neither ollama (only cpu) nor comfyui.. trying koboldcpp but not all models are supported there...

Very sad, please let me know if you need any information, debug output etc...

Hope you can implement it some day 👍

cheers

<!-- gh-comment-id:2282188808 --> @zuli12-dev commented on GitHub (Aug 10, 2024): came across this as i am in the same situation with an old RX580 without any budget to get something new und unable to get it running, neither ollama (only cpu) nor comfyui.. trying koboldcpp but not all models are supported there... Very sad, please let me know if you need any information, debug output etc... Hope you can implement it some day 👍 cheers
Author
Owner

@jeromechungmf commented on GitHub (Aug 16, 2024):

came across this as i am in the same situation with an old RX580 without any budget to get something new und unable to get it running, neither ollama (only cpu) nor comfyui.. trying koboldcpp but not all models are supported there...

Very sad, please let me know if you need any information, debug output etc...

Hope you can implement it some day 👍

cheers

Hi

May be try https://discuss.linuxcontainers.org/t/llama-cpp-and-ollama-servers-plugins-for-vs-code-vs-codium-and-intellij-ai/19744 ,you can recompile ollama code for rx580.

https://blog.lyric.im/p/using-llamacpp-to-run-llama-2-using-amd-radeon-rx-6900-for-gpu-acceleration you can recompile llamacpp code for rx580.

You are right,koboldcpp is not all models are supported,even have stable issue.

May be the llamacpp or ollama is the good choice.

https://github.com/robertrosenbusch/gfx803_rocm61_pt24/tree/main?tab=readme-ov-file ,this repo can build run rx580 gpu card enviroment image.in my experince ,stable diffusion+comfyui both can run in this enviroment.

Good luck.

PS:if the rx580 must be install,please remember install rcom5.7,the gfx803 dat file in there.

Jerome

<!-- gh-comment-id:2293791945 --> @jeromechungmf commented on GitHub (Aug 16, 2024): > came across this as i am in the same situation with an old RX580 without any budget to get something new und unable to get it running, neither ollama (only cpu) nor comfyui.. trying koboldcpp but not all models are supported there... > > Very sad, please let me know if you need any information, debug output etc... > > Hope you can implement it some day 👍 > > cheers Hi May be try https://discuss.linuxcontainers.org/t/llama-cpp-and-ollama-servers-plugins-for-vs-code-vs-codium-and-intellij-ai/19744 ,you can recompile ollama code for rx580. https://blog.lyric.im/p/using-llamacpp-to-run-llama-2-using-amd-radeon-rx-6900-for-gpu-acceleration you can recompile llamacpp code for rx580. You are right,koboldcpp is not all models are supported,even have stable issue. May be the llamacpp or ollama is the good choice. https://github.com/robertrosenbusch/gfx803_rocm61_pt24/tree/main?tab=readme-ov-file ,this repo can build run rx580 gpu card enviroment image.in my experince ,stable diffusion+comfyui both can run in this enviroment. Good luck. PS:if the rx580 must be install,please remember install rcom5.7,the gfx803 dat file in there. Jerome
Author
Owner

@KhazAkar commented on GitHub (Aug 17, 2024):

As alternative, maybe pushing Vulkan support would be reasonable? It's here:
https://github.com/ollama/ollama/issues/2033

<!-- gh-comment-id:2294866793 --> @KhazAkar commented on GitHub (Aug 17, 2024): As alternative, maybe pushing Vulkan support would be reasonable? It's here: https://github.com/ollama/ollama/issues/2033
Author
Owner

@jeromechungmf commented on GitHub (Aug 17, 2024):

As alternative, maybe pushing Vulkan support would be reasonable? It's here: #2033

The LM Studio is support vulkan,but still have stable issue.I have previous experience using LM Studio. After running several LLM models on LM Studio, it suddenly stopped working properly, and Koboldcpp also started experiencing issues.

So, I am currently reprogramming with Docker and Rocm,in this enviroment to reinstall and compile Ollama, allowing Ollama to run LLM models on Docker with the RX580. Since running Stable Diffusion in the Rocm environment seems very stable—I’ve generated dozens of images over an entire day without any issues—I’m confident this setup will work well.

Perhaps these experiences might be useful as a reference.

Good Luck.

Jerome

<!-- gh-comment-id:2294885460 --> @jeromechungmf commented on GitHub (Aug 17, 2024): > As alternative, maybe pushing Vulkan support would be reasonable? It's here: #2033 The LM Studio is support vulkan,but still have stable issue.I have previous experience using LM Studio. After running several LLM models on LM Studio, it suddenly stopped working properly, and Koboldcpp also started experiencing issues. So, I am currently reprogramming with Docker and Rocm,in this enviroment to reinstall and compile Ollama, allowing Ollama to run LLM models on Docker with the RX580. Since running Stable Diffusion in the Rocm environment seems very stable—I’ve generated dozens of images over an entire day without any issues—I’m confident this setup will work well. Perhaps these experiences might be useful as a reference. Good Luck. Jerome
Author
Owner

@KhazAkar commented on GitHub (Aug 18, 2024):

@jeromechungmf have you tried alternative to LM Studio, called gpt4all? They're having AFAIK best Vulkan support for LLM acceleration. I'm using RX 5500M and it does wonders, where ROCm installation for this GPU, on Ubuntu 24.04, is a nightmare or hackfest, or boht.

<!-- gh-comment-id:2295342217 --> @KhazAkar commented on GitHub (Aug 18, 2024): @jeromechungmf have you tried alternative to LM Studio, called gpt4all? They're having AFAIK best Vulkan support for LLM acceleration. I'm using RX 5500M and it does wonders, where ROCm installation for this GPU, on Ubuntu 24.04, is a nightmare or hackfest, or boht.
Author
Owner

@jeromechungmf commented on GitHub (Aug 18, 2024):

@jeromechungmf have you tried alternative to LM Studio, called gpt4all? They're having AFAIK best Vulkan support for LLM acceleration. I'm using RX 5500M and it does wonders, where ROCm installation for this GPU, on Ubuntu 24.04, is a nightmare or hackfest, or boht.

Finally, i found the docker image can support the rx580 gpu card for ollama, https://hub.docker.com/r/bergutman/ollama-rocm .
if you still use rx580 gpu card,docker pull the bergutman/ollama-rocm image,follow the overview ,use the docker-compose,run the bergutman/ollama-rocm image build to docker container.you can use 'docker logs containerID' command,see the ollama can get rx580 vram 8G.

I will likely continue testing the RX580 paired with Ollama to run various LLM large language models.

Hope these experiences might be useful as a reference.

Good Luck

Jerome

<!-- gh-comment-id:2295365706 --> @jeromechungmf commented on GitHub (Aug 18, 2024): > @jeromechungmf have you tried alternative to LM Studio, called gpt4all? They're having AFAIK best Vulkan support for LLM acceleration. I'm using RX 5500M and it does wonders, where ROCm installation for this GPU, on Ubuntu 24.04, is a nightmare or hackfest, or boht. Finally, i found the docker image can support the rx580 gpu card for ollama, https://hub.docker.com/r/bergutman/ollama-rocm . if you still use rx580 gpu card,docker pull the bergutman/ollama-rocm image,follow the overview ,use the docker-compose,run the bergutman/ollama-rocm image build to docker container.you can use 'docker logs containerID' command,see the ollama can get rx580 vram 8G. I will likely continue testing the RX580 paired with Ollama to run various LLM large language models. Hope these experiences might be useful as a reference. Good Luck Jerome
Author
Owner

@jeromechungmf commented on GitHub (Aug 19, 2024):

@jeromechungmf have you tried alternative to LM Studio, called gpt4all? They're having AFAIK best Vulkan support for LLM acceleration. I'm using RX 5500M and it does wonders, where ROCm installation for this GPU, on Ubuntu 24.04, is a nightmare or hackfest, or boht.

Finally, i found the docker image can support the rx580 gpu card for ollama, https://hub.docker.com/r/bergutman/ollama-rocm . if you still use rx580 gpu card,docker pull the bergutman/ollama-rocm image,follow the overview ,use the docker-compose,run the bergutman/ollama-rocm image build to docker container.you can use 'docker logs containerID' command,see the ollama can get rx580 vram 8G.

I will likely continue testing the RX580 paired with Ollama to run various LLM large language models.

Hope these experiences might be useful as a reference.

Good Luck

Jerome

After try download llama3 model for bergutman/ollama-rocm,i get error: template: :7:3: executing "" at <.response>: can't evaluate field response in type struct { first bool; system string; prompt string } #4057 ,the ollama version too old.

It's very sad.

Try llamacpp on docker or LM-Studio on Ubuntu 22.04 is my next step.

Jerome

<!-- gh-comment-id:2297069857 --> @jeromechungmf commented on GitHub (Aug 19, 2024): > > @jeromechungmf have you tried alternative to LM Studio, called gpt4all? They're having AFAIK best Vulkan support for LLM acceleration. I'm using RX 5500M and it does wonders, where ROCm installation for this GPU, on Ubuntu 24.04, is a nightmare or hackfest, or boht. > > Finally, i found the docker image can support the rx580 gpu card for ollama, https://hub.docker.com/r/bergutman/ollama-rocm . if you still use rx580 gpu card,docker pull the bergutman/ollama-rocm image,follow the overview ,use the docker-compose,run the bergutman/ollama-rocm image build to docker container.you can use 'docker logs containerID' command,see the ollama can get rx580 vram 8G. > > I will likely continue testing the RX580 paired with Ollama to run various LLM large language models. > > Hope these experiences might be useful as a reference. > > Good Luck > > Jerome After try download llama3 model for bergutman/ollama-rocm,i get error: template: :7:3: executing "" at <.response>: can't evaluate field response in type struct { first bool; system string; prompt string } #4057 ,the ollama version too old. It's very sad. Try llamacpp on docker or LM-Studio on Ubuntu 22.04 is my next step. Jerome
Author
Owner

@Tamila-2017 commented on GitHub (Sep 13, 2024):

Hello guys,

I am very interested in AI models for Ollama for text translation.
And I even managed to run Gemma 2 2B and Llama 3.1 8B on N100 CPU and Debian 12 OS.
The possibilities of their intelligence I liked very much.
Unfortunately the speed of these models on N100 is not very fast.
So I want to increase their speed with a graphics card.

I can only afford an old SAPHIRE NITRO RX480 8 GB graphics card.
But and unfortunately Ollama refuses to work with it.
Yes, I know there is still no official support for this graphics card: https://ollama.com/blog/amd-preview
There are a lot of discussions and attempts to implement this support on the web, for example: 1, 2, 3
And maybe some real solution for RX580 has already been found?

Tell me about it, please!

<!-- gh-comment-id:2350076847 --> @Tamila-2017 commented on GitHub (Sep 13, 2024): Hello guys, I am very interested in AI models for Ollama for text translation. And I even managed to run Gemma 2 2B and Llama 3.1 8B on N100 CPU and Debian 12 OS. The possibilities of their intelligence I liked very much. Unfortunately the speed of these models on N100 is not very fast. So I want to increase their speed with a graphics card. I can only afford an old SAPHIRE NITRO RX480 8 GB graphics card. But and unfortunately Ollama refuses to work with it. Yes, I know there is still no official support for this graphics card: https://ollama.com/blog/amd-preview There are a lot of discussions and attempts to implement this support on the web, for example: [1](https://www.reddit.com/r/ROCm/comments/178tco2/rocm_571_on_polaris_cards_like_the_rx_580_8gb/), [2](https://news.ycombinator.com/item?id=37070293), [3](https://hub.docker.com/r/bergutman/ollama-rocm) And maybe some real solution for RX580 has already been found? Tell me about it, please!
Author
Owner

@mnccouk commented on GitHub (Sep 15, 2024):

I have some success running Ollama using the 5.7.1 Rocm libraries with Radeon RX 580 GPU - Ubuntu 22.04.4 LTS, from within a docker container. Only tried so far with the llama3.1 model but seems to work ok.

The process I went through was just to focus and fix errors I encountered while trying to get something working, so this is by no means a proper fix for this issue.

The changes are based on the main Ollama branch from 6th Sept 2024, it would be interesting if somebody else could try to see if it works them too.

Repo with the changes - https://github.com/mnccouk/ollama/tree/rx580_gpu

See the top of the readme for some basic instructions to build. I only focused on the docker build, so make sure to install Rocm libraries on the docker host machine first (5.7.1).

<!-- gh-comment-id:2351641667 --> @mnccouk commented on GitHub (Sep 15, 2024): I have some success running Ollama using the 5.7.1 Rocm libraries with Radeon RX 580 GPU - Ubuntu 22.04.4 LTS, from within a docker container. Only tried so far with the llama3.1 model but seems to work ok. The process I went through was just to focus and fix errors I encountered while trying to get something working, so this is by no means a proper fix for this issue. The changes are based on the main Ollama branch from 6th Sept 2024, it would be interesting if somebody else could try to see if it works them too. Repo with the changes - https://github.com/mnccouk/ollama/tree/rx580_gpu See the top of the readme for some basic instructions to build. I only focused on the docker build, so make sure to install Rocm libraries on the docker host machine first (5.7.1).
Author
Owner

@jeromechungmf commented on GitHub (Sep 15, 2024):

Hello @Tamila-2017
I have been trying to run AI large language models (LLMs) on my PC using an RX580 GPU. I've tried two main approaches so far:
My Experience:

Linux with Vulkan:
I install Ubuntu 24.04 or 22.04, configuring it to use the Vulkan drivers, and then installing LMStudio. The goal is to make this setup work with my RX580 GPU for optimal LLM performance. However, attempting to install Vulkan drivers directly from AMD's website (for Ubuntu 22.04) proved unsuccessful. Instead, I found success by downloading the corresponding Vulkan driver package from https://vulkan.lunarg.com/sdk/home#linux

This allowed LMStudio to detect my RX580 GPU and use it for LLM calculations.

My Installation:
Currently, I am running Ubuntu Release 1.3.290 Ubuntu 22.04.
Recommendations
Thorough Driver Research: Before installing anything, make sure you download the correct Vulkan driver based on your Ubuntu version from https://vulkan.lunarg.com/sdk/home#linux

Windows with Vulkan:
In Windows OS Enviroment,just download AMD Driver Installer , follow the GUI step by step,Finish the gpu driver install.The Vulkan driver for windows os have been ready .Finish the AMD Driver install,you can download LMStudio and Run it.you can see the LMStudio show GPU Infomation VRAM 8.0GB.

This is my setup for running AI language models using an RX580 graphics card. Please take a look and see if you can provide any guidance or suggestions.

Good luck.

Jerome

<!-- gh-comment-id:2351644575 --> @jeromechungmf commented on GitHub (Sep 15, 2024): Hello @Tamila-2017 I have been trying to run AI large language models (LLMs) on my PC using an RX580 GPU. I've tried two main approaches so far: My Experience: Linux with Vulkan: I install Ubuntu 24.04 or 22.04, configuring it to use the Vulkan drivers, and then installing LMStudio. The goal is to make this setup work with my RX580 GPU for optimal LLM performance. However, attempting to install Vulkan drivers directly from AMD's website (for Ubuntu 22.04) proved unsuccessful. Instead, I found success by downloading the corresponding Vulkan driver package from https://vulkan.lunarg.com/sdk/home#linux This allowed LMStudio to detect my RX580 GPU and use it for LLM calculations. My Installation: Currently, I am running Ubuntu Release 1.3.290 Ubuntu 22.04. Recommendations Thorough Driver Research: Before installing anything, make sure you download the correct Vulkan driver based on your Ubuntu version from https://vulkan.lunarg.com/sdk/home#linux Windows with Vulkan: In Windows OS Enviroment,just download AMD Driver Installer , follow the GUI step by step,Finish the gpu driver install.The Vulkan driver for windows os have been ready .Finish the AMD Driver install,you can download LMStudio and Run it.you can see the LMStudio show GPU Infomation VRAM 8.0GB. This is my setup for running AI language models using an RX580 graphics card. Please take a look and see if you can provide any guidance or suggestions. Good luck. Jerome
Author
Owner

@Tamila-2017 commented on GitHub (Sep 15, 2024):

Hi guys,

I am very grateful to all of you for your detailed account of how to solve my problem with the RX580.
Now I appeared a great chance to solve this problem and use RX580 to communicate with AI.

Can you please tell me, what is the speed increase of working with RX580 with AI compared to СPU N100?
Is it 10x, 100x, or maybe 1000x? :-)

<!-- gh-comment-id:2351787863 --> @Tamila-2017 commented on GitHub (Sep 15, 2024): Hi guys, I am very grateful to all of you for your detailed account of how to solve my problem with the RX580. Now I appeared a great chance to solve this problem and use RX580 to communicate with AI. Can you please tell me, what is the speed increase of working with RX580 with AI compared to СPU N100? Is it 10x, 100x, or maybe 1000x? :-)
Author
Owner

@KhazAkar commented on GitHub (Sep 15, 2024):

You can all try ollama-for-amd fork. I'm using it myself with my RX5500M card and ROCm 6.2. works wonders.

<!-- gh-comment-id:2351788528 --> @KhazAkar commented on GitHub (Sep 15, 2024): You can all try ollama-for-amd fork. I'm using it myself with my RX5500M card and ROCm 6.2. works wonders.
Author
Owner

@Tamila-2017 commented on GitHub (Sep 15, 2024):

KhazAkar,

Thank you, this is also very interesting information. Could you tell us more about this?
For example, what exactly is the name of this fork, where can I get it and how to use it.
And most importantly: you are using the RX 5500M. Will this fork work with the RX580?

<!-- gh-comment-id:2351812347 --> @Tamila-2017 commented on GitHub (Sep 15, 2024): KhazAkar, Thank you, this is also very interesting information. Could you tell us more about this? For example, what exactly is the name of this fork, where can I get it and how to use it. And most importantly: you are using the RX 5500M. Will this fork work with the RX580?
Author
Owner

@Tamila-2017 commented on GitHub (Sep 16, 2024):

jeromechungmf

I chose your advice first:

Linux with Vulkan:
I install Ubuntu 24.04 or 22.04, configuring it to use the Vulkan drivers, and then installing LMStudio. The goal is to make this setup work with my RX580 GPU for optimal LLM performance. However, attempting to install Vulkan drivers directly from AMD's website (for Ubuntu 22.04) proved unsuccessful. Instead, I found success by downloading the corresponding Vulkan driver package from https://vulkan.lunarg.com/sdk/home#linux

This allowed LMStudio to detect my RX580 GPU and use it for LLM calculations.

And it ended in failure. What I did, in detail:

  1. Install Ubuntu-24.04.1 LTS Server Minimal (without X's)
  2. wget -qO- https://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo tee /etc/apt/trusted.gpg.d/lunarg.asc sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-noble.list https://packages.lunarg.com/vulkan/lunarg-vulkan-noble.list sudo apt update sudo apt install vulkan-sdk
  3. Download LM_Studio-0.3.2.AppImage to /home/ai/.appimage
  4. Run LM Studio: needs FUSE
  5. sudo add-apt-repository universe sudo apt install libfuse2t64
  6. Run LM Studio: needs libnss3.so
  7. sudo apt install libnss3
  8. Run LM Studio: needs libasound
  9. sudo apt install libasound2t64
  10. Run LM Studio:

FATAL:setuid_sandbox_host.cc(158)] The SUID sandbox helper binary was found, but is not configured correctly.
Rather than run without sandboxing I'm aborting now.
You need to make sure that /tmp/.mount_LM_StusEylzO/chrome-sandbox is owned by root and has mode 4755.

But this file does not exist!

Next, I began to understand that a graphical environment is required to run LM Studio and I did not like it at all, because I want to using AI in a console environment, without X's.

That's why I stopped this experiment, and now I want to try KhazAkar advice.
But unfortunately he's not giving any details yet :-(

<!-- gh-comment-id:2353304922 --> @Tamila-2017 commented on GitHub (Sep 16, 2024): **jeromechungmf** I chose your advice first: > Linux with Vulkan: > I install Ubuntu 24.04 or 22.04, configuring it to use the Vulkan drivers, and then installing LMStudio. The goal is to make this setup work with my RX580 GPU for optimal LLM performance. However, attempting to install Vulkan drivers directly from AMD's website (for Ubuntu 22.04) proved unsuccessful. Instead, I found success by downloading the corresponding Vulkan driver package from https://vulkan.lunarg.com/sdk/home#linux > > This allowed LMStudio to detect my RX580 GPU and use it for LLM calculations. And it ended in failure. What I did, in detail: 1. Install **Ubuntu-24.04.1 LTS Server Minimal** (without X's) 2. ` wget -qO- https://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo tee /etc/apt/trusted.gpg.d/lunarg.asc sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-noble.list https://packages.lunarg.com/vulkan/lunarg-vulkan-noble.list sudo apt update sudo apt install vulkan-sdk` 3. Download **LM_Studio-0.3.2.AppImage** to /home/ai/.appimage 4. Run LM Studio: needs **FUSE** 6. `sudo add-apt-repository universe sudo apt install libfuse2t64 ` 7. Run LM Studio: needs **libnss3.so** 8. `sudo apt install libnss3` 9. Run LM Studio: needs **libasound** 10. `sudo apt install libasound2t64` 11. Run LM Studio: > FATAL:setuid_sandbox_host.cc(158)] The SUID sandbox helper binary was found, but is not configured correctly. > Rather than run without sandboxing I'm aborting now. > You need to make sure that /tmp/.mount_LM_StusEylzO/chrome-sandbox is owned by root and has mode 4755. But this file does not exist! Next, I began to understand that a graphical environment is required to run LM Studio and I did not like it at all, because I want to using AI in a console environment, without X's. That's why I stopped this experiment, and now I want to try **KhazAkar** advice. But unfortunately he's not giving any details yet :-(
Author
Owner

@mnccouk commented on GitHub (Sep 16, 2024):

Hi guys,

I am very grateful to all of you for your detailed account of how to solve my problem with the RX580. Now I appeared a great chance to solve this problem and use RX580 to communicate with AI.

Can you please tell me, what is the speed increase of working with RX580 with AI compared to СPU N100? Is it 10x, 100x, or maybe 1000x? :-)

Here's my performance figures

CPU - AMD Ryzen 5 3400G with Radeon Vega Graphics (16GB RAM)

"eval_count":342,"eval_duration":66228944000 = 5.1639 tokens a second

GPU - Radeon Rx580 8GB vRAM

"eval_count":308,"eval_duration":11708779000} = 26.305 tokens a second

Same question asked in both cases:

curl http://192.168.0.156:11434/api/generate -d '{
  "model": "llama3.1",
  "prompt": "Why is the sky blue?"
}'
<!-- gh-comment-id:2354033386 --> @mnccouk commented on GitHub (Sep 16, 2024): > Hi guys, > > I am very grateful to all of you for your detailed account of how to solve my problem with the RX580. Now I appeared a great chance to solve this problem and use RX580 to communicate with AI. > > Can you please tell me, what is the speed increase of working with RX580 with AI compared to СPU N100? Is it 10x, 100x, or maybe 1000x? :-) Here's my performance figures CPU - AMD Ryzen 5 3400G with Radeon Vega Graphics (16GB RAM) "eval_count":342,"eval_duration":66228944000 = 5.1639 tokens a second GPU - Radeon Rx580 8GB vRAM "eval_count":308,"eval_duration":11708779000} = 26.305 tokens a second Same question asked in both cases: ``` curl http://192.168.0.156:11434/api/generate -d '{ "model": "llama3.1", "prompt": "Why is the sky blue?" }' ```
Author
Owner

@Tamila-2017 commented on GitHub (Sep 17, 2024):

mnccouk,
thank you so much for the convincing test! :-)
It turns out that the gain about speed is small, about 5 times.
At the same time, the TDP of:
AMD Ryzen 5 3400G = 65 Watts
Radeon Vega 16 = 75 Watts.
AMD RX580 = 185 Watts.

<!-- gh-comment-id:2356074895 --> @Tamila-2017 commented on GitHub (Sep 17, 2024): **mnccouk**, thank you so much for the convincing test! :-) It turns out that the gain about speed is small, about 5 times. At the same time, the TDP of: AMD Ryzen 5 3400G = 65 Watts Radeon Vega 16 = 75 Watts. AMD RX580 = 185 Watts.
Author
Owner

@jeromechungmf commented on GitHub (Sep 17, 2024):

jeromechungmf

I chose your advice first:

Linux with Vulkan:
I install Ubuntu 24.04 or 22.04, configuring it to use the Vulkan drivers, and then installing LMStudio. The goal is to make this setup work with my RX580 GPU for optimal LLM performance. However, attempting to install Vulkan drivers directly from AMD's website (for Ubuntu 22.04) proved unsuccessful. Instead, I found success by downloading the corresponding Vulkan driver package from https://vulkan.lunarg.com/sdk/home#linux
This allowed LMStudio to detect my RX580 GPU and use it for LLM calculations.

And it ended in failure. What I did, in detail:

  1. Install Ubuntu-24.04.1 LTS Server Minimal (without X's)
  2. wget -qO- https://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo tee /etc/apt/trusted.gpg.d/lunarg.asc sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-noble.list https://packages.lunarg.com/vulkan/lunarg-vulkan-noble.list sudo apt update sudo apt install vulkan-sdk
  3. Download LM_Studio-0.3.2.AppImage to /home/ai/.appimage
  4. Run LM Studio: needs FUSE
  5. sudo add-apt-repository universe sudo apt install libfuse2t64
  6. Run LM Studio: needs libnss3.so
  7. sudo apt install libnss3
  8. Run LM Studio: needs libasound
  9. sudo apt install libasound2t64
  10. Run LM Studio:

FATAL:setuid_sandbox_host.cc(158)] The SUID sandbox helper binary was found, but is not configured correctly.
Rather than run without sandboxing I'm aborting now.
You need to make sure that /tmp/.mount_LM_StusEylzO/chrome-sandbox is owned by root and has mode 4755.

But this file does not exist!

Next, I began to understand that a graphical environment is required to run LM Studio and I did not like it at all, because I want to using AI in a console environment, without X's.

That's why I stopped this experiment, and now I want to try KhazAkar advice. But unfortunately he's not giving any details yet :-(

Hi Tamila-2017

Sorry,I should show my install process step by step:

1.downlaod the Ubuntu 22.04 Desktop iso file,my file name:ubuntu-22.04.4-desktop-amd64.iso .
2.make the iso file to usb boot Disk.In the linux you can use the dd command make it.ex:sudo dd bs=4M if=/path/to/ubuntu-22.04.4-desktop-amd64.iso of=/dev/sdX status=progress oflag=sync
3.I booted from a USB drive and installed Ubuntu 22.04 on my computer.You can choice minimal option,follw the gui guide step by step.
4.open the firefox paste the https://vulkan.lunarg.com/sdk/home#linux, you can browse the vulkan page,in the Linux content,click the Ubuntu Packages tab,then roll down until see the Release 1.3.290 link,click it.you can see the Ubuntu 22.04 (Jammy Jellyfish) and Ubuntu 24.04 (Noble Numbat).in my case,i click the Ubuntu 22.04 (Jammy Jellyfish) item.it's show the command like below:
wget -qO- https://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo tee /etc/apt/trusted.gpg.d/lunarg.asc
sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-1.3.290-jammy.list https://packages.lunarg.com/vulkan/1.3.290/lunarg-vulkan-1.3.290-jammy.list
sudo apt update
sudo apt install vulkan-sdk
5.open the terminal follow the command and execute.finish to install vulkan driver in my ubuntu 22.04 desktop enviroment.
6.reboot
7.download the lmstudio and follow the https://www.tecmint.com/lm-studio-run-llms-linux/ to run lmstudio.

In my experince,it's should be work.

PS:use xdrp to longin Ubuntu should cause by privilege issue of lmstudio,if you want to remote login ubuntu to run lmstudio.Using like RealVNC Viewer for remote access is a better option, as it can avoid permission problems and allow the lmstudio can detection of the RX580 GPU.

Good luck.

Jerome

<!-- gh-comment-id:2356347653 --> @jeromechungmf commented on GitHub (Sep 17, 2024): > **jeromechungmf** > > I chose your advice first: > > > Linux with Vulkan: > > I install Ubuntu 24.04 or 22.04, configuring it to use the Vulkan drivers, and then installing LMStudio. The goal is to make this setup work with my RX580 GPU for optimal LLM performance. However, attempting to install Vulkan drivers directly from AMD's website (for Ubuntu 22.04) proved unsuccessful. Instead, I found success by downloading the corresponding Vulkan driver package from https://vulkan.lunarg.com/sdk/home#linux > > This allowed LMStudio to detect my RX580 GPU and use it for LLM calculations. > > And it ended in failure. What I did, in detail: > > 1. Install **Ubuntu-24.04.1 LTS Server Minimal** (without X's) > 2. ` wget -qO- https://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo tee /etc/apt/trusted.gpg.d/lunarg.asc sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-noble.list https://packages.lunarg.com/vulkan/lunarg-vulkan-noble.list sudo apt update sudo apt install vulkan-sdk` > 3. Download **LM_Studio-0.3.2.AppImage** to /home/ai/.appimage > 4. Run LM Studio: needs **FUSE** > 5. `sudo add-apt-repository universe sudo apt install libfuse2t64 ` > 6. Run LM Studio: needs **libnss3.so** > 7. `sudo apt install libnss3` > 8. Run LM Studio: needs **libasound** > 9. `sudo apt install libasound2t64` > 10. Run LM Studio: > > > FATAL:setuid_sandbox_host.cc(158)] The SUID sandbox helper binary was found, but is not configured correctly. > > Rather than run without sandboxing I'm aborting now. > > You need to make sure that /tmp/.mount_LM_StusEylzO/chrome-sandbox is owned by root and has mode 4755. > > But this file does not exist! > > Next, I began to understand that a graphical environment is required to run LM Studio and I did not like it at all, because I want to using AI in a console environment, without X's. > > That's why I stopped this experiment, and now I want to try **KhazAkar** advice. But unfortunately he's not giving any details yet :-( Hi Tamila-2017 Sorry,I should show my install process step by step: 1.downlaod the Ubuntu 22.04 Desktop iso file,my file name:ubuntu-22.04.4-desktop-amd64.iso . 2.make the iso file to usb boot Disk.In the linux you can use the dd command make it.ex:sudo dd bs=4M if=/path/to/ubuntu-22.04.4-desktop-amd64.iso of=/dev/sdX status=progress oflag=sync 3.I booted from a USB drive and installed Ubuntu 22.04 on my computer.You can choice minimal option,follw the gui guide step by step. 4.open the firefox paste the https://vulkan.lunarg.com/sdk/home#linux, you can browse the vulkan page,in the Linux content,click the Ubuntu Packages tab,then roll down until see the Release 1.3.290 link,click it.you can see the Ubuntu 22.04 (Jammy Jellyfish) and Ubuntu 24.04 (Noble Numbat).in my case,i click the Ubuntu 22.04 (Jammy Jellyfish) item.it's show the command like below: wget -qO- https://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo tee /etc/apt/trusted.gpg.d/lunarg.asc sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-1.3.290-jammy.list https://packages.lunarg.com/vulkan/1.3.290/lunarg-vulkan-1.3.290-jammy.list sudo apt update sudo apt install vulkan-sdk 5.open the terminal follow the command and execute.finish to install vulkan driver in my ubuntu 22.04 desktop enviroment. 6.reboot 7.download the lmstudio and follow the https://www.tecmint.com/lm-studio-run-llms-linux/ to run lmstudio. In my experince,it's should be work. PS:use xdrp to longin Ubuntu should cause by privilege issue of lmstudio,if you want to remote login ubuntu to run lmstudio.Using like RealVNC Viewer for remote access is a better option, as it can avoid permission problems and allow the lmstudio can detection of the RX580 GPU. Good luck. Jerome
Author
Owner

@Tamila-2017 commented on GitHub (Sep 17, 2024):

jeromechungmf,
Thank you for your attention and detailed description of the installation.
And I'm sorry that I still have a little experience in Linux.
So, I'm starting to install the recommended one for me ubuntu-22.04.4-desktop-amd64

<!-- gh-comment-id:2356459466 --> @Tamila-2017 commented on GitHub (Sep 17, 2024): **jeromechungmf**, Thank you for your attention and detailed description of the installation. And I'm sorry that I still have a little experience in Linux. So, I'm starting to install the recommended one for me ubuntu-22.04.4-desktop-amd64
Author
Owner

@KhazAkar commented on GitHub (Sep 17, 2024):

For vulkan, you don't need to downgrade to 22.04. 24.04 is also fine, even better. Ollama does not yet have vulkan support, but will gain it. For vulkan in CLI, you can either build Ollama with vulkan support from PR in this repo or build llama.cpp from ggreganov repository with vulkan support enabled. If you need GUI, then GPT4All is best one for GUI apps available, eventually LM Studio, but since I prefer open source, I'd pick GPT4ALL. In general I think we should move such discussion somewhere else and here focus on trying to get Ollama running with older AMD GPUs. So my proposition to try vulkan PR in Ollama marches the best.

<!-- gh-comment-id:2356468608 --> @KhazAkar commented on GitHub (Sep 17, 2024): For vulkan, you don't need to downgrade to 22.04. 24.04 is also fine, even better. Ollama does not yet have vulkan support, but will gain it. For vulkan in CLI, you can either build Ollama with vulkan support from PR in this repo or build llama.cpp from ggreganov repository with vulkan support enabled. If you need GUI, then GPT4All is best one for GUI apps available, eventually LM Studio, but since I prefer open source, I'd pick GPT4ALL. In general I think we should move such discussion somewhere else and here focus on trying to get Ollama running with older AMD GPUs. So my proposition to try vulkan PR in Ollama marches the best.
Author
Owner

@Tamila-2017 commented on GitHub (Sep 17, 2024):

So, I successfully installed Ubuntu-22.04.4-desktop-amd64 and followed steps "1-4".
However, unfortunately, action "5" is not executed, because there is no such command: execute.finish

.............
KhazAkar, oh, I'm glad you're here :-)
However, your message is vague and therefore not entirely clear
Could you elaborate on your recommendations, step by step?

<!-- gh-comment-id:2356523497 --> @Tamila-2017 commented on GitHub (Sep 17, 2024): So, I successfully installed Ubuntu-22.04.4-desktop-amd64 and followed steps "1-4". However, unfortunately, action "5" is not executed, because there is no such command: **execute.finish** ............. KhazAkar, oh, I'm glad you're here :-) However, your message is vague and therefore not entirely clear Could you elaborate on your recommendations, step by step?
Author
Owner

@mnccouk commented on GitHub (Sep 17, 2024):

@Tamila-2017

I happen to have a real time power meter on the server, The results for power consumption:-

CPU while processing - "Why is the sky blue?"
= Duration = 66.7 seconds with power consumption = 104 Watts
= 0.001926889 kWh
= 1.92 Wh

GPU while processing - "Why is the sky blue?"
Duration = 11.7 seconds with power consumption = 230 Watts
= 0.001661111 kWh
= 1.66 Wh

Server power while GPU\CPU idle is 40 Watts

<!-- gh-comment-id:2356851551 --> @mnccouk commented on GitHub (Sep 17, 2024): @Tamila-2017 I happen to have a real time power meter on the server, The results for power consumption:- CPU while processing - "Why is the sky blue?" = Duration = 66.7 seconds with power consumption = 104 Watts = 0.001926889 kWh = 1.92 Wh GPU while processing - "Why is the sky blue?" Duration = 11.7 seconds with power consumption = 230 Watts = 0.001661111 kWh = 1.66 Wh Server power while GPU\CPU idle is 40 Watts
Author
Owner

@Tamila-2017 commented on GitHub (Sep 17, 2024):

mnccouk,
Measurements are very useful, thank you

<!-- gh-comment-id:2356970427 --> @Tamila-2017 commented on GitHub (Sep 17, 2024): **mnccouk**, Measurements are very useful, thank you
Author
Owner

@Tamila-2017 commented on GitHub (Sep 17, 2024):

jeromechungmf,

I don't understand what the execute command execute.finish, which does not exist.

Therefore, I used this instruction - https://www.tecmint.com/lm-studio-run-llms-linux/
and I managed to launch the LM Studio.

Unfortunately, LM Studio also doesn't see the RX580 :-(
Although Ubuntu and nvtop sees it perfectly.

<!-- gh-comment-id:2356997630 --> @Tamila-2017 commented on GitHub (Sep 17, 2024): **jeromechungmf**, I don't understand what the execute command **execute.finish**, which does not exist. Therefore, I used this instruction - https://www.tecmint.com/lm-studio-run-llms-linux/ and I managed to launch the LM Studio. Unfortunately, LM Studio also doesn't see the RX580 :-( Although Ubuntu and nvtop sees it perfectly.
Author
Owner

@Tamila-2017 commented on GitHub (Sep 17, 2024):

KhazAkar

If you need GUI, then ...

I don't need the GUI, I'm strongly against it because I need to work with Ollama in the console.

In general I think we should move such discussion somewhere else and here focus on trying to get Ollama running with older AMD GPUs.

I completely agree with you! But for some reason you're stubbornly not sharing the details of this success of yours -

You can all try ollama-for-amd fork. I'm using it myself with my RX5500M card and ROCm 6.2. works wonders.

even though I've asked you about it before.
Why don't you tell me about it? Is it your big secret or it your was just a bad joke?

Unlike you jeromechungmf detailed his experience and I tried to use it, although I'm not thrilled with the graphical environment because I need a console server option that works with Ollama.

<!-- gh-comment-id:2357007626 --> @Tamila-2017 commented on GitHub (Sep 17, 2024): **KhazAkar** > If you need GUI, then ... I don't need the GUI, I'm strongly against it because I need to work with Ollama in the console. > In general I think we should move such discussion somewhere else and here focus on trying to get Ollama running with older AMD GPUs. I completely agree with you! But for some reason you're stubbornly not sharing the details of this success of yours - > You can all try ollama-for-amd fork. I'm using it myself with my RX5500M card and ROCm 6.2. works wonders. even though I've asked you about it before. Why don't you tell me about it? Is it your big secret or it your was just a bad joke? Unlike you **jeromechungmf** detailed his experience and I tried to use it, although I'm not thrilled with the graphical environment because I need a console server option that works with Ollama.
Author
Owner

@KhazAkar commented on GitHub (Sep 17, 2024):

@Tamila-2017 it's not a joke. Fork is here: https://github.com/likelovewant/ollama-for-amd
But I can't assure you of anything with older card than I currently have. This one does not use vulkan but ROCm and I don't know if your card is supported by ROCm 6.x.
For Vulkan usage, if you're feeling brave, you can try building Ollama with vulkan support by cloning repository and applying this PR:
https://github.com/ollama/ollama/pull/5059

In both cases, building Ollama by hand is inevitable.

<!-- gh-comment-id:2357017550 --> @KhazAkar commented on GitHub (Sep 17, 2024): @Tamila-2017 it's not a joke. Fork is here: https://github.com/likelovewant/ollama-for-amd But I can't assure you of anything with older card than I currently have. This one does not use vulkan but ROCm and I don't know if your card is supported by ROCm 6.x. For Vulkan usage, if you're feeling brave, you can try building Ollama with vulkan support by cloning repository and applying this PR: https://github.com/ollama/ollama/pull/5059 In both cases, building Ollama by hand is inevitable.
Author
Owner

@KhazAkar commented on GitHub (Sep 17, 2024):

Plus, @Tamila-2017 - only Info in this fork I've found is in foreign language to me.
https://github.com/likelovewant/ollama-for-amd/issues/1
You can translate it if you want.
And it would be great idea that you join Ollama discord server :)

<!-- gh-comment-id:2357022627 --> @KhazAkar commented on GitHub (Sep 17, 2024): Plus, @Tamila-2017 - only Info in this fork I've found is in foreign language to me. https://github.com/likelovewant/ollama-for-amd/issues/1 You can translate it if you want. And it would be great idea that you join Ollama discord server :)
Author
Owner

@jeromechungmf commented on GitHub (Sep 18, 2024):

jeromechungmf,

I don't understand what the execute command execute.finish, which does not exist.

Therefore, I used this instruction - https://www.tecmint.com/lm-studio-run-llms-linux/ and I managed to launch the LM Studio.

Unfortunately, LM Studio also doesn't see the RX580 :-( Although Ubuntu and nvtop sees it perfectly.

Hi Tamila-2017

1726621208963
1726621296912
1726621331628
1726621374395

Attach my screenshot,it's contain execute the vulkan driver install commands,my be you can refer.

Good lucks.

Jerome

<!-- gh-comment-id:2357291476 --> @jeromechungmf commented on GitHub (Sep 18, 2024): > **jeromechungmf**, > > I don't understand what the execute command **execute.finish**, which does not exist. > > Therefore, I used this instruction - https://www.tecmint.com/lm-studio-run-llms-linux/ and I managed to launch the LM Studio. > > Unfortunately, LM Studio also doesn't see the RX580 :-( Although Ubuntu and nvtop sees it perfectly. Hi Tamila-2017 ![1726621208963](https://github.com/user-attachments/assets/e2bf8df2-bdfe-4cd0-91ae-962bf0fbbe78) ![1726621296912](https://github.com/user-attachments/assets/9128e80c-8f5d-4b38-9c4c-a62e97bf20d5) ![1726621331628](https://github.com/user-attachments/assets/2ec0db74-7ca0-4fa2-9682-5b94a10d15ba) ![1726621374395](https://github.com/user-attachments/assets/5598d839-556b-46bb-bcb8-cf9b7ea72906) Attach my screenshot,it's contain execute the vulkan driver install commands,my be you can refer. Good lucks. Jerome
Author
Owner

@jeromechungmf commented on GitHub (Sep 18, 2024):

jeromechungmf,

I don't understand what the execute command execute.finish, which does not exist.

Therefore, I used this instruction - https://www.tecmint.com/lm-studio-run-llms-linux/ and I managed to launch the LM Studio.

Unfortunately, LM Studio also doesn't see the RX580 :-( Although Ubuntu and nvtop sees it perfectly.

DSC_0532
DSC_0533

Hi Tamila-2017

Attach the LMStudio Configuration sccreenshot,in the developer tab,click the LM Runtimes,you can check llama.cpp Vulkan should be v0.0.4.Then Click Settings tab you can see the VRAM 8 GB.

Jerome

<!-- gh-comment-id:2357370471 --> @jeromechungmf commented on GitHub (Sep 18, 2024): > **jeromechungmf**, > > I don't understand what the execute command **execute.finish**, which does not exist. > > Therefore, I used this instruction - https://www.tecmint.com/lm-studio-run-llms-linux/ and I managed to launch the LM Studio. > > Unfortunately, LM Studio also doesn't see the RX580 :-( Although Ubuntu and nvtop sees it perfectly. ![DSC_0532](https://github.com/user-attachments/assets/3399e75c-7c37-4726-91ee-738d460fdac8) ![DSC_0533](https://github.com/user-attachments/assets/535820c8-878d-4dbb-a282-f2d0987ac419) Hi Tamila-2017 Attach the LMStudio Configuration sccreenshot,in the developer tab,click the LM Runtimes,you can check llama.cpp Vulkan should be v0.0.4.Then Click Settings tab you can see the VRAM 8 GB. Jerome
Author
Owner

@bellmancity commented on GitHub (Sep 19, 2024):

jeromechungmf,
I don't understand what the execute command execute.finish, which does not exist.
Therefore, I used this instruction - https://www.tecmint.com/lm-studio-run-llms-linux/ and I managed to launch the LM Studio.
Unfortunately, LM Studio also doesn't see the RX580 :-( Although Ubuntu and nvtop sees it perfectly.

DSC_0532 DSC_0533

Hi Tamila-2017

Attach the LMStudio Configuration sccreenshot,in the developer tab,click the LM Runtimes,you can check llama.cpp Vulkan should be v0.0.4.Then Click Settings tab you can see the VRAM 8 GB.

Jerome

I tried LM Studio using your method (vulkan + lmstudio appimage) on my Linux Mint 22 with RX580. It just works right away. My RX580 is "refurbished" 16Gb from Aliexpress.

Screenshot from 2024-09-19 12-25-05
10.81 tok/sec 531 tokens 0.91s to first token

Hopefully Ollama will works with RX580 soon.

<!-- gh-comment-id:2360006959 --> @bellmancity commented on GitHub (Sep 19, 2024): > > **jeromechungmf**, > > I don't understand what the execute command **execute.finish**, which does not exist. > > Therefore, I used this instruction - https://www.tecmint.com/lm-studio-run-llms-linux/ and I managed to launch the LM Studio. > > Unfortunately, LM Studio also doesn't see the RX580 :-( Although Ubuntu and nvtop sees it perfectly. > > ![DSC_0532](https://private-user-images.githubusercontent.com/4884739/368399186-3399e75c-7c37-4726-91ee-738d460fdac8.JPG?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjY3MjIzNjAsIm5iZiI6MTcyNjcyMjA2MCwicGF0aCI6Ii80ODg0NzM5LzM2ODM5OTE4Ni0zMzk5ZTc1Yy03YzM3LTQ3MjYtOTFlZS03MzhkNDYwZmRhYzguSlBHP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDkxOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA5MTlUMDUwMTAwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9Y2I3ZjUwNDY4NjA1MTZlYjVlZGVkMDAyMTNiOTM2MGJhM2MzN2MwYzc2ZDhkNDMyZjdlYTViMWVkYjljOGUwMyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.WsAGZompR7AsVtwcvQFxrIFUzVuWPCG887iyc8ZijmY) ![DSC_0533](https://private-user-images.githubusercontent.com/4884739/368399207-535820c8-878d-4dbb-a282-f2d0987ac419.JPG?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjY3MjIzNjAsIm5iZiI6MTcyNjcyMjA2MCwicGF0aCI6Ii80ODg0NzM5LzM2ODM5OTIwNy01MzU4MjBjOC04NzhkLTRkYmItYTI4Mi1mMmQwOTg3YWM0MTkuSlBHP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDkxOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA5MTlUMDUwMTAwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9YjgyOWNmOTdjMTJkYmZkMTJkM2MwNDNkMjU2NThjY2M2Y2FiYmUxNjU1ZDJiNWZmZTNiZjMxYjY2MTVkYTlhOSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.fn0GUlyi5wueNrSzY4naRd3txE2yQOs3k_sZmkGG_Dk) > > Hi Tamila-2017 > > Attach the LMStudio Configuration sccreenshot,in the developer tab,click the LM Runtimes,you can check llama.cpp Vulkan should be v0.0.4.Then Click Settings tab you can see the VRAM 8 GB. > > Jerome I tried LM Studio using your method (vulkan + lmstudio appimage) on my Linux Mint 22 with RX580. It just works right away. My RX580 is "refurbished" 16Gb from Aliexpress. ![Screenshot from 2024-09-19 12-25-05](https://github.com/user-attachments/assets/c9a785d9-6456-444b-93f4-6fdb53ed6954) 10.81 tok/sec 531 tokens 0.91s to first token Hopefully Ollama will works with RX580 soon.
Author
Owner

@Tamila-2017 commented on GitHub (Sep 19, 2024):

Hello all,

Thanks all for the tips, but unfortunately, I can't handle these software tricks on my own.
If you want to help me remotely, I will give you access to my an AI computer, and you can configure the RX580's personally.

<!-- gh-comment-id:2361359851 --> @Tamila-2017 commented on GitHub (Sep 19, 2024): Hello all, Thanks all for the tips, but unfortunately, I can't handle these software tricks on my own. If you want to help me remotely, I will give you access to my an AI computer, and you can configure the RX580's personally.
Author
Owner

@Tamila-2017 commented on GitHub (Sep 19, 2024):

KhazAkar

And it would be great idea that you join Ollama discord server :)

I tried to register on Discord - https://discord.com
But it first demanded my phone number and then told me that the this number was incorrect.
Discord stupid, he doesn't know about phone numbers.

<!-- gh-comment-id:2361865686 --> @Tamila-2017 commented on GitHub (Sep 19, 2024): **KhazAkar** > And it would be great idea that you join Ollama discord server :) I tried to register on Discord - https://discord.com But it first demanded my phone number and then told me that the this number was incorrect. Discord stupid, he doesn't know about phone numbers.
Author
Owner

@mnccouk commented on GitHub (Sep 19, 2024):

I've added the docker image I'm using with the rx580 with Ollama to docker hub, Hopefully it might prove to be useful to someone.
https://hub.docker.com/r/mnccouk/ollama-gpu-rx580

<!-- gh-comment-id:2362217923 --> @mnccouk commented on GitHub (Sep 19, 2024): I've added the docker image I'm using with the rx580 with Ollama to docker hub, Hopefully it might prove to be useful to someone. https://hub.docker.com/r/mnccouk/ollama-gpu-rx580
Author
Owner

@bellmancity commented on GitHub (Sep 21, 2024):

I've added the docker image I'm using with the rx580 with Ollama to docker hub, Hopefully it might prove to be useful to someone. https://hub.docker.com/r/mnccouk/ollama-gpu-rx580

amd_gpu
result_ollama
I have tried your docker image and it runs successfully.
Eval rate = 15.25 tokes/s. Very impressive compared to LM Studio above.

RX580 2048 with 16GB VRAM
Ryzen 5 4650G with 16GB Memory.

Thank you for your effort.

<!-- gh-comment-id:2365067828 --> @bellmancity commented on GitHub (Sep 21, 2024): > I've added the docker image I'm using with the rx580 with Ollama to docker hub, Hopefully it might prove to be useful to someone. https://hub.docker.com/r/mnccouk/ollama-gpu-rx580 ![amd_gpu](https://github.com/user-attachments/assets/5a43f495-13f7-4d59-a818-288aa243f8c4) ![result_ollama](https://github.com/user-attachments/assets/6342b2e4-59e5-4689-a62f-b61422c8a047) I have tried your docker image and it runs successfully. Eval rate = 15.25 tokes/s. Very impressive compared to LM Studio above. RX580 2048 with 16GB VRAM Ryzen 5 4650G with 16GB Memory. Thank you for your effort.
Author
Owner

@mnccouk commented on GitHub (Sep 21, 2024):

@bellmancity, Awesome! Glad it worked for you.

<!-- gh-comment-id:2365191921 --> @mnccouk commented on GitHub (Sep 21, 2024): @bellmancity, Awesome! Glad it worked for you.
Author
Owner

@Tamila-2017 commented on GitHub (Sep 21, 2024):

I've added the docker image I'm using with the rx580 with Ollama to docker hub, Hopefully it might prove to be useful to someone.
https://hub.docker.com/r/mnccouk/ollama-gpu-rx580

I can't install ROCm 5.7.0 because there are obscure errors when compiling.

mnccouk,

please give the exact download link of the distro you used here

<!-- gh-comment-id:2365295940 --> @Tamila-2017 commented on GitHub (Sep 21, 2024): > I've added the docker image I'm using with the rx580 with Ollama to docker hub, Hopefully it might prove to be useful to someone. > https://hub.docker.com/r/mnccouk/ollama-gpu-rx580 I can't install ROCm 5.7.0 because there are obscure errors when compiling. **mnccouk**, please give the exact download link of the distro you used [here](https://github.com/ollama/ollama/issues/2453#issuecomment-2362217923)
Author
Owner

@eliot-akira commented on GitHub (Sep 21, 2024):

Maybe this link will help:

Follow this documentation for rocm installation, just substitute the 5.7.0 references to 5.7.1 in the documentation. --https://rocm.docs.amd.com/en/docs-5.7.0/deploy/linux/os-native/install.html

From https://github.com/mnccouk/ollama/tree/rx580_gpu?tab=readme-ov-file#linux-with-rx580-radeon-gpu

<!-- gh-comment-id:2365297548 --> @eliot-akira commented on GitHub (Sep 21, 2024): Maybe this link will help: > Follow this documentation for rocm installation, just substitute the 5.7.0 references to 5.7.1 in the documentation. --https://rocm.docs.amd.com/en/docs-5.7.0/deploy/linux/os-native/install.html From https://github.com/mnccouk/ollama/tree/rx580_gpu?tab=readme-ov-file#linux-with-rx580-radeon-gpu
Author
Owner

@Tamila-2017 commented on GitHub (Sep 21, 2024):

eliot-akira,

Thanks, these links have been known to me for a long time.
But I am not aware of the exact distribution that uses mnccouk.
I'm waiting for the exact link from mnccouk to download the distro, because a lot depends on the distro.

<!-- gh-comment-id:2365300364 --> @Tamila-2017 commented on GitHub (Sep 21, 2024): **eliot-akira**, Thanks, these links have been known to me for a long time. But I am not aware of the exact distribution that uses mnccouk. I'm waiting for the exact link from **mnccouk** to download the distro, because a lot depends on the distro.
Author
Owner

@KhazAkar commented on GitHub (Sep 21, 2024):

Eventually just make use of Vulkan driver, either in ollama from PR in this repo, or llama.cpp from https://github.com/ggerganov/llama.cpp built from source....

<!-- gh-comment-id:2365301738 --> @KhazAkar commented on GitHub (Sep 21, 2024): Eventually just make use of Vulkan driver, either in ollama from PR in this repo, or llama.cpp from https://github.com/ggerganov/llama.cpp built from source....
Author
Owner

@Tamila-2017 commented on GitHub (Sep 21, 2024):

I've tried using Vulkan before, but nothing good came out of it.

So I want to use the docker that mnccouk created and it worked for him and also at bellmancity

<!-- gh-comment-id:2365303307 --> @Tamila-2017 commented on GitHub (Sep 21, 2024): I've tried using Vulkan before, but nothing good came out of it. So I want to use the docker that **mnccouk** created and it worked for him and also at **bellmancity**
Author
Owner

@KhazAkar commented on GitHub (Sep 21, 2024):

I've tried using Vulkan before, but nothing good came out of it.

So I want to use the docker that mnccouk created and it worked for him and also at bellmancity

Something feels off for me here. I'm not talking about gui apps like somebody here tried to give LMStudio here
Plus, I think fastest way would be to use vulkan based inference, even building ollama from source, which is well written in docs here.
ROCm 5.7.x is quite... old. 5.7.1 is shipped with ubuntu 24.04 and debian 12 I think, under slightly different name, search for hip. Current versions of Ollama work with 6.0.x at least and I don't know if they work right with 5.7.x series. Screenshot on top says otherwise, but currently RX 580, besides pletinful of VRAM, ended it's life after cryptomining boom.
Best bet for its usage for AI is to push vulkan implementation for Ollama to be merged, available as PR here:
https://github.com/ollama/ollama/pull/5059

Either way, signing out. This thread got quite derailed IMHO and 3/4 of it should be on official discord server.

<!-- gh-comment-id:2365304912 --> @KhazAkar commented on GitHub (Sep 21, 2024): > I've tried using Vulkan before, but nothing good came out of it. > > So I want to use the docker that **mnccouk** created and it worked for him and also at **bellmancity** Something feels off for me here. I'm not talking about gui apps like somebody here tried to give LMStudio here Plus, I think fastest way would be to use vulkan based inference, even building ollama from source, which is well written in docs here. ROCm 5.7.x is quite... old. 5.7.1 is shipped with ubuntu 24.04 and debian 12 I think, under slightly different name, search for `hip`. Current versions of Ollama work with 6.0.x at least and I don't know if they work right with 5.7.x series. Screenshot on top says otherwise, but currently RX 580, besides pletinful of VRAM, ended it's life after cryptomining boom. Best bet for its usage for AI is to push vulkan implementation for Ollama to be merged, available as PR here: https://github.com/ollama/ollama/pull/5059 Either way, signing out. This thread got quite derailed IMHO and 3/4 of it should be on official discord server.
Author
Owner

@Tamila-2017 commented on GitHub (Sep 21, 2024):

KhazAkar ,
You narrate very much and interestingly.
Unfortunately, your thoughts do not lead to the final result, because they are based only on probabilistic guesses, but without their practical verification.

The only one who has a real success here without using the DE graphical environment and Vulkan - is modest mnccouk.

I'm waiting for him to give me the exact link to download the distro he uses.
Once again: not the name of the distro, but the EXACT LINK to download.

<!-- gh-comment-id:2365310367 --> @Tamila-2017 commented on GitHub (Sep 21, 2024): **KhazAkar** , You narrate very much and interestingly. Unfortunately, your thoughts do not lead to the final result, because they are based only on probabilistic guesses, but without their practical verification. The only one who has a real success here without using the DE graphical environment and Vulkan - is modest **mnccouk**. I'm waiting for him to give me the exact link to download the distro he uses. Once again: not the name of the distro, but the EXACT LINK to download.
Author
Owner

@mnccouk commented on GitHub (Sep 21, 2024):

@Tamila-2017, I installed Ubuntu 20.04 then did a distro upgrade to Ubuntu 22.04.4. Only reason I didn't install 22.04 directly was due to usb boot issue using 22.04 installation media, but that's another story.
Direct link (https://releases.ubuntu.com/22.04.4/) to 22.04.4 doesn't work for some reason. However, there is a link for 22.04.5, https://releases.ubuntu.com/jammy/ubuntu-22.04.5-desktop-amd64.iso.

Then install the Rocm libraries @eliot-akira provided the link to (above).

Sorry can't provide 22.04.4 working direct link but can't see why 22.04.5 shouldn't work for you.

Obviously, also make sure to install docker, this can be done using the packages that come with Ubuntu 22.04.x

<!-- gh-comment-id:2365333880 --> @mnccouk commented on GitHub (Sep 21, 2024): @Tamila-2017, I installed Ubuntu 20.04 then did a distro upgrade to Ubuntu 22.04.4. Only reason I didn't install 22.04 directly was due to usb boot issue using 22.04 installation media, but that's another story. Direct link (https://releases.ubuntu.com/22.04.4/) to 22.04.4 doesn't work for some reason. However, there is a link for 22.04.5, https://releases.ubuntu.com/jammy/ubuntu-22.04.5-desktop-amd64.iso. Then install the Rocm libraries @eliot-akira provided the link to (above). Sorry can't provide 22.04.4 working direct link but can't see why 22.04.5 shouldn't work for you. Obviously, also make sure to install docker, this can be done using the packages that come with Ubuntu 22.04.x
Author
Owner

@SirDubbins commented on GitHub (Sep 22, 2024):

I've added the docker image I'm using with the rx580 with Ollama to docker hub, Hopefully it might prove to be useful to someone. https://hub.docker.com/r/mnccouk/ollama-gpu-rx580

This worked for me using an RX480

<!-- gh-comment-id:2365403315 --> @SirDubbins commented on GitHub (Sep 22, 2024): > I've added the docker image I'm using with the rx580 with Ollama to docker hub, Hopefully it might prove to be useful to someone. https://hub.docker.com/r/mnccouk/ollama-gpu-rx580 This worked for me using an RX480
Author
Owner

@Tamila-2017 commented on GitHub (Sep 22, 2024):

Dear mnccouk,

Here is where you can download any version of Ubuntu 22.04.04 -

https://old-releases.ubuntu.com/releases/jammy/

But there are 2 options here:

  • ubuntu-22.04.4-desktop-amd64.iso (Xorg)
  • ubuntu-22.04.4-live-server-amd64.iso (console)

Can I use the version without Xorg and Gnom?

I.e. ubuntu-22.04.4-live-server-amd64.iso ?

<!-- gh-comment-id:2366845670 --> @Tamila-2017 commented on GitHub (Sep 22, 2024): Dear **mnccouk**, Here is where you can download any version of Ubuntu 22.04.04 - https://old-releases.ubuntu.com/releases/jammy/ But there are 2 options here: - ubuntu-22.04.4-desktop-amd64.iso (Xorg) - ubuntu-22.04.4-live-server-amd64.iso (console) Can I use the version without Xorg and Gnom? I.e. **ubuntu-22.04.4-live-server-amd64.iso** ?
Author
Owner

@mnccouk commented on GitHub (Sep 22, 2024):

Ah nice, glad you found the link.
I installed the GUI version on my server as I have a monitor connected to it and wanted to use GUI tools to do stuff. If you have a monitor plugged into your server maybe this is a good option to go with in the beginning.

I'm sure ubuntu-22.04.4-live-server-amd64.iso should also work fine but can't validate that personally.

Maybe get it working first with the desktop version, then once you know all the steps switch to the server version, of cause the choice is yours.

<!-- gh-comment-id:2366904596 --> @mnccouk commented on GitHub (Sep 22, 2024): Ah nice, glad you found the link. I installed the GUI version on my server as I have a monitor connected to it and wanted to use GUI tools to do stuff. If you have a monitor plugged into your server maybe this is a good option to go with in the beginning. I'm sure ubuntu-22.04.4-live-server-amd64.iso should also work fine but can't validate that personally. Maybe get it working first with the desktop version, then once you know all the steps switch to the server version, of cause the choice is yours.
Author
Owner

@SirDubbins commented on GitHub (Sep 22, 2024):

Has anyone been successful using multiple GPUs? I was trying to add a couple 8gb RX470 but I can't seem to get them to show up in rocminfo.

<!-- gh-comment-id:2366909713 --> @SirDubbins commented on GitHub (Sep 22, 2024): Has anyone been successful using multiple GPUs? I was trying to add a couple 8gb RX470 but I can't seem to get them to show up in rocminfo.
Author
Owner

@Tamila-2017 commented on GitHub (Sep 22, 2024):

This nightmarishly unfinished manual -
https://rocm.docs.amd.com/en/docs-5.7.0/deploy/linux/os-native/install.html
this is not an instruction, but a mockery of common sense.

This is the third time I have installed Ubuntu-22.04.04.04-Server "Jammy", then followed these instructions, remembering to replace 5.7.0 with 5.7.1, but fail every time:

  • Only the CPU is visible, but the graphics card is not.

I don't understand why this is the case :-(

Dear mnccouk,

could you do me the courtesy of creating a linear bash script instead of this nightmarish instruction, which I could execute without having to think about unnecessary complications to get a working ROCm 5.7.1 ?

<!-- gh-comment-id:2366953691 --> @Tamila-2017 commented on GitHub (Sep 22, 2024): This nightmarishly unfinished manual - https://rocm.docs.amd.com/en/docs-5.7.0/deploy/linux/os-native/install.html this is not an instruction, but a mockery of common sense. This is the third time I have installed Ubuntu-22.04.04.04-Server "Jammy", then followed these instructions, remembering to replace 5.7.0 with 5.7.1, but fail every time: - Only the CPU is visible, but the graphics card is not. I don't understand why this is the case :-( Dear **mnccouk**, could you do me the courtesy of creating a linear bash script instead of this nightmarish instruction, which I could execute without having to think about unnecessary complications to get a working ROCm 5.7.1 ?
Author
Owner

@mnccouk commented on GitHub (Sep 22, 2024):

Just been through the docs, and back through my notes. This should get you close if not up and running. Haven't tested but hopefully this should get you on the right track

sudo mkdir --parents --mode=0755 /etc/apt/keyrings

wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | \
    gpg --dearmor | sudo tee /etc/apt/keyrings/rocm.gpg > /dev/null
    
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.7.1/ubuntu jammy main" \
    | sudo tee /etc/apt/sources.list.d/amdgpu.list
    
sudo apt update

echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600

sudo apt install amdgpu-dkms

sudo reboot

for ver in 5.7.1; do
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/$ver jammy main" \
    | sudo tee --append /etc/apt/sources.list.d/rocm.list
done

echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
    | sudo tee /etc/apt/preferences.d/rocm-pin-600
    
sudo apt update

sudo apt install rocm-hip-sdk #if that didn't work try -> sudo apt install rocm-hip-sdk5.7.1

#Think these may already be installed through deps, but no harm in trying 
sudo apt install rocm-utils #if that didn't work again try-> sudo apt install rocm-utils5.7.1

sudo tee --append /etc/ld.so.conf.d/rocm.conf <<EOF
/opt/rocm/lib
/opt/rocm/lib64
EOF

sudo ldconfig

#You'll need to add this in you user profile, this will work if you stay in the same terminal
export PATH=$PATH:/opt/rocm-5.7.1/bin:/opt/rocm-5.7.1/opencl/bin

sudo usermod -a -G render your_linux_username_here

#I still had permissions issue so brute forced access
sudo chmod 666 /dev/kfd

#to varify the libraries\driver are\is working
/opt/rocm/bin/rocminfo

<!-- gh-comment-id:2366982275 --> @mnccouk commented on GitHub (Sep 22, 2024): Just been through the docs, and back through my notes. This should get you close if not up and running. Haven't tested but hopefully this should get you on the right track ``` sudo mkdir --parents --mode=0755 /etc/apt/keyrings wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | \ gpg --dearmor | sudo tee /etc/apt/keyrings/rocm.gpg > /dev/null echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.7.1/ubuntu jammy main" \ | sudo tee /etc/apt/sources.list.d/amdgpu.list sudo apt update echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600 sudo apt install amdgpu-dkms sudo reboot for ver in 5.7.1; do echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/$ver jammy main" \ | sudo tee --append /etc/apt/sources.list.d/rocm.list done echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \ | sudo tee /etc/apt/preferences.d/rocm-pin-600 sudo apt update sudo apt install rocm-hip-sdk #if that didn't work try -> sudo apt install rocm-hip-sdk5.7.1 #Think these may already be installed through deps, but no harm in trying sudo apt install rocm-utils #if that didn't work again try-> sudo apt install rocm-utils5.7.1 sudo tee --append /etc/ld.so.conf.d/rocm.conf <<EOF /opt/rocm/lib /opt/rocm/lib64 EOF sudo ldconfig #You'll need to add this in you user profile, this will work if you stay in the same terminal export PATH=$PATH:/opt/rocm-5.7.1/bin:/opt/rocm-5.7.1/opencl/bin sudo usermod -a -G render your_linux_username_here #I still had permissions issue so brute forced access sudo chmod 666 /dev/kfd #to varify the libraries\driver are\is working /opt/rocm/bin/rocminfo ```
Author
Owner

@Tamila-2017 commented on GitHub (Sep 22, 2024):

Wow! Thank you so much you dear mnccouk, for your quick and creative help!

I'm immediately proceeding with the 4th install ubuntu-22.04.4-live-server-amd64 :-)

<!-- gh-comment-id:2366989204 --> @Tamila-2017 commented on GitHub (Sep 22, 2024): Wow! Thank you so much you dear **mnccouk**, for your quick and creative help! I'm immediately proceeding with the 4th install ubuntu-22.04.4-live-server-amd64 :-)
Author
Owner

@kth8 commented on GitHub (Sep 22, 2024):

I did a fresh install of ubuntu 22.04.5 then updated and rebooted but getting stuck at the sudo apt install amdgpu-dkms part with this error

Setting up amdgpu-dkms-firmware (1:6.2.4.50701-1664922.22.04) ...
Setting up dctrl-tools (2.24-3build2) ...
Setting up dkms (2.8.7-2ubuntu2.2) ...
Setting up amdgpu-dkms (1:6.2.4.50701-1664922.22.04) ...
Loading new amdgpu-6.2.4-1664922.22.04 DKMS files...
Building for 6.8.0-45-generic
Building for architecture x86_64
Building initial module for 6.8.0-45-generic
ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/amdgpu-dkms.0.crash'
Error! Bad return status for module build on kernel: 6.8.0-45-generic (x86_64)
Consult /var/lib/dkms/amdgpu/6.2.4-1664922.22.04/build/make.log for more information.
dpkg: error processing package amdgpu-dkms (--configure):
 installed amdgpu-dkms package post-installation script subprocess returned error exit status 10
Processing triggers for man-db (2.10.2-1) ...
Errors were encountered while processing:
 amdgpu-dkms
E: Sub-process /usr/bin/dpkg returned an error code (1)

from that make.log: https://termbin.com/hjob

<!-- gh-comment-id:2367017376 --> @kth8 commented on GitHub (Sep 22, 2024): I did a fresh install of ubuntu 22.04.5 then updated and rebooted but getting stuck at the `sudo apt install amdgpu-dkms` part with this error ``` Setting up amdgpu-dkms-firmware (1:6.2.4.50701-1664922.22.04) ... Setting up dctrl-tools (2.24-3build2) ... Setting up dkms (2.8.7-2ubuntu2.2) ... Setting up amdgpu-dkms (1:6.2.4.50701-1664922.22.04) ... Loading new amdgpu-6.2.4-1664922.22.04 DKMS files... Building for 6.8.0-45-generic Building for architecture x86_64 Building initial module for 6.8.0-45-generic ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/amdgpu-dkms.0.crash' Error! Bad return status for module build on kernel: 6.8.0-45-generic (x86_64) Consult /var/lib/dkms/amdgpu/6.2.4-1664922.22.04/build/make.log for more information. dpkg: error processing package amdgpu-dkms (--configure): installed amdgpu-dkms package post-installation script subprocess returned error exit status 10 Processing triggers for man-db (2.10.2-1) ... Errors were encountered while processing: amdgpu-dkms E: Sub-process /usr/bin/dpkg returned an error code (1) ``` from that make.log: https://termbin.com/hjob
Author
Owner

@Tamila-2017 commented on GitHub (Sep 23, 2024):

Dear mnccouk,

Your magical script executed without errors, only I had to split it into two parts, where the

sudo reboot

However, I got the same result as before, in which I don't see a graphics card, only a Celeron CPU:

~$ /opt/rocm/bin/rocminfo 
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    Intel(R) Celeron(R) CPU G1620 @ 2.70GHz
  Uuid:                    CPU-XX                             
  Marketing Name:          Intel(R) Celeron(R) CPU G1620 @ 2.70GHz
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2700                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            2                                  
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    3970100(0x3c9434) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    3970100(0x3c9434) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    3970100(0x3c9434) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*** Done *** 

What could be the reason for this failure?

<!-- gh-comment-id:2367030588 --> @Tamila-2017 commented on GitHub (Sep 23, 2024): Dear **mnccouk**, Your magical script executed without errors, only I had to split it into two parts, where the `sudo reboot` However, I got the same result as before, in which I don't see a graphics card, only a Celeron CPU: ``` ~$ /opt/rocm/bin/rocminfo ROCk module is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.1 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE Mwaitx: DISABLED DMAbuf Support: YES ========== HSA Agents ========== ******* Agent 1 ******* Name: Intel(R) Celeron(R) CPU G1620 @ 2.70GHz Uuid: CPU-XX Marketing Name: Intel(R) Celeron(R) CPU G1620 @ 2.70GHz Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 2700 BDFID: 0 Internal Node ID: 0 Compute Unit: 2 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 3970100(0x3c9434) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 3970100(0x3c9434) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 3970100(0x3c9434) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: *** Done *** ``` What could be the reason for this failure?
Author
Owner

@SirDubbins commented on GitHub (Sep 23, 2024):

I did a fresh install of ubuntu 22.04.5 then updated and rebooted but getting stuck at the sudo apt install amdgpu-dkms part with this error

Setting up amdgpu-dkms-firmware (1:6.2.4.50701-1664922.22.04) ...
Setting up dctrl-tools (2.24-3build2) ...
Setting up dkms (2.8.7-2ubuntu2.2) ...
Setting up amdgpu-dkms (1:6.2.4.50701-1664922.22.04) ...
Loading new amdgpu-6.2.4-1664922.22.04 DKMS files...
Building for 6.8.0-45-generic
Building for architecture x86_64
Building initial module for 6.8.0-45-generic
ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/amdgpu-dkms.0.crash'
Error! Bad return status for module build on kernel: 6.8.0-45-generic (x86_64)
Consult /var/lib/dkms/amdgpu/6.2.4-1664922.22.04/build/make.log for more information.
dpkg: error processing package amdgpu-dkms (--configure):
 installed amdgpu-dkms package post-installation script subprocess returned error exit status 10
Processing triggers for man-db (2.10.2-1) ...
Errors were encountered while processing:
 amdgpu-dkms
E: Sub-process /usr/bin/dpkg returned an error code (1)

from that make.log: https://termbin.com/hjob

I think i encountered a similar problem. I wound up deleting the 6.8.0-45 generic kernel and am using the older 5.19.0-32 generic kernel instead.

<!-- gh-comment-id:2367094662 --> @SirDubbins commented on GitHub (Sep 23, 2024): > I did a fresh install of ubuntu 22.04.5 then updated and rebooted but getting stuck at the `sudo apt install amdgpu-dkms` part with this error > > ``` > Setting up amdgpu-dkms-firmware (1:6.2.4.50701-1664922.22.04) ... > Setting up dctrl-tools (2.24-3build2) ... > Setting up dkms (2.8.7-2ubuntu2.2) ... > Setting up amdgpu-dkms (1:6.2.4.50701-1664922.22.04) ... > Loading new amdgpu-6.2.4-1664922.22.04 DKMS files... > Building for 6.8.0-45-generic > Building for architecture x86_64 > Building initial module for 6.8.0-45-generic > ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/amdgpu-dkms.0.crash' > Error! Bad return status for module build on kernel: 6.8.0-45-generic (x86_64) > Consult /var/lib/dkms/amdgpu/6.2.4-1664922.22.04/build/make.log for more information. > dpkg: error processing package amdgpu-dkms (--configure): > installed amdgpu-dkms package post-installation script subprocess returned error exit status 10 > Processing triggers for man-db (2.10.2-1) ... > Errors were encountered while processing: > amdgpu-dkms > E: Sub-process /usr/bin/dpkg returned an error code (1) > ``` > > from that make.log: https://termbin.com/hjob I think i encountered a similar problem. I wound up deleting the 6.8.0-45 generic kernel and am using the older 5.19.0-32 generic kernel instead.
Author
Owner

@Tamila-2017 commented on GitHub (Sep 23, 2024):

SirDubbins,

I used to get this error all the time too.
But when I used magical the script by mnccouk, I didn't notice it, because the lines on the screen were running very fast.
Maybe it didn't exist?

<!-- gh-comment-id:2367836039 --> @Tamila-2017 commented on GitHub (Sep 23, 2024): SirDubbins, I used to get this error all the time too. But when I used magical the script by [mnccouk](https://github.com/ollama/ollama/issues/2453#issuecomment-2366982275), I didn't notice it, because the lines on the screen were running very fast. Maybe it didn't exist?
Author
Owner

@mnccouk commented on GitHub (Sep 23, 2024):

@Tamila-2017
Probably best to run that script step by step, one command at a time, After each command validate you have no errors, apologies for the confusion. You may have had similar issue to @kth8.but not noticed due to lack of error trapping when running all the commands at once.

@SirDubbins proposed a fix, by trying again after downgrading your kernel, or reinstall ubuntu again(sorry!).

Looking at the doc - https://rocm.docs.amd.com/en/docs-5.7.1/release/gpu_os_support.html - Ubuntu 22.04.3 has been validated with kernel 6.2. Try that version(22.04.3) of Ubuntu instead, also don't install any updates as part of the Ubuntu installation.

I went from an earlier of Ubuntu 20.04 then upgraded through to Ubuntu 22.04.4, the whole process wasn't smooth for me either. It's quite possible the kernel drivers could have compiled against an earlier kernel version.

Just for info - I'm actually on kernel version 6.8.0-40-generic now.

Sorry, but you may have to target an install of (Ubuntu 22.04.3) or try downgrading your kernel but that introduce other issues. I'd opt for the fresh install again if you have the luxury of being able to.

<!-- gh-comment-id:2368003805 --> @mnccouk commented on GitHub (Sep 23, 2024): @Tamila-2017 Probably best to run that script step by step, one command at a time, After each command validate you have no errors, apologies for the confusion. You may have had similar issue to @kth8.but not noticed due to lack of error trapping when running all the commands at once. @SirDubbins proposed a fix, by trying again after downgrading your kernel, or reinstall ubuntu again(sorry!). Looking at the doc - https://rocm.docs.amd.com/en/docs-5.7.1/release/gpu_os_support.html - Ubuntu 22.04.3 has been validated with kernel 6.2. Try that version(22.04.3) of Ubuntu instead, also don't install any updates as part of the Ubuntu installation. I went from an earlier of Ubuntu 20.04 then upgraded through to Ubuntu 22.04.4, the whole process wasn't smooth for me either. It's quite possible the kernel drivers could have compiled against an earlier kernel version. Just for info - I'm actually on kernel version 6.8.0-40-generic now. Sorry, but you may have to target an install of (Ubuntu 22.04.3) or try downgrading your kernel but that introduce other issues. I'd opt for the fresh install again if you have the luxury of being able to.
Author
Owner

@Tamila-2017 commented on GitHub (Sep 23, 2024):

Dear mnccouk,

I've reinstalled again ubuntu-22.04.4-live-server-amd64.iso

uname -a
Linux ai-server 5.15.0-122-generic #132-Ubuntu SMP Thu Aug 29 13:45:52 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Then, carefully step by step, I executed each command of your magic script separately, watching for mistakes.
There were no mistakes.

However, I got the same result without the GPU -

/opt/rocm/bin/rocminfo
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    Intel(R) Celeron(R) CPU G1620 @ 2.70GHz
  Uuid:                    CPU-XX                             
  Marketing Name:          Intel(R) Celeron(R) CPU G1620 @ 2.70GHz
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2700                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            2                                  
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    3970100(0x3c9434) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    3970100(0x3c9434) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    3970100(0x3c9434) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*** Done ***             

What do I do next?

<!-- gh-comment-id:2368987421 --> @Tamila-2017 commented on GitHub (Sep 23, 2024): Dear **mnccouk**, I've reinstalled again **ubuntu-22.04.4-live-server-amd64.iso** ``` uname -a Linux ai-server 5.15.0-122-generic #132-Ubuntu SMP Thu Aug 29 13:45:52 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux ``` Then, carefully step by step, I executed each command of your [magic script](https://github.com/ollama/ollama/issues/2453#issuecomment-2366982275) separately, watching for mistakes. There were no mistakes. However, I got the same result without the GPU - ``` /opt/rocm/bin/rocminfo ROCk module is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.1 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE Mwaitx: DISABLED DMAbuf Support: YES ========== HSA Agents ========== ******* Agent 1 ******* Name: Intel(R) Celeron(R) CPU G1620 @ 2.70GHz Uuid: CPU-XX Marketing Name: Intel(R) Celeron(R) CPU G1620 @ 2.70GHz Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 2700 BDFID: 0 Internal Node ID: 0 Compute Unit: 2 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 3970100(0x3c9434) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 3970100(0x3c9434) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 3970100(0x3c9434) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: *** Done *** ``` What do I do next?
Author
Owner

@mnccouk commented on GitHub (Sep 23, 2024):

What's the output of

cat /var/log/dmesg | grep -e amdgpu -e drm
<!-- gh-comment-id:2369036254 --> @mnccouk commented on GitHub (Sep 23, 2024): What's the output of ``` cat /var/log/dmesg | grep -e amdgpu -e drm ```
Author
Owner

@Tamila-2017 commented on GitHub (Sep 23, 2024):

# cat /var/log/dmesg | grep -e amdgpu -e drm
cat: /var/log/dmesg: No such file or directory
# ls
alternatives.log  btmp                   dist-upgrade  installer  private              wtmp
apt               cloud-init-output.log  dpkg.log      journal    rocm_smi_lib
bootstrap.log     cloud-init.log         faillog       lastlog    unattended-upgrades

<!-- gh-comment-id:2369059129 --> @Tamila-2017 commented on GitHub (Sep 23, 2024): ``` # cat /var/log/dmesg | grep -e amdgpu -e drm cat: /var/log/dmesg: No such file or directory ``` ``` # ls alternatives.log btmp dist-upgrade installer private wtmp apt cloud-init-output.log dpkg.log journal rocm_smi_lib bootstrap.log cloud-init.log faillog lastlog unattended-upgrades ```
Author
Owner

@mnccouk commented on GitHub (Sep 23, 2024):

How about?

sudo dmesg | grep -e amdgpu -e drm
<!-- gh-comment-id:2369273584 --> @mnccouk commented on GitHub (Sep 23, 2024): How about? ``` sudo dmesg | grep -e amdgpu -e drm ```
Author
Owner

@mnccouk commented on GitHub (Sep 23, 2024):

And...

lsmod | grep amdgpu
<!-- gh-comment-id:2369283537 --> @mnccouk commented on GitHub (Sep 23, 2024): And... ``` lsmod | grep amdgpu ```
Author
Owner

@Tamila-2017 commented on GitHub (Sep 23, 2024):

sudo dmesg | grep -e amdgpu -e drm
[    5.127309] [drm] amdgpu kernel modesetting enabled.
[    5.156647] [drm] amdgpu version: 6.2.4
[    5.185611] [drm] OS DRM version: 5.15.0
[    5.244324] amdgpu: CRAT table not found
[    5.305139] amdgpu: Virtual CRAT table created for CPU
[    5.368709] amdgpu: Topology: Add CPU node
[    5.448129] amdgpu: PeerDirect support was initialized successfully
[    5.480922] [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67DF 0x1DA2:0xE366 0xE7).
[    5.513856] [drm] register mmio base: 0xF7E00000
[    5.546704] [drm] register mmio size: 262144
[    5.579527] [drm] add ip block number 0 <vi_common>
[    5.612338] [drm] add ip block number 1 <gmc_v8_0>
[    5.645021] [drm] add ip block number 2 <tonga_ih>
[    5.677367] [drm] add ip block number 3 <gfx_v8_0>
[    5.709378] [drm] add ip block number 4 <sdma_v3_0>
[    5.741213] [drm] add ip block number 5 <powerplay>
[    5.772408] [drm] add ip block number 6 <dm>
[    5.802951] [drm] add ip block number 7 <uvd_v6_0>
[    5.833281] [drm] add ip block number 8 <vce_v3_0>
[    5.925618] amdgpu 0000:01:00.0: No more image in the PCI ROM
[    5.956693] amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from ROM BAR
[    5.987774] amdgpu: ATOM BIOS: 113-1E3660U-O51
[    6.018348] [drm] UVD is enabled in VM mode
[    6.048418] [drm] UVD ENC is enabled in VM mode
[    6.078313] [drm] VCE enabled in VM mode
[    6.108367] amdgpu 0000:01:00.0: vgaarb: deactivate vga console
[    6.108372] amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[    6.108376] amdgpu 0000:01:00.0: amdgpu: PCIE atomic ops is not supported
[    6.108593] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[    6.108638] amdgpu 0000:01:00.0: amdgpu: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
[    6.108643] amdgpu 0000:01:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[    6.108734] [drm] Detected VRAM RAM=8192M, BAR=256M
[    6.108738] [drm] RAM width 256bits GDDR5
[    6.108788] [drm] amdgpu: 8192M of VRAM memory ready
[    6.108792] [drm] amdgpu: 1938M of GTT memory ready.
[    6.108862] [drm] GART: num cpu pages 65536, num gpu pages 65536
[    6.109304] [drm] PCIE GART of 256M enabled (table at 0x000000F400800000).
[    6.109428] [drm] Chained IB support enabled!
[    6.110196] amdgpu: [powerplay] hwmgr_sw_init smu backed is polaris10_smu
[    6.110341] [drm] Found UVD firmware Version: 1.130 Family ID: 16
[    6.110955] [drm] Found VCE firmware Version: 53.26 Binary ID: 3
[    6.179076] [drm] Display Core v3.2.241 initialized on DCE 11.2
[    6.274702] [drm] UVD and UVD ENC initialized successfully.
[    6.384641] [drm] VCE initialized successfully.
[    6.385053] kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics 730<0
[    6.385072] amdgpu 0000:01:00.0: amdgpu: SE 4, SH per SE 1, CU per SH 9, active_cu_number 36
[    6.388350] amdgpu: legacy kernel without apple_gmux_detect()
[    6.388356] amdgpu 0000:01:00.0: amdgpu: Using BACO for runtime pm
[    6.389279] [drm] Initialized amdgpu 3.54.0 20150101 for 0000:01:00.0 on minor 0
[    6.403933] fbcon: amdgpudrmfb (fb0) is primary device
[    6.479039] amdgpu 0000:01:00.0: [drm] fb0: amdgpudrmfb frame buffer device
[    7.845748] systemd[1]: Starting Load Kernel Module drm...
[    9.367404] snd_hda_intel 0000:01:00.1: bound 0000:01:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])

lsmod | grep amdgpu
amdgpu              13590528  0
amddrm_ttm_helper      16384  1 amdgpu
amdttm                 94208  2 amdgpu,amddrm_ttm_helper
amdxcp                 16384  1 amdgpu
iommu_v2               24576  1 amdgpu
amddrm_buddy           20480  1 amdgpu
amd_sched              49152  1 amdgpu
amdkcl                 32768  3 amd_sched,amdttm,amdgpu
i2c_algo_bit           16384  1 amdgpu
drm_kms_helper        311296  3 amdgpu
drm                   622592  9 drm_kms_helper,amd_sched,amdttm,amdgpu,amddrm_buddy,amdkcl,amddrm_ttm_helper,amdxcp

<!-- gh-comment-id:2369342478 --> @Tamila-2017 commented on GitHub (Sep 23, 2024): ``` sudo dmesg | grep -e amdgpu -e drm [ 5.127309] [drm] amdgpu kernel modesetting enabled. [ 5.156647] [drm] amdgpu version: 6.2.4 [ 5.185611] [drm] OS DRM version: 5.15.0 [ 5.244324] amdgpu: CRAT table not found [ 5.305139] amdgpu: Virtual CRAT table created for CPU [ 5.368709] amdgpu: Topology: Add CPU node [ 5.448129] amdgpu: PeerDirect support was initialized successfully [ 5.480922] [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67DF 0x1DA2:0xE366 0xE7). [ 5.513856] [drm] register mmio base: 0xF7E00000 [ 5.546704] [drm] register mmio size: 262144 [ 5.579527] [drm] add ip block number 0 <vi_common> [ 5.612338] [drm] add ip block number 1 <gmc_v8_0> [ 5.645021] [drm] add ip block number 2 <tonga_ih> [ 5.677367] [drm] add ip block number 3 <gfx_v8_0> [ 5.709378] [drm] add ip block number 4 <sdma_v3_0> [ 5.741213] [drm] add ip block number 5 <powerplay> [ 5.772408] [drm] add ip block number 6 <dm> [ 5.802951] [drm] add ip block number 7 <uvd_v6_0> [ 5.833281] [drm] add ip block number 8 <vce_v3_0> [ 5.925618] amdgpu 0000:01:00.0: No more image in the PCI ROM [ 5.956693] amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from ROM BAR [ 5.987774] amdgpu: ATOM BIOS: 113-1E3660U-O51 [ 6.018348] [drm] UVD is enabled in VM mode [ 6.048418] [drm] UVD ENC is enabled in VM mode [ 6.078313] [drm] VCE enabled in VM mode [ 6.108367] amdgpu 0000:01:00.0: vgaarb: deactivate vga console [ 6.108372] amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported [ 6.108376] amdgpu 0000:01:00.0: amdgpu: PCIE atomic ops is not supported [ 6.108593] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit [ 6.108638] amdgpu 0000:01:00.0: amdgpu: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used) [ 6.108643] amdgpu 0000:01:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF [ 6.108734] [drm] Detected VRAM RAM=8192M, BAR=256M [ 6.108738] [drm] RAM width 256bits GDDR5 [ 6.108788] [drm] amdgpu: 8192M of VRAM memory ready [ 6.108792] [drm] amdgpu: 1938M of GTT memory ready. [ 6.108862] [drm] GART: num cpu pages 65536, num gpu pages 65536 [ 6.109304] [drm] PCIE GART of 256M enabled (table at 0x000000F400800000). [ 6.109428] [drm] Chained IB support enabled! [ 6.110196] amdgpu: [powerplay] hwmgr_sw_init smu backed is polaris10_smu [ 6.110341] [drm] Found UVD firmware Version: 1.130 Family ID: 16 [ 6.110955] [drm] Found VCE firmware Version: 53.26 Binary ID: 3 [ 6.179076] [drm] Display Core v3.2.241 initialized on DCE 11.2 [ 6.274702] [drm] UVD and UVD ENC initialized successfully. [ 6.384641] [drm] VCE initialized successfully. [ 6.385053] kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics 730<0 [ 6.385072] amdgpu 0000:01:00.0: amdgpu: SE 4, SH per SE 1, CU per SH 9, active_cu_number 36 [ 6.388350] amdgpu: legacy kernel without apple_gmux_detect() [ 6.388356] amdgpu 0000:01:00.0: amdgpu: Using BACO for runtime pm [ 6.389279] [drm] Initialized amdgpu 3.54.0 20150101 for 0000:01:00.0 on minor 0 [ 6.403933] fbcon: amdgpudrmfb (fb0) is primary device [ 6.479039] amdgpu 0000:01:00.0: [drm] fb0: amdgpudrmfb frame buffer device [ 7.845748] systemd[1]: Starting Load Kernel Module drm... [ 9.367404] snd_hda_intel 0000:01:00.1: bound 0000:01:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu]) ``` ``` lsmod | grep amdgpu amdgpu 13590528 0 amddrm_ttm_helper 16384 1 amdgpu amdttm 94208 2 amdgpu,amddrm_ttm_helper amdxcp 16384 1 amdgpu iommu_v2 24576 1 amdgpu amddrm_buddy 20480 1 amdgpu amd_sched 49152 1 amdgpu amdkcl 32768 3 amd_sched,amdttm,amdgpu i2c_algo_bit 16384 1 amdgpu drm_kms_helper 311296 3 amdgpu drm 622592 9 drm_kms_helper,amd_sched,amdttm,amdgpu,amddrm_buddy,amdkcl,amddrm_ttm_helper,amdxcp ```
Author
Owner

@mnccouk commented on GitHub (Sep 23, 2024):

I'm running out of ideas here, but I don't like the look of:

[    6.385053] kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics 730<0

Think it maybe to do with PCI spec of your hardware - see this link for related topic - https://www.reddit.com/r/ROCm/comments/ba6tvq/atomics_on_my_hardware/

<!-- gh-comment-id:2369372398 --> @mnccouk commented on GitHub (Sep 23, 2024): I'm running out of ideas here, but I don't like the look of: ``` [ 6.385053] kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics 730<0 ``` Think it maybe to do with PCI spec of your hardware - see this link for related topic - https://www.reddit.com/r/ROCm/comments/ba6tvq/atomics_on_my_hardware/
Author
Owner

@Tamila-2017 commented on GitHub (Sep 23, 2024):

It's incredible! You are doing well, but I am not, even though I follow your instructions exactly

So I offer you remote access via ssh, and you personally can install this cranky ROCm 5.7.1

PS. I used to be able to mine Bitcoin, Litecoin and Ethereum Classic on the same motherboard MSI H61MU-E35 (B3) and AMD RX580 without any problems.
I don't understand why they refuse to work together with ROCm :-o

<!-- gh-comment-id:2369404574 --> @Tamila-2017 commented on GitHub (Sep 23, 2024): It's incredible! You are doing well, but I am not, even though I follow your instructions exactly So I offer you remote access via ssh, and you personally can install this cranky ROCm 5.7.1 PS. I used to be able to mine Bitcoin, Litecoin and Ethereum Classic on the same motherboard MSI H61MU-E35 (B3) and AMD RX580 without any problems. I don't understand why they refuse to work together with ROCm :-o
Author
Owner

@mnccouk commented on GitHub (Sep 23, 2024):

I can't be certain on this, but what I'm reading is the ROCm requires PCIe 3.0 AtomicOp feature, looking at the spec of your CPU - https://www.intel.com/content/www/us/en/products/sku/71073/intel-celeron-processor-g1620-2m-cache-2-70-ghz/specifications.html
It looks like it supports PCIe revision 2.0 which unfortunately is not compatible with the ROCm system. Also motherboard seems to be PCIe 2.x too.

I'm sorry to say I don't think ROCm will work for you with that CPU\Motherboard GPU combination, I maybe wrong but that's I think. Open to any other thoughts.

<!-- gh-comment-id:2369535399 --> @mnccouk commented on GitHub (Sep 23, 2024): I can't be certain on this, but what I'm reading is the ROCm requires PCIe 3.0 AtomicOp feature, looking at the spec of your CPU - https://www.intel.com/content/www/us/en/products/sku/71073/intel-celeron-processor-g1620-2m-cache-2-70-ghz/specifications.html It looks like it supports PCIe revision 2.0 which unfortunately is not compatible with the ROCm system. Also motherboard seems to be PCIe 2.x too. I'm sorry to say I don't think ROCm will work for you with that CPU\Motherboard GPU combination, I maybe wrong but that's I think. Open to any other thoughts.
Author
Owner

@Tamila-2017 commented on GitHub (Sep 23, 2024):

Dear mnccouk 💓

You have no need to apologize :-) On the contrary, I am very grateful to you for your courtesy and your great work you have done to analyze my situation and create a magic script to build ROCm.
I think you are right about my outdated hardware.
I will consider buying a more modern CPU and Motherboard, and I would be grateful if you could tell me the system requirements for them.

<!-- gh-comment-id:2369611351 --> @Tamila-2017 commented on GitHub (Sep 23, 2024): Dear **mnccouk** 💓 You have no need to apologize :-) On the contrary, I am very grateful to you for your courtesy and your great work you have done to analyze my situation and create a magic script to build ROCm. I think you are right about my outdated hardware. I will consider buying a more modern CPU and Motherboard, and I would be grateful if you could tell me the system requirements for them.
Author
Owner

@mnccouk commented on GitHub (Sep 24, 2024):

@Tamila-2017

I can only forward the motherboard and CPU I'm using as I know that would work:
Motherboard - B450M PRO-VDH MAX - https://www.msi.com/Motherboard/B450M-PRO-VDH-MAX/support
CPU - AMD Ryzen 5 3400G with Radeon Vega Graphics
And of course the RX580

There is a discussion here - https://github.com/ROCm/ROCm/issues/237 about a similar issue to what you have, except in this instance it's just one of the PCIe ports that is failing when attempting to use two graphics cards, even though it has PCIe 3.0 specification.

Maybe others would share what hardware(motherboard\cpu combination) they have successfully used with ROCm to give you some alternatives?

<!-- gh-comment-id:2371070242 --> @mnccouk commented on GitHub (Sep 24, 2024): @Tamila-2017 I can only forward the motherboard and CPU I'm using as I know that would work: Motherboard - B450M PRO-VDH MAX - https://www.msi.com/Motherboard/B450M-PRO-VDH-MAX/support CPU - AMD Ryzen 5 3400G with Radeon Vega Graphics And of course the RX580 There is a discussion here - https://github.com/ROCm/ROCm/issues/237 about a similar issue to what you have, except in this instance it's just one of the PCIe ports that is failing when attempting to use two graphics cards, even though it has PCIe 3.0 specification. Maybe others would share what hardware(motherboard\cpu combination) they have successfully used with ROCm to give you some alternatives?
Author
Owner

@Tamila-2017 commented on GitHub (Sep 24, 2024):

Ok. Guys, how do you measure speed in tokens/second? I want to measure the speed of CPU N100.

<!-- gh-comment-id:2371717486 --> @Tamila-2017 commented on GitHub (Sep 24, 2024): Ok. Guys, how do you measure speed in tokens/second? I want to measure the speed of CPU N100.
Author
Owner

@bellmancity commented on GitHub (Sep 24, 2024):

Ok. Guys, how do you measure speed in tokens/second? I want to measure the speed of CPU N100.
I think this is the command, in ollama prompt.
/set verbose

<!-- gh-comment-id:2371803620 --> @bellmancity commented on GitHub (Sep 24, 2024): > Ok. Guys, how do you measure speed in tokens/second? I want to measure the speed of CPU N100. I think this is the command, in ollama prompt. /set verbose
Author
Owner

@Tamila-2017 commented on GitHub (Sep 24, 2024):

/set verbose

So I need to run some model to measure the speed?
I ran the Llama 3.1 model and executed your command, but it did not report the speed:

>>> /set verbose
Set 'verbose' mode.

<!-- gh-comment-id:2371862239 --> @Tamila-2017 commented on GitHub (Sep 24, 2024): > /set verbose So I need to run some model to measure the speed? I ran the Llama 3.1 model and executed your command, but it did not report the speed: ``` >>> /set verbose Set 'verbose' mode. ```
Author
Owner

@mnccouk commented on GitHub (Sep 24, 2024):

Then you make a prompt:-
Screenshot from 2024-09-24 20-36-45

<!-- gh-comment-id:2372223901 --> @mnccouk commented on GitHub (Sep 24, 2024): Then you make a prompt:- ![Screenshot from 2024-09-24 20-36-45](https://github.com/user-attachments/assets/ea587bb0-3028-4d17-86fa-61e69ac34afd)
Author
Owner

@Tamila-2017 commented on GitHub (Sep 24, 2024):

Surprisingly, I got a different answer:

~$ ollama run llama3.1
/set verbose
Set 'verbose' mode.
why is the sky blue
The sky appears blue to us because of a phenomenon called scattering which occurs when sunlight interacts with the tiny molecules of gases in the atmosphere. Here's a simplified explanation:

  1. Sunlight enters Earth's atmosphere: When the sun shines, it emits a wide range of electromagnetic radiation, including visible light, ultraviolet (UV) light, and infrared (IR) radiation.
  2. Light scatters off gas molecules: As sunlight travels through the atmosphere, it encounters tiny molecules of gases like nitrogen (N2) and oxygen (O2). These molecules are much smaller than the wavelength of light, so they scatter the shorter (blue) wavelengths more efficiently than the longer (red) wavelengths.
  3. Blue light is scattered in all directions: The scattering of blue light by gas molecules causes it to be distributed throughout the atmosphere in all directions. This means that we see this scattered blue light from almost everywhere around us.
  4. Red light continues straight: On the other hand, the longer wavelengths (like red and infrared) continue traveling in a more direct path, with less scattering occurring. As a result, these colors are not as prominently visible to our eyes.

Why doesn't the sky always appear blue?

  1. Time of day: During sunrise and sunset, the sun's light travels through more of the Earth's atmosphere, which scatters shorter wavelengths > like blue even further. This results in the appearance of reds and oranges.
  2. Clouds and dust: When clouds or atmospheric particles are present, they can absorb or scatter light in different ways, affecting the apparent color of the sky.
  3. Atmospheric conditions: Weather events, pollution, or other factors can alter the way light interacts with the atmosphere, making the sky appear more hazy, gray, or even greenish.

Additional fun facts:

  • The sky appears blue to us because our eyes are more sensitive to the shorter (blue) wavelengths of light.
  • If we were to live on a planet with a different atmospheric composition or no atmosphere at all, the sky might appear differently colored or even not blue at all!
  • Some people claim that the sky can appear violet under certain conditions. However, this is relatively rare and usually requires specific optical phenomena.

I hope you enjoyed learning about why the sky appears blue!

And the performance of Llama 3.1 on CPU N100 turned out like this:

CPU_N100

<!-- gh-comment-id:2372332377 --> @Tamila-2017 commented on GitHub (Sep 24, 2024): Surprisingly, I got a different answer: > ~$ ollama run llama3.1 > /set verbose > Set 'verbose' mode. > why is the sky blue > The sky appears blue to us because of a phenomenon called scattering which occurs when sunlight interacts with the tiny molecules of gases in the atmosphere. Here's a simplified explanation: > > 1. **Sunlight enters Earth's atmosphere**: When the sun shines, it emits a wide range of electromagnetic radiation, including visible light, ultraviolet (UV) light, and infrared (IR) radiation. > 2. **Light scatters off gas molecules**: As sunlight travels through the atmosphere, it encounters tiny molecules of gases like nitrogen (N2) and oxygen (O2). These molecules are much smaller than the wavelength of light, so they scatter the shorter (blue) wavelengths more efficiently than the longer (red) wavelengths. > 3. **Blue light is scattered in all directions**: The scattering of blue light by gas molecules causes it to be distributed throughout the atmosphere in all directions. This means that we see this scattered blue light from almost everywhere around us. > 4. **Red light continues straight**: On the other hand, the longer wavelengths (like red and infrared) continue traveling in a more direct path, with less scattering occurring. As a result, these colors are not as prominently visible to our eyes. > > **Why doesn't the sky always appear blue?** > > 1. **Time of day**: During sunrise and sunset, the sun's light travels through more of the Earth's atmosphere, which scatters shorter wavelengths > like blue even further. This results in the appearance of reds and oranges. > 2. **Clouds and dust**: When clouds or atmospheric particles are present, they can absorb or scatter light in different ways, affecting the apparent color of the sky. > 3. **Atmospheric conditions**: Weather events, pollution, or other factors can alter the way light interacts with the atmosphere, making the sky appear more hazy, gray, or even greenish. > > **Additional fun facts:** > > * The sky appears blue to us because our eyes are more sensitive to the shorter (blue) wavelengths of light. > * If we were to live on a planet with a different atmospheric composition or no atmosphere at all, the sky might appear differently colored or even not blue at all! > * Some people claim that the sky can appear violet under certain conditions. However, this is relatively rare and usually requires specific optical phenomena. > > I hope you enjoyed learning about why the sky appears blue! And the performance of Llama 3.1 on CPU N100 turned out like this: ![CPU_N100](https://github.com/user-attachments/assets/83b294db-3931-4eff-8047-1661f251b8b8)
Author
Owner

@Tamila-2017 commented on GitHub (Sep 24, 2024):

The performance of Llama 3.1 on CPU Intel-i3-4330-3 50GHz:

Intel-i3-4330-3 50GHz

<!-- gh-comment-id:2372426266 --> @Tamila-2017 commented on GitHub (Sep 24, 2024): The performance of Llama 3.1 on CPU Intel-i3-4330-3 50GHz: ![Intel-i3-4330-3 50GHz](https://github.com/user-attachments/assets/8dc5f36a-969c-4a03-b0e9-748cf958e0d6)
Author
Owner

@Tamila-2017 commented on GitHub (Sep 24, 2024):

Dear mnccouk,

You have achieved a performance of ~26 tokens/sec.
Let me clarify: this is the total performance AMD Ryzen 5 3400G + Radeon Vega Graphics + RX580 ?
Or is the performance of only one RX580 ?

<!-- gh-comment-id:2372453286 --> @Tamila-2017 commented on GitHub (Sep 24, 2024): Dear **mnccouk**, You have achieved a performance of ~26 tokens/sec. Let me clarify: this is the total performance AMD Ryzen 5 3400G **+** Radeon Vega Graphics **+** RX580 ? Or is the performance of only one RX580 ?
Author
Owner

@AustinPowers1935 commented on GitHub (Sep 25, 2024):

docker exec -it ollama_gpu ollama run llama3.1

/set verbose
Set 'verbose' mode.
Fix grammar and polish message:
...
... @mnccouk Thank you! Works great for models which fit into GPU memory.
...
... For those struggling with installation on Ubuntu:
... just skip "2. Add the AMDGPU Repository and Install the Kernel-mode Driver" from this instruction: https://rocm.docs.amd.com/en/docs-5.7.0/deploy/linux/os-native/install.html.
... You do not need it: recent Ubuntu releases already contain amdgpu driver.
...
... I got it worked without any problems within less than half an hour.
... 22.04.5 LTS
... 6.8.0-40-generic
... RX590
Here is the revised message with grammar and polish:

"@mnccouk Thank you! Works great for models that fit into GPU memory.

For those struggling to install on Ubuntu, here's a tip:
You can skip step 2 ("Add the AMDGPU Repository and Install the Kernel-mode Driver") from this instruction set: https://rocm.docs.amd.com/en/docs-5.7.0/deploy/linux/os-native/install.html.
You don't need it, as recent Ubuntu releases already contain the amdgpu driver.

I was able to get it working without any issues within less than half an hour on:
22.04.5 LTS
6.8.0-40-generic
RX590"

Note: I made some minor changes for clarity and readability, such as changing "those struggling with installation" to "those struggling to install", and rephrasing the sentence about skipping step 2
to make it more concise. Let me know if you'd like any further adjustments!

total duration: 31.579130243s
load duration: 28.163993ms
prompt eval count: 137 token(s)
prompt eval duration: 24.679448s
prompt eval rate: 5.55 tokens/s
eval count: 202 token(s)
eval duration: 6.829664s
eval rate: 29.58 tokens/s

<!-- gh-comment-id:2372669758 --> @AustinPowers1935 commented on GitHub (Sep 25, 2024): > docker exec -it ollama_gpu ollama run llama3.1 > >>> /set verbose > Set 'verbose' mode. > >>> Fix grammar and polish message: > ... > ... @mnccouk Thank you! Works great for models which fit into GPU memory. > ... > ... For those struggling with installation on Ubuntu: > ... just skip "2. Add the AMDGPU Repository and Install the Kernel-mode Driver" from this instruction: https://rocm.docs.amd.com/en/docs-5.7.0/deploy/linux/os-native/install.html. > ... You do not need it: recent Ubuntu releases already contain amdgpu driver. > ... > ... I got it worked without any problems within less than half an hour. > ... 22.04.5 LTS > ... 6.8.0-40-generic > ... RX590 > Here is the revised message with grammar and polish: > > "@mnccouk Thank you! Works great for models that fit into GPU memory. > > For those struggling to install on Ubuntu, here's a tip: > You can skip step 2 ("Add the AMDGPU Repository and Install the Kernel-mode Driver") from this instruction set: https://rocm.docs.amd.com/en/docs-5.7.0/deploy/linux/os-native/install.html. > You don't need it, as recent Ubuntu releases already contain the amdgpu driver. > > I was able to get it working without any issues within less than half an hour on: > 22.04.5 LTS > 6.8.0-40-generic > RX590" > > Note: I made some minor changes for clarity and readability, such as changing "those struggling with installation" to "those struggling to install", and rephrasing the sentence about skipping step 2 > to make it more concise. Let me know if you'd like any further adjustments! > > total duration: 31.579130243s > load duration: 28.163993ms > prompt eval count: 137 token(s) > prompt eval duration: 24.679448s > prompt eval rate: 5.55 tokens/s > eval count: 202 token(s) > eval duration: 6.829664s > eval rate: 29.58 tokens/s >
Author
Owner

@AustinPowers1935 commented on GitHub (Sep 25, 2024):

@kth8 @Tamila-2017 ^^^

<!-- gh-comment-id:2372671405 --> @AustinPowers1935 commented on GitHub (Sep 25, 2024): @kth8 @Tamila-2017 ^^^
Author
Owner

@kth8 commented on GitHub (Sep 25, 2024):

After doing a new install of Ubuntu 22.04 I followed the guide except step 2 of installing amdgpu-dkms but after rebooting and running rocminfo it seems my GPU is not discovered.

Here are my system logs: https://termbin.com/svjq

I am using RX 470.

<!-- gh-comment-id:2374377959 --> @kth8 commented on GitHub (Sep 25, 2024): After doing a new install of Ubuntu 22.04 I followed the guide except step 2 of installing `amdgpu-dkms` but after rebooting and running `rocminfo` it seems my GPU is not discovered. Here are my system logs: https://termbin.com/svjq I am using RX 470.
Author
Owner

@mnccouk commented on GitHub (Sep 25, 2024):

Dear mnccouk,

You have achieved a performance of ~26 tokens/sec. Let me clarify: this is the total performance AMD Ryzen 5 3400G + Radeon Vega Graphics + RX580 ? Or is the performance of only one RX580 ?

This is just with 1 RX580 + CPU, the CPU is still utilised to some degree but the GPU is performing all the heavy lifting.

As far as I'm aware the embedded Radeon Vega Graphics(embedded in the CPU) in not utilised.

<!-- gh-comment-id:2375210546 --> @mnccouk commented on GitHub (Sep 25, 2024): > Dear **mnccouk**, > > You have achieved a performance of ~26 tokens/sec. Let me clarify: this is the total performance AMD Ryzen 5 3400G **+** Radeon Vega Graphics **+** RX580 ? Or is the performance of only one RX580 ? This is just with 1 RX580 + CPU, the CPU is still utilised to some degree but the GPU is performing all the heavy lifting. As far as I'm aware the embedded Radeon Vega Graphics(embedded in the CPU) in not utilised.
Author
Owner

@mnccouk commented on GitHub (Sep 25, 2024):

@kth8
looks like you have similar problem to @Tamila-2017
from your log

[   24.636266] kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics 730<0

<!-- gh-comment-id:2375227097 --> @mnccouk commented on GitHub (Sep 25, 2024): @kth8 looks like you have similar problem to @Tamila-2017 from your log ``` [ 24.636266] kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics 730<0 ```
Author
Owner

@kth8 commented on GitHub (Sep 25, 2024):

@kth8 looks like you have similar problem to @Tamila-2017 from your log

[   24.636266] kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics 730<0

Do you know what that means? Is there something wrong with my motherboard or graphics card? Do I need to change a BIOS setting or set a kernel parameter or use a different kernel or something else?

<!-- gh-comment-id:2375234383 --> @kth8 commented on GitHub (Sep 25, 2024): > @kth8 looks like you have similar problem to @Tamila-2017 from your log > > ``` > [ 24.636266] kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics 730<0 > ``` Do you know what that means? Is there something wrong with my motherboard or graphics card? Do I need to change a BIOS setting or set a kernel parameter or use a different kernel or something else?
Author
Owner

@mnccouk commented on GitHub (Sep 25, 2024):

I believe it's to do with PCI specification - see chat from earlier - https://github.com/ollama/ollama/issues/2453#issuecomment-2369372398

<!-- gh-comment-id:2375241373 --> @mnccouk commented on GitHub (Sep 25, 2024): I believe it's to do with PCI specification - see chat from earlier - https://github.com/ollama/ollama/issues/2453#issuecomment-2369372398
Author
Owner

@kth8 commented on GitHub (Sep 25, 2024):

I believe it's to do with PCI specification - see chat from earlier - #2453 (comment)

If this is a hardware issue then I guess there is no way to solve this?

<!-- gh-comment-id:2375249613 --> @kth8 commented on GitHub (Sep 25, 2024): > I believe it's to do with PCI specification - see chat from earlier - [#2453 (comment)](https://github.com/ollama/ollama/issues/2453#issuecomment-2369372398) If this is a hardware issue then I guess there is no way to solve this?
Author
Owner

@mnccouk commented on GitHub (Sep 25, 2024):

Only by upgrading motherboard and CPU, I think. (I'm no expert, only deduced this from what I've read, so, i'm open to other thoughts.)

Your logs list your system with detected chipset - https://www.intel.com/content/www/us/en/products/sku/66416/intel-c216-chipset/specifications.html
https://www.intel.com/content/www/us/en/products/sku/65693/intel-core-i33220-processor-3m-cache-3-30-ghz/specifications.html

Which both are PCIe v2.0,

<!-- gh-comment-id:2375282403 --> @mnccouk commented on GitHub (Sep 25, 2024): Only by upgrading motherboard and CPU, I think. (I'm no expert, only deduced this from what I've read, so, i'm open to other thoughts.) Your logs list your system with detected chipset - https://www.intel.com/content/www/us/en/products/sku/66416/intel-c216-chipset/specifications.html https://www.intel.com/content/www/us/en/products/sku/65693/intel-core-i33220-processor-3m-cache-3-30-ghz/specifications.html Which both are PCIe v2.0,
Author
Owner

@Tamila-2017 commented on GitHub (Sep 26, 2024):

Guys, let me express my humble opinion, perhaps erroneous.
Yes, outdated hardware can cause and failure ROCm to at it work.
But it should not affect ROCm assembly, i.e. ordinary compilation, imho.

<!-- gh-comment-id:2376842239 --> @Tamila-2017 commented on GitHub (Sep 26, 2024): Guys, let me express my humble opinion, perhaps erroneous. Yes, outdated hardware can cause and failure ROCm to at it work. But it should not affect ROCm assembly, i.e. ordinary compilation, imho.
Author
Owner

@Tamila-2017 commented on GitHub (Sep 27, 2024):

About Atomics and Motherboards: https://github.com/ROCm/ROCm/issues/1146#issuecomment-758624560

Very sad information.... It turns out that ROCm is a capricious and unreliable project :-(

<!-- gh-comment-id:2380289874 --> @Tamila-2017 commented on GitHub (Sep 27, 2024): About Atomics and Motherboards: https://github.com/ROCm/ROCm/issues/1146#issuecomment-758624560 Very sad information.... It turns out that ROCm is a capricious and unreliable project :-(
Author
Owner

@Tamila-2017 commented on GitHub (Sep 29, 2024):

My Test NVIDIA Graphics Cards Performance:

RTX-2080-TI RTX-3080

Of course, modern NVIDIA graphics cards work fast.
But modern AMD cards also have good performance

AMD's problem lies in a completely different, non-caring attitude: the disgusting support for video cards.
As a result, the ROCm project is in a terribly moody state. It requires special motherboards with an undocumented Atomics option for the PCI bus, which their manufacturers usually do not report in their specifications. Therefore, it is impossible to guess in advance whether any motherboard will work or not.

The Ollama project also doesn't give a damn about old AMD graphics cards, paying attention only to new models.

As a result, I completely wasted a lot of time trying to get ROCm to work on the RX580, but I didn't get any of it.

I am completely disappointed in AMD and ROCm for it.
Therefore, I stop any attempts to work with the terrible AMD platform and completely switch to NVIDIA, which does not have these problems and everything is configured in a few minutes and works stably.

Good luck to you, guys! 💓

<!-- gh-comment-id:2381593347 --> @Tamila-2017 commented on GitHub (Sep 29, 2024): **My Test NVIDIA Graphics Cards Performance:** ![RTX-2080-TI](https://github.com/user-attachments/assets/ec6650c8-9ce6-4f6c-9db6-ae5a27b99e3e) ![RTX-3080](https://github.com/user-attachments/assets/fba34ede-7774-4fd6-8fa4-f37a10af5718) Of course, modern NVIDIA graphics cards work fast. But modern AMD cards also have good performance AMD's problem lies in a completely different, non-caring attitude: the disgusting support for video cards. As a result, the ROCm project is in a terribly moody state. It requires special motherboards with an undocumented Atomics option for the PCI bus, which their manufacturers usually do not report in their specifications. Therefore, it is impossible to guess in advance whether any motherboard will work or not. The Ollama project also doesn't give a damn about old AMD graphics cards, paying attention only to new models. As a result, I completely wasted a lot of time trying to get ROCm to work on the RX580, but I didn't get any of it. I am completely disappointed in AMD and ROCm for it. Therefore, I stop any attempts to work with the terrible AMD platform and completely switch to NVIDIA, which does not have these problems and everything is configured in a few minutes and works stably. Good luck to you, guys! 💓
Author
Owner

@T-Shilov commented on GitHub (Oct 8, 2024):

Hello everyone,

I have read this topic carefully, and I want to buy a PRIME X370-PRO motherboard.
Do you think it will be able to work with two AMD RX580 graphics cards?

<!-- gh-comment-id:2399612651 --> @T-Shilov commented on GitHub (Oct 8, 2024): Hello everyone, I have read this [topic](https://github.com/ROCm/ROCm/issues/1146) carefully, and I want to buy a [PRIME X370-PRO](https://www.asus.com/supportonly/prime%20x370-pro/helpdesk_cpu/) motherboard. Do you think it will be able to work with **two** AMD RX580 graphics cards?
Author
Owner

@Darin755 commented on GitHub (Oct 8, 2024):

Hello everyone,

I have read this topic carefully, and I want to buy a PRIME X370-PRO motherboard. Do you think it will be able to work with two AMD RX580 graphics cards?

It should work the big question is what kind of performance you will get. That will depend heavily on the CPU and motherboard. For optimal performance you want two x16 slots but the board you have is either a single x16 or dual x8. That might be fine for your needs but it is something you should keep in mind.

If you want something that can handle two cards you want a server board with a server CPU which can be pretty expensive. The other option would be to just build two systems. Either way this issue is not really the place for this so I would recommend that you open up a new discussion.

<!-- gh-comment-id:2399975936 --> @Darin755 commented on GitHub (Oct 8, 2024): > Hello everyone, > > I have read this [topic](https://github.com/ROCm/ROCm/issues/1146) carefully, and I want to buy a [PRIME X370-PRO](https://www.asus.com/supportonly/prime%20x370-pro/helpdesk_cpu/) motherboard. Do you think it will be able to work with **two** AMD RX580 graphics cards? It should work the big question is what kind of performance you will get. That will depend heavily on the CPU and motherboard. For optimal performance you want two x16 slots but the board you have is either a single x16 or dual x8. That might be fine for your needs but it is something you should keep in mind. If you want something that can handle two cards you want a server board with a server CPU which can be pretty expensive. The other option would be to just build two systems. Either way this issue is not really the place for this so I would recommend that you open up a new discussion.
Author
Owner

@T-Shilov commented on GitHub (Oct 8, 2024):

@Darin755,

For optimal performance you want two x16 slots but the board you have is either a single x16 or dual x8.

2 x PCI Express 3 0

Thank you for your important observation, thank you!
So, the PRIME X370-PRO for the two RX580s is not suitable :-(
Will need another motherboard, and I would appreciate it if you could tell me its model.

That will depend heavily on the CPU

I'm sorry, but is AI performance dependent on the CPU?
After all, graphics cards are used here, which provide the main performance of AI.
Take a look at this, please

Regarding the question of the other thread. The use of RX580 is closely related to the type of motherboard, and it was already discussed above. So I also have the right to discuss here the choice of motherboard for the RX580, Isn't that right?
However, I took your advice into account and asked my question here as well.

<!-- gh-comment-id:2400275829 --> @T-Shilov commented on GitHub (Oct 8, 2024): @Darin755, > For optimal performance you want two x16 slots but the board you have is either a single x16 or dual x8. ![2 x PCI Express 3 0](https://github.com/user-attachments/assets/75769e3a-d371-4688-b742-da2d79262cf2) Thank you for your important observation, thank you! So, the PRIME X370-PRO for the two RX580s is not suitable :-( Will need another motherboard, and I would appreciate it if you could tell me its model. > That will depend heavily on the CPU I'm sorry, but is AI performance dependent on the CPU? After all, graphics cards are used here, which provide the main performance of AI. Take a look at [this]( https://github.com/ollama/ollama/issues/2453#issuecomment-2375210546), please Regarding the question of the other thread. The use of RX580 is closely related to the type of motherboard, and it was already discussed [above](https://github.com/ollama/ollama/issues/2453#issuecomment-2371070242). So I also have the right to discuss here the choice of motherboard for the RX580, Isn't that right? However, I took your advice into account and asked my question [here](https://github.com/ROCm/ROCm/issues/1146#issuecomment-2400184705) as well.
Author
Owner

@T-Shilov commented on GitHub (Oct 10, 2024):

mnccouk,
I used Ubuntu 24.04 and your handy script: https://github.com/ollama/ollama/issues/2453#issuecomment-2366982275
However, when executing this script, several errors occurred and the script stopped:

<skip>
update-alternatives: using /usr/lib/shim/shimx64.efi.signed.latest to provide /usr/lib/shim/shimx64.efi.signed (shimx64.e
fi.signed) in auto mode
Generating a new Secure Boot signing key:
Can't load /var/lib/shim-signed/mok/.rnd into RNG
40474B4D84740000:error:12000079:random number generator:RAND_load_file:Cannot open file:../crypto/rand/randfile.c:106:Fil
ename=/var/lib/shim-signed/mok/.rnd
<skip>

EFI variables are not supported on this system
/sys/firmware/efi/efivars not found, aborting.
Setting up libgprofng0:amd64 (2.42-4ubuntu2) ...
<skip>

Setting up dkms (3.0.11-1ubuntu13) ...
Setting up amdgpu-dkms (1:6.2.4.50701-1664922.22.04) ...
Loading new amdgpu-6.2.4-1664922.22.04 DKMS files...
Building for 6.8.0-45-generic
Building for architecture x86_64
Building initial module for 6.8.0-45-generic
Error! Bad return status for module build on kernel: 6.8.0-45-generic (x86_64)
Consult /var/lib/dkms/amdgpu/6.2.4-1664922.22.04/build/make.log for more information.
dpkg: error processing package amdgpu-dkms (--configure):
 installed amdgpu-dkms package post-installation script subprocess returned error exit status 10
Setting up g++-x86-64-linux-gnu (4:13.2.0-7ubuntu1) ...
Setting up g++-13 (13.2.0-23ubuntu4) ...
Setting up g++ (4:13.2.0-7ubuntu1) ...
update-alternatives: using /usr/bin/g++ to provide /usr/bin/c++ (c++) in auto mode
Setting up build-essential (12.10ubuntu1) ...
Processing triggers for libc-bin (2.39-0ubuntu8.3) ...
Processing triggers for man-db (2.12.0-4build2) ...
Processing triggers for install-info (7.1-3build2) ...
Errors were encountered while processing:
 amdgpu-dkms
E: Sub-process /usr/bin/dpkg returned an error code (1)

Then I used an older Ubuntu 22.04.3 LTS, but other errors occurred and the script stopped again:

<skip>
Loading new amdgpu-6.2.4-1664922.22.04 DKMS files...
Building for 6.8.0-45-generic
Building for architecture x86_64
Building initial module for 6.8.0-45-generic
configure: error: cannot detect CFLAGS...
Error! Bad return status for module build on kernel: 6.8.0-45-generic (x86_64)
Consult /var/lib/dkms/amdgpu/6.2.4-1664922.22.04/build/make.log for more information.
dpkg: error processing package amdgpu-dkms (--configure):
 installed amdgpu-dkms package post-installation script subprocess returned error exit status 10
Setting up g++ (4:11.2.0-1ubuntu1) ...
update-alternatives: using /usr/bin/g++ to provide /usr/bin/c++ (c++) in auto mode
Setting up build-essential (12.9ubuntu3) ...
Processing triggers for libc-bin (2.35-0ubuntu3.1) ...
Processing triggers for man-db (2.10.2-1) ...
Processing triggers for install-info (6.8-4build1) ...
Errors were encountered while processing:
 amdgpu-dkms
E: Sub-process /usr/bin/dpkg returned an error code (1)

Can you please advise me on how this unfortunate situation can be remedied?

<!-- gh-comment-id:2406145190 --> @T-Shilov commented on GitHub (Oct 10, 2024): mnccouk, I used Ubuntu 24.04 and your handy script: https://github.com/ollama/ollama/issues/2453#issuecomment-2366982275 However, when executing this script, several errors occurred and the script stopped: ``` <skip> update-alternatives: using /usr/lib/shim/shimx64.efi.signed.latest to provide /usr/lib/shim/shimx64.efi.signed (shimx64.e fi.signed) in auto mode Generating a new Secure Boot signing key: Can't load /var/lib/shim-signed/mok/.rnd into RNG 40474B4D84740000:error:12000079:random number generator:RAND_load_file:Cannot open file:../crypto/rand/randfile.c:106:Fil ename=/var/lib/shim-signed/mok/.rnd <skip> EFI variables are not supported on this system /sys/firmware/efi/efivars not found, aborting. Setting up libgprofng0:amd64 (2.42-4ubuntu2) ... <skip> Setting up dkms (3.0.11-1ubuntu13) ... Setting up amdgpu-dkms (1:6.2.4.50701-1664922.22.04) ... Loading new amdgpu-6.2.4-1664922.22.04 DKMS files... Building for 6.8.0-45-generic Building for architecture x86_64 Building initial module for 6.8.0-45-generic Error! Bad return status for module build on kernel: 6.8.0-45-generic (x86_64) Consult /var/lib/dkms/amdgpu/6.2.4-1664922.22.04/build/make.log for more information. dpkg: error processing package amdgpu-dkms (--configure): installed amdgpu-dkms package post-installation script subprocess returned error exit status 10 Setting up g++-x86-64-linux-gnu (4:13.2.0-7ubuntu1) ... Setting up g++-13 (13.2.0-23ubuntu4) ... Setting up g++ (4:13.2.0-7ubuntu1) ... update-alternatives: using /usr/bin/g++ to provide /usr/bin/c++ (c++) in auto mode Setting up build-essential (12.10ubuntu1) ... Processing triggers for libc-bin (2.39-0ubuntu8.3) ... Processing triggers for man-db (2.12.0-4build2) ... Processing triggers for install-info (7.1-3build2) ... Errors were encountered while processing: amdgpu-dkms E: Sub-process /usr/bin/dpkg returned an error code (1) ``` Then I used an older Ubuntu 22.04.3 LTS, but other errors occurred and the script stopped again: ``` <skip> Loading new amdgpu-6.2.4-1664922.22.04 DKMS files... Building for 6.8.0-45-generic Building for architecture x86_64 Building initial module for 6.8.0-45-generic configure: error: cannot detect CFLAGS... Error! Bad return status for module build on kernel: 6.8.0-45-generic (x86_64) Consult /var/lib/dkms/amdgpu/6.2.4-1664922.22.04/build/make.log for more information. dpkg: error processing package amdgpu-dkms (--configure): installed amdgpu-dkms package post-installation script subprocess returned error exit status 10 Setting up g++ (4:11.2.0-1ubuntu1) ... update-alternatives: using /usr/bin/g++ to provide /usr/bin/c++ (c++) in auto mode Setting up build-essential (12.9ubuntu3) ... Processing triggers for libc-bin (2.35-0ubuntu3.1) ... Processing triggers for man-db (2.10.2-1) ... Processing triggers for install-info (6.8-4build1) ... Errors were encountered while processing: amdgpu-dkms E: Sub-process /usr/bin/dpkg returned an error code (1) ``` Can you please advise me on how this unfortunate situation can be remedied?
Author
Owner

@T-Shilov commented on GitHub (Oct 11, 2024):

This topic was created on February 12. It's been 310 days already!
It contains a lot of questions and various tips on using AMD RX580.
For example, there was created a handy docker for using Ollama.

However, I studied this topic carefully, and realized that the main problem here is a failed installation of ROCm 5.7.1.
After that I tried using the most promising tips for installing ROCm 5.7.1, but unfortunately none of them work.

Well, can someone please courage the trouble to summarize and create a how to for a RELIABLE installation of Rock 5.7.1? :-)

<!-- gh-comment-id:2407547842 --> @T-Shilov commented on GitHub (Oct 11, 2024): This topic was created on February 12. It's been 310 days already! It contains a lot of questions and various tips on using AMD RX580. For example, there was created a handy docker for using Ollama. However, I studied this topic carefully, and realized that the main problem here is a failed installation of ROCm 5.7.1. After that I tried using the most promising tips for installing ROCm 5.7.1, but unfortunately none of them work. Well, can someone please courage the trouble to summarize and create a how to for a RELIABLE installation of Rock 5.7.1? :-)
Author
Owner

@mnccouk commented on GitHub (Oct 11, 2024):

Regarding your error:

Check inside your build log -
/var/lib/dkms/amdgpu/6.2.4-1664922.22.04/build/make.log

See if that gives you some extra information on why the build failed.

I noticed that you are using 24.04 of ubuntu, ROCm 5.7.1 drivers are
relatively old so there's a chance that something might not be compatible
between the two.

On Fri, 11 Oct 2024 at 15:34, T-Shilov @.***> wrote:

This topic was created on February 12. It's been 310 days already!
It contains a lot of questions and various tips on using AMD RX580. For
example, there was created a handy docker for using Ollama.

However, I studied this topic carefully, and realized that the main
problem here is a failed installation of ROCm 5.7.1.
After that I tried using the most promising tips for installing ROCm
5.7.1, but unfortunately none of them work.

Well, can someone please courage the trouble to summarize and create a how
to for a RELIABLE installation of Rock 5.7.1? :-)


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/2453#issuecomment-2407547842,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABOLJO4NJJQRC6BVXJDUMQDZ27OX7AVCNFSM6AAAAABDD3P55SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBXGU2DOOBUGI
.
You are receiving this because you were mentioned.Message ID:
@.***>

<!-- gh-comment-id:2408199726 --> @mnccouk commented on GitHub (Oct 11, 2024): Regarding your error: Check inside your build log - /var/lib/dkms/amdgpu/6.2.4-1664922.22.04/build/make.log See if that gives you some extra information on why the build failed. I noticed that you are using 24.04 of ubuntu, ROCm 5.7.1 drivers are relatively old so there's a chance that something might not be compatible between the two. On Fri, 11 Oct 2024 at 15:34, T-Shilov ***@***.***> wrote: > This topic was created on February 12. It's been 310 days already! > It contains a lot of questions and various tips on using AMD RX580. For > example, there was created a handy docker for using Ollama. > > However, I studied this topic carefully, and realized that the main > problem here is a failed installation of ROCm 5.7.1. > After that I tried using the most promising tips for installing ROCm > 5.7.1, but unfortunately none of them work. > > Well, can someone please courage the trouble to summarize and create a how > to for a RELIABLE installation of Rock 5.7.1? :-) > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/2453#issuecomment-2407547842>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABOLJO4NJJQRC6BVXJDUMQDZ27OX7AVCNFSM6AAAAABDD3P55SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBXGU2DOOBUGI> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >
Author
Owner

@T-Shilov commented on GitHub (Oct 11, 2024):

mnccouk, thank you the quick response.
But since my attempts were unsuccessful, during this time I have already to variant to the option using Ubuntu 20.04.
A little later I will tell you what happened with it.

<!-- gh-comment-id:2408243462 --> @T-Shilov commented on GitHub (Oct 11, 2024): mnccouk, thank you the quick response. But since my attempts were unsuccessful, during this time I have already to variant to the option using Ubuntu 20.04. A little later I will tell you what happened with it.
Author
Owner

@T-Shilov commented on GitHub (Oct 12, 2024):

So, since in Ubuntu 22.04 I was not able to overcome this vicious error -

Errors were encountered while processing:
 amdgpu-dkms
E: Sub-process /usr/bin/dpkg returned an error code (1)

I had to switch to Ubuntu 20.04.

I modified your script a bit using this documentation

It turned out to be 2 parts:

# Install ROCm 5.7.1 to Ubuntu 20.04.xx
# https://github.com/ollama/ollama/issues/2453
# https://github.com/ollama/ollama/issues/2453#issuecomment-2366982275
# https://rocm.docs.amd.com/en/docs-5.7.0/deploy/linux/os-native/install.html
# Here change 5.7.0 на 5.7.1
#---------------------------

# Part 1

sudo apt install  curl dialog mc htop openssh-server
echo

# 1. Download and convert the package signing key
# -----------------------------------------------

sudo mkdir --parents --mode=0755 /etc/apt/keyrings
echo

wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | \
    gpg --dearmor | sudo tee /etc/apt/keyrings/rocm.gpg > /dev/null
echo

# 2. Add the AMDGPU Repository and Install the Kernel-mode Driver
# ---------------------------------------------------------------

# version
ver=5.7.1

# amdgpu repository for focal
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/$ver/ubuntu focal main" \
    | sudo tee /etc/apt/sources.list.d/amdgpu.list
sudo apt update
echo

sudo apt install amdgpu-dkms
echo

sleep 24h	# Press Ctrl-C

sudo reboot

and

# Install ROCm 5.7.1 to Ubuntu 20.04.xx
# https://github.com/ollama/ollama/issues/2453
# https://github.com/ollama/ollama/issues/2453#issuecomment-2366982275
# https://rocm.docs.amd.com/en/docs-5.7.0/deploy/linux/os-native/install.html
# Here 5.7.0 replaced на 5.7.1
#-----------------------------

# Part 2

# 3. Add the ROCm Repository
# --------------------------

# ROCm repositories for focal
for ver in 5.3.3 5.4.6 5.5.3 5.6.1 5.7 5.7.1; do
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/$ver focal main" \
    | sudo tee --append /etc/apt/sources.list.d/rocm.list
done
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
    | sudo tee /etc/apt/preferences.d/rocm-pin-600
echo

sudo apt update
echo

# 4. Install packages
# -------------------

sudo apt install rocm-hip-sdk   # if that didn't work try -> sudo apt install rocm-hip-sdk5.7.1
echo

# Post-install Actions
# --------------------
# Think these may already be installed through deps, but no harm in trying 
# sudo apt install rocm-utils #if that didn't work again try-> sudo apt install rocm-utils5.7.1

sudo tee --append /etc/ld.so.conf.d/rocm.conf <<EOF
/opt/rocm/lib
/opt/rocm/lib64
EOF
echo

sudo ldconfig
echo

# You'll need to add this in you user profile, this will work if you stay in the same terminal
export PATH=$PATH:/opt/rocm-5.7.1/bin:/opt/rocm-5.7.1/opencl/bin
echo

# Verifying Kernel-mode Driver Installation
dkms status
echo

# Warning!
# Here are my login = ai
sudo usermod -a -G render  ai  # <--- My login
echo

# I still had permissions issue so brute forced access
sudo chmod 666 /dev/kfd
echo

# To verify the libraries/driver are/is working
/opt/rocm/bin/rocminfo
echo

echo "Completion of ROCm 5.7.1 building"
echo

The obtained results are in these attachments:

Install-ROCm-571-Ubuntu-20.04-Part-1.pdf
Install-ROCm-571-Ubuntu-20.04-Part-2.pdf

I marked the detected errors in red color.

mnccouk, please take a look at them. My concern is why the RX580 is not detected.

Although when I run Ollama it reports:

>>> AMD GPU ready.

Start-Ollama.pdf

<!-- gh-comment-id:2408616555 --> @T-Shilov commented on GitHub (Oct 12, 2024): So, since in Ubuntu 22.04 I was not able to overcome this vicious error - ``` Errors were encountered while processing: amdgpu-dkms E: Sub-process /usr/bin/dpkg returned an error code (1) ``` I had to switch to Ubuntu 20.04. I modified your script a bit using [this documentation](https://rocm.docs.amd.com/en/docs-5.7.0/deploy/linux/os-native/install.html) It turned out to be 2 parts: ``` # Install ROCm 5.7.1 to Ubuntu 20.04.xx # https://github.com/ollama/ollama/issues/2453 # https://github.com/ollama/ollama/issues/2453#issuecomment-2366982275 # https://rocm.docs.amd.com/en/docs-5.7.0/deploy/linux/os-native/install.html # Here change 5.7.0 на 5.7.1 #--------------------------- # Part 1 sudo apt install curl dialog mc htop openssh-server echo # 1. Download and convert the package signing key # ----------------------------------------------- sudo mkdir --parents --mode=0755 /etc/apt/keyrings echo wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | \ gpg --dearmor | sudo tee /etc/apt/keyrings/rocm.gpg > /dev/null echo # 2. Add the AMDGPU Repository and Install the Kernel-mode Driver # --------------------------------------------------------------- # version ver=5.7.1 # amdgpu repository for focal echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/$ver/ubuntu focal main" \ | sudo tee /etc/apt/sources.list.d/amdgpu.list sudo apt update echo sudo apt install amdgpu-dkms echo sleep 24h # Press Ctrl-C sudo reboot ``` and ``` # Install ROCm 5.7.1 to Ubuntu 20.04.xx # https://github.com/ollama/ollama/issues/2453 # https://github.com/ollama/ollama/issues/2453#issuecomment-2366982275 # https://rocm.docs.amd.com/en/docs-5.7.0/deploy/linux/os-native/install.html # Here 5.7.0 replaced на 5.7.1 #----------------------------- # Part 2 # 3. Add the ROCm Repository # -------------------------- # ROCm repositories for focal for ver in 5.3.3 5.4.6 5.5.3 5.6.1 5.7 5.7.1; do echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/$ver focal main" \ | sudo tee --append /etc/apt/sources.list.d/rocm.list done echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \ | sudo tee /etc/apt/preferences.d/rocm-pin-600 echo sudo apt update echo # 4. Install packages # ------------------- sudo apt install rocm-hip-sdk # if that didn't work try -> sudo apt install rocm-hip-sdk5.7.1 echo # Post-install Actions # -------------------- # Think these may already be installed through deps, but no harm in trying # sudo apt install rocm-utils #if that didn't work again try-> sudo apt install rocm-utils5.7.1 sudo tee --append /etc/ld.so.conf.d/rocm.conf <<EOF /opt/rocm/lib /opt/rocm/lib64 EOF echo sudo ldconfig echo # You'll need to add this in you user profile, this will work if you stay in the same terminal export PATH=$PATH:/opt/rocm-5.7.1/bin:/opt/rocm-5.7.1/opencl/bin echo # Verifying Kernel-mode Driver Installation dkms status echo # Warning! # Here are my login = ai sudo usermod -a -G render ai # <--- My login echo # I still had permissions issue so brute forced access sudo chmod 666 /dev/kfd echo # To verify the libraries/driver are/is working /opt/rocm/bin/rocminfo echo echo "Completion of ROCm 5.7.1 building" echo ``` The obtained results are in these attachments: [Install-ROCm-571-Ubuntu-20.04-Part-1.pdf](https://github.com/user-attachments/files/17350620/Install-ROCm-571-Ubuntu-20.04-Part-1.pdf) [Install-ROCm-571-Ubuntu-20.04-Part-2.pdf](https://github.com/user-attachments/files/17350621/Install-ROCm-571-Ubuntu-20.04-Part-2.pdf) I marked the detected errors in red color. mnccouk, please take a look at them. My concern is why the RX580 is not detected. Although when I run Ollama it reports: **>>> AMD GPU ready.** [Start-Ollama.pdf](https://github.com/user-attachments/files/17350646/Start-Ollama.pdf)
Author
Owner

@mnccouk commented on GitHub (Oct 12, 2024):

I've built a docker image that you can try, see - https://github.com/ollama/ollama/issues/2453#issuecomment-2362217923

This was built to detect the older rx580 card, with the prerequisite of making sure the ROCm driver module is already installed on the host and working - This is the step you've been working on.

Also after a fresh reboot, execute the following command:-

sudo dmesg | grep -e amdgpu -e drm

Look through the output for any errors, if the amdgpu module looks to be loaded ok, try Ollama from the docker image.

<!-- gh-comment-id:2408680651 --> @mnccouk commented on GitHub (Oct 12, 2024): I've built a docker image that you can try, see - https://github.com/ollama/ollama/issues/2453#issuecomment-2362217923 This was built to detect the older rx580 card, with the prerequisite of making sure the ROCm driver module is already installed on the host and working - This is the step you've been working on. Also after a fresh reboot, execute the following command:- sudo dmesg | grep -e amdgpu -e drm Look through the output for any errors, if the amdgpu module looks to be loaded ok, try Ollama from the docker image.
Author
Owner

@T-Shilov commented on GitHub (Oct 12, 2024):

Thanks for the advice, only I didn't understand the necessary sequence of steps I should follow.
Do I need to first uninstall my futile attempts to install ROCm 5.7.1 on Ubuntu 20.04, and then install your docker on a clean Ubuntu?
Or do these actions have to be performed in some other sequence?
Could you please explain in more detail the right sequence of my actions, step by step?

<!-- gh-comment-id:2408684165 --> @T-Shilov commented on GitHub (Oct 12, 2024): Thanks for the advice, only I didn't understand the necessary sequence of steps I should follow. Do I need to first uninstall my futile attempts to install ROCm 5.7.1 on Ubuntu 20.04, and then install your docker on a clean Ubuntu? Or do these actions have to be performed in some other sequence? Could you please explain in more detail the right sequence of my actions, step by step?
Author
Owner

@mnccouk commented on GitHub (Oct 12, 2024):

The sequence is:-

  1. Install the ROCm drivers on your machine, which is what you have been
    working on. It's still not verified if what you have done up to now is
    working.

Seeing the output of the dmesg command will help verify this.
If the drivers are installed and working then move to the next step.

  1. Then fire up the docker container(make sure docker is installed first).
    Follow the link already provided for some instructions on how to start the
    container.

On Sat, 12 Oct 2024 at 20:57, T-Shilov @.***> wrote:

Thanks for the advice, only I didn't understand the necessary sequence of
steps I should follow.
Do I need to first uninstall my futile attempts to install ROCm 5.7.1 on
Ubuntu 20.04, and then install your docker on a clean Ubuntu?
Or do these actions have to be performed in some other sequence?
Could you please explain in more detail the right sequence of my actions,
step by step??


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/2453#issuecomment-2408684165,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABOLJO4MZFZXSKHV7HHAK3LZ3F5JRAVCNFSM6AAAAABDD3P55SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBYGY4DIMJWGU
.
You are receiving this because you were mentioned.Message ID:
@.***>

<!-- gh-comment-id:2408691768 --> @mnccouk commented on GitHub (Oct 12, 2024): The sequence is:- 1) Install the ROCm drivers on your machine, which is what you have been working on. It's still not verified if what you have done up to now is working. Seeing the output of the dmesg command will help verify this. If the drivers are installed and working then move to the next step. 2) Then fire up the docker container(make sure docker is installed first). Follow the link already provided for some instructions on how to start the container. On Sat, 12 Oct 2024 at 20:57, T-Shilov ***@***.***> wrote: > Thanks for the advice, only I didn't understand the necessary sequence of > steps I should follow. > Do I need to first uninstall my futile attempts to install ROCm 5.7.1 on > Ubuntu 20.04, and then install your docker on a clean Ubuntu? > Or do these actions have to be performed in some other sequence? > Could you please explain in more detail the right sequence of my actions, > step by step?? > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/2453#issuecomment-2408684165>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABOLJO4MZFZXSKHV7HHAK3LZ3F5JRAVCNFSM6AAAAABDD3P55SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBYGY4DIMJWGU> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >
Author
Owner

@T-Shilov commented on GitHub (Oct 12, 2024):

Thanks for the tip. Please, here is the output of your command:

$ sudo dmesg | grep -e amdgpu -e drm
[sudo] password for ai: 
[    1.444548] [drm] amdgpu kernel modesetting enabled.
[    1.444551] [drm] amdgpu version: 6.2.4
[    1.444553] [drm] OS DRM version: 5.15.0
[    1.444656] amdgpu: CRAT table not found
[    1.444660] amdgpu: Virtual CRAT table created for CPU
[    1.444669] amdgpu: Topology: Add CPU node
[    1.477640] amdgpu: PeerDirect support was initialized successfully
[    1.477780] [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67DF 0x1DA2:0xE366 0xE7).
[    1.477794] [drm] register mmio base: 0xFE600000
[    1.477795] [drm] register mmio size: 262144
[    1.477827] [drm] add ip block number 0 <vi_common>
[    1.477828] [drm] add ip block number 1 <gmc_v8_0>
[    1.477829] [drm] add ip block number 2 <tonga_ih>
[    1.477830] [drm] add ip block number 3 <gfx_v8_0>
[    1.477831] [drm] add ip block number 4 <sdma_v3_0>
[    1.477831] [drm] add ip block number 5 <powerplay>
[    1.477832] [drm] add ip block number 6 <dm>
[    1.477833] [drm] add ip block number 7 <uvd_v6_0>
[    1.477834] [drm] add ip block number 8 <vce_v3_0>
[    1.478008] amdgpu 0000:01:00.0: No more image in the PCI ROM
[    1.478024] amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from ROM BAR
[    1.478026] amdgpu: ATOM BIOS: 113-1E3660U-O51
[    1.478038] [drm] UVD is enabled in VM mode
[    1.478039] [drm] UVD ENC is enabled in VM mode
[    1.478040] [drm] VCE enabled in VM mode
[    1.478137] amdgpu 0000:01:00.0: vgaarb: deactivate vga console
[    1.478139] amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[    1.478141] amdgpu 0000:01:00.0: amdgpu: PCIE atomic ops is not supported
[    1.478165] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[    1.478205] amdgpu 0000:01:00.0: amdgpu: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
[    1.478209] amdgpu 0000:01:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[    1.478216] [drm] Detected VRAM RAM=8192M, BAR=256M
[    1.478217] [drm] RAM width 256bits GDDR5
[    1.478253] [drm] amdgpu: 8192M of VRAM memory ready
[    1.478254] [drm] amdgpu: 7979M of GTT memory ready.
[    1.478268] [drm] GART: num cpu pages 65536, num gpu pages 65536
[    1.478650] [drm] PCIE GART of 256M enabled (table at 0x000000F400800000).
[    1.478739] [drm] Chained IB support enabled!
[    1.479263] amdgpu: [powerplay] hwmgr_sw_init smu backed is polaris10_smu
[    1.479346] [drm] Found UVD firmware Version: 1.130 Family ID: 16
[    1.479889] [drm] Found VCE firmware Version: 53.26 Binary ID: 3
[    1.542400] [drm] Display Core v3.2.241 initialized on DCE 11.2
[    1.610063] [drm] UVD and UVD ENC initialized successfully.
[    1.720002] [drm] VCE initialized successfully.
[    1.720582] kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics 730<0
[    1.720594] amdgpu 0000:01:00.0: amdgpu: SE 4, SH per SE 1, CU per SH 9, active_cu_number 36
[    1.723664] amdgpu: legacy kernel without apple_gmux_detect()
[    1.723667] amdgpu 0000:01:00.0: amdgpu: Using BACO for runtime pm
[    1.723958] [drm] Initialized amdgpu 3.54.0 20150101 for 0000:01:00.0 on minor 0
[    1.735126] fbcon: amdgpudrmfb (fb0) is primary device
[    1.823364] amdgpu 0000:01:00.0: [drm] fb0: amdgpudrmfb frame buffer device
[    2.432046] systemd[1]: Condition check resulted in Load Kernel Module drm being skipped.
[    3.023163] snd_hda_intel 0000:01:00.1: bound 0000:01:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
[ 9736.893119] amdgpu 0000:01:00.0: amdgpu: PCI CONFIG reset
[ 9737.231304] [drm] PCIE GART of 256M enabled (table at 0x000000F400800000).
[ 9737.923097] [drm] Fence fallback timer expired on ring sdma0
[ 9738.435083] [drm] Fence fallback timer expired on ring sdma0
[ 9738.947040] [drm] Fence fallback timer expired on ring sdma0
[ 9739.459030] [drm] Fence fallback timer expired on ring sdma0
[ 9739.970994] [drm] Fence fallback timer expired on ring sdma0
[ 9740.482986] [drm] Fence fallback timer expired on ring sdma0
[ 9740.530084] [drm] UVD and UVD ENC initialized successfully.
[ 9740.640068] [drm] VCE initialized successfully.

Is this normal or not? I'm very worried about it.

<!-- gh-comment-id:2408694404 --> @T-Shilov commented on GitHub (Oct 12, 2024): Thanks for the tip. Please, here is the output of your command: ``` $ sudo dmesg | grep -e amdgpu -e drm [sudo] password for ai: [ 1.444548] [drm] amdgpu kernel modesetting enabled. [ 1.444551] [drm] amdgpu version: 6.2.4 [ 1.444553] [drm] OS DRM version: 5.15.0 [ 1.444656] amdgpu: CRAT table not found [ 1.444660] amdgpu: Virtual CRAT table created for CPU [ 1.444669] amdgpu: Topology: Add CPU node [ 1.477640] amdgpu: PeerDirect support was initialized successfully [ 1.477780] [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67DF 0x1DA2:0xE366 0xE7). [ 1.477794] [drm] register mmio base: 0xFE600000 [ 1.477795] [drm] register mmio size: 262144 [ 1.477827] [drm] add ip block number 0 <vi_common> [ 1.477828] [drm] add ip block number 1 <gmc_v8_0> [ 1.477829] [drm] add ip block number 2 <tonga_ih> [ 1.477830] [drm] add ip block number 3 <gfx_v8_0> [ 1.477831] [drm] add ip block number 4 <sdma_v3_0> [ 1.477831] [drm] add ip block number 5 <powerplay> [ 1.477832] [drm] add ip block number 6 <dm> [ 1.477833] [drm] add ip block number 7 <uvd_v6_0> [ 1.477834] [drm] add ip block number 8 <vce_v3_0> [ 1.478008] amdgpu 0000:01:00.0: No more image in the PCI ROM [ 1.478024] amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from ROM BAR [ 1.478026] amdgpu: ATOM BIOS: 113-1E3660U-O51 [ 1.478038] [drm] UVD is enabled in VM mode [ 1.478039] [drm] UVD ENC is enabled in VM mode [ 1.478040] [drm] VCE enabled in VM mode [ 1.478137] amdgpu 0000:01:00.0: vgaarb: deactivate vga console [ 1.478139] amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported [ 1.478141] amdgpu 0000:01:00.0: amdgpu: PCIE atomic ops is not supported [ 1.478165] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit [ 1.478205] amdgpu 0000:01:00.0: amdgpu: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used) [ 1.478209] amdgpu 0000:01:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF [ 1.478216] [drm] Detected VRAM RAM=8192M, BAR=256M [ 1.478217] [drm] RAM width 256bits GDDR5 [ 1.478253] [drm] amdgpu: 8192M of VRAM memory ready [ 1.478254] [drm] amdgpu: 7979M of GTT memory ready. [ 1.478268] [drm] GART: num cpu pages 65536, num gpu pages 65536 [ 1.478650] [drm] PCIE GART of 256M enabled (table at 0x000000F400800000). [ 1.478739] [drm] Chained IB support enabled! [ 1.479263] amdgpu: [powerplay] hwmgr_sw_init smu backed is polaris10_smu [ 1.479346] [drm] Found UVD firmware Version: 1.130 Family ID: 16 [ 1.479889] [drm] Found VCE firmware Version: 53.26 Binary ID: 3 [ 1.542400] [drm] Display Core v3.2.241 initialized on DCE 11.2 [ 1.610063] [drm] UVD and UVD ENC initialized successfully. [ 1.720002] [drm] VCE initialized successfully. [ 1.720582] kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics 730<0 [ 1.720594] amdgpu 0000:01:00.0: amdgpu: SE 4, SH per SE 1, CU per SH 9, active_cu_number 36 [ 1.723664] amdgpu: legacy kernel without apple_gmux_detect() [ 1.723667] amdgpu 0000:01:00.0: amdgpu: Using BACO for runtime pm [ 1.723958] [drm] Initialized amdgpu 3.54.0 20150101 for 0000:01:00.0 on minor 0 [ 1.735126] fbcon: amdgpudrmfb (fb0) is primary device [ 1.823364] amdgpu 0000:01:00.0: [drm] fb0: amdgpudrmfb frame buffer device [ 2.432046] systemd[1]: Condition check resulted in Load Kernel Module drm being skipped. [ 3.023163] snd_hda_intel 0000:01:00.1: bound 0000:01:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu]) [ 9736.893119] amdgpu 0000:01:00.0: amdgpu: PCI CONFIG reset [ 9737.231304] [drm] PCIE GART of 256M enabled (table at 0x000000F400800000). [ 9737.923097] [drm] Fence fallback timer expired on ring sdma0 [ 9738.435083] [drm] Fence fallback timer expired on ring sdma0 [ 9738.947040] [drm] Fence fallback timer expired on ring sdma0 [ 9739.459030] [drm] Fence fallback timer expired on ring sdma0 [ 9739.970994] [drm] Fence fallback timer expired on ring sdma0 [ 9740.482986] [drm] Fence fallback timer expired on ring sdma0 [ 9740.530084] [drm] UVD and UVD ENC initialized successfully. [ 9740.640068] [drm] VCE initialized successfully. ``` Is this normal or not? I'm very worried about it.
Author
Owner

@T-Shilov commented on GitHub (Oct 12, 2024):

Next, I followed these steps (they may be wrong):

ai@ai-server:~$ docker run -e HIP_PATH=/opt/rocm/lib/ -e LD_LIBRARY_PATH=/opt/rocm/lib --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama_gpu mnccouk/ollama-gpu-rx580:latest
Unable to find image 'mnccouk/ollama-gpu-rx580:latest' locally
latest: Pulling from mnccouk/ollama-gpu-rx580
6414378b6477: Pull complete 
fb1befc2d817: Pull complete 
49eeff3f7e9e: Pull complete 
4c41849d42d1: Pull complete 
93a019927674: Pull complete 
1b36b680bea4: Pull complete 
f8799f698dba: Pull complete 
4c87b47c2c7b: Pull complete 
Digest: sha256:510718801f14a3a69763a192af61882d9dfe5bdea6dab83461f6a8a886dab358
Status: Downloaded newer image for mnccouk/ollama-gpu-rx580:latest
Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
Your new public key is: 

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICronOwazaSIQqdU/Pblye5mYUCT5OVXdgZe33dG8c/1

2024/10/12 21:09:09 routes.go:1153: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2024-10-12T21:09:09.862Z level=INFO source=images.go:753 msg="total blobs: 0"
time=2024-10-12T21:09:09.862Z level=INFO source=images.go:760 msg="total unused blobs removed: 0"
time=2024-10-12T21:09:09.862Z level=INFO source=routes.go:1200 msg="Listening on [::]:11434 (version 0.0.0)"
time=2024-10-12T21:09:09.863Z level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu_avx cpu_avx2 rocm_v0 cpu]"
time=2024-10-12T21:09:09.863Z level=INFO source=gpu.go:199 msg="looking for compatible GPUs"
time=2024-10-12T21:09:09.864Z level=INFO source=amd_linux.go:361 msg="no compatible amdgpu devices detected"
time=2024-10-12T21:09:09.865Z level=INFO source=gpu.go:346 msg="no compatible GPUs were discovered"
time=2024-10-12T21:09:09.865Z level=INFO source=types.go:107 msg="inference compute" id=0 library=cpu variant=avx compute="" driver=0.0 name="" total="15.6 GiB" available="14.6 GiB"

ai@ai-server:~$ docker exec -it ollama_gpu ollama run llama3.1
pulling manifest 
pulling 8eeb52dfb3bb... 100% ▕████████████████████████████████████████████████████████████████████████▏ 4.7 GB                         
pulling 948af2743fc7... 100% ▕████████████████████████████████████████████████████████████████████████▏ 1.5 KB                         
pulling 0ba8f0e314b4... 100% ▕████████████████████████████████████████████████████████████████████████▏  12 KB                         
pulling 56bb8bd477a5... 100% ▕████████████████████████████████████████████████████████████████████████▏   96 B                         
pulling 1a4c3c319823... 100% ▕████████████████████████████████████████████████████████████████████████▏  485 B                         
verifying sha256 digest 
writing manifest 
success 
>>> /set verbose
Set 'verbose' mode.
>>> 
>>> Why is the sky blue?

total duration:       1m18.881610351s
load duration:        31.987745ms
prompt eval count:    16 token(s)
prompt eval duration: 2.473824s
prompt eval rate:     6.47 tokens/s
eval count:           306 token(s)
eval duration:        1m16.331472s
eval rate:            4.01 tokens/s
>>> Send a message (/? for help)

<!-- gh-comment-id:2408706026 --> @T-Shilov commented on GitHub (Oct 12, 2024): Next, I followed these steps (they may be wrong): ``` ai@ai-server:~$ docker run -e HIP_PATH=/opt/rocm/lib/ -e LD_LIBRARY_PATH=/opt/rocm/lib --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama_gpu mnccouk/ollama-gpu-rx580:latest Unable to find image 'mnccouk/ollama-gpu-rx580:latest' locally latest: Pulling from mnccouk/ollama-gpu-rx580 6414378b6477: Pull complete fb1befc2d817: Pull complete 49eeff3f7e9e: Pull complete 4c41849d42d1: Pull complete 93a019927674: Pull complete 1b36b680bea4: Pull complete f8799f698dba: Pull complete 4c87b47c2c7b: Pull complete Digest: sha256:510718801f14a3a69763a192af61882d9dfe5bdea6dab83461f6a8a886dab358 Status: Downloaded newer image for mnccouk/ollama-gpu-rx580:latest Couldn't find '/root/.ollama/id_ed25519'. Generating new private key. Your new public key is: ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICronOwazaSIQqdU/Pblye5mYUCT5OVXdgZe33dG8c/1 2024/10/12 21:09:09 routes.go:1153: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2024-10-12T21:09:09.862Z level=INFO source=images.go:753 msg="total blobs: 0" time=2024-10-12T21:09:09.862Z level=INFO source=images.go:760 msg="total unused blobs removed: 0" time=2024-10-12T21:09:09.862Z level=INFO source=routes.go:1200 msg="Listening on [::]:11434 (version 0.0.0)" time=2024-10-12T21:09:09.863Z level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu_avx cpu_avx2 rocm_v0 cpu]" time=2024-10-12T21:09:09.863Z level=INFO source=gpu.go:199 msg="looking for compatible GPUs" time=2024-10-12T21:09:09.864Z level=INFO source=amd_linux.go:361 msg="no compatible amdgpu devices detected" time=2024-10-12T21:09:09.865Z level=INFO source=gpu.go:346 msg="no compatible GPUs were discovered" time=2024-10-12T21:09:09.865Z level=INFO source=types.go:107 msg="inference compute" id=0 library=cpu variant=avx compute="" driver=0.0 name="" total="15.6 GiB" available="14.6 GiB" ai@ai-server:~$ docker exec -it ollama_gpu ollama run llama3.1 pulling manifest pulling 8eeb52dfb3bb... 100% ▕████████████████████████████████████████████████████████████████████████▏ 4.7 GB pulling 948af2743fc7... 100% ▕████████████████████████████████████████████████████████████████████████▏ 1.5 KB pulling 0ba8f0e314b4... 100% ▕████████████████████████████████████████████████████████████████████████▏ 12 KB pulling 56bb8bd477a5... 100% ▕████████████████████████████████████████████████████████████████████████▏ 96 B pulling 1a4c3c319823... 100% ▕████████████████████████████████████████████████████████████████████████▏ 485 B verifying sha256 digest writing manifest success >>> /set verbose Set 'verbose' mode. >>> >>> Why is the sky blue? total duration: 1m18.881610351s load duration: 31.987745ms prompt eval count: 16 token(s) prompt eval duration: 2.473824s prompt eval rate: 6.47 tokens/s eval count: 306 token(s) eval duration: 1m16.331472s eval rate: 4.01 tokens/s >>> Send a message (/? for help) ```
Author
Owner

@mnccouk commented on GitHub (Oct 12, 2024):

The docker container is just using the CPU there. Referring back to the
output of the dmesg command it looks like there is an issue with the ROCm
drivers using the PCI bus

This message from your log - kfd kfd: amdgpu: skipped device 1002:67df,
PCI rejects atomics 730<0
This is probably why your device is not being detected.

There is some discussion regarding this already in this thread, but I'm
afraid I don't know of a software solution around this issue. It seems as
if the ROCm driver is very particular about the PCI spec required.

On Sat, 12 Oct 2024 at 22:35, T-Shilov @.***> wrote:

Next, I followed these steps (they may be wrong):

@.***:~$ docker run -e HIP_PATH=/opt/rocm/lib/ -e LD_LIBRARY_PATH=/opt/rocm/lib --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama_gpu mnccouk/ollama-gpu-rx580:latest
Unable to find image 'mnccouk/ollama-gpu-rx580:latest' locally
latest: Pulling from mnccouk/ollama-gpu-rx580
6414378b6477: Pull complete
fb1befc2d817: Pull complete
49eeff3f7e9e: Pull complete
4c41849d42d1: Pull complete
93a019927674: Pull complete
1b36b680bea4: Pull complete
f8799f698dba: Pull complete
4c87b47c2c7b: Pull complete
Digest: sha256:510718801f14a3a69763a192af61882d9dfe5bdea6dab83461f6a8a886dab358
Status: Downloaded newer image for mnccouk/ollama-gpu-rx580:latest
Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
Your new public key is:

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICronOwazaSIQqdU/Pblye5mYUCT5OVXdgZe33dG8c/1

2024/10/12 21:09:09 routes.go:1153: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2024-10-12T21:09:09.862Z level=INFO source=images.go:753 msg="total blobs: 0"
time=2024-10-12T21:09:09.862Z level=INFO source=images.go:760 msg="total unused blobs removed: 0"
time=2024-10-12T21:09:09.862Z level=INFO source=routes.go:1200 msg="Listening on [::]:11434 (version 0.0.0)"
time=2024-10-12T21:09:09.863Z level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu_avx cpu_avx2 rocm_v0 cpu]"
time=2024-10-12T21:09:09.863Z level=INFO source=gpu.go:199 msg="looking for compatible GPUs"
time=2024-10-12T21:09:09.864Z level=INFO source=amd_linux.go:361 msg="no compatible amdgpu devices detected"
time=2024-10-12T21:09:09.865Z level=INFO source=gpu.go:346 msg="no compatible GPUs were discovered"
time=2024-10-12T21:09:09.865Z level=INFO source=types.go:107 msg="inference compute" id=0 library=cpu variant=avx compute="" driver=0.0 name="" total="15.6 GiB" available="14.6 GiB"

@.***:~$ docker exec -it ollama_gpu ollama run llama3.1
pulling manifest
pulling 8eeb52dfb3bb... 100% ▕████████████████████████████████████████████████████████████████████████▏ 4.7 GB
pulling 948af2743fc7... 100% ▕████████████████████████████████████████████████████████████████████████▏ 1.5 KB
pulling 0ba8f0e314b4... 100% ▕████████████████████████████████████████████████████████████████████████▏ 12 KB
pulling 56bb8bd477a5... 100% ▕████████████████████████████████████████████████████████████████████████▏ 96 B
pulling 1a4c3c319823... 100% ▕████████████████████████████████████████████████████████████████████████▏ 485 B
verifying sha256 digest
writing manifest
success

/set verbose
Set 'verbose' mode.

Why is the sky blue?

total duration: 1m18.881610351s
load duration: 31.987745ms
prompt eval count: 16 token(s)
prompt eval duration: 2.473824s
prompt eval rate: 6.47 tokens/s
eval count: 306 token(s)
eval duration: 1m16.331472s
eval rate: 4.01 tokens/s

Send a message (/? for help)


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/2453#issuecomment-2408706026,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABOLJO45K7AUID23ELFKNETZ3GIYNAVCNFSM6AAAAABDD3P55SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBYG4YDMMBSGY
.
You are receiving this because you were mentioned.Message ID:
@.***>

<!-- gh-comment-id:2408707547 --> @mnccouk commented on GitHub (Oct 12, 2024): The docker container is just using the CPU there. Referring back to the output of the dmesg command it looks like there is an issue with the ROCm drivers using the PCI bus This message from your log - kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics 730<0 This is probably why your device is not being detected. There is some discussion regarding this already in this thread, but I'm afraid I don't know of a software solution around this issue. It seems as if the ROCm driver is very particular about the PCI spec required. On Sat, 12 Oct 2024 at 22:35, T-Shilov ***@***.***> wrote: > Next, I followed these steps (they may be wrong): > > ***@***.***:~$ docker run -e HIP_PATH=/opt/rocm/lib/ -e LD_LIBRARY_PATH=/opt/rocm/lib --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama_gpu mnccouk/ollama-gpu-rx580:latest > Unable to find image 'mnccouk/ollama-gpu-rx580:latest' locally > latest: Pulling from mnccouk/ollama-gpu-rx580 > 6414378b6477: Pull complete > fb1befc2d817: Pull complete > 49eeff3f7e9e: Pull complete > 4c41849d42d1: Pull complete > 93a019927674: Pull complete > 1b36b680bea4: Pull complete > f8799f698dba: Pull complete > 4c87b47c2c7b: Pull complete > Digest: sha256:510718801f14a3a69763a192af61882d9dfe5bdea6dab83461f6a8a886dab358 > Status: Downloaded newer image for mnccouk/ollama-gpu-rx580:latest > Couldn't find '/root/.ollama/id_ed25519'. Generating new private key. > Your new public key is: > > ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICronOwazaSIQqdU/Pblye5mYUCT5OVXdgZe33dG8c/1 > > 2024/10/12 21:09:09 routes.go:1153: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" > time=2024-10-12T21:09:09.862Z level=INFO source=images.go:753 msg="total blobs: 0" > time=2024-10-12T21:09:09.862Z level=INFO source=images.go:760 msg="total unused blobs removed: 0" > time=2024-10-12T21:09:09.862Z level=INFO source=routes.go:1200 msg="Listening on [::]:11434 (version 0.0.0)" > time=2024-10-12T21:09:09.863Z level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu_avx cpu_avx2 rocm_v0 cpu]" > time=2024-10-12T21:09:09.863Z level=INFO source=gpu.go:199 msg="looking for compatible GPUs" > time=2024-10-12T21:09:09.864Z level=INFO source=amd_linux.go:361 msg="no compatible amdgpu devices detected" > time=2024-10-12T21:09:09.865Z level=INFO source=gpu.go:346 msg="no compatible GPUs were discovered" > time=2024-10-12T21:09:09.865Z level=INFO source=types.go:107 msg="inference compute" id=0 library=cpu variant=avx compute="" driver=0.0 name="" total="15.6 GiB" available="14.6 GiB" > > ***@***.***:~$ docker exec -it ollama_gpu ollama run llama3.1 > pulling manifest > pulling 8eeb52dfb3bb... 100% ▕████████████████████████████████████████████████████████████████████████▏ 4.7 GB > pulling 948af2743fc7... 100% ▕████████████████████████████████████████████████████████████████████████▏ 1.5 KB > pulling 0ba8f0e314b4... 100% ▕████████████████████████████████████████████████████████████████████████▏ 12 KB > pulling 56bb8bd477a5... 100% ▕████████████████████████████████████████████████████████████████████████▏ 96 B > pulling 1a4c3c319823... 100% ▕████████████████████████████████████████████████████████████████████████▏ 485 B > verifying sha256 digest > writing manifest > success > >>> /set verbose > Set 'verbose' mode. > >>> > >>> Why is the sky blue? > > total duration: 1m18.881610351s > load duration: 31.987745ms > prompt eval count: 16 token(s) > prompt eval duration: 2.473824s > prompt eval rate: 6.47 tokens/s > eval count: 306 token(s) > eval duration: 1m16.331472s > eval rate: 4.01 tokens/s > >>> Send a message (/? for help) > > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/2453#issuecomment-2408706026>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABOLJO45K7AUID23ELFKNETZ3GIYNAVCNFSM6AAAAABDD3P55SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBYG4YDMMBSGY> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >
Author
Owner

@T-Shilov commented on GitHub (Oct 13, 2024):

Yes, I've already realized that ROCm is a very demanding and capricious thing.
I don't understand why she's making claims about my S1200KP server motherboard, as I think it has bus PCI-3 and my CPU Xeon E3 1245 v2:

S1200KP

<!-- gh-comment-id:2408942801 --> @T-Shilov commented on GitHub (Oct 13, 2024): Yes, I've already realized that ROCm is a very demanding and capricious thing. I don't understand why she's making claims about my S1200KP server motherboard, as I think it has bus PCI-3 and my CPU Xeon E3 1245 v2: ![S1200KP](https://github.com/user-attachments/assets/e83f2126-ca63-4528-9703-68512d41f666)
Author
Owner

@T-Shilov commented on GitHub (Oct 13, 2024):

Moreover, CPU-Z confirms that in this motherboard PCI is worked in mode PCIe x 16 3.0 @ x16 1.1 -

Default Clock 1340 MHz

<!-- gh-comment-id:2409098423 --> @T-Shilov commented on GitHub (Oct 13, 2024): Moreover, CPU-Z confirms that in this motherboard PCI is worked in mode **PCIe x 16 3.0 @ x16 1.1** - ![Default Clock 1340 MHz](https://github.com/user-attachments/assets/cf4936cf-1908-4247-9814-2cfc1790cfd8)
Author
Owner

@T-Shilov commented on GitHub (Oct 21, 2024):

mnccouk, you wrote this here:

To verify, use the following command. You should see details about your graphics card.
rocminfo

Unfortunately, " should see details" it is very vague and unclear.
Please show the full output of the rocminfo command for RX580.
It is very important.

<!-- gh-comment-id:2427817124 --> @T-Shilov commented on GitHub (Oct 21, 2024): mnccouk, you wrote this [here](https://hub.docker.com/r/mnccouk/ollama-gpu-rx580): > To verify, use the following command. You should see details about your graphics card. > rocminfo > Unfortunately, " should see details" it is very vague and unclear. Please show the full output of the rocminfo command for RX580. It is very important.
Author
Owner

@mnccouk commented on GitHub (Oct 21, 2024):

In the output of the command you should see the GPU detailed in the
response if it has been correctly identified by the driver.

See
https://rocm.docs.amd.com/projects/rocminfo/en/latest/how-to/use-rocminfo.html

For an example, it's the GPU section that you're looking for not CPU.

On Mon, 21 Oct 2024, 22:57 T-Shilov, @.***> wrote:

mnccouk, you wrote this here
https://hub.docker.com/r/mnccouk/ollama-gpu-rx580:

To verify, use the following command. You should see details about your
graphics card.
rocminfo

Unfortunately, "see detail" it is very vague and unclear.
Please show the full output of the rocminfo command for RX580.
It is very important.


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/2453#issuecomment-2427817124,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABOLJO6PFZSNWENOUT3L2BTZ4V2GHAVCNFSM6AAAAABDD3P55SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRXHAYTOMJSGQ
.
You are receiving this because you were mentioned.Message ID:
@.***>

<!-- gh-comment-id:2427826646 --> @mnccouk commented on GitHub (Oct 21, 2024): In the output of the command you should see the GPU detailed in the response if it has been correctly identified by the driver. See https://rocm.docs.amd.com/projects/rocminfo/en/latest/how-to/use-rocminfo.html For an example, it's the GPU section that you're looking for not CPU. On Mon, 21 Oct 2024, 22:57 T-Shilov, ***@***.***> wrote: > mnccouk, you wrote this here > <https://hub.docker.com/r/mnccouk/ollama-gpu-rx580>: > > To verify, use the following command. You should see details about your > graphics card. > rocminfo > > Unfortunately, "see detail" it is very vague and unclear. > Please show the full output of the rocminfo command for RX580. > It is very important. > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/2453#issuecomment-2427817124>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABOLJO6PFZSNWENOUT3L2BTZ4V2GHAVCNFSM6AAAAABDD3P55SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRXHAYTOMJSGQ> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >
Author
Owner

@T-Shilov commented on GitHub (Oct 21, 2024):

Thank you for the answer. I got this result, please tell me, is it correct or not?

...................
Agent 2                  
*******                  
  Name:                    gfx803                             
  Uuid:                    GPU-XX                             
  Marketing Name:          Radeon RX 580 Series               
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
  Chip ID:                 26591(0x67df)                      
  ASIC Revision:           1(0x1)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1340                               
  BDFID:                   256                                
  Internal Node ID:        1                                  
  Compute Unit:            36                                 
  SIMDs per CU:            4                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 730                                
  SDMA engine uCode::      58                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8388608(0x800000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS:                     
      Size:                    8388608(0x800000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx803          
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***             

<!-- gh-comment-id:2427853436 --> @T-Shilov commented on GitHub (Oct 21, 2024): Thank you for the answer. I got this result, please tell me, is it correct or not? ``` ................... Agent 2 ******* Name: gfx803 Uuid: GPU-XX Marketing Name: Radeon RX 580 Series Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 16(0x10) KB Chip ID: 26591(0x67df) ASIC Revision: 1(0x1) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 1340 BDFID: 256 Internal Node ID: 1 Compute Unit: 36 SIMDs per CU: 4 Shader Engines: 4 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 40(0x28) Max Work-item Per CU: 2560(0xa00) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 730 SDMA engine uCode:: 58 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 8388608(0x800000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: Size: 8388608(0x800000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx803 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 *** Done *** ```
Author
Owner

@T-Shilov commented on GitHub (Oct 22, 2024):

Hello mnccouk and other friends,

I have read this topic carefully and have done a lot of work through numerous trials and errors to create a reliable installation procedure for the capricious ROCm 5.7.1 for the RX580.
I note that it is impossible to get the desired result on Ubuntu 22, 24 and Alma 9.2, because this malicious error invariably occurs every time you compile -

	Errors were encountered while processing:
	 amdgpu-dkms
	E: Sub-process /usr/bin/dpkg returned an error code (1)

The desired result is possible only on Ubuntu 20.04.
As a result, I created a two-part script that creates a workable ROCm 5.7.1 without much mental strain :-)

I'm giving this script to our community, which is having difficulty installing ROCm 5.7.1.
Before running the script, you need to create new user: ai

Install_ROCm_5.7.1.zip

<!-- gh-comment-id:2429953268 --> @T-Shilov commented on GitHub (Oct 22, 2024): Hello mnccouk and other friends, I have read this topic carefully and have done a lot of work through numerous trials and errors to create a reliable installation procedure for the capricious ROCm 5.7.1 for the RX580. I note that it is impossible to get the desired result on Ubuntu 22, 24 and Alma 9.2, because this malicious error invariably occurs every time you compile - ``` Errors were encountered while processing: amdgpu-dkms E: Sub-process /usr/bin/dpkg returned an error code (1) ``` The desired result is possible only on Ubuntu 20.04. As a result, I created a two-part script that creates a workable ROCm 5.7.1 without much mental strain :-) I'm giving this script to our community, which is having difficulty installing ROCm 5.7.1. Before running the script, you need to create new user: **ai** [Install_ROCm_5.7.1.zip](https://github.com/user-attachments/files/17480559/Install_ROCm_5.7.1.zip)
Author
Owner

@T-Shilov commented on GitHub (Oct 23, 2024):

mnccouk,
unfortunately I bad don't understand how to use your instruction mnccouk/ollama-gpu-rx580 correctly
It is very brief, and therefore not very clear.
For example, I still can't figure out if it needs to use the sudo command, or if I can do without it.
Also, there are often conflicts between volumes and other confusion.
So I have to delete conflicting volumes frequently.
There is also a problem of delimiting access of several users to one LLM.
Therefore I very don't like docker!

mnccouk, could you tell me how to use ollama c RX580 without using docker?
It was much clearer than using an extra entity like docker.

<!-- gh-comment-id:2433128011 --> @T-Shilov commented on GitHub (Oct 23, 2024): mnccouk, unfortunately I bad don't understand how to use your instruction [mnccouk/ollama-gpu-rx580](https://hub.docker.com/r/mnccouk/ollama-gpu-rx580) correctly It is very brief, and therefore not very clear. For example, I still can't figure out if it needs to use the sudo command, or if I can do without it. Also, there are often conflicts between volumes and other confusion. So I have to delete conflicting volumes frequently. There is also a problem of delimiting access of several users to one LLM. Therefore I very don't like docker! mnccouk, could you tell me how to use ollama c RX580 without using docker? It was much clearer than using an extra entity like docker.
Author
Owner

@mnccouk commented on GitHub (Oct 23, 2024):

Sorry you're having difficulty getting this up and running, I feel this is
not the place to be discussing this as it's veering off topic. However, in
the interest of getting you up and running here are a few pointers:-

  1. Should you use sudo or not?
    This depends on how docker has been set up - see
    https://docs.docker.com/engine/install/linux-postinstall/ to give
    permission to your user(that runs docker), so that you don't have to use
    sudo. Sudo'ing is still ok though to start your docker container if the
    post install steps have not been carried out.

  2. I'm not sure why you have conflicts between volumes, the -v
    ollama:/root/.ollama This mounts the local directory "ollama" into the
    container as the /root/.ollama directory.

  3. The docker container starts with the Ollama serve command, which
    means the API is exposed, in the case of the docker run command, on port

  1. Several client applications(users) can then access the same LLM from
    the ${docker_host_ip}:11434. -
    https://github.com/ollama/ollama/blob/main/docs/api.md

I've only focused on building a docker image, so I can't help at this stage
with the request to build without utilisation of docker. My forked source
is here - https://github.com/mnccouk/ollama/tree/rx580_gpu if you would
like to try, I believe there is another build script in the codebase to
achieve this - ./scripts/build_linux.sh but have not tried this myself.

On Wed, 23 Oct 2024 at 19:32, T-Shilov @.***> wrote:

mnccouk,
unfortunately I bad don't understand how to use your instruction
mnccouk/ollama-gpu-rx580
https://hub.docker.com/r/mnccouk/ollama-gpu-rx580 correctly
It is very brief, and therefore not very clear.
For example, I still can't figure out if it needs to use the sudo command,
or if I can do without it.
Also, there are often conflicts between volumes and other confusion.
So I have to delete conflicting volumes frequently.
There is also a problem of delimiting access of several users to one LLM.
Therefore I very don't like docker!

mnccouk, could you tell me how to use ollama c RX580 without using docker?
It was much clearer than using an extra entity like docker.


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/2453#issuecomment-2433128011,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABOLJO6ZKTY7BWXQDSN7WV3Z47TUDAVCNFSM6AAAAABDD3P55SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZTGEZDQMBRGE
.
You are receiving this because you were mentioned.Message ID:
@.***>

<!-- gh-comment-id:2433534242 --> @mnccouk commented on GitHub (Oct 23, 2024): Sorry you're having difficulty getting this up and running, I feel this is not the place to be discussing this as it's veering off topic. However, in the interest of getting you up and running here are a few pointers:- 1) Should you use sudo or not? This depends on how docker has been set up - see https://docs.docker.com/engine/install/linux-postinstall/ to give permission to your user(that runs docker), so that you don't have to use sudo. Sudo'ing is still ok though to start your docker container if the post install steps have not been carried out. 2) I'm not sure why you have conflicts between volumes, the -v ollama:/root/.ollama This mounts the local directory "ollama" into the container as the /root/.ollama directory. 3) The docker container starts with the Ollama serve command, which means the API is exposed, in the case of the docker run command, on port 11434. Several client applications(users) can then access the same LLM from the ${docker_host_ip}:11434. - https://github.com/ollama/ollama/blob/main/docs/api.md I've only focused on building a docker image, so I can't help at this stage with the request to build without utilisation of docker. My forked source is here - https://github.com/mnccouk/ollama/tree/rx580_gpu if you would like to try, I believe there is another build script in the codebase to achieve this - ./scripts/build_linux.sh but have not tried this myself. On Wed, 23 Oct 2024 at 19:32, T-Shilov ***@***.***> wrote: > mnccouk, > unfortunately I bad don't understand how to use your instruction > mnccouk/ollama-gpu-rx580 > <https://hub.docker.com/r/mnccouk/ollama-gpu-rx580> correctly > It is very brief, and therefore not very clear. > For example, I still can't figure out if it needs to use the sudo command, > or if I can do without it. > Also, there are often conflicts between volumes and other confusion. > So I have to delete conflicting volumes frequently. > There is also a problem of delimiting access of several users to one LLM. > Therefore I very don't like docker! > > mnccouk, could you tell me how to use ollama c RX580 without using docker? > It was much clearer than using an extra entity like docker. > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/2453#issuecomment-2433128011>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABOLJO6ZKTY7BWXQDSN7WV3Z47TUDAVCNFSM6AAAAABDD3P55SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZTGEZDQMBRGE> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >
Author
Owner

@T-Shilov commented on GitHub (Oct 23, 2024):

mnccouk,
thanks for the advices, I understand you.
I agree with you that this topic is not the place to discuss docker, it is better to discuss it on discord.
But unfortunately, I can't register on discord because I don't have an actual invitation, and discord rejects my phones for unknown reasons.

So, if mysterious problems arise using docker, then maybe you will make your project on App image?
It consists of only one file, and it is much easier and more convenient to work with it.

By the way, I think Debian is much better and more stable compared to Ubuntu.
That's why I redesigned my scripts for Debian 11.
They are not perfect yet, but Ollama is already working with them.

And if earlier after the question "Why is the sky blue?" Lama 3.1 in Ubuntu first thinks, there is a pause of 5-10 seconds or more, then Debian answers this question without a pause, instantly.
But it was only the first time, then the answers to other questions came slower and slower - why?
It gives the impression that Lama is getting tired of being asked questions.
The same "fatigue" of Llama is also observed in Ubuntu.
This is a very annoying inhibited LLama behavior. Maybe Docker to blame for it?

Here is the result in Debian 11:

Llama3 1_on_Debian-11

<!-- gh-comment-id:2433678635 --> @T-Shilov commented on GitHub (Oct 23, 2024): mnccouk, thanks for the advices, I understand you. I agree with you that this topic is not the place to discuss docker, it is better to discuss it on discord. But unfortunately, I can't register on discord because I don't have an actual invitation, and discord rejects my phones for unknown reasons. So, if mysterious problems arise using docker, then maybe you will make your project on App image? It consists of only one file, and it is much easier and more convenient to work with it. By the way, I think Debian is much better and more stable compared to Ubuntu. That's why I redesigned my scripts for Debian 11. They are not perfect yet, but Ollama is already working with them. And if earlier after the question "**Why is the sky blue?**" Lama 3.1 in Ubuntu first thinks, there is a pause of 5-10 seconds or more, then Debian answers this question without a pause, instantly. But it was only the first time, then the answers to other questions came slower and slower - why? It gives the impression that Lama is getting tired of being asked questions. The same "fatigue" of Llama is also observed in Ubuntu. This is a very annoying inhibited LLama behavior. Maybe Docker to blame for it? Here is the result in Debian 11: ![Llama3 1_on_Debian-11](https://github.com/user-attachments/assets/11d44432-0fe2-4bb7-86f2-622ba8fc7ff1)
Author
Owner

@T-Shilov commented on GitHub (Oct 24, 2024):

I will tell you more about the incomprehensible behavior of Llama 3.1 in your Docker.

On the 1st question "Why is the sky blue?" Llama answered without pause, instantly and showed a speed of 27 tokens/sec.
On the 2nd exactly the same question, Llama thought for a long time, and showed a speed of 12 tokens / sec.
On the 3rd exactly the same question, Llama thought for 4 minutes and showed a speed of 12 tokens/sec.

Why?? I don't understand why this is happening, but this is not normal, and it is impossible to use such
thinking for a long time about a recurring simple question and unpredictable this Ollama.

PS. When Ollama thinks, the RX580 consumes 105 watts
When Ollama responds, the RX580 consumes 155 watts.

<!-- gh-comment-id:2434907495 --> @T-Shilov commented on GitHub (Oct 24, 2024): I will tell you more about the incomprehensible behavior of Llama 3.1 in your Docker. On the 1st question "Why is the sky blue?" Llama answered without pause, instantly and showed a speed of 27 tokens/sec. On the 2nd exactly the same question, Llama thought for a long time, and showed a speed of 12 tokens / sec. On the 3rd exactly the same question, Llama thought for 4 minutes and showed a speed of 12 tokens/sec. Why?? I don't understand why this is happening, but this is not normal, and it is impossible to use such thinking for a long time about a recurring simple question and unpredictable this Ollama. PS. When Ollama thinks, the RX580 consumes 105 watts When Ollama responds, the RX580 consumes 155 watts.
Author
Owner

@mnccouk commented on GitHub (Oct 24, 2024):

Do you clear the conversation context between promoting the same question?

If not, processing subsequent questions also processes the previous
conversational context, which includes the responses from previous
questions.

This might be why each prompt is taking longer.

On Thu, 24 Oct 2024, 11:34 T-Shilov, @.***> wrote:

I will tell you more about the incomprehensible behavior of Llama 3.1 in
your Docker.

On the 1st question "Why is the sky blue?" Llama answered without pause,
instantly and showed a speed of 27 tokens/sec.
On the 2nd exactly the same question, Llama thought for a long time, and
showed a speed of 12 tokens / sec.
On the 3rd exactly the same question, Llama thought for 4 minutes and
showed a speed of 12 tokens/sec.

Why?? This is not normal, and it is impossible to use such a thoughtful
and unpredictable Ollama.

PS. When Ollama thinks, the RX580 consumes 105 watts
When Ollama responds, the RX580 consumes 155 watts.


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/2453#issuecomment-2434907495,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABOLJO4CEJATN5DBXT2FABLZ5DEJ7AVCNFSM6AAAAABDD3P55SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZUHEYDONBZGU
.
You are receiving this because you were mentioned.Message ID:
@.***>

<!-- gh-comment-id:2435098679 --> @mnccouk commented on GitHub (Oct 24, 2024): Do you clear the conversation context between promoting the same question? If not, processing subsequent questions also processes the previous conversational context, which includes the responses from previous questions. This might be why each prompt is taking longer. On Thu, 24 Oct 2024, 11:34 T-Shilov, ***@***.***> wrote: > I will tell you more about the incomprehensible behavior of Llama 3.1 in > your Docker. > > On the 1st question "Why is the sky blue?" Llama answered without pause, > instantly and showed a speed of 27 tokens/sec. > On the 2nd exactly the same question, Llama thought for a long time, and > showed a speed of 12 tokens / sec. > On the 3rd exactly the same question, Llama thought for 4 minutes and > showed a speed of 12 tokens/sec. > > Why?? This is not normal, and it is impossible to use such a thoughtful > and unpredictable Ollama. > > PS. When Ollama thinks, the RX580 consumes 105 watts > When Ollama responds, the RX580 consumes 155 watts. > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/2453#issuecomment-2434907495>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABOLJO4CEJATN5DBXT2FABLZ5DEJ7AVCNFSM6AAAAABDD3P55SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZUHEYDONBZGU> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >
Author
Owner

@T-Shilov commented on GitHub (Oct 24, 2024):

No, I didn't clean it, unaware of the need for cleaning.
After all, I asked only three identical short questions, and already on the 2nd question there was a problem with slowing down Ollma.
And what, even in this simple situation, me need to cleanse the context?
I didn't know about this because there is no such problem in ChatGPT.

OK, how exactly is the context of the conversation cleared?


Upd. I found a command that clears the conversational context- this is Ctrl-R (probably).
Now Llama 3.1 began to respond faster, but not instantly, as in the 1st time.
The speed has now become small - 19-22 tokens/sec - why?

<!-- gh-comment-id:2435439393 --> @T-Shilov commented on GitHub (Oct 24, 2024): No, I didn't clean it, unaware of the need for cleaning. After all, I asked only three identical short questions, and already on the 2nd question there was a problem with slowing down Ollma. And what, even in this simple situation, me need to cleanse the context? I didn't know about this because there is no such problem in ChatGPT. OK, how exactly is the context of the conversation cleared? ------ Upd. I found a command that clears the conversational context- this is **Ctrl-R** (probably). Now Llama 3.1 began to respond faster, but not instantly, as in the 1st time. The speed has now become small - **19-22** tokens/sec - why?
Author
Owner

@T-Shilov commented on GitHub (Oct 29, 2024):

Hello,

I want to join the LLM community in Discord, but registration by phone is rejected.
Can someone share an invite for Discord?

<!-- gh-comment-id:2444274593 --> @T-Shilov commented on GitHub (Oct 29, 2024): Hello, I want to join the LLM community in Discord, but registration by phone is rejected. Can someone share an invite for Discord?
Author
Owner

@kth8 commented on GitHub (Nov 22, 2024):

@mnccouk I managed to get my old RX 470 with i3-3220 CPU to work, not with ROCm but with Vulkan after following this guide: https://www.jeffgeerling.com/blog/2024/llms-accelerated-egpu-on-raspberry-pi-5

Using Llama-3.2-3B-Instruct-Q4_K_M my RX 470 managed to get about 20 token/s which is half the speed of the RX 6500 XT Jeff got in his benchmark. I also saw there is a PR open to add Vulkan support to Ollama that will make running on older hardware easier #5059

<!-- gh-comment-id:2492786776 --> @kth8 commented on GitHub (Nov 22, 2024): @mnccouk I managed to get my old RX 470 with i3-3220 CPU to work, not with ROCm but with Vulkan after following this guide: https://www.jeffgeerling.com/blog/2024/llms-accelerated-egpu-on-raspberry-pi-5 Using `Llama-3.2-3B-Instruct-Q4_K_M` my RX 470 managed to get about 20 token/s which is half the speed of the RX 6500 XT Jeff got in his benchmark. I also saw there is a PR open to add Vulkan support to Ollama that will make running on older hardware easier #5059
Author
Owner

@Tamila-2017 commented on GitHub (Dec 12, 2024):

Hi all,
I now use an NVIDIA graphics card, which is easy to configure and provides a good speed of 90 tokens per second.
But I still have the RX580, and I'm still interested in using it.
I remember that someone is here in this topic promised that its full support in ROCm will soon appear and it will also be easy to use.
Tell me please, has this promise already been fulfilled?

<!-- gh-comment-id:2540198270 --> @Tamila-2017 commented on GitHub (Dec 12, 2024): Hi all, I now use an NVIDIA graphics card, which is easy to configure and provides a good speed of 90 tokens per second. But I still have the RX580, and I'm still interested in using it. I remember that someone is here in this topic promised that its full support in ROCm will soon appear and it will also be easy to use. Tell me please, has this promise already been fulfilled?
Author
Owner

@zoumath19 commented on GitHub (Dec 13, 2024):

Hi all, I now use an NVIDIA graphics card, which is easy to configure and provides a good speed of 90 tokens per second. But I still have the RX580, and I'm still interested in using it. I remember that someone is here in this topic promised that its full support in ROCm will soon appear and it will also be easy to use. Tell me please, has this promise already been fulfilled?

Waiting on this too, will be immense

<!-- gh-comment-id:2540329801 --> @zoumath19 commented on GitHub (Dec 13, 2024): > Hi all, I now use an NVIDIA graphics card, which is easy to configure and provides a good speed of 90 tokens per second. But I still have the RX580, and I'm still interested in using it. I remember that someone is here in this topic promised that its full support in ROCm will soon appear and it will also be easy to use. Tell me please, has this promise already been fulfilled? Waiting on this too, will be immense
Author
Owner

@Tamila-2017 commented on GitHub (Dec 13, 2024):

Ok, can you be more specific - 1 month, or 1 year, or more?

<!-- gh-comment-id:2541042420 --> @Tamila-2017 commented on GitHub (Dec 13, 2024): Ok, can you be more specific - 1 month, or 1 year, or more?
Author
Owner

@mattiasghodsian commented on GitHub (Jan 28, 2025):

I would also appreciate obtaining support for the AMD RX 580 GPUs, as I have a couple available.

<!-- gh-comment-id:2619534737 --> @mattiasghodsian commented on GitHub (Jan 28, 2025): I would also appreciate obtaining support for the AMD RX 580 GPUs, as I have a couple available.
Author
Owner

@takitakitanana commented on GitHub (Jan 28, 2025):

AMD RX 580 GPUs support would be awesome

<!-- gh-comment-id:2620168590 --> @takitakitanana commented on GitHub (Jan 28, 2025): AMD RX 580 GPUs support would be awesome
Author
Owner

@PhoenixIO commented on GitHub (Jan 30, 2025):

RX580 is quite common, and it would be decent to have official support for it,

<!-- gh-comment-id:2623376137 --> @PhoenixIO commented on GitHub (Jan 30, 2025): RX580 is quite common, and it would be decent to have official support for it,
Author
Owner

@PhoenixIO commented on GitHub (Jan 30, 2025):

If anyone is wondering, after some research, I found a step-by-step solution to make this work on older GPUs: https://github.com/likelovewant/ollama-for-amd

<!-- gh-comment-id:2625415747 --> @PhoenixIO commented on GitHub (Jan 30, 2025): If anyone is wondering, after some research, I found a step-by-step solution to make this work on older GPUs: [https://github.com/likelovewant/ollama-for-amd](https://github.com/likelovewant/ollama-for-amd)
Author
Owner

@kth8 commented on GitHub (Feb 2, 2025):

@mnccouk I managed to get my old RX 470 with i3-3220 CPU to work, not with ROCm but with Vulkan after following this guide: https://www.jeffgeerling.com/blog/2024/llms-accelerated-egpu-on-raspberry-pi-5

Using Llama-3.2-3B-Instruct-Q4_K_M my RX 470 managed to get about 20 token/s which is half the speed of the RX 6500 XT Jeff got in his benchmark. I also saw there is a PR open to add Vulkan support to Ollama that will make running on older hardware easier #5059

I made a Docker image to easily run Llama on my RX 470 (gfx803) using llama.cpp's Vulkan https://github.com/kth8/llama-server-vulkan

<!-- gh-comment-id:2629270831 --> @kth8 commented on GitHub (Feb 2, 2025): > [@mnccouk](https://github.com/mnccouk) I managed to get my old RX 470 with i3-3220 CPU to work, not with ROCm but with Vulkan after following this guide: https://www.jeffgeerling.com/blog/2024/llms-accelerated-egpu-on-raspberry-pi-5 > > Using `Llama-3.2-3B-Instruct-Q4_K_M` my RX 470 managed to get about 20 token/s which is half the speed of the RX 6500 XT Jeff got in his benchmark. I also saw there is a PR open to add Vulkan support to Ollama that will make running on older hardware easier [#5059](https://github.com/ollama/ollama/pull/5059) I made a Docker image to easily run Llama on my RX 470 (gfx803) using llama.cpp's Vulkan https://github.com/kth8/llama-server-vulkan
Author
Owner

@ChunkyPanda03 commented on GitHub (Feb 3, 2025):

I really like how llama.cpp has vulkan built in I wish that ollama supported vulkan as well. I think less effort should be put into rocm support. I say this because you would be able to target more gpus as amd just drops support of cards after 3 years. I have attempted and failed to get ollama working with the rocm libraries but unless someone forks and maintains an older branch for the polaris 10 gpus these patches we are doing will not work for the newer versions of the linux kernel as it is rocm 5.1 does not install to linux kernel 6.1.0-30 does not compile the dkms.

<!-- gh-comment-id:2630015668 --> @ChunkyPanda03 commented on GitHub (Feb 3, 2025): I really like how llama.cpp has vulkan built in I wish that ollama supported vulkan as well. I think less effort should be put into rocm support. I say this because you would be able to target more gpus as amd just drops support of cards after 3 years. I have attempted and failed to get ollama working with the rocm libraries but unless someone forks and maintains an older branch for the polaris 10 gpus these patches we are doing will not work for the newer versions of the linux kernel as it is rocm 5.1 does not install to linux kernel 6.1.0-30 does not compile the dkms.
Author
Owner

@Tamila-2017 commented on GitHub (Feb 3, 2025):

ChunkyPanda03

And what follows from this? Do you have a ready-made solution?

<!-- gh-comment-id:2630373292 --> @Tamila-2017 commented on GitHub (Feb 3, 2025): **ChunkyPanda03** And what follows from this? Do you have a ready-made solution?
Author
Owner

@siavashmohammady66 commented on GitHub (Feb 3, 2025):

For targeting multiple platforms supporting OpenCL is more important than supporting ROCM

<!-- gh-comment-id:2630605014 --> @siavashmohammady66 commented on GitHub (Feb 3, 2025): For targeting multiple platforms supporting OpenCL is more important than supporting ROCM
Author
Owner

@gl2007 commented on GitHub (Feb 4, 2025):

There is a fork of ollama for vulkan: ollama-vulkan but the problem is that you have to build it yourself. I am going to try it myself :)

<!-- gh-comment-id:2632612424 --> @gl2007 commented on GitHub (Feb 4, 2025): There is a fork of ollama for vulkan: [ollama-vulkan](https://github.com/pufferffish/ollama-vulkan) but the problem is that you have to build it yourself. I am going to try it myself :)
Author
Owner

@Tamila-2017 commented on GitHub (Feb 16, 2025):

Dear mnccouk,

I've added the docker image I'm using with the rx580 with Ollama to docker hub, Hopefully it might prove to be useful to someone. https://hub.docker.com/r/mnccouk/ollama-gpu-rx580

Your docker for RX580 is working very fine so far 👍
And now I have a question: is it possible to use RX580 and Rock 5.7.1 without using your docker?

<!-- gh-comment-id:2661547165 --> @Tamila-2017 commented on GitHub (Feb 16, 2025): Dear **mnccouk**, > I've added the docker image I'm using with the rx580 with Ollama to docker hub, Hopefully it might prove to be useful to someone. https://hub.docker.com/r/mnccouk/ollama-gpu-rx580 Your docker for RX580 is working very fine so far 👍 And now I have a question: is it possible to use RX580 and Rock 5.7.1 without using your docker?
Author
Owner

@gl2007 commented on GitHub (Feb 16, 2025):

Dear mnccouk,

I've added the docker image I'm using with the rx580 with Ollama to docker hub, Hopefully it might prove to be useful to someone. https://hub.docker.com/r/mnccouk/ollama-gpu-rx580

Your docker for RX580 is working very fine so far 👍 And now I have a question: is it possible to use RX580 and Rock 5.7.1 without using your docker?

see my reply to your feature request post. Got it to work without docker. There is also this other repo which I didn't test yet:

<!-- gh-comment-id:2661578714 --> @gl2007 commented on GitHub (Feb 16, 2025): > Dear **mnccouk**, > > > I've added the docker image I'm using with the rx580 with Ollama to docker hub, Hopefully it might prove to be useful to someone. https://hub.docker.com/r/mnccouk/ollama-gpu-rx580 > > Your docker for RX580 is working very fine so far 👍 And now I have a question: is it possible to use RX580 and Rock 5.7.1 without using your docker? see my reply to your feature request post. Got it to work without docker. There is also this [other repo](https://github.com/likelovewant/ollama-for-amd/releases) which I didn't test yet:
Author
Owner

@Tamila-2017 commented on GitHub (Feb 16, 2025):

see my reply to your feature request post.

Dear gl2007,
Thank you for your feedback. Please specify where this answer of yours is located for working without a docker?

<!-- gh-comment-id:2661630020 --> @Tamila-2017 commented on GitHub (Feb 16, 2025): > see my reply to your feature request post. Dear **gl2007**, Thank you for your feedback. Please specify where this answer of yours is located for working without a docker?
Author
Owner

@gl2007 commented on GitHub (Feb 17, 2025):

see my reply to your feature request post.

Dear gl2007, Thank you for your feedback. Please specify where this answer of yours is located for working without a docker?

See this releases page.

<!-- gh-comment-id:2661770845 --> @gl2007 commented on GitHub (Feb 17, 2025): > > see my reply to your feature request post. > > Dear **gl2007**, Thank you for your feedback. Please specify where this answer of yours is located for working without a docker? See this [releases](https://github.com/likelovewant/ollama-for-amd/releases) page.
Author
Owner

@Tamila-2017 commented on GitHub (Feb 17, 2025):

See this releases page.

Thanks for the link. But I'm not using Windows, I'm using Debian. So I need to compile Source code (tar.gz) ?

<!-- gh-comment-id:2662139148 --> @Tamila-2017 commented on GitHub (Feb 17, 2025): > See this [releases](https://github.com/likelovewant/ollama-for-amd/releases) page. Thanks for the link. But I'm not using Windows, I'm using Debian. So I need to compile Source code (tar.gz) ?
Author
Owner

@gl2007 commented on GitHub (Feb 17, 2025):

Thanks for the link. But I'm not using Windows, I'm using Debian. So I need to compile Source code (tar.gz) ?

Ahh, ok; but it might not be as difficult as it sounds. Under scripts folder, you can find a ...linux.sh which can build it for you. Windows is typically more painful :)

<!-- gh-comment-id:2662264749 --> @gl2007 commented on GitHub (Feb 17, 2025): > Thanks for the link. But I'm not using Windows, I'm using Debian. So I need to compile Source code (tar.gz) ? Ahh, ok; but it might not be as difficult as it sounds. Under scripts folder, you can find a ...linux.sh which can build it for you. Windows is typically more painful :)
Author
Owner

@Tamila-2017 commented on GitHub (Feb 17, 2025):

Ok, I ran this script called install.sh
It was successfully completed:

ai@rx580:~/ollama-for-amd-0.5.9/scripts$ ./install.sh 
>>> Cleaning up old version at /usr/local/lib/ollama
[sudo] password for ai: 
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
################################################################################### 100,0%
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
>>> Downloading Linux ROCm amd64 bundle
################################################################################### 100,0%
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
>>> AMD GPU ready.

Unfortunately, the GPU RX580 does not work because the speed is only 4 tokens/sec.
This means that only the CPU is running:


ai@rx580:~$ ollama run llama3.1
>>> /set verbose
Set 'verbose' mode.
>>> Why is the sky blue?
The sky appears blue to us because of a phenomenon called Rayleigh scattering, which 
is named after the British physicist Lord Rayleigh who first described it in the 
late 19th century.
Here's what happens:
..................................................................
So, to summarize: the sky appears blue because of the scattering of sunlight by gas 
molecules in the atmosphere, with shorter wavelengths like blue being scattered more 
than longer wavelengths like red.

total duration:       1m41.502082446s
load duration:        27.954397ms
prompt eval count:    16 token(s)
prompt eval duration: 3.096s
prompt eval rate:     5.17 tokens/s
eval count:           402 token(s)
eval duration:        1m38.376s
eval rate:            4.09 tokens/s
<!-- gh-comment-id:2662343007 --> @Tamila-2017 commented on GitHub (Feb 17, 2025): Ok, I ran this script called install.sh It was successfully completed: ``` ai@rx580:~/ollama-for-amd-0.5.9/scripts$ ./install.sh >>> Cleaning up old version at /usr/local/lib/ollama [sudo] password for ai: >>> Installing ollama to /usr/local >>> Downloading Linux amd64 bundle ################################################################################### 100,0% >>> Adding ollama user to render group... >>> Adding ollama user to video group... >>> Adding current user to ollama group... >>> Creating ollama systemd service... >>> Enabling and starting ollama service... >>> Downloading Linux ROCm amd64 bundle ################################################################################### 100,0% >>> The Ollama API is now available at 127.0.0.1:11434. >>> Install complete. Run "ollama" from the command line. >>> AMD GPU ready. ``` Unfortunately, the GPU RX580 does not work because the speed is only 4 tokens/sec. This means that only the CPU is running: ``` ai@rx580:~$ ollama run llama3.1 >>> /set verbose Set 'verbose' mode. >>> Why is the sky blue? The sky appears blue to us because of a phenomenon called Rayleigh scattering, which is named after the British physicist Lord Rayleigh who first described it in the late 19th century. Here's what happens: .................................................................. So, to summarize: the sky appears blue because of the scattering of sunlight by gas molecules in the atmosphere, with shorter wavelengths like blue being scattered more than longer wavelengths like red. total duration: 1m41.502082446s load duration: 27.954397ms prompt eval count: 16 token(s) prompt eval duration: 3.096s prompt eval rate: 5.17 tokens/s eval count: 402 token(s) eval duration: 1m38.376s eval rate: 4.09 tokens/s ```
Author
Owner

@Tamila-2017 commented on GitHub (Feb 17, 2025):

The previous experiment was on Ubuntu 20.04.
Now I've tried Debian 12.
The software was installed flawlessly again:

ai@rx580:~/ollama-for-amd-0.5.9/scripts$ sudo ./install.sh 
[sudo] password for ai: 
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
########################################################################################### 100.0%
>>> Creating ollama user...
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service.
>>> Downloading Linux ROCm amd64 bundle
########################################################################################### 100.0%
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
>>> AMD GPU ready.
ai@rx580:~/ollama-for-amd-0.5.9/scripts$ sudo reboot

But unfortunately, the RX580 doesn't work here either :-(

ai@rx580:~$ ollama run llama3.1
pulling manifest 
pulling 667b0c1932bc... 100% ▕█████████████████████ ▏ 4.9 GB/4.9 GB  7.9 MB/s  0s
pulling 948af2743fc7... 100% ▕█████████████████████▏ 1.5 KB                         
pulling 0ba8f0e314b4... 100% ▕█████████████████████▏  12 KB                         
pulling 56bb8bd477a5... 100% ▕█████████████████████▏   96 B                         
pulling 455f34728c9b... 100% ▕█████████████████████▏  487 B                         
verifying sha256 digest 
writing manifest 
success 
>>> 
>>> Send a message (/? for help)
>>> /set verbose
Set 'verbose' mode.
>>> 
>>>  Why is the sky blue?
The sky appears blue because of a phenomenon called Rayleigh scattering, which is named after the 
British physicist Lord Rayleigh who first described it in the late 19th century.

Here's what happens:
..........................................
And there you have it – the short answer to a long-standing question!

total duration:       1m25.276420709s
load duration:        29.056084ms
prompt eval count:    16 token(s)
prompt eval duration: 2.744s
prompt eval rate:     5.83 tokens/s
eval count:           353 token(s)
eval duration:        1m22.502s
eval rate:            4.28 tokens/s

<!-- gh-comment-id:2662481037 --> @Tamila-2017 commented on GitHub (Feb 17, 2025): _The previous experiment was on Ubuntu 20.04. Now I've tried Debian 12. The software was installed flawlessly again:_ ``` ai@rx580:~/ollama-for-amd-0.5.9/scripts$ sudo ./install.sh [sudo] password for ai: >>> Installing ollama to /usr/local >>> Downloading Linux amd64 bundle ########################################################################################### 100.0% >>> Creating ollama user... >>> Adding ollama user to render group... >>> Adding ollama user to video group... >>> Adding current user to ollama group... >>> Creating ollama systemd service... >>> Enabling and starting ollama service... Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service. >>> Downloading Linux ROCm amd64 bundle ########################################################################################### 100.0% >>> The Ollama API is now available at 127.0.0.1:11434. >>> Install complete. Run "ollama" from the command line. >>> AMD GPU ready. ai@rx580:~/ollama-for-amd-0.5.9/scripts$ sudo reboot ``` _But unfortunately, the RX580 doesn't work here either :-(_ ``` ai@rx580:~$ ollama run llama3.1 pulling manifest pulling 667b0c1932bc... 100% ▕█████████████████████ ▏ 4.9 GB/4.9 GB 7.9 MB/s 0s pulling 948af2743fc7... 100% ▕█████████████████████▏ 1.5 KB pulling 0ba8f0e314b4... 100% ▕█████████████████████▏ 12 KB pulling 56bb8bd477a5... 100% ▕█████████████████████▏ 96 B pulling 455f34728c9b... 100% ▕█████████████████████▏ 487 B verifying sha256 digest writing manifest success >>> >>> Send a message (/? for help) >>> /set verbose Set 'verbose' mode. >>> >>> Why is the sky blue? The sky appears blue because of a phenomenon called Rayleigh scattering, which is named after the British physicist Lord Rayleigh who first described it in the late 19th century. Here's what happens: .......................................... And there you have it – the short answer to a long-standing question! total duration: 1m25.276420709s load duration: 29.056084ms prompt eval count: 16 token(s) prompt eval duration: 2.744s prompt eval rate: 5.83 tokens/s eval count: 353 token(s) eval duration: 1m22.502s eval rate: 4.28 tokens/s ```
Author
Owner

@robertrosenbusch commented on GitHub (Feb 22, 2025):

First at all thnx @likelovewant for your great work.

@Tamila-2017 : if you are able to handle without any further probs, take a look on my ollama/pytorch Repos for gx803. it will summarize the following steps for ollama on RX5X0

My Steps to use the Ollama on gf803/Linux Ollama v0.5.12 were:

  1. use the offical ROCm 6.3 Version
  2. recompile rocblas for ROCm6.3 for gfx803
  3. checkout the ollama v0.5.12
  4. replace Line 74 discover/gpu.go from "9 "to number 8
  5. add on Line 59 CMakePresets.json AMDGPU_TARGETS gfx803 like ` "AMDGPU_TARGETS": "gfx803";
  6. add on Line 99 CMakeLists.txt. like "^gfx(803|900|9[...]
<!-- gh-comment-id:2676417172 --> @robertrosenbusch commented on GitHub (Feb 22, 2025): First at all thnx @likelovewant for your great work. @Tamila-2017 : if you are able to handle without any further probs, take a look on my [ollama/pytorch](https://github.com/robertrosenbusch/gfx803_rocm) Repos for gx803. it will summarize the following steps for ollama on RX5X0 My Steps to use the Ollama on gf803/Linux [Ollama v0.5.12](https://github.com/ollama/ollama/releases/tag/v0.5.12) were: 1. use the offical [ROCm 6.3 Version](https://hub.docker.com/layers/rocm/pytorch/rocm6.3_ubuntu24.04_py3.12_pytorch_release_2.4.0/images/sha256-98ddf20333bd01ff749b8092b1190ee369a75d3b8c71c2fac80ffdcb1a98d529?context=explore) 2. recompile [rocblas for ROCm6.3](https://github.com/ROCm/rocBLAS/releases/tag/rocm-6.3.0) for gfx803 3. checkout the ollama [v0.5.12](https://github.com/ollama/ollama/releases/tag/v0.5.12) 4. replace Line 74 `discover/gpu.go` from "9 "to number *8* 5. add on Line 59 `CMakePresets.json` AMDGPU_TARGETS gfx803 like ` "AMDGPU_TARGETS": "gfx803"; 6. add on Line 99 `CMakeLists.txt.` like `"^gfx(803|900|9[...]`
Author
Owner

@siavashmohammady66 commented on GitHub (Feb 24, 2025):

Thank you a lot @robertrosenbusch
Could explain each step in more detail?
Thank you

<!-- gh-comment-id:2677831280 --> @siavashmohammady66 commented on GitHub (Feb 24, 2025): Thank you a lot @robertrosenbusch Could explain each step in more detail? Thank you
Author
Owner

@robertrosenbusch commented on GitHub (Feb 25, 2025):

Thank you a lot @robertrosenbusch Could explain each step in more detail? Thank you

Of course :D The short version will be to take a look on my documentation for gfx803-Ollama Dockerfile and/or on my install Instructions for Ollama :D

Hint Its not necessary to use any specific ROCm-Installation/Librarys on your Host System, cause you pass through the --device=/dev/kfd --device=/dev/dri . from the Kernel to the ROCm 6.3 Docker-Image. You only need a Kernel-Version where these both devices are aviable. And of course the docker container is independent of the host Linux version used.... And you need a lot of time to download and compile Ollama for gfx803 ^.

  1. I used the ROCm 6.3/PyTorch Dockerfile from AMD, cause it inherit all necessary Librarys, Tools and Dependencies
  2. Set the Environment-Vars for gfx803 and a symlink ln -s /opt/rocm-6.3.0 /opt/rocm
  3. Checkout and recompile the rocBLAS for ROCm 6.3 to use gfx803, cause its not compiled for gfx803 into the official ROCm-Dockercontainer/install.sh -ida gfx803
  4. Checkout last Ollama-Version v0.5.12
  5. replace Line 74 discover/gpu.go from "9 "to number "8"... cause without the change Ollama-Backend will never used gfx803 in Linux. Take a look on this discussion sed -i 's/var RocmComputeMajorMin = "9"/var RocmComputeMajorMin = "8"/' discover/gpu.go
  6. add gfx803 to AMDGPUTARGETs on CMakePresets.json and CMakeLists.txt
  7. sed -i 's/"gfx900;gfx940;gfx941;gfx942;gfx1010;gfx1012;gfx1030;gfx1100;gfx1101;gfx1102;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack+;gfx90a:xnack-" /"gfx803;gfx900;gfx940;gfx941;gfx942;gfx1010;gfx1012;gfx1030;gfx1100;gfx1101;gfx1102;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack+;gfx90a:xnack-" /g' CMakePresets.json
  8. sed -i 's/"list(FILTER AMDGPU_TARGETS INCLUDE REGEX "^gfx(900|94[012]|101[02]|1030|110[012])$")"/"list(FILTER AMDGPU_TARGETS INCLUDE REGEX "^gfx(803|900|94[012]|101[02]|1030|110[012])$")"/g' CMakeLists.txt
  9. Compile Ollama Backend for gfx803 cmake -B build -DAMDGPU_TARGETS=gfx803 && cmake --build build
  10. Compile Ollama Frontend go generate ./... && go build .
  11. Start Ollama Backend via ./ollama serve& into the background. You should get an output similar like this
  12. Start Ollama Frontend via e.g. ollama run llama3.1:8b You should get an output similar like this

But all this Steps will do my Dockerfile_rocm63_ollama for you :D

Benchmark i took last month:

ROCm-6.3.0 Ollama v0.5.4 Benchmark on RX570 vs CPU Ryzen7 3700x
CPU/GPU deepseek-r1:8b llama3.1:8b llama2:7b
GPU AMD RX570 Total: 18.19 t/s Total: 18.80 t/s Total: 27.46 t/s
CPU AMD Ryzen 7 3700x Total: 7.33 t/s Total: 7.53 t/s Total: 8.76 t/s

`

<!-- gh-comment-id:2682048935 --> @robertrosenbusch commented on GitHub (Feb 25, 2025): > Thank you a lot [@robertrosenbusch](https://github.com/robertrosenbusch) Could explain each step in more detail? Thank you Of course :D The short version will be to take a look on [my documentation for gfx803-Ollama ](https://github.com/robertrosenbusch/gfx803_rocm/blob/main/Dockerfile_rocm63_ollama)Dockerfile and/or on my[ install Instructions ](https://github.com/robertrosenbusch/gfx803_rocm) for Ollama :D **Hint** Its not necessary to use any specific ROCm-Installation/Librarys on your Host System, cause you pass through the `--device=/dev/kfd --device=/dev/dri .` from the Kernel to the ROCm 6.3 Docker-Image. You only need a Kernel-Version where these both devices are aviable. And of course the docker container is independent of the host Linux version used.... And you **need a lot of time to download and compile** Ollama for gfx803 ^. 1. I used the [ROCm 6.3/PyTorch Dockerfile](https://hub.docker.com/layers/rocm/pytorch/rocm6.3_ubuntu24.04_py3.12_pytorch_release_2.4.0/images/sha256-98ddf20333bd01ff749b8092b1190ee369a75d3b8c71c2fac80ffdcb1a98d529) from AMD, cause it inherit all necessary Librarys, Tools and Dependencies 2. Set the Environment-Vars for gfx803 and a symlink `ln -s /opt/rocm-6.3.0 /opt/rocm ` 3. Checkout and recompile the [rocBLAS for ROCm 6.3](https://github.com/ROCm/rocBLAS/releases/tag/rocm-6.3.0) to use gfx803, cause its not compiled for gfx803 into the official ROCm-Dockercontainer`/install.sh -ida gfx803 ` 3. Checkout last Ollama-Version [v0.5.12](https://github.com/ollama/ollama/releases/tag/v0.5.12) 4. replace Line 74 discover/gpu.go from "9 "to number "8"... cause without the change Ollama-Backend will never used gfx803 in Linux. Take a look on [this discussion](https://github.com/likelovewant/ollama-for-amd/issues/51) `sed -i 's/var RocmComputeMajorMin = "9"/var RocmComputeMajorMin = "8"/' discover/gpu.go` 5. add gfx803 to AMDGPUTARGETs on `CMakePresets.json` and `CMakeLists.txt` 6. ` sed -i 's/"gfx900;gfx940;gfx941;gfx942;gfx1010;gfx1012;gfx1030;gfx1100;gfx1101;gfx1102;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack+;gfx90a:xnack-" /"gfx803;gfx900;gfx940;gfx941;gfx942;gfx1010;gfx1012;gfx1030;gfx1100;gfx1101;gfx1102;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack+;gfx90a:xnack-" /g' CMakePresets.json` 7. `sed -i 's/"list(FILTER AMDGPU_TARGETS INCLUDE REGEX "^gfx(900|94[012]|101[02]|1030|110[012])$")"/"list(FILTER AMDGPU_TARGETS INCLUDE REGEX "^gfx(803|900|94[012]|101[02]|1030|110[012])$")"/g' CMakeLists.txt` 8. Compile Ollama Backend for gfx803 `cmake -B build -DAMDGPU_TARGETS=gfx803 && cmake --build build` 9. Compile Ollama Frontend `go generate ./... && go build .` 10. Start Ollama Backend via `./ollama serve&` into the background. You should get an output similar like [this](https://pastebin.com/N8ha3dA5) 11. Start Ollama Frontend via e.g. `ollama run llama3.1:8b` You should get an output similar like [this](https://pastebin.com/TEURa1sg) But all this Steps will do my [`Dockerfile_rocm63_ollama`](https://github.com/robertrosenbusch/gfx803_rocm/blob/main/Dockerfile_rocm63_ollama) for you :D Benchmark i took last month: ##### ROCm-6.3.0 Ollama v0.5.4 Benchmark on RX570 vs CPU Ryzen7 3700x |CPU/GPU |deepseek-r1:8b|llama3.1:8b|llama2:7b| |--------------|-----|------|-----| |[GPU AMD RX570](https://github.com/robertrosenbusch/gfx803_rocm/tree/main/benchmark/gpu_rocm63_ollama_benchmark.png)|Total: 18.19 t/s|Total: 18.80 t/s|Total: 27.46 t/s| |[CPU AMD Ryzen 7 3700x](https://github.com/robertrosenbusch/gfx803_rocm/tree/main/benchmark/cpu_rocm63_ollama_benchmark.png)| Total: 7.33 t/s|Total: 7.53 t/s|Total: 8.76 t/s| `
Author
Owner

@Tamila-2017 commented on GitHub (Feb 25, 2025):

robertrosenbusch, but you're using a very complicated method.
What advantages does it have compared to this simple method?
And here the obtained speed is demonstrated.

<!-- gh-comment-id:2683243070 --> @Tamila-2017 commented on GitHub (Feb 25, 2025): **robertrosenbusch**, but you're using a very complicated method. What advantages does it have compared to this [**simple method?** ](https://github.com/ollama/ollama/issues/2453#issuecomment-2429953268) And here the [**obtained speed**](https://github.com/ollama/ollama/issues/2453#issuecomment-2433678635) is demonstrated.
Author
Owner

@robertrosenbusch commented on GitHub (Feb 25, 2025):

But this is a very difficult way. What advantages does it have compared to this simple method? And here the obtained speed is demonstrated.

@Tamila-2017 : its so funny to talk with an AI-Robot ;P Whats your conclusion to simplyfy the 5-Steps on my Dockerfile to use gfx803 on a similar ubuntu?

<!-- gh-comment-id:2683322817 --> @robertrosenbusch commented on GitHub (Feb 25, 2025): > But this is a very difficult way. What advantages does it have compared to this [**simple method?** ](https://github.com/ollama/ollama/issues/2453#issuecomment-2429953268) And here the [**obtained speed**](https://github.com/ollama/ollama/issues/2453#issuecomment-2433678635) is demonstrated. @Tamila-2017 : its so funny to talk with an AI-Robot ;P Whats your conclusion to simplyfy the 5-Steps on my Dockerfile to use gfx803 on a similar ubuntu?
Author
Owner

@Tamila-2017 commented on GitHub (Feb 25, 2025):

Robertrosenbusch , I'm sorry, but unfortunately I don't understand the meaning of your question.
Could you ask your question in more detail?

<!-- gh-comment-id:2683381234 --> @Tamila-2017 commented on GitHub (Feb 25, 2025): **Robertrosenbusch** , I'm sorry, but unfortunately I don't understand the meaning of your question. Could you ask your question in more detail?
Author
Owner

@sanchez314c commented on GitHub (Feb 27, 2025):

Thank you a lot @robertrosenbusch Could explain each step in more detail? Thank you

Of course :D The short version will be to take a look on my documentation for gfx803-Ollama Dockerfile and/or on my install Instructions for Ollama :D

Hint Its not necessary to use any specific ROCm-Installation/Librarys on your Host System, cause you pass through the --device=/dev/kfd --device=/dev/dri . from the Kernel to the ROCm 6.3 Docker-Image. You only need a Kernel-Version where these both devices are aviable. And of course the docker container is independent of the host Linux version used.... And you need a lot of time to download and compile Ollama for gfx803 ^.

  1. I used the ROCm 6.3/PyTorch Dockerfile from AMD, cause it inherit all necessary Librarys, Tools and Dependencies
  2. Set the Environment-Vars for gfx803 and a symlink ln -s /opt/rocm-6.3.0 /opt/rocm
  3. Checkout and recompile the rocBLAS for ROCm 6.3 to use gfx803, cause its not compiled for gfx803 into the official ROCm-Dockercontainer/install.sh -ida gfx803
  4. Checkout last Ollama-Version v0.5.12
  5. replace Line 74 discover/gpu.go from "9 "to number "8"... cause without the change Ollama-Backend will never used gfx803 in Linux. Take a look on this discussion sed -i 's/var RocmComputeMajorMin = "9"/var RocmComputeMajorMin = "8"/' discover/gpu.go
  6. add gfx803 to AMDGPUTARGETs on CMakePresets.json and CMakeLists.txt
  7. sed -i 's/"gfx900;gfx940;gfx941;gfx942;gfx1010;gfx1012;gfx1030;gfx1100;gfx1101;gfx1102;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack+;gfx90a:xnack-" /"gfx803;gfx900;gfx940;gfx941;gfx942;gfx1010;gfx1012;gfx1030;gfx1100;gfx1101;gfx1102;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack+;gfx90a:xnack-" /g' CMakePresets.json
  8. sed -i 's/"list(FILTER AMDGPU_TARGETS INCLUDE REGEX "^gfx(900|94[012]|101[02]|1030|110[012])$")"/"list(FILTER AMDGPU_TARGETS INCLUDE REGEX "^gfx(803|900|94[012]|101[02]|1030|110[012])$")"/g' CMakeLists.txt
  9. Compile Ollama Backend for gfx803 cmake -B build -DAMDGPU_TARGETS=gfx803 && cmake --build build
  10. Compile Ollama Frontend go generate ./... && go build .
  11. Start Ollama Backend via ./ollama serve& into the background. You should get an output similar like this
  12. Start Ollama Frontend via e.g. ollama run llama3.1:8b You should get an output similar like this

But all this Steps will do my Dockerfile_rocm63_ollama for you :D

Benchmark i took last month:

ROCm-6.3.0 Ollama v0.5.4 Benchmark on RX570 vs CPU Ryzen7 3700x

CPU/GPU deepseek-r1:8b llama3.1:8b llama2:7b
GPU AMD RX570 Total: 18.19 t/s Total: 18.80 t/s Total: 27.46 t/s
CPU AMD Ryzen 7 3700x Total: 7.33 t/s Total: 7.53 t/s Total: 8.76 t/s
`

@robertrosenbusch has you got this to work on bare-metal or just in Docker?

<!-- gh-comment-id:2686812646 --> @sanchez314c commented on GitHub (Feb 27, 2025): > > Thank you a lot [@robertrosenbusch](https://github.com/robertrosenbusch) Could explain each step in more detail? Thank you > > Of course :D The short version will be to take a look on [my documentation for gfx803-Ollama ](https://github.com/robertrosenbusch/gfx803_rocm/blob/main/Dockerfile_rocm63_ollama)Dockerfile and/or on my[ install Instructions ](https://github.com/robertrosenbusch/gfx803_rocm) for Ollama :D > > **Hint** Its not necessary to use any specific ROCm-Installation/Librarys on your Host System, cause you pass through the `--device=/dev/kfd --device=/dev/dri .` from the Kernel to the ROCm 6.3 Docker-Image. You only need a Kernel-Version where these both devices are aviable. And of course the docker container is independent of the host Linux version used.... And you **need a lot of time to download and compile** Ollama for gfx803 ^. > > 1. I used the [ROCm 6.3/PyTorch Dockerfile](https://hub.docker.com/layers/rocm/pytorch/rocm6.3_ubuntu24.04_py3.12_pytorch_release_2.4.0/images/sha256-98ddf20333bd01ff749b8092b1190ee369a75d3b8c71c2fac80ffdcb1a98d529) from AMD, cause it inherit all necessary Librarys, Tools and Dependencies > 2. Set the Environment-Vars for gfx803 and a symlink `ln -s /opt/rocm-6.3.0 /opt/rocm ` > 3. Checkout and recompile the [rocBLAS for ROCm 6.3](https://github.com/ROCm/rocBLAS/releases/tag/rocm-6.3.0) to use gfx803, cause its not compiled for gfx803 into the official ROCm-Dockercontainer`/install.sh -ida gfx803 ` > 4. Checkout last Ollama-Version [v0.5.12](https://github.com/ollama/ollama/releases/tag/v0.5.12) > 5. replace Line 74 discover/gpu.go from "9 "to number "8"... cause without the change Ollama-Backend will never used gfx803 in Linux. Take a look on [this discussion](https://github.com/likelovewant/ollama-for-amd/issues/51) `sed -i 's/var RocmComputeMajorMin = "9"/var RocmComputeMajorMin = "8"/' discover/gpu.go` > 6. add gfx803 to AMDGPUTARGETs on `CMakePresets.json` and `CMakeLists.txt` > 7. ` sed -i 's/"gfx900;gfx940;gfx941;gfx942;gfx1010;gfx1012;gfx1030;gfx1100;gfx1101;gfx1102;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack+;gfx90a:xnack-" /"gfx803;gfx900;gfx940;gfx941;gfx942;gfx1010;gfx1012;gfx1030;gfx1100;gfx1101;gfx1102;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack+;gfx90a:xnack-" /g' CMakePresets.json` > 8. `sed -i 's/"list(FILTER AMDGPU_TARGETS INCLUDE REGEX "^gfx(900|94[012]|101[02]|1030|110[012])$")"/"list(FILTER AMDGPU_TARGETS INCLUDE REGEX "^gfx(803|900|94[012]|101[02]|1030|110[012])$")"/g' CMakeLists.txt` > 9. Compile Ollama Backend for gfx803 `cmake -B build -DAMDGPU_TARGETS=gfx803 && cmake --build build` > 10. Compile Ollama Frontend `go generate ./... && go build .` > 11. Start Ollama Backend via `./ollama serve&` into the background. You should get an output similar like [this](https://pastebin.com/N8ha3dA5) > 12. Start Ollama Frontend via e.g. `ollama run llama3.1:8b` You should get an output similar like [this](https://pastebin.com/TEURa1sg) > > But all this Steps will do my [`Dockerfile_rocm63_ollama`](https://github.com/robertrosenbusch/gfx803_rocm/blob/main/Dockerfile_rocm63_ollama) for you :D > > Benchmark i took last month: > > ##### ROCm-6.3.0 Ollama v0.5.4 Benchmark on RX570 vs CPU Ryzen7 3700x > CPU/GPU deepseek-r1:8b llama3.1:8b llama2:7b > [GPU AMD RX570](https://github.com/robertrosenbusch/gfx803_rocm/tree/main/benchmark/gpu_rocm63_ollama_benchmark.png) Total: 18.19 t/s Total: 18.80 t/s Total: 27.46 t/s > [CPU AMD Ryzen 7 3700x](https://github.com/robertrosenbusch/gfx803_rocm/tree/main/benchmark/cpu_rocm63_ollama_benchmark.png) Total: 7.33 t/s Total: 7.53 t/s Total: 8.76 t/s > ` @robertrosenbusch has you got this to work on bare-metal or just in Docker?
Author
Owner

@robertrosenbusch commented on GitHub (Feb 27, 2025):

@robertrosenbusch has you got this to work on bare-metal or just in Docker?

"Just" in Docker, cause you are independent on what ROCm-Version or Linux-Flavour or Linux-Version (Ubuntu, PopOS, CentOS,Arch etc.) you used on your BareMetal. And of course its well documented with "official" AMD ROCm-Dockercontainer on last aviable ROCm Version 6.3 and the last Ollama Version v0.5.12:P And of course the handling is much more easier.

I have no clue while anyone will use it on Baremetal. There is no Performance-Impact, cause Docker is a "simple" Process-Isolation with a well known Documentation and a very good Tool-/Userland. You need only six/6 (!) Steps to run Ollama v0.5.12 with gfx803 on your Linux (insert_flavor/_insert_flavor_version) :P From Zero to full working Ollama on gfx803.

But hey, feel free to compile and install it on your bare-metal. To install gfx803 ROCm-SoftwareStack on well known working Baremetal i am out.

Beware: Abroad a full function gfx-803 ROCm-Hazzle on your Baremetal you had have to change on this official Ollama-Git the files: CMakeLists.txt, CMakePresets.json and discover/gpu.go

Or you are using these Ollama Fork cause there is a small change in gpu.go on Line 74 included ^.^

<!-- gh-comment-id:2688711063 --> @robertrosenbusch commented on GitHub (Feb 27, 2025): > [@robertrosenbusch](https://github.com/robertrosenbusch) has you got this to work on bare-metal or just in Docker? "Just" in Docker, cause you are **independent** on what ROCm-Version or Linux-Flavour or Linux-Version (Ubuntu, PopOS, CentOS,Arch etc.) you used on your BareMetal. And of course its well documented with "official" AMD ROCm-Dockercontainer on last aviable ROCm Version 6.3 and the last Ollama Version v0.5.12:P And of course the handling is much more easier. I have no clue while anyone will use it on Baremetal. There is no Performance-Impact, cause Docker is a "simple" Process-Isolation with a well known Documentation and a very good Tool-/Userland. You need [only six/6 (!) Steps](https://github.com/robertrosenbusch/gfx803_rocm) to run Ollama v0.5.12 with gfx803 on your Linux (insert_flavor/_insert_flavor_version) :P From Zero to full working Ollama on gfx803. But hey, feel free to compile and install it on your bare-metal. To install gfx803 ROCm-SoftwareStack on well known working Baremetal i am out. **Beware:** Abroad a full function gfx-803 ROCm-Hazzle on your Baremetal you had have to change on this official Ollama-Git the files:` CMakeLists.txt`,` CMakePresets.json` and `discover/gpu.go` Or you are using these [Ollama Fork]( https://github.com/likelovewant/ollama-for-amd.git) cause there is a small change in gpu.go on Line 74 included ^.^
Author
Owner

@sanchez314c commented on GitHub (Feb 28, 2025):

@robertrosenbusch why would anyone want to use it on Bare-metal instead of Docker? Why, NOT. Anything run at a bare-metal level is going to be better.

I was asking because I have yet to be able to get a full version to compile. I keep crashing and arriving at CPU related errors for MAXVINNI and I don't even know where they're coming from because I'm not specifiying them.

<!-- gh-comment-id:2689554413 --> @sanchez314c commented on GitHub (Feb 28, 2025): @robertrosenbusch why would anyone want to use it on Bare-metal instead of Docker? Why, NOT. Anything run at a bare-metal level is going to be better. I was asking because I have yet to be able to get a full version to compile. I keep crashing and arriving at CPU related errors for MAXVINNI and I don't even know where they're coming from because I'm not specifiying them.
Author
Owner

@Tamila-2017 commented on GitHub (Feb 28, 2025):

Anything run at a bare-metal level is going to be better.

You are absolutely right! Which is why I don't like Docker. It's an unnecessary layer of complexity.

<!-- gh-comment-id:2689666048 --> @Tamila-2017 commented on GitHub (Feb 28, 2025): > Anything run at a bare-metal level is going to be better. You are absolutely right! Which is why I don't like Docker. It's an unnecessary layer of complexity.
Author
Owner

@robertrosenbusch commented on GitHub (Feb 28, 2025):

@robertrosenbusch why would anyone want to use it on Bare-metal instead of Docker? Why, NOT. Anything run at a bare-metal level is going to be better.

I was asking because I have yet to be able to get a full version to compile. I keep crashing and arriving at CPU related errors for MAXVINNI and I don't even know where they're coming from because I'm not specifiying them.

@sanchez314c : Sorry to beeing harsh. Lets focus on Ollama Sourcecode and while you are not able to use gfx803 since months without changes/patch on the Ollama Sourcecode, independent you use it on Baremetal or into a Docker.

  1. gfx803 was not able to use in Ollama cause the minimum gfx-Version is gfx9xx on gpu.go
  2. gfx803 is not supported on CMakePresets.json
  3. gfx803 is not supported on CMakeLists.txt

If you had have a full working Baremetal/Docker ROCm-Environment for gfx803... checkout the latest Release-Ollama via Git, change the three files, recompile Ollama and be happy :P

<!-- gh-comment-id:2691283224 --> @robertrosenbusch commented on GitHub (Feb 28, 2025): > [@robertrosenbusch](https://github.com/robertrosenbusch) why would anyone want to use it on Bare-metal instead of Docker? Why, NOT. Anything run at a bare-metal level is going to be better. > > I was asking because I have yet to be able to get a full version to compile. I keep crashing and arriving at CPU related errors for MAXVINNI and I don't even know where they're coming from because I'm not specifiying them. @sanchez314c : Sorry to beeing harsh. **Lets focus on Ollama Sourcecode** and while you are not able to use gfx803 since months without changes/patch on the Ollama Sourcecode, independent you use it on Baremetal or into a Docker. 1. gfx803 was not able to use in Ollama cause the minimum gfx-Version is gfx**9**xx on `gpu.go` 2. gfx803 is not supported on `CMakePresets.json` 3. gfx803 is not supported on `CMakeLists.txt` If you had have a full working Baremetal/Docker ROCm-Environment for gfx803... checkout the latest Release-Ollama via Git, change the three files, recompile Ollama and be happy :P
Author
Owner

@sanchez314c commented on GitHub (Mar 6, 2025):

Can anyone confirm a successful compile on bare-metal and include what version of ROCm they are using? I'm still hitting walls trying to get this going and would really appreciate any guidance/help. I know I'm not only one with legacy hardware (RX580 and Tesla K80) and I'm sure there are a lot of people out there who would appreciate this. In the midst of trying to figure this out my objective is to build a scripted install that does everything as well as detect system aspects. I have most of that work just can't get the compiles working. There is NO CLEAR instructions -- ANYWHERE -- including here, and any of the Github repos like Ollama-for-AMD also don't include clear easy instructions. Everything is somewhat criptic. And for someone who doesn't know how to compile and is learning this is extremely frustrating and difficult. I don't understand why you guys don't just add this legacy support provisions to source.

<!-- gh-comment-id:2704594976 --> @sanchez314c commented on GitHub (Mar 6, 2025): Can anyone confirm a successful compile on bare-metal and include what version of ROCm they are using? I'm still hitting walls trying to get this going and would really appreciate any guidance/help. I know I'm not only one with legacy hardware (RX580 and Tesla K80) and I'm sure there are a lot of people out there who would appreciate this. In the midst of trying to figure this out my objective is to build a scripted install that does everything as well as detect system aspects. I have most of that work just can't get the compiles working. There is NO CLEAR instructions -- ANYWHERE -- including here, and any of the Github repos like Ollama-for-AMD also don't include clear easy instructions. Everything is somewhat criptic. And for someone who doesn't know how to compile and is learning this is extremely frustrating and difficult. I don't understand why you guys don't just add this legacy support provisions to source.
Author
Owner

@dariosusman commented on GitHub (Mar 7, 2025):

I'm not entirely sure yet, but I seem to have been able to get this running on a bare-metal

https://github.com/likelovewant/ollama-for-amd/issues/62#issuecomment-2705481206

<!-- gh-comment-id:2705482087 --> @dariosusman commented on GitHub (Mar 7, 2025): I'm not entirely sure yet, but I seem to have been able to get this running on a bare-metal https://github.com/likelovewant/ollama-for-amd/issues/62#issuecomment-2705481206
Author
Owner

@sanchez314c commented on GitHub (Mar 9, 2025):

@robertrosenbusch

⠏ time=2025-03-09T05:58:51.134Z level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: exit status 2"
[GIN] 2025/03/09 - 05:58:51 | 500 | 1.051664302s | 127.0.0.1 | POST "/api/generate"
Error: llama runner process has terminated: exit status 2
root@02574468dc8f:/ollama# time=2025-03-09T05:58:56.135Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.001018716 model=/root/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc
time=2025-03-09T05:58:56.385Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.251154112 model=/root/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc
time=2

Cannot get it to work.

<!-- gh-comment-id:2708687691 --> @sanchez314c commented on GitHub (Mar 9, 2025): @robertrosenbusch ⠏ time=2025-03-09T05:58:51.134Z level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: exit status 2" [GIN] 2025/03/09 - 05:58:51 | 500 | 1.051664302s | 127.0.0.1 | POST "/api/generate" Error: llama runner process has terminated: exit status 2 root@02574468dc8f:/ollama# time=2025-03-09T05:58:56.135Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.001018716 model=/root/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc time=2025-03-09T05:58:56.385Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.251154112 model=/root/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc time=2 Cannot get it to work.
Author
Owner

@sanchez314c commented on GitHub (Mar 12, 2025):

@robertrosenbusch

heathen-admin@LLMServer:~/ollama$ ./ollama run tinyllama
[GIN] 2025/03/11 - 21:02:00 | 200 | 47.412µs | 127.0.0.1 | HEAD "/"
[GIN] 2025/03/11 - 21:02:00 | 200 | 5.869556ms | 127.0.0.1 | POST "/api/show"
time=2025-03-11T21:02:00.108-04:00 level=WARN source=ggml.go:132 msg="key not found" key=llama.attention.key_length default=64
time=2025-03-11T21:02:00.108-04:00 level=WARN source=ggml.go:132 msg="key not found" key=llama.attention.value_length default=64
time=2025-03-11T21:02:00.108-04:00 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/home/heathen-admin/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 gpu=0 parallel=4 available=7603056640 required="1.7 GiB"
time=2025-03-11T21:02:00.109-04:00 level=INFO source=server.go:97 msg="system memory" total="125.5 GiB" free="119.7 GiB" free_swap="8.0 GiB"
time=2025-03-11T21:02:00.109-04:00 level=WARN source=ggml.go:132 msg="key not found" key=llama.attention.key_length default=64
time=2025-03-11T21:02:00.109-04:00 level=WARN source=ggml.go:132 msg="key not found" key=llama.attention.value_length default=64
time=2025-03-11T21:02:00.109-04:00 level=INFO source=server.go:130 msg=offload library=rocm layers.requested=-1 layers.model=23 layers.offload=23 layers.split="" memory.available="[7.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="1.7 GiB" memory.required.partial="1.7 GiB" memory.required.kv="176.0 MiB" memory.required.allocations="[1.7 GiB]" memory.weights.total="696.1 MiB" memory.weights.repeating="644.8 MiB" memory.weights.nonrepeating="51.3 MiB" memory.graph.full="544.0 MiB" memory.graph.partial="546.3 MiB"
time=2025-03-11T21:02:00.109-04:00 level=INFO source=server.go:380 msg="starting llama server" cmd="/home/heathen-admin/ollama/ollama runner --model /home/heathen-admin/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 --ctx-size 8192 --batch-size 512 --n-gpu-layers 23 --threads 10 --parallel 4 --port 43799"
time=2025-03-11T21:02:00.109-04:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-03-11T21:02:00.110-04:00 level=INFO source=server.go:557 msg="waiting for llama runner to start responding"
time=2025-03-11T21:02:00.110-04:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error"
time=2025-03-11T21:02:00.121-04:00 level=INFO source=runner.go:932 msg="starting go runner"
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
Device 0: Tesla K80, compute capability 3.7, VMM: yes
Device 1: Tesla K80, compute capability 3.7, VMM: yes
load_backend: loaded CUDA backend from /home/heathen-admin/ollama/build/lib/ollama/libggml-cuda.so
⠇ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon RX 580 Series, compute capability 8.0, VMM: no
load_backend: loaded ROCm backend from /home/heathen-admin/ollama/build/lib/ollama/libggml-hip.so
load_backend: loaded CPU backend from /home/heathen-admin/ollama/build/lib/ollama/libggml-cpu-skylakex.so
time=2025-03-11T21:02:00.951-04:00 level=INFO source=runner.go:935 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | CUDA : ARCHS = 370 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | ROCm : PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | AVX512 = 1 | LLAMAFILE = 1 | cgo(gcc)" threads=10
time=2025-03-11T21:02:00.951-04:00 level=INFO source=runner.go:993 msg="Server listening on 127.0.0.1:43799"
⠏ llama_load_model_from_file: using device CUDA0 (Tesla K80) - 11354 MiB free
⠏ time=2025-03-11T21:02:01.114-04:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model"
llama_load_model_from_file: using device CUDA1 (Tesla K80) - 11354 MiB free
llama_load_model_from_file: using device ROCm0 (Radeon RX 580 Series) - 8148 MiB free
llama_model_loader: loaded meta data with 23 key-value pairs and 201 tensors from /home/heathen-admin/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 (version GGUF V3 (latest))
llama_model_loader: loaded meta data with 23 key-value pairs and 201 tensors from /home/heathen-admin/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = TinyLlama
llama_model_loader: - kv 2: llama.context_length u32 = 2048
llama_model_loader: - kv 3: llama.embedding_length u32 = 2048
llama_model_loader: - kv 4: llama.block_count u32 = 22
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 5632
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 64
llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 4
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000
llama_model_loader: - kv 11: general.file_type u32 = 2
llama_model_loader: - kv 12: tokenizer.ggml.model str = llama
llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["", "", "", "<0x00>", "<...
llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 16: tokenizer.ggml.merges arr[str,61249] = ["▁ t", "e r", "i n", "▁ a", "e n...
llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 2
llama_model_loader: - kv 21: tokenizer.chat_template str = {% for message in messages %}\n{% if m...
llama_model_loader: - kv 22: general.quantization_version u32 = 2
llama_model_loader: - type f32: 45 tensors
llama_model_loader: - type q4_0: 155 tensors
llama_model_loader: - type q6_K: 1 tensors
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
llm_load_vocab: special tokens cache size = 3
llm_load_vocab: token to piece cache size = 0.1684 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: vocab_only = 0
llm_load_print_meta: n_ctx_train = 2048
llm_load_print_meta: n_embd = 2048
llm_load_print_meta: n_layer = 22
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 4
llm_load_print_meta: n_rot = 64
llm_load_print_meta: n_swa = 0
llm_load_print_meta: n_embd_head_k = 64
llm_load_print_meta: n_embd_head_v = 64
llm_load_print_meta: n_gqa = 8
llm_load_print_meta: n_embd_k_gqa = 256
llm_load_print_meta: n_embd_v_gqa = 256
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 5632
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn = 2048
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: ssm_dt_b_c_rms = 0
llm_load_print_meta: model type = 1B
llm_load_print_meta: model ftype = Q4_0
llm_load_print_meta: model params = 1.10 B
llm_load_print_meta: model size = 606.53 MiB (4.63 BPW)
llm_load_print_meta: general.name = TinyLlama
llm_load_print_meta: BOS token = 1 ''
llm_load_print_meta: EOS token = 2 '
'
llm_load_print_meta: UNK token = 0 ''
llm_load_print_meta: PAD token = 2 ''
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_print_meta: EOG token = 2 ''
llm_load_print_meta: max token length = 48
⠙ llm_load_tensors: offloading 22 repeating layers to GPU
llm_load_tensors: offloading output layer to GPU
llm_load_tensors: offloaded 23/23 layers to GPU
llm_load_tensors: ROCm0 model buffer size = 169.48 MiB
llm_load_tensors: CUDA0 model buffer size = 212.77 MiB
llm_load_tensors: CUDA1 model buffer size = 189.12 MiB
llm_load_tensors: CPU_Mapped model buffer size = 35.16 MiB
SIGSEGV: segmentation violation
PC=0x729e387ac935 m=3 sigcode=1 addr=0x18
signal arrived during cgo execution

goroutine 50 gp=0xc000105340 m=3 mp=0xc00009ce08 [syscall]:

<!-- gh-comment-id:2716109443 --> @sanchez314c commented on GitHub (Mar 12, 2025): @robertrosenbusch heathen-admin@LLMServer:~/ollama$ ./ollama run tinyllama [GIN] 2025/03/11 - 21:02:00 | 200 | 47.412µs | 127.0.0.1 | HEAD "/" [GIN] 2025/03/11 - 21:02:00 | 200 | 5.869556ms | 127.0.0.1 | POST "/api/show" time=2025-03-11T21:02:00.108-04:00 level=WARN source=ggml.go:132 msg="key not found" key=llama.attention.key_length default=64 time=2025-03-11T21:02:00.108-04:00 level=WARN source=ggml.go:132 msg="key not found" key=llama.attention.value_length default=64 time=2025-03-11T21:02:00.108-04:00 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/home/heathen-admin/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 gpu=0 parallel=4 available=7603056640 required="1.7 GiB" time=2025-03-11T21:02:00.109-04:00 level=INFO source=server.go:97 msg="system memory" total="125.5 GiB" free="119.7 GiB" free_swap="8.0 GiB" time=2025-03-11T21:02:00.109-04:00 level=WARN source=ggml.go:132 msg="key not found" key=llama.attention.key_length default=64 time=2025-03-11T21:02:00.109-04:00 level=WARN source=ggml.go:132 msg="key not found" key=llama.attention.value_length default=64 time=2025-03-11T21:02:00.109-04:00 level=INFO source=server.go:130 msg=offload library=rocm layers.requested=-1 layers.model=23 layers.offload=23 layers.split="" memory.available="[7.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="1.7 GiB" memory.required.partial="1.7 GiB" memory.required.kv="176.0 MiB" memory.required.allocations="[1.7 GiB]" memory.weights.total="696.1 MiB" memory.weights.repeating="644.8 MiB" memory.weights.nonrepeating="51.3 MiB" memory.graph.full="544.0 MiB" memory.graph.partial="546.3 MiB" time=2025-03-11T21:02:00.109-04:00 level=INFO source=server.go:380 msg="starting llama server" cmd="/home/heathen-admin/ollama/ollama runner --model /home/heathen-admin/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 --ctx-size 8192 --batch-size 512 --n-gpu-layers 23 --threads 10 --parallel 4 --port 43799" time=2025-03-11T21:02:00.109-04:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2025-03-11T21:02:00.110-04:00 level=INFO source=server.go:557 msg="waiting for llama runner to start responding" time=2025-03-11T21:02:00.110-04:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error" time=2025-03-11T21:02:00.121-04:00 level=INFO source=runner.go:932 msg="starting go runner" ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 2 CUDA devices: Device 0: Tesla K80, compute capability 3.7, VMM: yes Device 1: Tesla K80, compute capability 3.7, VMM: yes load_backend: loaded CUDA backend from /home/heathen-admin/ollama/build/lib/ollama/libggml-cuda.so ⠇ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon RX 580 Series, compute capability 8.0, VMM: no load_backend: loaded ROCm backend from /home/heathen-admin/ollama/build/lib/ollama/libggml-hip.so load_backend: loaded CPU backend from /home/heathen-admin/ollama/build/lib/ollama/libggml-cpu-skylakex.so time=2025-03-11T21:02:00.951-04:00 level=INFO source=runner.go:935 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | CUDA : ARCHS = 370 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | ROCm : PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | AVX512 = 1 | LLAMAFILE = 1 | cgo(gcc)" threads=10 time=2025-03-11T21:02:00.951-04:00 level=INFO source=runner.go:993 msg="Server listening on 127.0.0.1:43799" ⠏ llama_load_model_from_file: using device CUDA0 (Tesla K80) - 11354 MiB free ⠏ time=2025-03-11T21:02:01.114-04:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model" llama_load_model_from_file: using device CUDA1 (Tesla K80) - 11354 MiB free llama_load_model_from_file: using device ROCm0 (Radeon RX 580 Series) - 8148 MiB free llama_model_loader: loaded meta data with 23 key-value pairs and 201 tensors from /home/heathen-admin/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 (version GGUF V3 (latest)) llama_model_loader: loaded meta data with 23 key-value pairs and 201 tensors from /home/heathen-admin/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = TinyLlama llama_model_loader: - kv 2: llama.context_length u32 = 2048 llama_model_loader: - kv 3: llama.embedding_length u32 = 2048 llama_model_loader: - kv 4: llama.block_count u32 = 22 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 5632 llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 64 llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 4 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000 llama_model_loader: - kv 11: general.file_type u32 = 2 llama_model_loader: - kv 12: tokenizer.ggml.model str = llama llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<... llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... llama_model_loader: - kv 16: tokenizer.ggml.merges arr[str,61249] = ["▁ t", "e r", "i n", "▁ a", "e n... llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 0 llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 2 llama_model_loader: - kv 21: tokenizer.chat_template str = {% for message in messages %}\n{% if m... llama_model_loader: - kv 22: general.quantization_version u32 = 2 llama_model_loader: - type f32: 45 tensors llama_model_loader: - type q4_0: 155 tensors llama_model_loader: - type q6_K: 1 tensors llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect llm_load_vocab: special tokens cache size = 3 llm_load_vocab: token to piece cache size = 0.1684 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 2048 llm_load_print_meta: n_embd = 2048 llm_load_print_meta: n_layer = 22 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 4 llm_load_print_meta: n_rot = 64 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 64 llm_load_print_meta: n_embd_head_v = 64 llm_load_print_meta: n_gqa = 8 llm_load_print_meta: n_embd_k_gqa = 256 llm_load_print_meta: n_embd_v_gqa = 256 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 5632 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 2048 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: ssm_dt_b_c_rms = 0 llm_load_print_meta: model type = 1B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 1.10 B llm_load_print_meta: model size = 606.53 MiB (4.63 BPW) llm_load_print_meta: general.name = TinyLlama llm_load_print_meta: BOS token = 1 '<s>' llm_load_print_meta: EOS token = 2 '</s>' llm_load_print_meta: UNK token = 0 '<unk>' llm_load_print_meta: PAD token = 2 '</s>' llm_load_print_meta: LF token = 13 '<0x0A>' llm_load_print_meta: EOG token = 2 '</s>' llm_load_print_meta: max token length = 48 ⠙ llm_load_tensors: offloading 22 repeating layers to GPU llm_load_tensors: offloading output layer to GPU llm_load_tensors: offloaded 23/23 layers to GPU llm_load_tensors: ROCm0 model buffer size = 169.48 MiB llm_load_tensors: CUDA0 model buffer size = 212.77 MiB llm_load_tensors: CUDA1 model buffer size = 189.12 MiB llm_load_tensors: CPU_Mapped model buffer size = 35.16 MiB SIGSEGV: segmentation violation PC=0x729e387ac935 m=3 sigcode=1 addr=0x18 signal arrived during cgo execution goroutine 50 gp=0xc000105340 m=3 mp=0xc00009ce08 [syscall]:
Author
Owner

@chboishabba commented on GitHub (Mar 15, 2025):

@robertrosenbusch

heathen-admin@LLMServer:~/ollama$ ./ollama run tinyllama [GIN] 2025/03/11 - 21:02:00 | 200 | 47.412µs | 127.0.0.1 | HEAD "/" [GIN] 2025/03/11 - 21:02:00 | 200 | 5.869556ms | 127.0.0.1 | POST "/api/show" time=2025-03-11T21:02:00.108-04:00 level=WARN source=ggml.go:132 msg="key not found" key=llama.attention.key_length default=64 time=2025-03-11T21:02:00.108-04:00 level=WARN source=ggml.go:132 msg="key not found" key=llama.attention.value_length default=64 time=2025-03-11T21:02:00.108-04:00 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/home/heathen-admin/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 gpu=0 parallel=4 available=7603056640 required="1.7 GiB" time=2025-03-11T21:02:00.109-04:00 level=INFO source=server.go:97 msg="system memory" total="125.5 GiB" free="119.7 GiB" free_swap="8.0 GiB" time=2025-03-11T21:02:00.109-04:00 level=WARN source=ggml.go:132 msg="key not found" key=llama.attention.key_length default=64 time=2025-03-11T21:02:00.109-04:00 level=WARN source=ggml.go:132 msg="key not found" key=llama.attention.value_length default=64 time=2025-03-11T21:02:00.109-04:00 level=INFO source=server.go:130 msg=offload library=rocm layers.requested=-1 layers.model=23 layers.offload=23 layers.split="" memory.available="[7.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="1.7 GiB" memory.required.partial="1.7 GiB" memory.required.kv="176.0 MiB" memory.required.allocations="[1.7 GiB]" memory.weights.total="696.1 MiB" memory.weights.repeating="644.8 MiB" memory.weights.nonrepeating="51.3 MiB" memory.graph.full="544.0 MiB" memory.graph.partial="546.3 MiB" time=2025-03-11T21:02:00.109-04:00 level=INFO source=server.go:380 msg="starting llama server" cmd="/home/heathen-admin/ollama/ollama runner --model /home/heathen-admin/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 --ctx-size 8192 --batch-size 512 --n-gpu-layers 23 --threads 10 --parallel 4 --port 43799" time=2025-03-11T21:02:00.109-04:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2025-03-11T21:02:00.110-04:00 level=INFO source=server.go:557 msg="waiting for llama runner to start responding" time=2025-03-11T21:02:00.110-04:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error" time=2025-03-11T21:02:00.121-04:00 level=INFO source=runner.go:932 msg="starting go runner" ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 2 CUDA devices: Device 0: Tesla K80, compute capability 3.7, VMM: yes Device 1: Tesla K80, compute capability 3.7, VMM: yes load_backend: loaded CUDA backend from /home/heathen-admin/ollama/build/lib/ollama/libggml-cuda.so ⠇ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon RX 580 Series, compute capability 8.0, VMM: no load_backend: loaded ROCm backend from /home/heathen-admin/ollama/build/lib/ollama/libggml-hip.so load_backend: loaded CPU backend from /home/heathen-admin/ollama/build/lib/ollama/libggml-cpu-skylakex.so time=2025-03-11T21:02:00.951-04:00 level=INFO source=runner.go:935 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | CUDA : ARCHS = 370 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | ROCm : PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | AVX512 = 1 | LLAMAFILE = 1 | cgo(gcc)" threads=10 time=2025-03-11T21:02:00.951-04:00 level=INFO source=runner.go:993 msg="Server listening on 127.0.0.1:43799" ⠏ llama_load_model_from_file: using device CUDA0 (Tesla K80) - 11354 MiB free ⠏ time=2025-03-11T21:02:01.114-04:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model" llama_load_model_from_file: using device CUDA1 (Tesla K80) - 11354 MiB free llama_load_model_from_file: using device ROCm0 (Radeon RX 580 Series) - 8148 MiB free llama_model_loader: loaded meta data with 23 key-value pairs and 201 tensors from /home/heathen-admin/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 (version GGUF V3 (latest)) llama_model_loader: loaded meta data with 23 key-value pairs and 201 tensors from /home/heathen-admin/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = TinyLlama llama_model_loader: - kv 2: llama.context_length u32 = 2048 llama_model_loader: - kv 3: llama.embedding_length u32 = 2048 llama_model_loader: - kv 4: llama.block_count u32 = 22 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 5632 llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 64 llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 4 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000 llama_model_loader: - kv 11: general.file_type u32 = 2 llama_model_loader: - kv 12: tokenizer.ggml.model str = llama llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["", "", "", "<0x00>", "<... llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... llama_model_loader: - kv 16: tokenizer.ggml.merges arr[str,61249] = ["▁ t", "e r", "i n", "▁ a", "e n... llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 0 llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 2 llama_model_loader: - kv 21: tokenizer.chat_template str = {% for message in messages %}\n{% if m... llama_model_loader: - kv 22: general.quantization_version u32 = 2 llama_model_loader: - type f32: 45 tensors llama_model_loader: - type q4_0: 155 tensors llama_model_loader: - type q6_K: 1 tensors llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect llm_load_vocab: special tokens cache size = 3 llm_load_vocab: token to piece cache size = 0.1684 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 2048 llm_load_print_meta: n_embd = 2048 llm_load_print_meta: n_layer = 22 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 4 llm_load_print_meta: n_rot = 64 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 64 llm_load_print_meta: n_embd_head_v = 64 llm_load_print_meta: n_gqa = 8 llm_load_print_meta: n_embd_k_gqa = 256 llm_load_print_meta: n_embd_v_gqa = 256 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 5632 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 2048 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: ssm_dt_b_c_rms = 0 llm_load_print_meta: model type = 1B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 1.10 B llm_load_print_meta: model size = 606.53 MiB (4.63 BPW) llm_load_print_meta: general.name = TinyLlama llm_load_print_meta: BOS token = 1 '' llm_load_print_meta: EOS token = 2 '' llm_load_print_meta: UNK token = 0 '' llm_load_print_meta: PAD token = 2 '' llm_load_print_meta: LF token = 13 '<0x0A>' llm_load_print_meta: EOG token = 2 '' llm_load_print_meta: max token length = 48 ⠙ llm_load_tensors: offloading 22 repeating layers to GPU llm_load_tensors: offloading output layer to GPU llm_load_tensors: offloaded 23/23 layers to GPU llm_load_tensors: ROCm0 model buffer size = 169.48 MiB llm_load_tensors: CUDA0 model buffer size = 212.77 MiB llm_load_tensors: CUDA1 model buffer size = 189.12 MiB llm_load_tensors: CPU_Mapped model buffer size = 35.16 MiB SIGSEGV: segmentation violation PC=0x729e387ac935 m=3 sigcode=1 addr=0x18 signal arrived during cgo execution

goroutine 50 gp=0xc000105340 m=3 mp=0xc00009ce08 [syscall]:

the segv is due to rocm and is why the docker. i think possibly some ops are not supported eg fp16, i am investigating currently

https://github.com/lamikr/rocm_sdk_builder/issues/228

<!-- gh-comment-id:2726112946 --> @chboishabba commented on GitHub (Mar 15, 2025): > [@robertrosenbusch](https://github.com/robertrosenbusch) > > heathen-admin@LLMServer:~/ollama$ ./ollama run tinyllama [GIN] 2025/03/11 - 21:02:00 | 200 | 47.412µs | 127.0.0.1 | HEAD "/" [GIN] 2025/03/11 - 21:02:00 | 200 | 5.869556ms | 127.0.0.1 | POST "/api/show" time=2025-03-11T21:02:00.108-04:00 level=WARN source=ggml.go:132 msg="key not found" key=llama.attention.key_length default=64 time=2025-03-11T21:02:00.108-04:00 level=WARN source=ggml.go:132 msg="key not found" key=llama.attention.value_length default=64 time=2025-03-11T21:02:00.108-04:00 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/home/heathen-admin/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 gpu=0 parallel=4 available=7603056640 required="1.7 GiB" time=2025-03-11T21:02:00.109-04:00 level=INFO source=server.go:97 msg="system memory" total="125.5 GiB" free="119.7 GiB" free_swap="8.0 GiB" time=2025-03-11T21:02:00.109-04:00 level=WARN source=ggml.go:132 msg="key not found" key=llama.attention.key_length default=64 time=2025-03-11T21:02:00.109-04:00 level=WARN source=ggml.go:132 msg="key not found" key=llama.attention.value_length default=64 time=2025-03-11T21:02:00.109-04:00 level=INFO source=server.go:130 msg=offload library=rocm layers.requested=-1 layers.model=23 layers.offload=23 layers.split="" memory.available="[7.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="1.7 GiB" memory.required.partial="1.7 GiB" memory.required.kv="176.0 MiB" memory.required.allocations="[1.7 GiB]" memory.weights.total="696.1 MiB" memory.weights.repeating="644.8 MiB" memory.weights.nonrepeating="51.3 MiB" memory.graph.full="544.0 MiB" memory.graph.partial="546.3 MiB" time=2025-03-11T21:02:00.109-04:00 level=INFO source=server.go:380 msg="starting llama server" cmd="/home/heathen-admin/ollama/ollama runner --model /home/heathen-admin/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 --ctx-size 8192 --batch-size 512 --n-gpu-layers 23 --threads 10 --parallel 4 --port 43799" time=2025-03-11T21:02:00.109-04:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2025-03-11T21:02:00.110-04:00 level=INFO source=server.go:557 msg="waiting for llama runner to start responding" time=2025-03-11T21:02:00.110-04:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error" time=2025-03-11T21:02:00.121-04:00 level=INFO source=runner.go:932 msg="starting go runner" ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 2 CUDA devices: Device 0: Tesla K80, compute capability 3.7, VMM: yes Device 1: Tesla K80, compute capability 3.7, VMM: yes load_backend: loaded CUDA backend from /home/heathen-admin/ollama/build/lib/ollama/libggml-cuda.so ⠇ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon RX 580 Series, compute capability 8.0, VMM: no load_backend: loaded ROCm backend from /home/heathen-admin/ollama/build/lib/ollama/libggml-hip.so load_backend: loaded CPU backend from /home/heathen-admin/ollama/build/lib/ollama/libggml-cpu-skylakex.so time=2025-03-11T21:02:00.951-04:00 level=INFO source=runner.go:935 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | CUDA : ARCHS = 370 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | ROCm : PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | AVX512 = 1 | LLAMAFILE = 1 | cgo(gcc)" threads=10 time=2025-03-11T21:02:00.951-04:00 level=INFO source=runner.go:993 msg="Server listening on 127.0.0.1:43799" ⠏ llama_load_model_from_file: using device CUDA0 (Tesla K80) - 11354 MiB free ⠏ time=2025-03-11T21:02:01.114-04:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model" llama_load_model_from_file: using device CUDA1 (Tesla K80) - 11354 MiB free llama_load_model_from_file: using device ROCm0 (Radeon RX 580 Series) - 8148 MiB free llama_model_loader: loaded meta data with 23 key-value pairs and 201 tensors from /home/heathen-admin/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 (version GGUF V3 (latest)) llama_model_loader: loaded meta data with 23 key-value pairs and 201 tensors from /home/heathen-admin/.ollama/models/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = TinyLlama llama_model_loader: - kv 2: llama.context_length u32 = 2048 llama_model_loader: - kv 3: llama.embedding_length u32 = 2048 llama_model_loader: - kv 4: llama.block_count u32 = 22 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 5632 llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 64 llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 4 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000 llama_model_loader: - kv 11: general.file_type u32 = 2 llama_model_loader: - kv 12: tokenizer.ggml.model str = llama llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["", "", "", "<0x00>", "<... llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... llama_model_loader: - kv 16: tokenizer.ggml.merges arr[str,61249] = ["▁ t", "e r", "i n", "▁ a", "e n... llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 0 llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 2 llama_model_loader: - kv 21: tokenizer.chat_template str = {% for message in messages %}\n{% if m... llama_model_loader: - kv 22: general.quantization_version u32 = 2 llama_model_loader: - type f32: 45 tensors llama_model_loader: - type q4_0: 155 tensors llama_model_loader: - type q6_K: 1 tensors llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect llm_load_vocab: special tokens cache size = 3 llm_load_vocab: token to piece cache size = 0.1684 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 2048 llm_load_print_meta: n_embd = 2048 llm_load_print_meta: n_layer = 22 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 4 llm_load_print_meta: n_rot = 64 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 64 llm_load_print_meta: n_embd_head_v = 64 llm_load_print_meta: n_gqa = 8 llm_load_print_meta: n_embd_k_gqa = 256 llm_load_print_meta: n_embd_v_gqa = 256 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 5632 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 2048 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: ssm_dt_b_c_rms = 0 llm_load_print_meta: model type = 1B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 1.10 B llm_load_print_meta: model size = 606.53 MiB (4.63 BPW) llm_load_print_meta: general.name = TinyLlama llm_load_print_meta: BOS token = 1 '' llm_load_print_meta: EOS token = 2 '' llm_load_print_meta: UNK token = 0 '' llm_load_print_meta: PAD token = 2 '' llm_load_print_meta: LF token = 13 '<0x0A>' llm_load_print_meta: EOG token = 2 '' llm_load_print_meta: max token length = 48 ⠙ llm_load_tensors: offloading 22 repeating layers to GPU llm_load_tensors: offloading output layer to GPU llm_load_tensors: offloaded 23/23 layers to GPU llm_load_tensors: ROCm0 model buffer size = 169.48 MiB llm_load_tensors: CUDA0 model buffer size = 212.77 MiB llm_load_tensors: CUDA1 model buffer size = 189.12 MiB llm_load_tensors: CPU_Mapped model buffer size = 35.16 MiB SIGSEGV: segmentation violation PC=0x729e387ac935 m=3 sigcode=1 addr=0x18 signal arrived during cgo execution > > goroutine 50 gp=0xc000105340 m=3 mp=0xc00009ce08 [syscall]: the segv is due to rocm and is why the docker. i think possibly some ops are not supported eg fp16, i am investigating currently https://github.com/lamikr/rocm_sdk_builder/issues/228
Author
Owner

@mon-jai commented on GitHub (Mar 17, 2025):

This issue might be resolved if #9650 is merged.

Ollama with Vulkan runs perfectly on my Radeon RX 5700, even though it isn't supported by ROCm. Just run the installer and you are good to go, no manual file swapping required.

For now, we can use the binaries compiled by @McBane87: https://github.com/whyvl/ollama-vulkan/issues/7#issue-2828064858

<!-- gh-comment-id:2728099236 --> @mon-jai commented on GitHub (Mar 17, 2025): This issue might be resolved if #9650 is merged. Ollama with Vulkan runs perfectly on my Radeon RX 5700, even though it isn't supported by ROCm. Just run the installer and you are good to go, no manual file swapping required. For now, we can use the binaries compiled by @McBane87: https://github.com/whyvl/ollama-vulkan/issues/7#issue-2828064858
Author
Owner

@chboishabba commented on GitHub (Mar 19, 2025):

@mon-jai rocm_sdk_builder enables rocm support on these cards so should only require a working rocm installation. I believe robertrosenbusch docker should provide most of what is required as already provides vllm for gfx803 - still testing on my end.

https://github.com/robertrosenbusch/gfx803_rocm/issues/6

https://github.com/lamikr/rocm_sdk_builder/issues/173

<!-- gh-comment-id:2738232231 --> @chboishabba commented on GitHub (Mar 19, 2025): @mon-jai rocm_sdk_builder enables rocm support on these cards so should only require a working rocm installation. I believe robertrosenbusch docker should provide most of what is required as already provides vllm for gfx803 - still testing on my end. https://github.com/robertrosenbusch/gfx803_rocm/issues/6 https://github.com/lamikr/rocm_sdk_builder/issues/173
Author
Owner

@robertrosenbusch commented on GitHub (Mar 21, 2025):

@chboishabba @mon-jai :

This issue might be resolved if #9650 is merged.

Ollama with Vulkan runs perfectly on my Radeon RX 5700, even though it isn't supported by ROCm. Just run the installer and you are good to go, no manual file swapping required.

For now, we can use the binaries compiled by @McBane87: whyvl#7 (comment)

Hi guys and girls and everything between and between outside there. be carefull which each other.

@chboishabba: you didnt answer my questions on my git-repos issue about the whisperX on ROCm/gfx803.

@mon-jai : both works on ollama 0.62 with rx570 (polaris/gfx803) and ROCm 6.3. take a look on my Dockerfile i published.
./ollama run tinyllama

`/ollama run tinyllama
[GIN] 2025/03/21 - 23:08:55 | 200 | 32.127µs | 127.0.0.1 | HEAD "/"
[GIN] 2025/03/21 - 23:08:55 | 200 | 12.315771ms | 127.0.0.1 | POST "/api/show"
[GIN] 2025/03/21 - 23:08:55 | 200 | 5.944312ms | 127.0.0.1 | POST "/api/generate"

/set verbose
Set 'verbose' mode.
whats the difference between absorption and adsorption.
Sure! Here is a brief explanation of the differences between absorption and adsortion in terms of their definitions:

  1. Absorption:
  • Absorption occurs when a substance is completely or partially absorbed into the bloodstream by an organism.
  • This happens when a compound (usually a nutrient) absorbed from the food we eat enters the body through the digestive system, and it gets taken up by a
    cell in our body where it's used to perform chemical reactions. The absorbed amount of this nutrient can be expressed as a percentage of the total amount
    consumed (usually in terms of weight).
  1. Adsortion:
  • Adsortion occurs when a substance is not fully absorbed into the bloodstream by an organism.
  • This happens when an organism cannot digest or absorb the compound in its current form, so it's excreted from the body through the urine or feces. If the
    compound is essential for the organism to function properly, then adsortion may occur. However, most nutrients that are not absorbed into the bloodstream
    but are still used by the body can be metabolized and utilized as energy or stored in tissues or organs for later use.

In summary, absorption occurs when a compound is completely absorbed from the food we eat, while adsortion occurs when a substance is not fully absorbed
into the bloodstream by an organism. Both processes are important for nutrition and can affect our physical and mental health in various ways.[GIN] 2025/03/21 - 23:14:41 | 200 | 5.40056913s | 127.0.0.1 | POST "/api/chat"

total duration: 5.400487346s
load duration: 9.143939ms
prompt eval count: 681 token(s)
prompt eval duration: 8.594151ms
prompt eval rate: 79239.94 tokens/s
eval count: 347 token(s)
eval duration: 5.37478095s
eval rate: 64.56 tokens/s

<!-- gh-comment-id:2744665007 --> @robertrosenbusch commented on GitHub (Mar 21, 2025): @chboishabba @mon-jai : > This issue might be resolved if [#9650](https://github.com/ollama/ollama/pull/9650) is merged. > > Ollama with Vulkan runs perfectly on my Radeon RX 5700, even though it isn't supported by ROCm. Just run the installer and you are good to go, no manual file swapping required. > > For now, we can use the binaries compiled by [@McBane87](https://github.com/McBane87): [whyvl#7 (comment)](https://github.com/whyvl/ollama-vulkan/issues/7#issue-2828064858) Hi guys and girls and everything between and between outside there. be carefull which each other. @chboishabba: you didnt answer my questions on my git-repos issue [about the whisperX](https://github.com/robertrosenbusch/gfx803_rocm/issues/6#issuecomment-2730578664) on ROCm/gfx803. @mon-jai : both works on ollama 0.62 with rx570 (polaris/gfx803) and ROCm 6.3. take a look on my [Dockerfile](https://github.com/robertrosenbusch/gfx803_rocm/blob/main/Dockerfile_rocm63_ollama) i published. `./ollama run tinyllama` `/ollama run tinyllama [GIN] 2025/03/21 - 23:08:55 | 200 | 32.127µs | 127.0.0.1 | HEAD "/" [GIN] 2025/03/21 - 23:08:55 | 200 | 12.315771ms | 127.0.0.1 | POST "/api/show" [GIN] 2025/03/21 - 23:08:55 | 200 | 5.944312ms | 127.0.0.1 | POST "/api/generate" >>> /set verbose Set 'verbose' mode. >>> whats the difference between absorption and adsorption. Sure! Here is a brief explanation of the differences between absorption and adsortion in terms of their definitions: 1. Absorption: - Absorption occurs when a substance is completely or partially absorbed into the bloodstream by an organism. - This happens when a compound (usually a nutrient) absorbed from the food we eat enters the body through the digestive system, and it gets taken up by a cell in our body where it's used to perform chemical reactions. The absorbed amount of this nutrient can be expressed as a percentage of the total amount consumed (usually in terms of weight). 2. Adsortion: - Adsortion occurs when a substance is not fully absorbed into the bloodstream by an organism. - This happens when an organism cannot digest or absorb the compound in its current form, so it's excreted from the body through the urine or feces. If the compound is essential for the organism to function properly, then adsortion may occur. However, most nutrients that are not absorbed into the bloodstream but are still used by the body can be metabolized and utilized as energy or stored in tissues or organs for later use. In summary, absorption occurs when a compound is completely absorbed from the food we eat, while adsortion occurs when a substance is not fully absorbed into the bloodstream by an organism. Both processes are important for nutrition and can affect our physical and mental health in various ways.[GIN] 2025/03/21 - 23:14:41 | 200 | 5.40056913s | 127.0.0.1 | POST "/api/chat" total duration: 5.400487346s load duration: 9.143939ms prompt eval count: 681 token(s) prompt eval duration: 8.594151ms prompt eval rate: 79239.94 tokens/s eval count: 347 token(s) eval duration: 5.37478095s eval rate: 64.56 tokens/s
Author
Owner

@chboishabba commented on GitHub (Mar 24, 2025):

Yes sorry haven't had moment

On Sat, 22 Mar 2025, 9:21 am Robert Rosenbusch, @.***>
wrote:

@chboishabba https://github.com/chboishabba @mon-jai
https://github.com/mon-jai :

This issue might be resolved if #9650
https://github.com/ollama/ollama/pull/9650 is merged.

Ollama with Vulkan runs perfectly on my Radeon RX 5700, even though it
isn't supported by ROCm. Just run the installer and you are good to go, no
manual file swapping required.

For now, we can use the binaries compiled by @McBane87
https://github.com/McBane87: whyvl#7 (comment)
https://github.com/whyvl/ollama-vulkan/issues/7#issue-2828064858

Hi guys and girls and everything between and between outside there. be
carefull which each other.

@chboishabba https://github.com/chboishabba: you didnt answer my
questions on my git-repos issue about the whisperX
https://github.com/robertrosenbusch/gfx803_rocm/issues/6#issuecomment-2730578664
on ROCm/gfx803.

@mon-jai https://github.com/mon-jai : both works on ollama 0.62 with
rx570 (polaris/gfx803)
./ollama run tinyllama

`/ollama run tinyllama
[GIN] 2025/03/21 - 23:08:55 | 200 | 32.127µs | 127.0.0.1 | HEAD "/"
[GIN] 2025/03/21 - 23:08:55 | 200 | 12.315771ms | 127.0.0.1 | POST
"/api/show"
[GIN] 2025/03/21 - 23:08:55 | 200 | 5.944312ms | 127.0.0.1 | POST
"/api/generate"

/set verbose
Set 'verbose' mode.
whats the difference between absorption and adsorption.
Sure! Here is a brief explanation of the differences between absorption
and adsortion in terms of their definitions:

  1. Absorption:
  • Absorption occurs when a substance is completely or partially
    absorbed into the bloodstream by an organism.
  • This happens when a compound (usually a nutrient) absorbed from the
    food we eat enters the body through the digestive system, and it gets taken
    up by a
    cell in our body where it's used to perform chemical reactions. The
    absorbed amount of this nutrient can be expressed as a percentage of the
    total amount
    consumed (usually in terms of weight).
  1. Adsortion:
  • Adsortion occurs when a substance is not fully absorbed into the
    bloodstream by an organism.
  • This happens when an organism cannot digest or absorb the compound
    in its current form, so it's excreted from the body through the urine or
    feces. If the
    compound is essential for the organism to function properly, then
    adsortion may occur. However, most nutrients that are not absorbed into the
    bloodstream
    but are still used by the body can be metabolized and utilized as
    energy or stored in tissues or organs for later use.

In summary, absorption occurs when a compound is completely absorbed from
the food we eat, while adsortion occurs when a substance is not fully
absorbed
into the bloodstream by an organism. Both processes are important for
nutrition and can affect our physical and mental health in various
ways.[GIN] 2025/03/21 - 23:14:41 | 200 | 5.40056913s | 127.0.0.1 | POST
"/api/chat"

total duration: 5.400487346s
load duration: 9.143939ms
prompt eval count: 681 token(s)
prompt eval duration: 8.594151ms
prompt eval rate: 79239.94 tokens/s
eval count: 347 token(s)
eval duration: 5.37478095s
eval rate: 64.56 tokens/s


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/2453#issuecomment-2744665007,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AGM4B3S2PCKIVHQKAFU34HT2VSNH3AVCNFSM6AAAAABDD3P55SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONBUGY3DKMBQG4
.
You are receiving this because you were mentioned.Message ID:
@.***>
[image: robertrosenbusch]robertrosenbusch left a comment
(ollama/ollama#2453)
https://github.com/ollama/ollama/issues/2453#issuecomment-2744665007

@chboishabba https://github.com/chboishabba @mon-jai
https://github.com/mon-jai :

This issue might be resolved if #9650
https://github.com/ollama/ollama/pull/9650 is merged.

Ollama with Vulkan runs perfectly on my Radeon RX 5700, even though it
isn't supported by ROCm. Just run the installer and you are good to go, no
manual file swapping required.

For now, we can use the binaries compiled by @McBane87
https://github.com/McBane87: whyvl#7 (comment)
https://github.com/whyvl/ollama-vulkan/issues/7#issue-2828064858

Hi guys and girls and everything between and between outside there. be
carefull which each other.

@chboishabba https://github.com/chboishabba: you didnt answer my
questions on my git-repos issue about the whisperX
https://github.com/robertrosenbusch/gfx803_rocm/issues/6#issuecomment-2730578664
on ROCm/gfx803.

@mon-jai https://github.com/mon-jai : both works on ollama 0.62 with
rx570 (polaris/gfx803)
./ollama run tinyllama

`/ollama run tinyllama
[GIN] 2025/03/21 - 23:08:55 | 200 | 32.127µs | 127.0.0.1 | HEAD "/"
[GIN] 2025/03/21 - 23:08:55 | 200 | 12.315771ms | 127.0.0.1 | POST
"/api/show"
[GIN] 2025/03/21 - 23:08:55 | 200 | 5.944312ms | 127.0.0.1 | POST
"/api/generate"

/set verbose
Set 'verbose' mode.
whats the difference between absorption and adsorption.
Sure! Here is a brief explanation of the differences between absorption
and adsortion in terms of their definitions:

  1. Absorption:
  • Absorption occurs when a substance is completely or partially
    absorbed into the bloodstream by an organism.
  • This happens when a compound (usually a nutrient) absorbed from the
    food we eat enters the body through the digestive system, and it gets taken
    up by a
    cell in our body where it's used to perform chemical reactions. The
    absorbed amount of this nutrient can be expressed as a percentage of the
    total amount
    consumed (usually in terms of weight).
  1. Adsortion:
  • Adsortion occurs when a substance is not fully absorbed into the
    bloodstream by an organism.
  • This happens when an organism cannot digest or absorb the compound
    in its current form, so it's excreted from the body through the urine or
    feces. If the
    compound is essential for the organism to function properly, then
    adsortion may occur. However, most nutrients that are not absorbed into the
    bloodstream
    but are still used by the body can be metabolized and utilized as
    energy or stored in tissues or organs for later use.

In summary, absorption occurs when a compound is completely absorbed from
the food we eat, while adsortion occurs when a substance is not fully
absorbed
into the bloodstream by an organism. Both processes are important for
nutrition and can affect our physical and mental health in various
ways.[GIN] 2025/03/21 - 23:14:41 | 200 | 5.40056913s | 127.0.0.1 | POST
"/api/chat"

total duration: 5.400487346s
load duration: 9.143939ms
prompt eval count: 681 token(s)
prompt eval duration: 8.594151ms
prompt eval rate: 79239.94 tokens/s
eval count: 347 token(s)
eval duration: 5.37478095s
eval rate: 64.56 tokens/s


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/2453#issuecomment-2744665007,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AGM4B3S2PCKIVHQKAFU34HT2VSNH3AVCNFSM6AAAAABDD3P55SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONBUGY3DKMBQG4
.
You are receiving this because you were mentioned.Message ID:
@.***>

<!-- gh-comment-id:2746603075 --> @chboishabba commented on GitHub (Mar 24, 2025): Yes sorry haven't had moment On Sat, 22 Mar 2025, 9:21 am Robert Rosenbusch, ***@***.***> wrote: > @chboishabba <https://github.com/chboishabba> @mon-jai > <https://github.com/mon-jai> : > > This issue might be resolved if #9650 > <https://github.com/ollama/ollama/pull/9650> is merged. > > Ollama with Vulkan runs perfectly on my Radeon RX 5700, even though it > isn't supported by ROCm. Just run the installer and you are good to go, no > manual file swapping required. > > For now, we can use the binaries compiled by @McBane87 > <https://github.com/McBane87>: whyvl#7 (comment) > <https://github.com/whyvl/ollama-vulkan/issues/7#issue-2828064858> > > Hi guys and girls and everything between and between outside there. be > carefull which each other. > > @chboishabba <https://github.com/chboishabba>: you didnt answer my > questions on my git-repos issue about the whisperX > <https://github.com/robertrosenbusch/gfx803_rocm/issues/6#issuecomment-2730578664> > on ROCm/gfx803. > > @mon-jai <https://github.com/mon-jai> : both works on ollama 0.62 with > rx570 (polaris/gfx803) > ./ollama run tinyllama > > `/ollama run tinyllama > [GIN] 2025/03/21 - 23:08:55 | 200 | 32.127µs | 127.0.0.1 | HEAD "/" > [GIN] 2025/03/21 - 23:08:55 | 200 | 12.315771ms | 127.0.0.1 | POST > "/api/show" > [GIN] 2025/03/21 - 23:08:55 | 200 | 5.944312ms | 127.0.0.1 | POST > "/api/generate" > > /set verbose > Set 'verbose' mode. > whats the difference between absorption and adsorption. > Sure! Here is a brief explanation of the differences between absorption > and adsortion in terms of their definitions: > > > 1. Absorption: > > > - Absorption occurs when a substance is completely or partially > absorbed into the bloodstream by an organism. > - This happens when a compound (usually a nutrient) absorbed from the > food we eat enters the body through the digestive system, and it gets taken > up by a > cell in our body where it's used to perform chemical reactions. The > absorbed amount of this nutrient can be expressed as a percentage of the > total amount > consumed (usually in terms of weight). > > > 2. Adsortion: > > > - Adsortion occurs when a substance is not fully absorbed into the > bloodstream by an organism. > - This happens when an organism cannot digest or absorb the compound > in its current form, so it's excreted from the body through the urine or > feces. If the > compound is essential for the organism to function properly, then > adsortion may occur. However, most nutrients that are not absorbed into the > bloodstream > but are still used by the body can be metabolized and utilized as > energy or stored in tissues or organs for later use. > > In summary, absorption occurs when a compound is completely absorbed from > the food we eat, while adsortion occurs when a substance is not fully > absorbed > into the bloodstream by an organism. Both processes are important for > nutrition and can affect our physical and mental health in various > ways.[GIN] 2025/03/21 - 23:14:41 | 200 | 5.40056913s | 127.0.0.1 | POST > "/api/chat" > > total duration: 5.400487346s > load duration: 9.143939ms > prompt eval count: 681 token(s) > prompt eval duration: 8.594151ms > prompt eval rate: 79239.94 tokens/s > eval count: 347 token(s) > eval duration: 5.37478095s > eval rate: 64.56 tokens/s > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/2453#issuecomment-2744665007>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AGM4B3S2PCKIVHQKAFU34HT2VSNH3AVCNFSM6AAAAABDD3P55SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONBUGY3DKMBQG4> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> > [image: robertrosenbusch]*robertrosenbusch* left a comment > (ollama/ollama#2453) > <https://github.com/ollama/ollama/issues/2453#issuecomment-2744665007> > > @chboishabba <https://github.com/chboishabba> @mon-jai > <https://github.com/mon-jai> : > > This issue might be resolved if #9650 > <https://github.com/ollama/ollama/pull/9650> is merged. > > Ollama with Vulkan runs perfectly on my Radeon RX 5700, even though it > isn't supported by ROCm. Just run the installer and you are good to go, no > manual file swapping required. > > For now, we can use the binaries compiled by @McBane87 > <https://github.com/McBane87>: whyvl#7 (comment) > <https://github.com/whyvl/ollama-vulkan/issues/7#issue-2828064858> > > Hi guys and girls and everything between and between outside there. be > carefull which each other. > > @chboishabba <https://github.com/chboishabba>: you didnt answer my > questions on my git-repos issue about the whisperX > <https://github.com/robertrosenbusch/gfx803_rocm/issues/6#issuecomment-2730578664> > on ROCm/gfx803. > > @mon-jai <https://github.com/mon-jai> : both works on ollama 0.62 with > rx570 (polaris/gfx803) > ./ollama run tinyllama > > `/ollama run tinyllama > [GIN] 2025/03/21 - 23:08:55 | 200 | 32.127µs | 127.0.0.1 | HEAD "/" > [GIN] 2025/03/21 - 23:08:55 | 200 | 12.315771ms | 127.0.0.1 | POST > "/api/show" > [GIN] 2025/03/21 - 23:08:55 | 200 | 5.944312ms | 127.0.0.1 | POST > "/api/generate" > > /set verbose > Set 'verbose' mode. > whats the difference between absorption and adsorption. > Sure! Here is a brief explanation of the differences between absorption > and adsortion in terms of their definitions: > > > 1. Absorption: > > > - Absorption occurs when a substance is completely or partially > absorbed into the bloodstream by an organism. > - This happens when a compound (usually a nutrient) absorbed from the > food we eat enters the body through the digestive system, and it gets taken > up by a > cell in our body where it's used to perform chemical reactions. The > absorbed amount of this nutrient can be expressed as a percentage of the > total amount > consumed (usually in terms of weight). > > > 2. Adsortion: > > > - Adsortion occurs when a substance is not fully absorbed into the > bloodstream by an organism. > - This happens when an organism cannot digest or absorb the compound > in its current form, so it's excreted from the body through the urine or > feces. If the > compound is essential for the organism to function properly, then > adsortion may occur. However, most nutrients that are not absorbed into the > bloodstream > but are still used by the body can be metabolized and utilized as > energy or stored in tissues or organs for later use. > > In summary, absorption occurs when a compound is completely absorbed from > the food we eat, while adsortion occurs when a substance is not fully > absorbed > into the bloodstream by an organism. Both processes are important for > nutrition and can affect our physical and mental health in various > ways.[GIN] 2025/03/21 - 23:14:41 | 200 | 5.40056913s | 127.0.0.1 | POST > "/api/chat" > > total duration: 5.400487346s > load duration: 9.143939ms > prompt eval count: 681 token(s) > prompt eval duration: 8.594151ms > prompt eval rate: 79239.94 tokens/s > eval count: 347 token(s) > eval duration: 5.37478095s > eval rate: 64.56 tokens/s > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/2453#issuecomment-2744665007>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AGM4B3S2PCKIVHQKAFU34HT2VSNH3AVCNFSM6AAAAABDD3P55SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONBUGY3DKMBQG4> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >
Author
Owner

@lsunay commented on GitHub (Apr 8, 2025):

Setting Up Ollama with AMD Radeon RX 580 GPU Support Using Docker Containers

Hey everyone! I'm super excited to share with you how I set up Ollama with AMD Radeon RX 580 GPU support using Docker containers. A big shoutout to everyone whose posts and guides I've learned from along the way - your help has been invaluable!

In this post, I'll guide you through the process step by step. We'll use the following images:

  1. docker.io/rocm/dev-ubuntu-22.04:5.7.1-complete - This image contains the ROCM 5.7.1 libraries, which support the RX 580 GPU.
  2. mnccouk/ollama-gpu-rx580:latest - This image contains Ollama, which is compatible with ROCM 5.7.1.

Step 1: Stop the Existing Ollama Container

If you have an existing Ollama container running on your CPU, stop it to avoid conflicts:

docker stop ollama

Step 2: Start the ROCM Host Container

First, we'll start the ROCM host container, which will provide the necessary ROCM 5.7.1 libraries:

docker run -it --name rocm_host --device=/dev/kfd --device=/dev/dri docker.io/rocm/dev-ubuntu-22.04:5.7.1-complete

Inside the rocm_host container, verify that the GPU is recognized by running:

rocminfo

Step 3: Start the Ollama GPU Container

Next, start the Ollama GPU container using the volume from the existing Ollama container (if you want to use the existing models) and the ROCM host container:

docker run -d --name ollama_gpu --volumes-from rocm_host -v ollama:/root/.ollama -e HIP_PATH=/opt/rocm/lib/ -e LD_LIBRARY_PATH=/opt/rocm/lib --device /dev/kfd --device /dev/dri -p 11434:11434 mnccouk/ollama-gpu-rx580:latest

Step 4: Start a Chat Session with Ollama

Once the container is running, you can start a chat session using the llama3.1 model:

docker exec -it ollama_gpu ollama run llama3.1

Step 5: Managing the Containers

To stop the running Ollama GPU container:

docker container stop ollama_gpu

To start the Ollama GPU container again:

docker container start ollama_gpu

Step 6: Monitoring GPU Usage

To monitor GPU usage on the host machine, you can use the rocm-smi command:

rocm-smi

To monitor GPU usage inside the rocm_host container, first enter the container:

docker exec -it rocm_host bash

Then, run the rocm-smi command inside the container:

rocm-smi

Additional Notes:

  • If you encounter error code 127, ensure that the rocm-smi command is correctly installed and accessible in the rocm_host container.
  • If you encounter error code 137, it might be due to the container reaching its memory limit or being forcefully stopped by the system. You can adjust the container's memory and CPU limits when starting it.

System Information:

  • Operating System: Debian 12
  • CPU: AMD Ryzen 5500

By following these steps, you should be able to set up and run Ollama with AMD Radeon RX 580 GPU support using Docker containers. Happy computing, and thank you again to everyone who helped me along the way!


<!-- gh-comment-id:2787657077 --> @lsunay commented on GitHub (Apr 8, 2025): **Setting Up Ollama with AMD Radeon RX 580 GPU Support Using Docker Containers** Hey everyone! I'm super excited to share with you how I set up Ollama with AMD Radeon RX 580 GPU support using Docker containers. A big shoutout to everyone whose posts and guides I've learned from along the way - your help has been invaluable! In this post, I'll guide you through the process step by step. We'll use the following images: 1. `docker.io/rocm/dev-ubuntu-22.04:5.7.1-complete` - This image contains the ROCM 5.7.1 libraries, which support the RX 580 GPU. 2. `mnccouk/ollama-gpu-rx580:latest` - This image contains Ollama, which is compatible with ROCM 5.7.1. **Step 1: Stop the Existing Ollama Container** If you have an existing Ollama container running on your CPU, stop it to avoid conflicts: ```bash docker stop ollama ``` **Step 2: Start the ROCM Host Container** First, we'll start the ROCM host container, which will provide the necessary ROCM 5.7.1 libraries: ```bash docker run -it --name rocm_host --device=/dev/kfd --device=/dev/dri docker.io/rocm/dev-ubuntu-22.04:5.7.1-complete ``` Inside the `rocm_host` container, verify that the GPU is recognized by running: ```bash rocminfo ``` **Step 3: Start the Ollama GPU Container** Next, start the Ollama GPU container using the volume from the existing Ollama container (if you want to use the existing models) and the ROCM host container: ```bash docker run -d --name ollama_gpu --volumes-from rocm_host -v ollama:/root/.ollama -e HIP_PATH=/opt/rocm/lib/ -e LD_LIBRARY_PATH=/opt/rocm/lib --device /dev/kfd --device /dev/dri -p 11434:11434 mnccouk/ollama-gpu-rx580:latest ``` **Step 4: Start a Chat Session with Ollama** Once the container is running, you can start a chat session using the `llama3.1` model: ```bash docker exec -it ollama_gpu ollama run llama3.1 ``` **Step 5: Managing the Containers** To stop the running Ollama GPU container: ```bash docker container stop ollama_gpu ``` To start the Ollama GPU container again: ```bash docker container start ollama_gpu ``` **Step 6: Monitoring GPU Usage** To monitor GPU usage on the host machine, you can use the `rocm-smi` command: ```bash rocm-smi ``` To monitor GPU usage inside the `rocm_host` container, first enter the container: ```bash docker exec -it rocm_host bash ``` Then, run the `rocm-smi` command inside the container: ```bash rocm-smi ``` **Additional Notes:** - If you encounter error code 127, ensure that the `rocm-smi` command is correctly installed and accessible in the `rocm_host` container. - If you encounter error code 137, it might be due to the container reaching its memory limit or being forcefully stopped by the system. You can adjust the container's memory and CPU limits when starting it. **System Information:** - **Operating System:** Debian 12 - **CPU:** AMD Ryzen 5500 By following these steps, you should be able to set up and run Ollama with AMD Radeon RX 580 GPU support using Docker containers. Happy computing, and thank you again to everyone who helped me along the way! ---
Author
Owner

@robertrosenbusch commented on GitHub (Apr 9, 2025):

@lsunay: Thats cool!

Just some hints to use GFX803 on Ollama with a Dockercontainer. It could save lifetime :P

Note

ROCm hardware requirements

  1. Make sure, that both Kernel-Devices /dev/dri and /dev/kfd are aviable
  2. Make sure, your Mainboard support PCIe atomic for your connected gfx803-GPU(s) sudo grep flags /sys/class/kfd/kfd/topology/nodes/*/io_links/0/properties

Note

If you wanna use more then one gfx803 aka multiple GFX803 GPUs on Ollama, cause you wanna utilize bigger LLVm then 8b (VRAM Limit) take a look on this

Caution

Prevent ROCm SegFaults on your Linux Distro

After some feedback/research from Users who are using the Dockercontainer from my GIT in Ollama and PyTorch/ComfyUI, cause the devices /dev/dri and /dev/kfd crashed with SegFaults. Please proofe your used Linux-Kernel Version and switch up or down to a well known working Kernel-Version. Fedora 41, Arch and Debian 13 using (in April 2015) suspicious Linux-Kernel-Versions as default.

Kernel Version 5.19 6.2 6.8 6.9 6.10 6.11 6.12 6.13 6.14
working on ROCm 6.3 for Ollama/PyTorch 🟥 🟥
<!-- gh-comment-id:2788657491 --> @robertrosenbusch commented on GitHub (Apr 9, 2025): @lsunay: Thats cool! Just some hints to use GFX803 on Ollama with a Dockercontainer. It could save lifetime :P > [!NOTE] > #### ROCm hardware requirements > 1. Make sure, that both Kernel-Devices `/dev/dri` and `/dev/kfd` are aviable > 2. Make sure, your Mainboard support [PCIe atomic](https://github.com/ROCm/ROCm/issues/2224#issuecomment-2299689450) for your connected gfx803-GPU(s) `sudo grep flags /sys/class/kfd/kfd/topology/nodes/*/io_links/0/properties` > [!NOTE] > If you wanna use more then one gfx803 aka multiple GFX803 GPUs on Ollama, cause you wanna utilize bigger LLVm then 8b (VRAM Limit) [take a look on this](https://github.com/robertrosenbusch/gfx803_rocm/issues/12#issuecomment-2763259884 ) > [!CAUTION] > #### Prevent ROCm SegFaults on your Linux Distro > After some feedback/research from Users who are using the Dockercontainer from my GIT in [Ollama](https://github.com/robertrosenbusch/gfx803_rocm/issues/8#issue-2919996555) and [PyTorch/ComfyUI](https://github.com/robertrosenbusch/gfx803_rocm/issues/13#issuecomment-2754796999), cause the devices `/dev/dri` and `/dev/kfd` crashed with SegFaults. Please proofe your used Linux-Kernel Version and switch up or down to a well known working Kernel-Version. Fedora 41, Arch and Debian 13 using (in April 2015) suspicious Linux-Kernel-Versions as default. > |Kernel Version|5.19|6.2|6.8|6.9|6.10|6.11|6.12|6.13|6.14| > |--------------|-----|-----|------|-----|------|-----|-----|-----|-----| > |working on ROCm 6.3 for Ollama/PyTorch|✅|✅|✅|✅|✅|✅|🟥|🟥|✅|
Author
Owner

@chboishabba commented on GitHub (Apr 10, 2025):

Propose possibly using pre built updated binaries from https://github.com/lamikr/rocm_sdk_builder

<!-- gh-comment-id:2791459799 --> @chboishabba commented on GitHub (Apr 10, 2025): Propose possibly using pre built updated binaries from https://github.com/lamikr/rocm_sdk_builder
Author
Owner

@MicahBird commented on GitHub (Apr 10, 2025):

@lsunay Thank you so much for the brief guide! I can confirm that it also works on a Radeon RX 570 with 4GB of VRAM. Testing llama3.2:1b I was able to get 72.94 tokens/s!

<!-- gh-comment-id:2794068677 --> @MicahBird commented on GitHub (Apr 10, 2025): @lsunay Thank you so much for the brief guide! I can confirm that it also works on a Radeon RX 570 with 4GB of VRAM. Testing llama3.2:1b I was able to get 72.94 tokens/s!
Author
Owner

@mon-jai commented on GitHub (Apr 10, 2025):

@chboishabba There isn't any Windows build yet. 🥲

<!-- gh-comment-id:2794435897 --> @mon-jai commented on GitHub (Apr 10, 2025): @chboishabba There isn't any Windows build yet. 🥲
Author
Owner

@chboishabba commented on GitHub (Apr 11, 2025):

@chboishabba There isn't any Windows build yet. 🥲

@dhiltgen
https://github.com/lamikr/rocm_sdk_builder/issues/231 ?

<!-- gh-comment-id:2795698115 --> @chboishabba commented on GitHub (Apr 11, 2025): > @chboishabba There isn't any Windows build yet. 🥲 @dhiltgen https://github.com/lamikr/rocm_sdk_builder/issues/231 ?
Author
Owner

@lsunay commented on GitHub (Apr 11, 2025):

@lsunay: Thats cool!

Just some hints to use GFX803 on Ollama with a Dockercontainer. It could save lifetime :P

Note

ROCm hardware requirements

1. Make sure, that both Kernel-Devices `/dev/dri` and `/dev/kfd` are aviable

2. Make sure, your Mainboard support [PCIe atomic](https://github.com/ROCm/ROCm/issues/2224#issuecomment-2299689450) for your connected gfx803-GPU(s) `sudo grep flags /sys/class/kfd/kfd/topology/nodes/*/io_links/0/properties`

Note

If you wanna use more then one gfx803 aka multiple GFX803 GPUs on Ollama, cause you wanna utilize bigger LLVm then 8b (VRAM Limit) take a look on this

Caution

Prevent ROCm SegFaults on your Linux Distro

After some feedback/research from Users who are using the Dockercontainer from my GIT in Ollama and PyTorch/ComfyUI, cause the devices /dev/dri and /dev/kfd crashed with SegFaults. Please proofe your used Linux-Kernel Version and switch up or down to a well known working Kernel-Version. Fedora 41, Arch and Debian 13 using (in April 2015) suspicious Linux-Kernel-Versions as default.
Kernel Version 5.19 6.2 6.8 6.9 6.10 6.11 6.12 6.13 6.14
working on ROCm 6.3 for Ollama/PyTorch 🟥 🟥

Feedback on Linux Kernel Version and ROCm SegFaults

Hello,

Here is the Linux kernel version I am using:

uname -r
6.1.0-32-amd64

I checked if my kernel version is among the working versions listed for ROCm 6.3 with Ollama and PyTorch. Since the provided list is for ROCm 6.3, and I am using ROCm 5.7, I just wanted to share my kernel version for reference. No further action is needed from my side.

If you have any recommendations or information regarding this version, please share. Thank you!

<!-- gh-comment-id:2795905503 --> @lsunay commented on GitHub (Apr 11, 2025): > [@lsunay](https://github.com/lsunay): Thats cool! > > Just some hints to use GFX803 on Ollama with a Dockercontainer. It could save lifetime :P > > Note > #### ROCm hardware requirements > > 1. Make sure, that both Kernel-Devices `/dev/dri` and `/dev/kfd` are aviable > > 2. Make sure, your Mainboard support [PCIe atomic](https://github.com/ROCm/ROCm/issues/2224#issuecomment-2299689450) for your connected gfx803-GPU(s) `sudo grep flags /sys/class/kfd/kfd/topology/nodes/*/io_links/0/properties` > > > Note > > If you wanna use more then one gfx803 aka multiple GFX803 GPUs on Ollama, cause you wanna utilize bigger LLVm then 8b (VRAM Limit) [take a look on this](https://github.com/robertrosenbusch/gfx803_rocm/issues/12#issuecomment-2763259884) > > Caution > #### Prevent ROCm SegFaults on your Linux Distro > > After some feedback/research from Users who are using the Dockercontainer from my GIT in [Ollama](https://github.com/robertrosenbusch/gfx803_rocm/issues/8#issue-2919996555) and [PyTorch/ComfyUI](https://github.com/robertrosenbusch/gfx803_rocm/issues/13#issuecomment-2754796999), cause the devices `/dev/dri` and `/dev/kfd` crashed with SegFaults. Please proofe your used Linux-Kernel Version and switch up or down to a well known working Kernel-Version. Fedora 41, Arch and Debian 13 using (in April 2015) suspicious Linux-Kernel-Versions as default. > Kernel Version 5.19 6.2 6.8 6.9 6.10 6.11 6.12 6.13 6.14 > working on ROCm 6.3 for Ollama/PyTorch ✅ ✅ ✅ ✅ ✅ ✅ 🟥 🟥 ✅ Feedback on Linux Kernel Version and ROCm SegFaults Hello, Here is the Linux kernel version I am using: uname -r 6.1.0-32-amd64 I checked if my kernel version is among the working versions listed for ROCm 6.3 with Ollama and PyTorch. Since the provided list is for ROCm 6.3, and I am using ROCm 5.7, I just wanted to share my kernel version for reference. No further action is needed from my side. If you have any recommendations or information regarding this version, please share. Thank you!
Author
Owner

@robertrosenbusch commented on GitHub (Apr 11, 2025):

@lsunay: Hi and welcome back!

Here is the Linux kernel version I am using:

uname -r 6.1.0-32-amd64

I checked if my kernel version is among the working versions listed for ROCm 6.3 with Ollama and PyTorch. Since the provided list is for ROCm 6.3, and I am using ROCm 5.7, I just wanted to share my kernel version for reference. No further action is needed from my side.

If you have any recommendations or information regarding this version, please share. Thank you!

The Prob into the Docker is not used ROCm 5.7.1 or ROCm 6.3 or Ollama or GFX803 or your used Distro. The Prob is inside the Kernel-Version 6.12 and 6.13 which is fixed on 6.14 to use/handle the both Kernel devices /dev/dri and /dev/kfd on your Host-System grmpf

A "fresh new" installed Distro in April 2025 on Debian 13/Arch/Fedora 41 and your Docker HowTo will all ended into the same SegFault. That what i expected on and i am sorry :P

But its weekend time and i will proofe your HowTo (where i am really thankfull about it) on current ARCH-Linux with delivered Kernel 6.13 and Kernel 6.12 to verify. I am just curious if it will be works fine -·^

In this way, please hold the line :D I will research it and gave some feedback.


Robert Rosenbusch

<!-- gh-comment-id:2797847505 --> @robertrosenbusch commented on GitHub (Apr 11, 2025): @lsunay: Hi and welcome back! > Here is the Linux kernel version I am using: > > uname -r 6.1.0-32-amd64 > > I checked if my kernel version is among the working versions listed for ROCm 6.3 with Ollama and PyTorch. Since the provided list is for ROCm 6.3, and I am using ROCm 5.7, I just wanted to share my kernel version for reference. No further action is needed from my side. > > If you have any recommendations or information regarding this version, please share. Thank you! The Prob into the Docker is not used ROCm 5.7.1 or ROCm 6.3 or Ollama or GFX803 or your used Distro. The Prob is inside the Kernel-Version **6.12 and 6.13 which is fixed on 6.14** to use/handle the both Kernel devices `/dev/dri` and `/dev/kfd` on your Host-System *grmpf* A "fresh new" installed Distro in April 2025 on Debian 13/Arch/Fedora 41 and your Docker HowTo will all ended into the same SegFault. That what i expected on and i am sorry :P But its weekend time and i will proofe your HowTo (where i am really thankfull about it) on current ARCH-Linux with delivered Kernel 6.13 and Kernel 6.12 to verify. I am just curious if it will be works fine -·^ In this way, please hold the line :D I will research it and gave some feedback. --- Robert Rosenbusch
Author
Owner

@lsunay commented on GitHub (Apr 11, 2025):

@lsunay: Hi and welcome back!

Here is the Linux kernel version I am using:
uname -r 6.1.0-32-amd64
I checked if my kernel version is among the working versions listed for ROCm 6.3 with Ollama and PyTorch. Since the provided list is for ROCm 6.3, and I am using ROCm 5.7, I just wanted to share my kernel version for reference. No further action is needed from my side.
If you have any recommendations or information regarding this version, please share. Thank you!

The Prob is not used ROCm 5.7.1 or ROCm 6.3 or Ollama. The Prob is inside the Kernel-Version 6.12 and 6.13 which is fixed on 6.14 -·^

A "fresh new" installed Distro on Debian 13/Arch/Fedora 41 and your HowTo will ended into a same SegFault. That what i expected on :P

But its weekend time and i will proofe your HowTo (where i am really thankfull about it) on current ARCH-Linux with delivered Kernel 6.13 and Kernel 6.12. I am just curious if it will be works fine -·^

In this way, please hold the line :D I will research it and gave some feedback.

Robert Rosenbusch


Subject: Feedback on amdgpu Error with Kernel 6.1.0-32-amd64

Hey Robert,
Thanks for the detailed info and updates on the kernel issues with ROCm! I’ve been following your posts and wanted to share an issue I’m facing on my setup, hoping to get some feedback if you’ve come across something similar. 😊

I’m running into an error in my logs: amdgpu: init_user_pages: Failed to get user pages: -1. It shows up multiple times, especially during GPU-intensive tasks like model execution. I know you mentioned that the main problem with SegFaults is tied to Kernel 6.12 and 6.13 (fixed in 6.14), but I’m on a different version and still seeing issues, so I thought I’d share my setup for reference.

Here’s what I’ve got:

  • Kernel Version: 6.1.0-32-amd64 (checked with uname -r)
  • GPU: AMD Radeon RX 580 2048SP (from lspci | grep -i vga: 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Polaris 20 XL [Radeon RX 580 2048SP] (rev ef))
  • Driver Modules: amdgpu and related modules are loaded (confirmed with lsmod | grep amdgpu)
  • Error Log (from journalctl -p 3 -xb): Repeated entries of amdgpu: init_user_pages: Failed to get user pages: -1 during workload.

I’ve checked dmesg and other logs, and there’s no direct SegFault reported yet, but I’m wondering if this amdgpu error could lead to crashes or if it’s a sign of a deeper compatibility issue, even on Kernel 6.1. From what I understand, this error points to the driver failing to access user space memory pages, maybe due to memory management or kernel-driver mismatch.

I really appreciate your research on kernel versions and the heads-up on 6.14 fixing the SegFaults for 6.12/6.13. Since I’m on 6.1, I’m curious if this could still be related or if it’s something else (like IOMMU settings or ROCm version—I’m on 5.7, by the way). I’m planning to dig deeper into logs and maybe test a kernel update if needed.

If you’ve seen this amdgpu error before or have any tips for Kernel 6.1 with an RX 580, I’d be super grateful for your input. No rush, just wanted to share this and hold the line as you mentioned! :D Looking forward to any updates from your tests on Arch with 6.12/6.13. Thanks again for the great info and support!

Cheers,


<!-- gh-comment-id:2797901817 --> @lsunay commented on GitHub (Apr 11, 2025): > [@lsunay](https://github.com/lsunay): Hi and welcome back! > > > Here is the Linux kernel version I am using: > > uname -r 6.1.0-32-amd64 > > I checked if my kernel version is among the working versions listed for ROCm 6.3 with Ollama and PyTorch. Since the provided list is for ROCm 6.3, and I am using ROCm 5.7, I just wanted to share my kernel version for reference. No further action is needed from my side. > > If you have any recommendations or information regarding this version, please share. Thank you! > > The Prob is not used ROCm 5.7.1 or ROCm 6.3 or Ollama. The Prob is inside the Kernel-Version **6.12 and 6.13 which is fixed on 6.14** -·^ > > A "fresh new" installed Distro on Debian 13/Arch/Fedora 41 and your HowTo will ended into a same SegFault. That what i expected on :P > > But its weekend time and i will proofe your HowTo (where i am really thankfull about it) on current ARCH-Linux with delivered Kernel 6.13 and Kernel 6.12. I am just curious if it will be works fine -·^ > > In this way, please hold the line :D I will research it and gave some feedback. > > Robert Rosenbusch --- **Subject:** Feedback on `amdgpu` Error with Kernel 6.1.0-32-amd64 Hey Robert, Thanks for the detailed info and updates on the kernel issues with ROCm! I’ve been following your posts and wanted to share an issue I’m facing on my setup, hoping to get some feedback if you’ve come across something similar. 😊 I’m running into an error in my logs: `amdgpu: init_user_pages: Failed to get user pages: -1`. It shows up multiple times, especially during GPU-intensive tasks like model execution. I know you mentioned that the main problem with SegFaults is tied to Kernel 6.12 and 6.13 (fixed in 6.14), but I’m on a different version and still seeing issues, so I thought I’d share my setup for reference. Here’s what I’ve got: - **Kernel Version:** 6.1.0-32-amd64 (checked with `uname -r`) - **GPU:** AMD Radeon RX 580 2048SP (from `lspci | grep -i vga`: `01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Polaris 20 XL [Radeon RX 580 2048SP] (rev ef)`) - **Driver Modules:** `amdgpu` and related modules are loaded (confirmed with `lsmod | grep amdgpu`) - **Error Log (from `journalctl -p 3 -xb`):** Repeated entries of `amdgpu: init_user_pages: Failed to get user pages: -1` during workload. I’ve checked `dmesg` and other logs, and there’s no direct SegFault reported yet, but I’m wondering if this `amdgpu` error could lead to crashes or if it’s a sign of a deeper compatibility issue, even on Kernel 6.1. From what I understand, this error points to the driver failing to access user space memory pages, maybe due to memory management or kernel-driver mismatch. I really appreciate your research on kernel versions and the heads-up on 6.14 fixing the SegFaults for 6.12/6.13. Since I’m on 6.1, I’m curious if this could still be related or if it’s something else (like IOMMU settings or ROCm version—I’m on 5.7, by the way). I’m planning to dig deeper into logs and maybe test a kernel update if needed. If you’ve seen this `amdgpu` error before or have any tips for Kernel 6.1 with an RX 580, I’d be super grateful for your input. No rush, just wanted to share this and hold the line as you mentioned! :D Looking forward to any updates from your tests on Arch with 6.12/6.13. Thanks again for the great info and support! Cheers, ---
Author
Owner

@robertrosenbusch commented on GitHub (Apr 12, 2025):

@lsunay: Hi and welcome back. Thanks for your additional Information

My short feedback:

  1. Your Ollama/ROCm 5.7.1 HowTo crashed on Kernel-Version 6.12 and 6.13 as expected on with a SegFault. And on 6.8 and 6.14 it works fine ·^ in this sense, all current Linux distros are affected to produce a SegFault on the gfx803 GPU while using /dev/dri and /dev/kfd Independent of Ollama and excluding Ubunutu. Its just a bug into the kernel versions 6.12 and 6.13
  2. I can confirm your amdgpu: init_user_pages: Failed to get user pages: -1, independet your using Kernelversion.

I am not able to take a look deep inside what mnccouk/ollama-gpu-rx580 is doing into his Docker cause its definitely not well known documented and over 7 months old... as much a lot things about GFX803 grmpf its a little frustrating.

This Error-Logs are independent on what Kernel/ROCm/Ollama Version you are used.

However, on ROCm 6.3 and Ollama 0.6.5 there is the same Prob, but only the once you load a new Modell/LLVm on Ollama.
So far my own researchs.


Cheers, Robert Rosenbusch

<!-- gh-comment-id:2799028600 --> @robertrosenbusch commented on GitHub (Apr 12, 2025): @lsunay: Hi and welcome back. Thanks for your additional Information **My short feedback:** 1. Your Ollama/ROCm 5.7.1 HowTo crashed on Kernel-Version 6.12 and 6.13 as expected on with a SegFault. And on 6.8 and 6.14 it works fine ·^ **in this sense, all current Linux distros are affected to produce a SegFault on the gfx803 GPU while using `/dev/dri `and `/dev/kfd` Independent of Ollama and excluding Ubunutu. Its just a bug into the kernel versions 6.12 and 6.13** 2. I can confirm your `amdgpu: init_user_pages: Failed to get user pages: -1`, independet your using Kernelversion. I am not able to take a look deep inside what `mnccouk/ollama-gpu-rx580` is doing into his Docker cause its definitely not well known documented and over 7 months old... as much a lot things about GFX803 **grmpf** its a little frustrating. This Error-Logs are independent on what Kernel/ROCm/Ollama Version you are used. However, on ROCm 6.3 and Ollama 0.6.5 there is the same Prob, but only the once you load a new Modell/LLVm on Ollama. So far my own researchs. --- Cheers, Robert Rosenbusch
Author
Owner

@robertrosenbusch commented on GitHub (Apr 13, 2025):

@lsunay : Hi ·^ It seems to be AMD published ROCm v6.4.0 last week. I will take a look on when AMD published their ROCm-PyTorch v6.4 Dockercontainer into the next few weeks. As far is i know to read the Release Notes at my first look... It shouldnt be a big Prob to recompile ROCm v6.4 and Ollama on gfx803.


Cheers, Robert Rosenbusch

<!-- gh-comment-id:2800037014 --> @robertrosenbusch commented on GitHub (Apr 13, 2025): @lsunay : Hi ·^ It seems to be [AMD published ROCm v6.4.0](https://github.com/ROCm/ROCm/releases/tag/rocm-6.4.0#release-highlights) last week. I will take a look on when AMD published their ROCm-PyTorch v6.4 Dockercontainer into the next few weeks. As far is i know to read the Release Notes at my first look... It shouldnt be a big Prob to recompile ROCm v6.4 and Ollama on gfx803. --- Cheers, Robert Rosenbusch
Author
Owner

@lsunay commented on GitHub (Apr 13, 2025):

@lsunay : Hi ·^ It seems to be AMD published ROCm v6.4.0 last week. I will take a look on when AMD published their ROCm-PyTorch v6.4 Dockercontainer into the next few weeks. As far is i know to read the Release Notes at my first look... It shouldnt be a big Prob to recompile ROCm v6.4 and Ollama on gfx803.

Cheers, Robert Rosenbusch

Hi Robert,
Thanks for the update! 😊 It's great to hear that AMD has released ROCm v6.4.0 last week. I'm really looking forward to seeing how it performs with the ROCm-PyTorch v6.4 Docker container once it's available. From your initial look at the Release Notes, it sounds promising that recompiling ROCm v6.4 and Ollama on gfx803 shouldn't be a major issue.

I'd love to be more proactive and try compiling your Docker files if this turns out to be successful. However, at the moment, I'm a bit hesitant to make changes to my current setup as it's working fine, and I don't want to risk breaking anything. Additionally, my SSD drives with the models are pretty much full, so I don't have much space to experiment right now.

On a related note, I've noticed that the Ollama version in the mnccouk/ollama-gpu-rx580:latest image is outdated, and because of this, it can't run newer models like gemma3:4b. This issue actually makes me even more eager to try out newly created Docker containers that could support the latest models and updates.

I'll definitely keep an eye on your progress and updates over the next few weeks. Please do share any findings or new containers when you get a chance to work on them. I'm eager to test things out once I free up some space and feel confident about making changes. Thanks again for your efforts and for keeping us in the loop!

Note: This response was refined with the assistance of grok-3-beta, though the core message remains unchanged. Adding: You should have noticed before, in fact, I'm not much of a talker 😊

Cheers,
Levent Sunay

<!-- gh-comment-id:2800151815 --> @lsunay commented on GitHub (Apr 13, 2025): > [@lsunay](https://github.com/lsunay) : Hi ·^ It seems to be [AMD published ROCm v6.4.0](https://github.com/ROCm/ROCm/releases/tag/rocm-6.4.0#release-highlights) last week. I will take a look on when AMD published their ROCm-PyTorch v6.4 Dockercontainer into the next few weeks. As far is i know to read the Release Notes at my first look... It shouldnt be a big Prob to recompile ROCm v6.4 and Ollama on gfx803. > > Cheers, Robert Rosenbusch Hi Robert, Thanks for the update! 😊 It's great to hear that AMD has released ROCm v6.4.0 last week. I'm really looking forward to seeing how it performs with the ROCm-PyTorch v6.4 Docker container once it's available. From your initial look at the Release Notes, it sounds promising that recompiling ROCm v6.4 and Ollama on gfx803 shouldn't be a major issue. I'd love to be more proactive and try compiling your Docker files if this turns out to be successful. However, at the moment, I'm a bit hesitant to make changes to my current setup as it's working fine, and I don't want to risk breaking anything. Additionally, my SSD drives with the models are pretty much full, so I don't have much space to experiment right now. On a related note, I've noticed that the Ollama version in the mnccouk/ollama-gpu-rx580:latest image is outdated, and because of this, it can't run newer models like gemma3:4b. This issue actually makes me even more eager to try out newly created Docker containers that could support the latest models and updates. I'll definitely keep an eye on your progress and updates over the next few weeks. Please do share any findings or new containers when you get a chance to work on them. I'm eager to test things out once I free up some space and feel confident about making changes. Thanks again for your efforts and for keeping us in the loop! Note: This response was refined with the assistance of grok-3-beta, though the core message remains unchanged. Adding: You should have noticed before, in fact, I'm not much of a talker 😊 Cheers, Levent Sunay
Author
Owner

@robertrosenbusch commented on GitHub (Apr 14, 2025):

@lsunay

On a related note, I've noticed that the Ollama version in the mnccouk/ollama-gpu-rx580:latest image is outdated, and because of this, it can't run newer models like gemma3:4b. This issue actually makes me even more eager to try out newly created Docker containers that could support the latest models and updates.

Told you so, its outdatet :P Here are my Benchmark on different LLMs on based on the ROCm6.3/Ollama 0.6.5 Dockercontainer:

benchmark_ollama0.6.5_rocm63.txt

UPDATE: I was a little greedy to know... ROCm 6.4 works fine on GFX803/Ollama 0.6.5

<!-- gh-comment-id:2800842253 --> @robertrosenbusch commented on GitHub (Apr 14, 2025): @lsunay > On a related note, I've noticed that the Ollama version in the mnccouk/ollama-gpu-rx580:latest image is outdated, and because of this, it can't run newer models like gemma3:4b. This issue actually makes me even more eager to try out newly created Docker containers that could support the latest models and updates. Told you so, its outdatet :P Here are my Benchmark on different LLMs on based on the ROCm6.3/Ollama 0.6.5 Dockercontainer: [benchmark_ollama0.6.5_rocm63.txt](https://github.com/user-attachments/files/19732020/benchmark_ollama0.6.5_rocm63.txt) **UPDATE:** I was a little greedy to know... [ROCm 6.4 works fine on GFX803/Ollama 0.6.5 ](https://github.com/robertrosenbusch/gfx803_rocm/issues/16#issue-2993061609)
Author
Owner

@vpereira commented on GitHub (May 6, 2025):

With llama.cp + Vulkan it is working well:

root@gpu:~/llama.cpp/build/bin# ./llama-bench -m ../../models/gemma-3-1b-it-UD-IQ1_S.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 550 / 550 Series (RADV POLARIS12) (radv) | uma: 0 | fp16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| gemma3 1B IQ1_S - 1.5625 bpw   | 665.17 MiB |   999.89 M | Vulkan     |  99 |           pp512 |        429.56 ± 1.03 |
| gemma3 1B IQ1_S - 1.5625 bpw   | 665.17 MiB |   999.89 M | Vulkan     |  99 |           tg128 |         37.69 ± 0.12 |

build: 2f54e348 (5292)
<!-- gh-comment-id:2855104300 --> @vpereira commented on GitHub (May 6, 2025): With `llama.cp` + `Vulkan` it is working well: ``` root@gpu:~/llama.cpp/build/bin# ./llama-bench -m ../../models/gemma-3-1b-it-UD-IQ1_S.gguf ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = AMD Radeon RX 550 / 550 Series (RADV POLARIS12) (radv) | uma: 0 | fp16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none | model | size | params | backend | ngl | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | gemma3 1B IQ1_S - 1.5625 bpw | 665.17 MiB | 999.89 M | Vulkan | 99 | pp512 | 429.56 ± 1.03 | | gemma3 1B IQ1_S - 1.5625 bpw | 665.17 MiB | 999.89 M | Vulkan | 99 | tg128 | 37.69 ± 0.12 | build: 2f54e348 (5292) ```
Author
Owner

@mon-jai commented on GitHub (May 6, 2025):

@vpereira Sounds promising! Any idea how to use it with Open WebUI?

<!-- gh-comment-id:2855327153 --> @mon-jai commented on GitHub (May 6, 2025): @vpereira Sounds promising! Any idea how to use it with Open WebUI?
Author
Owner

@vpereira commented on GitHub (May 6, 2025):

@vpereira Sounds promising! Any idea how to use it with Open WebUI?

sure, i followed https://docs.openwebui.com/getting-started/quick-start/starting-with-llama-cpp/#step-3-serve-the-model-with-llamacpp

just added a systemctl service file, added the --host 0.0.0.0 to the example and it is working flawless, just spend my time looking the nvtop 🤖

<!-- gh-comment-id:2855611518 --> @vpereira commented on GitHub (May 6, 2025): > [@vpereira](https://github.com/vpereira) Sounds promising! Any idea how to use it with Open WebUI? sure, i followed https://docs.openwebui.com/getting-started/quick-start/starting-with-llama-cpp/#step-3-serve-the-model-with-llamacpp just added a systemctl service file, added the `--host 0.0.0.0` to the example and it is working flawless, just spend my time looking the `nvtop` 🤖
Author
Owner

@robertrosenbusch commented on GitHub (May 19, 2025):

@vpereira: Benchmark with ROCm 6.4.0 on RX570 with Lllama.cpp

llama.cpp# /llama.cpp/build/bin/llama-bench -m gemma-3-1b-it-UD-IQ1_S.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon RX 570 Series, gfx803 (0x803), VMM: no, Wave Size: 64

model size params backend ngl test t/s
gemma3 1B IQ1_S - 1.5625 bpw 524.98 MiB 999.89 M ROCm 99 pp512 669.73 _ 1.60
gemma3 1B IQ1_S - 1.5625 bpw 524.98 MiB 999.89 M ROCm 99 tg128 53.55 _ 0.09

Cheers, Robert Rosenbusch

<!-- gh-comment-id:2890616850 --> @robertrosenbusch commented on GitHub (May 19, 2025): @vpereira: Benchmark with ROCm 6.4.0 on RX570 with Lllama.cpp llama.cpp# /llama.cpp/build/bin/llama-bench -m gemma-3-1b-it-UD-IQ1_S.gguf ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon RX 570 Series, gfx803 (0x803), VMM: no, Wave Size: 64 | model | size | params | backend | ngl | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | gemma3 1B IQ1_S - 1.5625 bpw | 524.98 MiB | 999.89 M | ROCm | 99 | pp512 | 669.73 _ 1.60 | | gemma3 1B IQ1_S - 1.5625 bpw | 524.98 MiB | 999.89 M | ROCm | 99 | tg128 | 53.55 _ 0.09 | --- Cheers, Robert Rosenbusch
Author
Owner

@siavashmohammady66 commented on GitHub (May 19, 2025):

@vpereira: Benchmark with ROCm 6.4.0 on RX570 with Lllama.cpp

llama.cpp# /llama.cpp/build/bin/llama-bench -m gemma-3-1b-it-UD-IQ1_S.gguf ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon RX 570 Series, gfx803 (0x803), VMM: no, Wave Size: 64
model size params backend ngl test t/s
gemma3 1B IQ1_S - 1.5625 bpw 524.98 MiB 999.89 M ROCm 99 pp512 669.73 _ 1.60
gemma3 1B IQ1_S - 1.5625 bpw 524.98 MiB 999.89 M ROCm 99 tg128 53.55 _ 0.09

Cheers, Robert Rosenbusch

Fantastic
Could you explain how to do it?

<!-- gh-comment-id:2891027241 --> @siavashmohammady66 commented on GitHub (May 19, 2025): > [@vpereira](https://github.com/vpereira): Benchmark with ROCm 6.4.0 on RX570 with Lllama.cpp > > llama.cpp# /llama.cpp/build/bin/llama-bench -m gemma-3-1b-it-UD-IQ1_S.gguf ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon RX 570 Series, gfx803 (0x803), VMM: no, Wave Size: 64 > model size params backend ngl test t/s > gemma3 1B IQ1_S - 1.5625 bpw 524.98 MiB 999.89 M ROCm 99 pp512 669.73 _ 1.60 > gemma3 1B IQ1_S - 1.5625 bpw 524.98 MiB 999.89 M ROCm 99 tg128 53.55 _ 0.09 > > Cheers, Robert Rosenbusch Fantastic Could you explain how to do it?
Author
Owner

@robertrosenbusch commented on GitHub (May 19, 2025):

Hi @siavashmohammady66 and welcome back!

Fantastic Could you explain how to do it?

What should i explain?! :P There is a "little" GIT for a all this AI-GFX803 Stuff. But big aware, you should able to use docker and had have a lot storage space and time to recompile ^.^

However, if you wanna use llama.cpp only its not a big deal.


Cheers, Robert Rosenbusch

<!-- gh-comment-id:2892364725 --> @robertrosenbusch commented on GitHub (May 19, 2025): Hi @siavashmohammady66 and welcome back! > Fantastic Could you explain how to do it? What should i explain?! :P There is a "little" [GIT for a all this AI-GFX803 Stuf](https://github.com/robertrosenbusch/gfx803_rocm)f. But big aware, you should able to use docker and had have a lot storage space and time to recompile ^.^ However, if you wanna use llama.cpp only [its not a big deal](https://github.com/robertrosenbusch/gfx803_rocm/issues/2#issuecomment-2692353185). ---- Cheers, Robert Rosenbusch
Author
Owner

@dakshcs commented on GitHub (Jun 27, 2025):

hello smoothbrain non-ai webdev here, is Polaris 10/20/30 support unofficial atp or will it be integrated into ollama soon?

I have a Polaris 30XT GPU and would love to use it!

<!-- gh-comment-id:3013932127 --> @dakshcs commented on GitHub (Jun 27, 2025): hello smoothbrain non-ai webdev here, is Polaris 10/20/30 support unofficial atp or will it be integrated into ollama soon? I have a Polaris 30XT GPU and would love to use it!
Author
Owner

@chboishabba commented on GitHub (Jun 28, 2025):

Polaris 10/20/30

generally polaris are not supported for GPGPU, however this is the purpose of Robert's repo.

<!-- gh-comment-id:3015054282 --> @chboishabba commented on GitHub (Jun 28, 2025): > Polaris 10/20/30 generally polaris are not supported for GPGPU, however this is the purpose of Robert's repo.
Author
Owner

@robertrosenbusch commented on GitHub (Jul 10, 2025):

@chboishabba @phoenix277yt : I do it to fix it do made it more easy :P ROCm 6.4.1 and Ollama 0.9.5 on gfx803 (we are in 2025)

docker pull robertrosenbusch/rocm6_gfx803_ollama:6.4.1_0.9.5

docker run -it --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -p 8080:8080 -p 11434:11434 --name rocm64_ollama_095 robertrosenbusch/rocm6_gfx803_ollama:6.4.1_0.9.5 bash

badüm depends on your ISP/Hardware you need some time to download.

you wanna download a specific LLVm? Lets go!

Try the Container docker exec -ti rocm64_ollama_095 bash
And then Download a you need for like this one ./ollama run llama3.2:1b

At the End of the Day, you got it on http://YOURLOCALIP:8080

@phoenix277yt : Feel free to ask and make some hints. there is a smale but beautifull gfx803 community there ^.^

Cheers, Robert Rosenbusch

<!-- gh-comment-id:3059124644 --> @robertrosenbusch commented on GitHub (Jul 10, 2025): @chboishabba @phoenix277yt : I do it to fix it do made it more easy :P ROCm 6.4.1 and Ollama 0.9.5 on gfx803 (we are in 2025) `docker pull robertrosenbusch/rocm6_gfx803_ollama:6.4.1_0.9.5` `docker run -it --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -p 8080:8080 -p 11434:11434 --name rocm64_ollama_095 robertrosenbusch/rocm6_gfx803_ollama:6.4.1_0.9.5 bash` *badüm* depends on your ISP/Hardware you need some time to download. you wanna download a specific LLVm? Lets go! Try the Container `docker exec -ti rocm64_ollama_095 bash` And then Download a you need for like this one `./ollama run llama3.2:1b` At the End of the Day, you got it on `http://YOURLOCALIP:8080` @phoenix277yt : Feel free to ask and make some hints. there is a smale but beautifull [gfx803 community there](https://github.com/robertrosenbusch/gfx803_rocm) ^.^ Cheers, Robert Rosenbusch
Author
Owner

@chboishabba commented on GitHub (Jul 11, 2025):

@chboishabba @phoenix277yt : I do it to fix it do made it more easy :P ROCm 6.4.1 and Ollama 0.9.5 on gfx803 (we are in 2025)

docker pull robertrosenbusch/rocm6_gfx803_ollama:6.4.1_0.9.5

docker run -it --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -p 8080:8080 -p 11434:11434 --name rocm64_ollama_095 robertrosenbusch/rocm6_gfx803_ollama:6.4.1_0.9.5 bash

badüm depends on your ISP/Hardware you need some time to download.

you wanna download a specific LLVm? Lets go!

Try the Container docker exec -ti rocm64_ollama_095 bash And then Download a you need for like this one ./ollama run llama3.2:1b

At the End of the Day, you got it on http://YOURLOCALIP:8080

@phoenix277yt : Feel free to ask and make some hints. there is a smale but beautifull gfx803 community there ^.^

Cheers, Robert Rosenbusch

hooray! once I can fix my kernel install I will absolutely try! I can try right now on 6.14.0-rt3-arch1-1-rt but GPT helped me bork /boot/ every time I update and I need to fix that lol

I'm in Australia.. I know what you mean about badüm but I've got to presume you're German if you feel the need for the umlaut on onomatopoeia lol

Thanks very much brother, I'll let you know results. Best wishes.

<!-- gh-comment-id:3059701179 --> @chboishabba commented on GitHub (Jul 11, 2025): > [@chboishabba](https://github.com/chboishabba) [@phoenix277yt](https://github.com/phoenix277yt) : I do it to fix it do made it more easy :P ROCm 6.4.1 and Ollama 0.9.5 on gfx803 (we are in 2025) > > `docker pull robertrosenbusch/rocm6_gfx803_ollama:6.4.1_0.9.5` > > `docker run -it --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -p 8080:8080 -p 11434:11434 --name rocm64_ollama_095 robertrosenbusch/rocm6_gfx803_ollama:6.4.1_0.9.5 bash` > > _badüm_ depends on your ISP/Hardware you need some time to download. > > you wanna download a specific LLVm? Lets go! > > Try the Container `docker exec -ti rocm64_ollama_095 bash` And then Download a you need for like this one `./ollama run llama3.2:1b` > > At the End of the Day, you got it on `http://YOURLOCALIP:8080` > > [@phoenix277yt](https://github.com/phoenix277yt) : Feel free to ask and make some hints. there is a smale but beautifull [gfx803 community there](https://github.com/robertrosenbusch/gfx803_rocm) ^.^ > > Cheers, Robert Rosenbusch hooray! once I can fix my kernel install I will absolutely try! I can try right now on 6.14.0-rt3-arch1-1-rt but GPT helped me bork /boot/ every time I update and I need to fix that lol I'm in Australia.. I know what you mean about _badüm_ but I've got to presume you're German if you feel the need for the umlaut on onomatopoeia lol Thanks very much brother, I'll let you know results. Best wishes.
Author
Owner

@tristan-k commented on GitHub (Jul 12, 2025):

Here is a benchmark between both ollama docker ROCm versions on a AMD Radeon Pro WX 5100.

System

sudo inxi -S
System:
  Host: fedora Kernel: 6.14.0-63.fc42.x86_64 arch: x86_64 bits: 64
  Console: pty pts/7 Distro: Fedora Linux 42 (Workstation Edition)
sudo inxi -G
Graphics:
  Device-1: Advanced Micro Devices [AMD/ATI] Ellesmere [Radeon Pro WX 5100]
    driver: amdgpu v: kernel
  Display: unspecified server: X.Org v: 24.1.8 with: Xwayland v: 24.1.8
    driver: dri: radeonsi gpu: amdgpu resolution: 1920x1080~144Hz
  API: OpenGL v: 4.6 vendor: amd mesa v: 25.1.4 renderer: AMD Radeon Pro WX
    5100 Graphics (radeonsi polaris10 ACO DRM 3.61 6.14.0-63.fc42.x86_64)
  API: Vulkan v: 1.4.313 drivers: radv,llvmpipe surfaces: N/A
  API: EGL Message: EGL data requires eglinfo. Check --recommends.
  Info: Tools: api: glxinfo,vulkaninfo gpu: amdgpu_top x11: xdriinfo,
    xdpyinfo, xprop, xrandr

Ollama 0.9.0

sudo docker run -it -d --restart unless-stopped --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -p 8080:8080 -p 11434:11434  --name rocm64_ollama_090 rocm64_gfx803_ollama:0.9.0 bash
sudo docker exec -ti rocm64_ollama_090 bash
python3 /llm-benchmark/benchmark.py
Benchmarking: gemma3:4b
Prompts: Why is the sky blue?

----------------------------------------------------
        gemma3:4b
        	Prompt eval: 45.12 t/s
        	Response: 11.28 t/s
        	Total: 11.47 t/s

        Stats:
        	Prompt tokens: 15
        	Response tokens: 641
        	Model load time: 0.04s
        	Prompt eval time: 0.33s
        	Response time: 56.85s
        	Total time: 57.22s
----------------------------------------------------
Benchmarking: gemma3:4b
Prompt: Write a report on the financials of Microsoft

----------------------------------------------------
        gemma3:4b
        	Prompt eval: 75.05 t/s
        	Response: 10.44 t/s
        	Total: 10.57 t/s

        Stats:
        	Prompt tokens: 17
        	Response tokens: 1206
        	Model load time: 0.04s
        	Prompt eval time: 0.23s
        	Response time: 115.49s
        	Total time: 115.76s
----------------------------------------------------
Average stats:

----------------------------------------------------
        gemma3:4b
        	Prompt eval: 57.25 t/s
        	Response: 10.72 t/s
        	Total: 10.87 t/s

        Stats:
        	Prompt tokens: 32
        	Response tokens: 1847
        	Model load time: 0.08s
        	Prompt eval time: 0.56s
        	Response time: 172.34s
        	Total time: 172.98s
----------------------------------------------------

Ollama 0.9.5

sudo docker run -it -d --restart unless-stopped --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -p 8080:8080 -p 11434:11434  --name rocm64_ollama_095 robertrosenbusch/rocm6_gfx803_ollama:6.4.1_0.9.5 bash
sudo docker exec -ti rocm64_ollama_095 bash
python3 /llm-benchmark/benchmark.py
Benchmarking: gemma3:4b
Prompts: Why is the sky blue?

----------------------------------------------------
        gemma3:4b
        	Prompt eval: 55.60 t/s
        	Response: 11.59 t/s
        	Total: 11.83 t/s

        Stats:
        	Prompt tokens: 15
        	Response tokens: 573
        	Model load time: 4.86s
        	Prompt eval time: 0.27s
        	Response time: 49.45s
        	Total time: 54.58s
----------------------------------------------------
Benchmarking: gemma3:4b
Prompt: Write a report on the financials of Microsoft

----------------------------------------------------
        gemma3:4b
        	Prompt eval: 80.73 t/s
        	Response: 10.68 t/s
        	Total: 10.82 t/s

        Stats:
        	Prompt tokens: 17
        	Response tokens: 1140
        	Model load time: 0.11s
        	Prompt eval time: 0.21s
        	Response time: 106.70s
        	Total time: 107.02s
----------------------------------------------------
Average stats:

----------------------------------------------------
        gemma3:4b
        	Prompt eval: 66.62 t/s
        	Response: 10.97 t/s
        	Total: 11.14 t/s

        Stats:
        	Prompt tokens: 32
        	Response tokens: 1713
        	Model load time: 4.97s
        	Prompt eval time: 0.48s
        	Response time: 156.15s
        	Total time: 161.60s
----------------------------------------------------
<!-- gh-comment-id:3065486006 --> @tristan-k commented on GitHub (Jul 12, 2025): Here is a benchmark between both ollama docker ROCm versions on a `AMD Radeon Pro WX 5100`. ### System ``` sudo inxi -S System: Host: fedora Kernel: 6.14.0-63.fc42.x86_64 arch: x86_64 bits: 64 Console: pty pts/7 Distro: Fedora Linux 42 (Workstation Edition) ``` ``` sudo inxi -G Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Ellesmere [Radeon Pro WX 5100] driver: amdgpu v: kernel Display: unspecified server: X.Org v: 24.1.8 with: Xwayland v: 24.1.8 driver: dri: radeonsi gpu: amdgpu resolution: 1920x1080~144Hz API: OpenGL v: 4.6 vendor: amd mesa v: 25.1.4 renderer: AMD Radeon Pro WX 5100 Graphics (radeonsi polaris10 ACO DRM 3.61 6.14.0-63.fc42.x86_64) API: Vulkan v: 1.4.313 drivers: radv,llvmpipe surfaces: N/A API: EGL Message: EGL data requires eglinfo. Check --recommends. Info: Tools: api: glxinfo,vulkaninfo gpu: amdgpu_top x11: xdriinfo, xdpyinfo, xprop, xrandr ``` ### Ollama 0.9.0 ``` sudo docker run -it -d --restart unless-stopped --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -p 8080:8080 -p 11434:11434 --name rocm64_ollama_090 rocm64_gfx803_ollama:0.9.0 bash ``` ``` sudo docker exec -ti rocm64_ollama_090 bash python3 /llm-benchmark/benchmark.py ``` ``` Benchmarking: gemma3:4b Prompts: Why is the sky blue? ---------------------------------------------------- gemma3:4b Prompt eval: 45.12 t/s Response: 11.28 t/s Total: 11.47 t/s Stats: Prompt tokens: 15 Response tokens: 641 Model load time: 0.04s Prompt eval time: 0.33s Response time: 56.85s Total time: 57.22s ---------------------------------------------------- ``` ``` Benchmarking: gemma3:4b Prompt: Write a report on the financials of Microsoft ---------------------------------------------------- gemma3:4b Prompt eval: 75.05 t/s Response: 10.44 t/s Total: 10.57 t/s Stats: Prompt tokens: 17 Response tokens: 1206 Model load time: 0.04s Prompt eval time: 0.23s Response time: 115.49s Total time: 115.76s ---------------------------------------------------- ``` ``` Average stats: ---------------------------------------------------- gemma3:4b Prompt eval: 57.25 t/s Response: 10.72 t/s Total: 10.87 t/s Stats: Prompt tokens: 32 Response tokens: 1847 Model load time: 0.08s Prompt eval time: 0.56s Response time: 172.34s Total time: 172.98s ---------------------------------------------------- ``` ### Ollama 0.9.5 ``` sudo docker run -it -d --restart unless-stopped --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -p 8080:8080 -p 11434:11434 --name rocm64_ollama_095 robertrosenbusch/rocm6_gfx803_ollama:6.4.1_0.9.5 bash ``` ``` sudo docker exec -ti rocm64_ollama_095 bash python3 /llm-benchmark/benchmark.py ``` ``` Benchmarking: gemma3:4b Prompts: Why is the sky blue? ---------------------------------------------------- gemma3:4b Prompt eval: 55.60 t/s Response: 11.59 t/s Total: 11.83 t/s Stats: Prompt tokens: 15 Response tokens: 573 Model load time: 4.86s Prompt eval time: 0.27s Response time: 49.45s Total time: 54.58s ---------------------------------------------------- ``` ``` Benchmarking: gemma3:4b Prompt: Write a report on the financials of Microsoft ---------------------------------------------------- gemma3:4b Prompt eval: 80.73 t/s Response: 10.68 t/s Total: 10.82 t/s Stats: Prompt tokens: 17 Response tokens: 1140 Model load time: 0.11s Prompt eval time: 0.21s Response time: 106.70s Total time: 107.02s ---------------------------------------------------- ``` ``` Average stats: ---------------------------------------------------- gemma3:4b Prompt eval: 66.62 t/s Response: 10.97 t/s Total: 11.14 t/s Stats: Prompt tokens: 32 Response tokens: 1713 Model load time: 4.97s Prompt eval time: 0.48s Response time: 156.15s Total time: 161.60s ---------------------------------------------------- ```
Author
Owner

@tristan-k commented on GitHub (Jul 12, 2025):

Here is another quick benchmark for ollama-vulkan. It seems like Vulkan gives about 5 more t/s for the AMD Radeon Pro WX 5100.

./obench.sh --markdown -m gemma3:4b -c 3 
Running benchmark 3 times using model: gemma3:4b

| Run | Eval Rate (Tokens/Second) |
|-----|-----------------------------|
| 1 | 16.63 tokens/s |
| 2 | 16.38 tokens/s |
| 3 | 16.16 tokens/s |
|**Average Eval Rate**| 16.39 tokens/second
<!-- gh-comment-id:3065925159 --> @tristan-k commented on GitHub (Jul 12, 2025): Here is another quick [benchmark](https://github.com/geerlingguy/ollama-benchmark?tab=readme-ov-file) for [ollama-vulkan](https://github.com/whyvl/ollama-vulkan/issues/7#issuecomment-2660836871). It seems like Vulkan gives about 5 more t/s for the `AMD Radeon Pro WX 5100`. ``` ./obench.sh --markdown -m gemma3:4b -c 3 Running benchmark 3 times using model: gemma3:4b | Run | Eval Rate (Tokens/Second) | |-----|-----------------------------| | 1 | 16.63 tokens/s | | 2 | 16.38 tokens/s | | 3 | 16.16 tokens/s | |**Average Eval Rate**| 16.39 tokens/second ```
Author
Owner

@robertrosenbusch commented on GitHub (Jul 12, 2025):

@tristan-k : Hi and welcome back! thanks a lot for your benchmarks!

Did the Radeon Pro WX 5100 just support 16 Gigs of VRam? i am intressted on, cause you benchmarked into your original post against a 12b Model ^.^

I do some benchmarks from 0.6 to 0.9 there Same Hardware, same ROCm-Stack, different Ollama Versions. My main point of view was: take a look on "Total time".

And sorry, into my mindset "./obench" is nice, but not very carefull to benchmark anything into real world.


Cheers, Robert Rosenbusch

<!-- gh-comment-id:3065968405 --> @robertrosenbusch commented on GitHub (Jul 12, 2025): @tristan-k : Hi and welcome back! thanks a lot for your benchmarks! Did the `Radeon Pro WX 5100` just support` 16 Gigs of VRam`? i am intressted on, cause you benchmarked into your original post against a 12b Model ^.^ I do some benchmarks from [0.6 to 0.9 there](https://github.com/robertrosenbusch/gfx803_rocm/wiki/ROCm-6.4.0-Ollama-Benchmarks) Same Hardware, same ROCm-Stack, different Ollama Versions. My main point of view was: take a look on "Total time". And sorry, into my mindset "./obench" is nice, but not very carefull to benchmark anything into real world. --- Cheers, Robert Rosenbusch
Author
Owner

@tristan-k commented on GitHub (Jul 12, 2025):

No it doesn't. I initially benchmarked with the wrong model gemma3:12b and redid the bench with a model gemma3:4b which fits into the VRAM. My bad.

It's not exactly an apples-to-apples comparison. I will try to run benchmark.py in the ollama-vulkan docker image later and post the results here.

<!-- gh-comment-id:3065989410 --> @tristan-k commented on GitHub (Jul 12, 2025): No it doesn't. I initially benchmarked with the wrong model `gemma3:12b` and redid the bench with a model `gemma3:4b` which fits into the VRAM. My bad. It's not exactly an apples-to-apples comparison. I will try to run `benchmark.py` in the `ollama-vulkan` docker image later and post the results here.
Author
Owner

@chboishabba commented on GitHub (Jul 13, 2025):

@tristan-k : Hi and welcome back! thanks a lot for your benchmarks!

Did the Radeon Pro WX 5100 just support 16 Gigs of VRam? i am intressted on, cause you benchmarked into your original post against a 12b Model ^.^

I do some benchmarks from 0.6 to 0.9 there Same Hardware, same ROCm-Stack, different Ollama Versions. My main point of view was: take a look on "Total time".

And sorry, into my mindset "./obench" is nice, but not very carefull to benchmark anything into real world.

Cheers, Robert Rosenbusch

Awesome benchmarks, cheers. I'm pretty blown away how much faster the deepseek eval is on the newer ollama versions. just having another look, it seems like the llama2:7b eval actually went backwards over apprx the same period... maybe sampling bias, as the larger model stayed roughly the same.

<!-- gh-comment-id:3066328989 --> @chboishabba commented on GitHub (Jul 13, 2025): > [@tristan-k](https://github.com/tristan-k) : Hi and welcome back! thanks a lot for your benchmarks! > > Did the `Radeon Pro WX 5100` just support` 16 Gigs of VRam`? i am intressted on, cause you benchmarked into your original post against a 12b Model ^.^ > > I do some benchmarks from [0.6 to 0.9 there](https://github.com/robertrosenbusch/gfx803_rocm/wiki/ROCm-6.4.0-Ollama-Benchmarks) Same Hardware, same ROCm-Stack, different Ollama Versions. My main point of view was: take a look on "Total time". > > And sorry, into my mindset "./obench" is nice, but not very carefull to benchmark anything into real world. > > Cheers, Robert Rosenbusch Awesome benchmarks, cheers. I'm pretty blown away how much faster the deepseek eval is on the newer ollama versions. just having another look, it seems like the llama2:7b eval actually went backwards over apprx the same period... maybe sampling bias, as the larger model stayed roughly the same.
Author
Owner

@tristan-k commented on GitHub (Jul 13, 2025):

As promised here are the benchmark results.

Ollama (0.9.3) Vulkan

sudo inxi -S
System:
  Host: fedora Kernel: 6.14.0-63.fc42.x86_64 arch: x86_64 bits: 64
  Console: pty pts/1 Distro: Fedora Linux 42 (Workstation Edition)
sudo inxi -m
Memory:
  System RAM: total: 32 GiB available: 31.24 GiB used: 6.48 GiB (20.7%)
  Array-1: capacity: 128 GiB slots: 4 modules: 4 EC: None
  Device-1: DIMM_A1 type: DDR4 size: 8 GiB speed: 2666 MT/s
  Device-2: DIMM_A2 type: DDR4 size: 8 GiB speed: 2666 MT/s
  Device-3: DIMM_B1 type: DDR4 size: 8 GiB speed: 2666 MT/s
  Device-4: DIMM_B2 type: DDR4 size: 8 GiB speed: 2666 MT/s
sudo inxi -G
Graphics:
  Device-1: Advanced Micro Devices [AMD/ATI] Ellesmere [Radeon Pro WX 5100]
    driver: amdgpu v: kernel
  Display: unspecified server: X.Org v: 24.1.8 with: Xwayland v: 24.1.8
    driver: dri: radeonsi gpu: amdgpu resolution: 1920x1080~144Hz
  API: EGL v: 1.5 drivers: radeonsi,swrast
    platforms: gbm,x11,surfaceless,device
  API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 25.1.4 renderer: AMD
    Radeon Pro WX 5100 Graphics (radeonsi polaris10 ACO DRM 3.61
    6.14.0-63.fc42.x86_64)
  API: Vulkan v: 1.4.313 drivers: radv,llvmpipe surfaces: N/A
  Info: Tools: api: eglinfo, glxinfo, vulkaninfo gpu: amdgpu_top,radeontop
    x11: xdriinfo, xdpyinfo, xprop, xrandr
sudo docker exec -it ollama-vulkan-whyvl bash
ollama --version
ollama version is 0.9.3 
Benchmarking: gemma3:4b
Prompt: Why is the sky blue?

----------------------------------------------------
        gemma3:4b
        	Prompt eval: 38.63 t/s
        	Response: 15.96 t/s
        	Total: 16.20 t/s

        Stats:
        	Prompt tokens: 15
        	Response tokens: 572
        	Model load time: 10.33s
        	Prompt eval time: 0.39s
        	Response time: 35.84s
        	Total time: 46.57s
----------------------------------------------------
Benchmarking: gemma3:4b
Prompt: Write a report on the financials of Microsoft

----------------------------------------------------
        gemma3:4b
        	Prompt eval: 53.29 t/s
        	Response: 15.15 t/s
        	Total: 15.30 t/s

        Stats:
        	Prompt tokens: 17
        	Response tokens: 1228
        	Model load time: 0.18s
        	Prompt eval time: 0.32s
        	Response time: 81.03s
        	Total time: 81.54s
----------------------------------------------------
Average stats:

----------------------------------------------------
        gemma3:4b
        	Prompt eval: 45.24 t/s
        	Response: 15.40 t/s
        	Total: 15.58 t/s

        Stats:
        	Prompt tokens: 32
        	Response tokens: 1800
        	Model load time: 10.52s
        	Prompt eval time: 0.71s
        	Response time: 116.88s
        	Total time: 128.11s
----------------------------------------------------

Radeon Pro WX 5100 Vulkan to ROCm Comparison

Metric Vulkan (1.4.313) ROCm (6.4.1) Difference
Prompt Eval (t/s) 45.24 66.62 -21.38
Response (t/s) 15.40 10.97 +4.43
Total (t/s) 15.58 11.14 +4.44
Prompt Tokens 32 32 0
Response Tokens 1800 1713 +87
Model Load Time (s) 10.52 4.97 +5.55
Prompt Eval Time (s) 0.71 0.48 +0.23
Response Time (s) 116.88 156.15 -39.27
Total Time (s) 128.11 161.60 -33.49
<!-- gh-comment-id:3067118955 --> @tristan-k commented on GitHub (Jul 13, 2025): As promised here are the [benchmark](https://github.com/willybcode/llm-benchmark) results. ### Ollama (0.9.3) Vulkan ``` sudo inxi -S System: Host: fedora Kernel: 6.14.0-63.fc42.x86_64 arch: x86_64 bits: 64 Console: pty pts/1 Distro: Fedora Linux 42 (Workstation Edition) ``` ``` sudo inxi -m Memory: System RAM: total: 32 GiB available: 31.24 GiB used: 6.48 GiB (20.7%) Array-1: capacity: 128 GiB slots: 4 modules: 4 EC: None Device-1: DIMM_A1 type: DDR4 size: 8 GiB speed: 2666 MT/s Device-2: DIMM_A2 type: DDR4 size: 8 GiB speed: 2666 MT/s Device-3: DIMM_B1 type: DDR4 size: 8 GiB speed: 2666 MT/s Device-4: DIMM_B2 type: DDR4 size: 8 GiB speed: 2666 MT/s ``` ``` sudo inxi -G Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Ellesmere [Radeon Pro WX 5100] driver: amdgpu v: kernel Display: unspecified server: X.Org v: 24.1.8 with: Xwayland v: 24.1.8 driver: dri: radeonsi gpu: amdgpu resolution: 1920x1080~144Hz API: EGL v: 1.5 drivers: radeonsi,swrast platforms: gbm,x11,surfaceless,device API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 25.1.4 renderer: AMD Radeon Pro WX 5100 Graphics (radeonsi polaris10 ACO DRM 3.61 6.14.0-63.fc42.x86_64) API: Vulkan v: 1.4.313 drivers: radv,llvmpipe surfaces: N/A Info: Tools: api: eglinfo, glxinfo, vulkaninfo gpu: amdgpu_top,radeontop x11: xdriinfo, xdpyinfo, xprop, xrandr ``` ``` sudo docker exec -it ollama-vulkan-whyvl bash ollama --version ollama version is 0.9.3 ``` ``` Benchmarking: gemma3:4b Prompt: Why is the sky blue? ---------------------------------------------------- gemma3:4b Prompt eval: 38.63 t/s Response: 15.96 t/s Total: 16.20 t/s Stats: Prompt tokens: 15 Response tokens: 572 Model load time: 10.33s Prompt eval time: 0.39s Response time: 35.84s Total time: 46.57s ---------------------------------------------------- ``` ``` Benchmarking: gemma3:4b Prompt: Write a report on the financials of Microsoft ---------------------------------------------------- gemma3:4b Prompt eval: 53.29 t/s Response: 15.15 t/s Total: 15.30 t/s Stats: Prompt tokens: 17 Response tokens: 1228 Model load time: 0.18s Prompt eval time: 0.32s Response time: 81.03s Total time: 81.54s ---------------------------------------------------- ``` ``` Average stats: ---------------------------------------------------- gemma3:4b Prompt eval: 45.24 t/s Response: 15.40 t/s Total: 15.58 t/s Stats: Prompt tokens: 32 Response tokens: 1800 Model load time: 10.52s Prompt eval time: 0.71s Response time: 116.88s Total time: 128.11s ---------------------------------------------------- ``` ### Radeon Pro WX 5100 Vulkan to ROCm Comparison | Metric | Vulkan (1.4.313) | ROCm (6.4.1) | Difference | |-------------------------|----------------------------|-----------------------------|-------------------| | Prompt Eval (t/s) | 45.24 | 66.62 | -21.38 | | Response (t/s) | 15.40 | 10.97 | +4.43 | | Total (t/s) | 15.58 | 11.14 | +4.44 | | Prompt Tokens | 32 | 32 | 0 | | Response Tokens | 1800 | 1713 | +87 | | Model Load Time (s) | 10.52 | 4.97 | +5.55 | | Prompt Eval Time (s) | 0.71 | 0.48 | +0.23 | | Response Time (s) | 116.88 | 156.15 | -39.27 | | Total Time (s) | 128.11 | 161.60 | -33.49 |
Author
Owner

@robertrosenbusch commented on GitHub (Jul 13, 2025):

@tristan-k : first at all, welcome back and take some popcorn and a cold soda please.

its really suspicious.. i guess the Radeon Pro WX 5100 is more similar to the RX580 then mine RX570, where i took all benchmarks on ROCm 6.4.

And by the way you should really really happy to use the right Kernel-Version 6.14.0-63.fc42.x86_64 for all the ROCm Stuff. Dont change it till AMD offer maybe with ROCm 6.5 or 7.0 is able to work against a Kernel-Version higher then Kernel-Version 6.11 I guess @chboishabba knows what i mean :P And as far as i know @chboishabba : the answer from AMD ROCm-Dev Team was: Use a kernel-version we are supporting

@tristan-k: feel free to introduce it and make some hints ·^

At the End of the Day: gfx803 is still working on Ollama (0.6/0.7/0.8/0.9). Right? With ROCm 6.4.X and Vulcan1.4.313. I guess there are more differents between the Ollama-Versions then to the used different ROCm/Vulcan Versions on gfx803. But into my GIT i dont only support Ollama :D ComfyUI is my first intention

My Benchmark Sample? Ollama v0.9.0

Average stats:


    gemma3:4b
    	Prompt eval: 90.30 t/s
    	Response: 25.57 t/s
    	Total: 25.88 t/s

    Stats:
    	Prompt tokens: 32
    	Response tokens: 1862
    	Model load time: 22.81s
    	Prompt eval time: 0.35s
    	Response time: 72.83s
    	Total time: 95.99s


Cheers, Robert Rosenbusch

<!-- gh-comment-id:3067295373 --> @robertrosenbusch commented on GitHub (Jul 13, 2025): @tristan-k : first at all, welcome back and take some popcorn and a cold soda please. its really suspicious.. i guess the `Radeon Pro WX 5100` is more similar to the RX580 then mine RX570, where i took all benchmarks on ROCm 6.4. And by the way you should really really happy to use the right Kernel-Version ` 6.14.0-63.fc42.x86_64` for all the ROCm Stuff. Dont change it till AMD offer maybe with ROCm 6.5 or 7.0 is able to work against a Kernel-Version higher then Kernel-Version 6.11 I guess @chboishabba [knows what i mean](https://github.com/robertrosenbusch/gfx803_rocm/issues/35#issuecomment-3006071189) :P And as far as i know @chboishabba : the answer from A[MD ROCm-Dev Team was: Use a kernel-version we are supporting ](https://github.com/ROCm/ROCm/issues/4965#issuecomment-3005124072) @tristan-k: feel free to introduce it and make some hints ·^ At the End of the Day: gfx803 is still working on Ollama (0.6/0.7/0.8/0.9). Right? With ROCm 6.4.X and Vulcan1.4.313. I guess there are more differents between the Ollama-Versions then to the used different ROCm/Vulcan Versions on gfx803. But into my GIT i dont only support Ollama :D ComfyUI is my first intention My Benchmark Sample? Ollama v0.9.0 Average stats: ---------------------------------------------------- gemma3:4b Prompt eval: 90.30 t/s Response: 25.57 t/s Total: 25.88 t/s Stats: Prompt tokens: 32 Response tokens: 1862 Model load time: 22.81s Prompt eval time: 0.35s Response time: 72.83s Total time: 95.99s ---------------------------------------------------- --- Cheers, Robert Rosenbusch
Author
Owner

@robertrosenbusch commented on GitHub (Jul 13, 2025):

@tristan-k @chboishabba : AMD just published offical ROCm 7.0 badüüm or i told you so Maybe i am to late on this GFX803 ROCm 7.0 Party :P laughing


Cheers, Robert Rosenbusch

<!-- gh-comment-id:3067300218 --> @robertrosenbusch commented on GitHub (Jul 13, 2025): @tristan-k @chboishabba : AMD just published offical ROCm 7.0 *badüüm* or *i told you so* Maybe i am to late on this GFX803 [ROCm 7.0 Party](https://www.amd.com/de/products/software/rocm.html) :P *laughing* --- Cheers, Robert Rosenbusch
Author
Owner

@chboishabba commented on GitHub (Jul 14, 2025):

Can confirm, don't go updating until compatibility is confirmed unless you
are interested in bisecting kernel commits.

On Mon, 14 Jul 2025 at 06:53, Robert Rosenbusch @.***>
wrote:

robertrosenbusch left a comment (ollama/ollama#2453)
https://github.com/ollama/ollama/issues/2453#issuecomment-3067295373

@tristan-k https://github.com/tristan-k : first at all, welcome back
and take some popcorn and a cold soda please.

its really suspicious.. i guess the Radeon Pro WX 5100 is more similar to
the RX580 then mine RX570, where i took all benchmarks on ROCm 6.4.

And by the way you should really really happy to use the right
Kernel-Version 6.14.0-63.fc42.x86_64 for all the ROCm Stuff. Dont change
it till AMD offer maybe with ROCm 6.5 or 7.0 is able to work against a
Kernel-Version higher then Kernel-Version 6.11 I guess @chboishabba
https://github.com/chboishabba knows what i mean
https://github.com/robertrosenbusch/gfx803_rocm/issues/35#issuecomment-3006071189
:P And as far as i know @chboishabba https://github.com/chboishabba :
the answer from AMD for ROCm 6.4 was: Use a kernel-version we are
supporting @tristan-k https://github.com/tristan-k: feel free to
introduce it and make some hints ·^

At the End of the World: gfx803 is still working on Ollama
(0.6/0.7/0.8/0.9). Right? With ROCm 6.4.X and Vulcan1.4.313. I guess there
are more differents between the Ollama-Versions then to the used different
ROCm/Vulcan Versions on gfx803. But into my GIT i dont only support Ollama
:D ComfyUI is my first intention

My Benchmark Sample? Ollama v0.9.0

Average stats:

gemma3:4b
	Prompt eval: 90.30 t/s
	Response: 25.57 t/s
	Total: 25.88 t/s

Stats:
	Prompt tokens: 32
	Response tokens: 1862
	Model load time: 22.81s
	Prompt eval time: 0.35s
	Response time: 72.83s
	Total time: 95.99s


Cheers, Robert Rosenbusch


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/2453#issuecomment-3067295373,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AGM4B3S6SFQUJIWO25LCKT33ILBLPAVCNFSM6AAAAABDD3P55SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTANRXGI4TKMZXGM
.
You are receiving this because you were mentioned.Message ID:
@.***>

<!-- gh-comment-id:3067466478 --> @chboishabba commented on GitHub (Jul 14, 2025): Can confirm, don't go updating until compatibility is confirmed unless you are interested in bisecting kernel commits. On Mon, 14 Jul 2025 at 06:53, Robert Rosenbusch ***@***.***> wrote: > *robertrosenbusch* left a comment (ollama/ollama#2453) > <https://github.com/ollama/ollama/issues/2453#issuecomment-3067295373> > > @tristan-k <https://github.com/tristan-k> : first at all, welcome back > and take some popcorn and a cold soda please. > > its really suspicious.. i guess the Radeon Pro WX 5100 is more similar to > the RX580 then mine RX570, where i took all benchmarks on ROCm 6.4. > > And by the way you should really really happy to use the right > Kernel-Version 6.14.0-63.fc42.x86_64 for all the ROCm Stuff. Dont change > it till AMD offer maybe with ROCm 6.5 or 7.0 is able to work against a > Kernel-Version higher then Kernel-Version 6.11 I guess @chboishabba > <https://github.com/chboishabba> knows what i mean > <https://github.com/robertrosenbusch/gfx803_rocm/issues/35#issuecomment-3006071189> > :P And as far as i know @chboishabba <https://github.com/chboishabba> : > the answer from AMD for ROCm 6.4 was: Use a kernel-version we are > supporting @tristan-k <https://github.com/tristan-k>: feel free to > introduce it and make some hints ·^ > > At the End of the World: gfx803 is still working on Ollama > (0.6/0.7/0.8/0.9). Right? With ROCm 6.4.X and Vulcan1.4.313. I guess there > are more differents between the Ollama-Versions then to the used different > ROCm/Vulcan Versions on gfx803. But into my GIT i dont only support Ollama > :D ComfyUI is my first intention > > My Benchmark Sample? Ollama v0.9.0 > > Average stats: > ------------------------------ > > gemma3:4b > Prompt eval: 90.30 t/s > Response: 25.57 t/s > Total: 25.88 t/s > > Stats: > Prompt tokens: 32 > Response tokens: 1862 > Model load time: 22.81s > Prompt eval time: 0.35s > Response time: 72.83s > Total time: 95.99s > > ------------------------------ > ------------------------------ > > Cheers, Robert Rosenbusch > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/2453#issuecomment-3067295373>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AGM4B3S6SFQUJIWO25LCKT33ILBLPAVCNFSM6AAAAABDD3P55SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTANRXGI4TKMZXGM> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >
Author
Owner

@robertrosenbusch commented on GitHub (Aug 3, 2025):

... ollama 0.9.0 --> deepseek-r1:32b performance benchmark between 20 XEON E5-Cores vs 4x RX470/8G (gfx803) on/in a vid.

(top 20 XEON E5 - Cores
bottom 4 x RX470/8G )


Cheers, Robert Rosenbusch

<!-- gh-comment-id:3148787564 --> @robertrosenbusch commented on GitHub (Aug 3, 2025): ... ollama 0.9.0 --> deepseek-r1:32b performance benchmark between [20 XEON E5-Cores vs 4x RX470/8G (gfx803)](https://github.com/robertrosenbusch/gfx803_rocm/issues/38#issuecomment-3146546427) on/in a vid. (top 20 XEON E5 - Cores bottom 4 x RX470/8G ) --- Cheers, Robert Rosenbusch
Author
Owner

@ericcurtin commented on GitHub (Oct 13, 2025):

We added Vulkan support to docker model runner, so we cover this feature:

https://www.docker.com/blog/docker-model-runner-vulkan-gpu-support/

We've also put effort to putting all our code in one central place to make it easier for people to contribute. Please star, fork and contribute.

https://github.com/docker/model-runner

We have vulkan support. You can pull models from Docker Hub, Huggingface or any other OCI registry. You can also push models to Docker Hub or any other OCI registry.

<!-- gh-comment-id:3399439656 --> @ericcurtin commented on GitHub (Oct 13, 2025): We added Vulkan support to docker model runner, so we cover this feature: https://www.docker.com/blog/docker-model-runner-vulkan-gpu-support/ We've also put effort to putting all our code in one central place to make it easier for people to contribute. Please star, fork and contribute. https://github.com/docker/model-runner We have vulkan support. You can pull models from Docker Hub, Huggingface or any other OCI registry. You can also push models to Docker Hub or any other OCI registry.
Author
Owner

@exbanny58-alt commented on GitHub (Nov 19, 2025):

Ollama на AMD RX 580: Рабочее решение
Проблема: Ollama не видит RX 580, использует только CPU

Решение:
Убить все процессы ollama
1е окно:

bash

В PowerShell:

$env:OLLAMA_VULKAN="1"
$env:OLLAMA_GPU_LAYERS="99"
ollama serve

2е окно:
ollama run модель

Постоянное решение:

Создай системную переменную OLLAMA_VULKAN=1

Или используй .bat файл с этими настройками

Результат:

Видеокарта загружена на 100% вместо CPU

Мгновенные ответы

Модели до 7B параметров в тестах

Проверка: В логах ищи "Radeon RX 580 Series" и library=Vulkan

Ollama on AMD RX 580: Working Solution
Problem: Ollama doesn't detect RX 580, uses CPU only

Solution:
kill all ollama process
bash

In PowerShell: 1 windows

$env:OLLAMA_VULKAN="1"
$env:OLLAMA_GPU_LAYERS="99"
ollama serve

2 windows
ollama run model-name

Permanent fix:

Create system variable OLLAMA_VULKAN=1

Or use .bat file with these settings

Result:

GPU at 100% instead of CPU

Instant responses

Models up to 7B parameters test

Verification: In logs look for "Radeon RX 580 Series" and library=Vulkan

<!-- gh-comment-id:3554524701 --> @exbanny58-alt commented on GitHub (Nov 19, 2025): Ollama на AMD RX 580: Рабочее решение Проблема: Ollama не видит RX 580, использует только CPU Решение: Убить все процессы ollama 1е окно: bash # В PowerShell: $env:OLLAMA_VULKAN="1" $env:OLLAMA_GPU_LAYERS="99" ollama serve 2е окно: ollama run модель Постоянное решение: Создай системную переменную OLLAMA_VULKAN=1 Или используй .bat файл с этими настройками Результат: ✅ Видеокарта загружена на 100% вместо CPU ✅ Мгновенные ответы ✅ Модели до 7B параметров в тестах Проверка: В логах ищи "Radeon RX 580 Series" и library=Vulkan =================================================== Ollama on AMD RX 580: Working Solution Problem: Ollama doesn't detect RX 580, uses CPU only Solution: kill all ollama process bash # In PowerShell: 1 windows $env:OLLAMA_VULKAN="1" $env:OLLAMA_GPU_LAYERS="99" ollama serve 2 windows ollama run model-name Permanent fix: Create system variable OLLAMA_VULKAN=1 Or use .bat file with these settings Result: ✅ GPU at 100% instead of CPU ✅ Instant responses ✅ Models up to 7B parameters test Verification: In logs look for "Radeon RX 580 Series" and library=Vulkan
Author
Owner

@aptac01 commented on GitHub (Dec 12, 2025):

По способу exbanny58-alt ускорил выполнение на asus rx570 8gb, Для удобного взаимодействия я сделал батник (file.bat) вот с таким содержимым

@echo off
powershell -Command "$env:OLLAMA_VULKAN='1'; $env:OLLAMA_GPU_LAYERS='99'; ollama serve"

При его запуске - оллама запускается в консольном режиме, чтобы открыть gui - можно поставить браузерное расширение типа "ollama ui"


Using the exbanny58-alt method, I accelerated the execution on the asus rx570 8gb, For convenient interaction, I made a batch file (file.bat) with the following contents

@echo off
powershell -Command "$env:OLLAMA_VULKAN='1'; $env:OLLAMA_GPU_LAYERS='99'; ollama serve"

When it starts, ollama starts in console mode, to open the gui - you can install a browser extension like "ollama ui"

<!-- gh-comment-id:3646270775 --> @aptac01 commented on GitHub (Dec 12, 2025): По способу exbanny58-alt ускорил выполнение на asus rx570 8gb, Для удобного взаимодействия я сделал батник (file.bat) вот с таким содержимым @echo off powershell -Command "$env:OLLAMA_VULKAN='1'; $env:OLLAMA_GPU_LAYERS='99'; ollama serve" При его запуске - оллама запускается в консольном режиме, чтобы открыть gui - можно поставить браузерное расширение типа "ollama ui" --- Using the exbanny58-alt method, I accelerated the execution on the asus rx570 8gb, For convenient interaction, I made a batch file (file.bat) with the following contents @echo off powershell -Command "$env:OLLAMA_VULKAN='1'; $env:OLLAMA_GPU_LAYERS='99'; ollama serve" When it starts, ollama starts in console mode, to open the gui - you can install a browser extension like "ollama ui"
Author
Owner

@Lendangame commented on GitHub (Mar 7, 2026):

По способу exbanny58-alt ускорил выполнение на asus rx570 8gb, Для удобного взаимодействия я сделал батник (file.bat) вот с таким содержимым

@echo off powershell -Command "$env:OLLAMA_VULKAN='1'; $env:OLLAMA_GPU_LAYERS='99'; ollama serve"

При его запуске - оллама запускается в консольном режиме, чтобы открыть gui - можно поставить браузерное расширение типа "ollama ui"

Using the exbanny58-alt method, I accelerated the execution on the asus rx570 8gb, For convenient interaction, I made a batch file (file.bat) with the following contents

@echo off powershell -Command "$env:OLLAMA_VULKAN='1'; $env:OLLAMA_GPU_LAYERS='99'; ollama serve"

When it starts, ollama starts in console mode, to open the gui - you can install a browser extension like "ollama ui"

Doesnt work

<!-- gh-comment-id:4016454290 --> @Lendangame commented on GitHub (Mar 7, 2026): > По способу exbanny58-alt ускорил выполнение на asus rx570 8gb, Для удобного взаимодействия я сделал батник (file.bat) вот с таким содержимым > > [@echo](https://github.com/echo) off powershell -Command "$env:OLLAMA_VULKAN='1'; $env:OLLAMA_GPU_LAYERS='99'; ollama serve" > > При его запуске - оллама запускается в консольном режиме, чтобы открыть gui - можно поставить браузерное расширение типа "ollama ui" > > Using the exbanny58-alt method, I accelerated the execution on the asus rx570 8gb, For convenient interaction, I made a batch file (file.bat) with the following contents > > [@echo](https://github.com/echo) off powershell -Command "$env:OLLAMA_VULKAN='1'; $env:OLLAMA_GPU_LAYERS='99'; ollama serve" > > When it starts, ollama starts in console mode, to open the gui - you can install a browser extension like "ollama ui" Doesnt work
Author
Owner
<!-- gh-comment-id:4027955756 --> @chboishabba commented on GitHub (Mar 10, 2026): https://github.com/advanced-lvl-up/Rx470-Vega10-Rx580-gfx803-gfx900-fix-AMD-GPU/issues/10
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#1434