[GH-ISSUE #6857] Issues getting rocm support to compile on Gentoo #30087

New Issue

GiteaMirror · 2026-04-22T09:32:49-05:00

GiteaMirror commented

2026-04-22 09:32:49 -05:00

Originally created by @kiaraly on GitHub (Sep 18, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6857

What is the issue?

I'm trying to get the project to compile on Gentoo but am running into some issues as Gentoo uses different paths.

On Gentoo, rocm libraries get installed into /usr/lib64, hip-clang lives somewhere else, and I'm sure there are some other differences as well.

As suggested in the wiki, I set the following environment variables to point the build script to the right point ROCM_PATH=/usr/lib64 CLBlast_DIR=/usr/lib64/cmake/CLBlast. This got me a bit further, but compilation still failed because the compiler paths were wrong.

I edited gen_linux.sh and changed the cmake definition for rocm

CMAKE_DEFS="${COMMON_CMAKE_DEFS} ${CMAKE_DEFS} -DGGML_HIPBLAS=on 
-DGGML_CUDA_NO_PEER_COPY=on -DCMAKE_C_COMPILER=$ROCM_PATH/llvm/bin/clang 
-DCMAKE_CXX_COMPILER=$ROCM_PATH/llvm/bin/clang++ -DAMDGPU_TARGETS=$(amdGPUs) -DGPU_TARGETS=$(amdGPUs)"

to

CMAKE_DEFS="${COMMON_CMAKE_DEFS} ${CMAKE_DEFS} -DGGML_HIPBLAS=on 
-DGGML_CUDA_NO_PEER_COPY=on -DCMAKE_C_COMPILER=$(hipconfig -l)/clang 
-DCMAKE_CXX_COMPILER=$(hipconfig -l)/clang++ -DAMDGPU_TARGETS=$(amdGPUs) -DGPU_TARGETS=$(amdGPUs)"

(this seems to be how llama sets their HIPCXX path and points to the correct path for me). This got me one step further again, but this time it complained about not finding some cmake files. Looking at the llama documentation again it sets HIP_PATH for compilation as well (though wrong) and I modified the build function to export

export HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -p)"

before compilation.

After that, the project compiles correctly, but trying to load any model crashes ollama. The ollama serve process reports

rocBLAS error: Tensile solution found, but exception thrown for { a_type: "f16_r", b_type: "f16_r", c_type: "f16_r", d_type: "f16_r", compute_type: "f16_r", transA: 'T', transB: 'N', M: 32, N: 2, K: 256, alpha: 1, row_stride_a: 1, col_stride_a: 1024, row_stride_b: 1, col_stride_b: 2048, row_stride_c: 1, col_stride_c: 32, row_stride_d: 1, col_stride_d: 32, beta: 0, batch_count: 8, strided_batch: false, stride_a: 32768, stride_b: 4096, stride_c: 64, stride_d: 64, atomics_mode: atomics_allowed }
Alpha value -0.0281982 doesn't match that set in problem: 1
This message will be only be displayed once, unless the ROCBLAS_VERBOSE_TENSILE_ERROR environment variable is set.
CUDA error: CUBLAS_STATUS_INTERNAL_ERROR
  current device: 0, in function ggml_cuda_mul_mat_batched_cublas at /home/roger/Git/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:1890
  hipblasGemmBatchedEx(ctx.cublas_handle(), HIPBLAS_OP_T, HIPBLAS_OP_N, ne01, ne11, ne10, alpha, (const void **) (ptrs_src.get() + 0*ne23), HIPBLAS_R_16F, nb01/nb00, (const void **) (ptrs_src.get() + 1*ne23), HIPBLAS_R_16F, nb11/nb10, beta, ( void **) (ptrs_dst.get() + 0*ne23), cu_data_type, ne01, ne23, cu_compute_type, HIPBLAS_GEMM_DEFAULT)
/home/roger/Git/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:102: CUDA error
time=2024-09-18T14:50:34.883+02:00 level=INFO source=server.go:621 msg="waiting for server to become available" status="llm server not responding"
time=2024-09-18T14:50:36.936+02:00 level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR\n  current device: 0, in function ggml_cuda_mul_mat_batched_cublas at /home/roger/Git/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:1890\n  hipblasGemmBatchedEx(ctx.cublas_handle(), HIPBLAS_OP_T, HIPBLAS_OP_N, ne01, ne11, ne10, alpha, (const void **) (ptrs_src.get() + 0*ne23), HIPBLAS_R_16F, nb01/nb00, (const void **) (ptrs_src.get() + 1*ne23), HIPBLAS_R_16F, nb11/nb10, beta, ( void **) (ptrs_dst.get() + 0*ne23), cu_data_type, ne01, ne23, cu_compute_type, HIPBLAS_GEMM_DEFAULT)\n/home/roger/Git/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:102: CUDA error"

the ollama run process crashes with

Error: llama runner process has terminated: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR
  current device: 0, in function ggml_cuda_mul_mat_batched_cublas at /home/roger/Git/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:1890
  hipblasGemmBatchedEx(ctx.cublas_handle(), HIPBLAS_OP_T, HIPBLAS_OP_N, ne01, ne11, ne10, alpha, (const void **) (ptrs_src.get() + 0*ne23), HIPBLAS_R_16F, nb01/nb00, (const void **) (ptrs_src.get() + 1*ne23), HIPBLAS_R_16F, nb11/nb10, beta, ( void **) (ptrs_dst.get() + 0*ne23), cu_data_type, ne01, ne23, cu_compute_type, HIPBLAS_GEMM_DEFAULT)
/home/roger/Git/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:102: CUDA error

I can't make any sense of these errors and don't know what else to try.

OS

Linux

GPU

AMD

CPU

AMD

Ollama version

git head

Originally created by @kiaraly on GitHub (Sep 18, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6857 ### What is the issue? I'm trying to get the project to compile on Gentoo but am running into some issues as Gentoo uses different paths. On Gentoo, rocm libraries get installed into /usr/lib64, hip-clang lives somewhere else, and I'm sure there are some other differences as well. As suggested in the wiki, I set the following environment variables to point the build script to the right point `ROCM_PATH=/usr/lib64 CLBlast_DIR=/usr/lib64/cmake/CLBlast`. This got me a bit further, but compilation still failed because the compiler paths were wrong. I edited gen_linux.sh and changed the cmake definition for rocm ``` CMAKE_DEFS="${COMMON_CMAKE_DEFS} ${CMAKE_DEFS} -DGGML_HIPBLAS=on -DGGML_CUDA_NO_PEER_COPY=on -DCMAKE_C_COMPILER=$ROCM_PATH/llvm/bin/clang -DCMAKE_CXX_COMPILER=$ROCM_PATH/llvm/bin/clang++ -DAMDGPU_TARGETS=$(amdGPUs) -DGPU_TARGETS=$(amdGPUs)" ``` to ``` CMAKE_DEFS="${COMMON_CMAKE_DEFS} ${CMAKE_DEFS} -DGGML_HIPBLAS=on -DGGML_CUDA_NO_PEER_COPY=on -DCMAKE_C_COMPILER=$(hipconfig -l)/clang -DCMAKE_CXX_COMPILER=$(hipconfig -l)/clang++ -DAMDGPU_TARGETS=$(amdGPUs) -DGPU_TARGETS=$(amdGPUs)" ``` ([this](https://github.com/ggerganov/llama.cpp/blob/8962422b1c6f9b8b15f5aeaea42600bcc2d44177/docs/build.md#hipblas) seems to be how llama sets their `HIPCXX` path and points to the correct path for me). This got me one step further again, but this time it complained about not finding some cmake files. Looking at the llama documentation again it sets `HIP_PATH` for compilation as well (though wrong) and I modified the build function to export ``` export HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -p)" ``` before compilation. After that, the project compiles correctly, but trying to load any model crashes ollama. The `ollama serve` process reports ``` rocBLAS error: Tensile solution found, but exception thrown for { a_type: "f16_r", b_type: "f16_r", c_type: "f16_r", d_type: "f16_r", compute_type: "f16_r", transA: 'T', transB: 'N', M: 32, N: 2, K: 256, alpha: 1, row_stride_a: 1, col_stride_a: 1024, row_stride_b: 1, col_stride_b: 2048, row_stride_c: 1, col_stride_c: 32, row_stride_d: 1, col_stride_d: 32, beta: 0, batch_count: 8, strided_batch: false, stride_a: 32768, stride_b: 4096, stride_c: 64, stride_d: 64, atomics_mode: atomics_allowed } Alpha value -0.0281982 doesn't match that set in problem: 1 This message will be only be displayed once, unless the ROCBLAS_VERBOSE_TENSILE_ERROR environment variable is set. CUDA error: CUBLAS_STATUS_INTERNAL_ERROR current device: 0, in function ggml_cuda_mul_mat_batched_cublas at /home/roger/Git/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:1890 hipblasGemmBatchedEx(ctx.cublas_handle(), HIPBLAS_OP_T, HIPBLAS_OP_N, ne01, ne11, ne10, alpha, (const void **) (ptrs_src.get() + 0*ne23), HIPBLAS_R_16F, nb01/nb00, (const void **) (ptrs_src.get() + 1*ne23), HIPBLAS_R_16F, nb11/nb10, beta, ( void **) (ptrs_dst.get() + 0*ne23), cu_data_type, ne01, ne23, cu_compute_type, HIPBLAS_GEMM_DEFAULT) /home/roger/Git/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:102: CUDA error time=2024-09-18T14:50:34.883+02:00 level=INFO source=server.go:621 msg="waiting for server to become available" status="llm server not responding" time=2024-09-18T14:50:36.936+02:00 level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR\n current device: 0, in function ggml_cuda_mul_mat_batched_cublas at /home/roger/Git/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:1890\n hipblasGemmBatchedEx(ctx.cublas_handle(), HIPBLAS_OP_T, HIPBLAS_OP_N, ne01, ne11, ne10, alpha, (const void **) (ptrs_src.get() + 0*ne23), HIPBLAS_R_16F, nb01/nb00, (const void **) (ptrs_src.get() + 1*ne23), HIPBLAS_R_16F, nb11/nb10, beta, ( void **) (ptrs_dst.get() + 0*ne23), cu_data_type, ne01, ne23, cu_compute_type, HIPBLAS_GEMM_DEFAULT)\n/home/roger/Git/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:102: CUDA error" ``` the `ollama run` process crashes with ``` Error: llama runner process has terminated: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR current device: 0, in function ggml_cuda_mul_mat_batched_cublas at /home/roger/Git/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:1890 hipblasGemmBatchedEx(ctx.cublas_handle(), HIPBLAS_OP_T, HIPBLAS_OP_N, ne01, ne11, ne10, alpha, (const void **) (ptrs_src.get() + 0*ne23), HIPBLAS_R_16F, nb01/nb00, (const void **) (ptrs_src.get() + 1*ne23), HIPBLAS_R_16F, nb11/nb10, beta, ( void **) (ptrs_dst.get() + 0*ne23), cu_data_type, ne01, ne23, cu_compute_type, HIPBLAS_GEMM_DEFAULT) /home/roger/Git/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:102: CUDA error ``` I can't make any sense of these errors and don't know what else to try. ### OS Linux ### GPU AMD ### CPU AMD ### Ollama version git head

GiteaMirror added the bug label 2026-04-22 09:32:49 -05:00

GiteaMirror closed this issue

2026-04-22 09:32:49 -05:00

GiteaMirror commented

2026-04-22 09:32:50 -05:00

@dhiltgen commented on GitHub (Sep 21, 2024):

We're working on some changes that should make it a bit easier to adapt paths on other OSes going forward. #5034

I'm not sure what this failure is, but it may be rpath related, or possible missing pieces ROCm is expecting in relative or absolute paths. You could try setting AMD_LOG_LEVEL=3 which will cause a lot of verbose logs from the various ROCm libraries which might help narrow it down.

@dhiltgen commented on GitHub (Sep 21, 2024): We're working on some changes that should make it a bit easier to adapt paths on other OSes going forward. #5034 I'm not sure what this failure is, but it may be rpath related, or possible missing pieces ROCm is expecting in relative or absolute paths. You could try setting `AMD_LOG_LEVEL=3` which will cause a lot of verbose logs from the various ROCm libraries which might help narrow it down.

GiteaMirror commented

2026-04-22 09:32:51 -05:00

@kiaraly commented on GitHub (Sep 21, 2024):

The log level didn't look interesting—just about 5000 lines of returning hipSuccess. On a second look, I saw that my ROCM_PATH was wrong. It should've been pointing to /usr instead. Fixing that still didn't make it work, so I went one step further and just copied the lib directory from the releases to my /usr/lib64 and it did work! I'm gonna spend some time figuring out which library causes the issue and then come back.

@kiaraly commented on GitHub (Sep 21, 2024): The log level didn't look interesting—just about 5000 lines of returning hipSuccess. On a second look, I saw that my `ROCM_PATH` was wrong. It should've been pointing to `/usr` instead. Fixing that still didn't make it work, so I went one step further and just copied the lib directory from the releases to my `/usr/lib64` and it did work! I'm gonna spend some time figuring out which library causes the issue and then come back.

GiteaMirror commented

2026-04-22 09:32:52 -05:00

@kiaraly commented on GitHub (Sep 23, 2024):

The issue seems to be with my system install of rocBLAS (librocblas). Could this be something as simple as an incompatible version?

I've had a quick look at the ebuild a1a9b484d8/sci-libs/rocBLAS/rocBLAS-6.1.1.ebuild but neither the patches nor the configure options stood out to me. Could this still be an issue with ollama or should I report it to the gentoo package maintainer?

@kiaraly commented on GitHub (Sep 23, 2024): The issue seems to be with my system install of rocBLAS (librocblas). Could this be something as simple as an incompatible version? I've had a quick look at the ebuild https://github.com/gentoo/gentoo/blob/a1a9b484d807d3af24aacf2cd6318bc28b8187b5/sci-libs/rocBLAS/rocBLAS-6.1.1.ebuild but neither the patches nor the configure options stood out to me. Could this still be an issue with ollama or should I report it to the gentoo package maintainer?

GiteaMirror commented

2026-04-22 09:32:53 -05:00

@waltercool commented on GitHub (Sep 25, 2024):

Did you found the error?

I been getting the same issue, Gentoo as well.

I'm sure this is something Distro based. I made a bug report few days ago, after I found out ROCM was being compiled with LLVM19 even if the ebuild says LLVM18 for ROCM 6.1

@waltercool commented on GitHub (Sep 25, 2024): Did you found the error? I been getting the same issue, Gentoo as well. I'm sure this is something Distro based. I made a bug report few days ago, after I found out ROCM was being compiled with LLVM19 even if the ebuild says LLVM18 for ROCM 6.1

GiteaMirror commented

2026-04-22 09:32:53 -05:00

@kiaraly commented on GitHub (Sep 27, 2024):

I have semi broken my PC and haven't been able to do any further testing. I looked at the Gentoo bugs for rocBLAS and only found this https://bugs.gentoo.org/940231. I'll manually try updating the package(s) over the next couple of days and see if that changes anything like the bug report suggests.

@kiaraly commented on GitHub (Sep 27, 2024): I have semi broken my PC and haven't been able to do any further testing. I looked at the Gentoo bugs for rocBLAS and only found this https://bugs.gentoo.org/940231. I'll manually try updating the package(s) over the next couple of days and see if that changes anything like the bug report suggests.

GiteaMirror commented

2026-04-22 09:32:54 -05:00

@rohitnanda1443 commented on GitHub (Oct 2, 2024):

I just compiled Ollama on Gentoo (after getting frustrated with vllm). Have a Ryzen 8700G / 780m withg 64 GB RAM.

Steps:

Followed the Gentoo ROCm guide: https://wiki.gentoo.org/wiki/ROCm
git clone https://github.com/ollama/ollama.git
3)cd ollama
export AMDGPU_TARGETS="gfx1100;gfx1102"
go generate ./...
go build .
./ollama serve & (as the ollama executable is in the ollama directory)
echo "export HSA_OVERRIDE_GFX_VERSION=11.0.0" >> .profile
echo "export HSA_ENABLE_SDMA=0" >> .profile
export HSA_OVERRIDE_GFX_VERSION=11.0.0
export HSA_ENABLE_SDMA=0
./ollama run mistral:instruct

Output of ./ollama ps
./ollama ps
NAME ID SIZE PROCESSOR UNTIL
mistral:instruct f974a74358d6 6.3 GB 100% GPU 4 minutes from now

Hope this helps.

@rohitnanda1443 commented on GitHub (Oct 2, 2024): I just compiled Ollama on Gentoo (after getting frustrated with vllm). Have a Ryzen 8700G / 780m withg 64 GB RAM. Steps: 1) Followed the Gentoo ROCm guide: https://wiki.gentoo.org/wiki/ROCm 2) git clone https://github.com/ollama/ollama.git 3)cd ollama 4) export AMDGPU_TARGETS="gfx1100;gfx1102" 5) go generate ./... 6) go build . 7) ./ollama serve & (as the ollama executable is in the ollama directory) 8) echo "export HSA_OVERRIDE_GFX_VERSION=11.0.0" >> .profile 9) echo "export HSA_ENABLE_SDMA=0" >> .profile 10) export HSA_OVERRIDE_GFX_VERSION=11.0.0 11) export HSA_ENABLE_SDMA=0 12) ./ollama run mistral:instruct Output of ./ollama ps ./ollama ps NAME ID SIZE PROCESSOR UNTIL mistral:instruct f974a74358d6 6.3 GB 100% GPU 4 minutes from now Hope this helps.

GiteaMirror commented

2026-04-22 09:32:55 -05:00

@kiaraly commented on GitHub (Oct 2, 2024):

If you run ./ollama serve there's a line in the log like this

time=2024-10-02T13:53:08.292+02:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 rocm rocm_v60102]"

Is rocm included in the runners? When I compiled ollama the first time it wasn't and ./ollama ps still reported that the model was saved in vram but the actual computing was still done on the CPU without the changes mentioned in the original comment.

I finally got my new GPU working and all and the error I'm getting changed. Instead of the old one I now get the following and using the bundled rocBLAS.so no longer fixes the issue.

ggml_cuda_compute_forward: SCALE failed
CUDA error: invalid device function
  current device: 0, in function ggml_cuda_compute_forward at /home/roger/Git/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:2326
  err
/home/roger/Git/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:102: CUDA error

When I compiled llama.cpp manually the resulting binary has working rocm support. I'm gonna look at how ollama compiles it and see if I can make any progress from there on.

@kiaraly commented on GitHub (Oct 2, 2024): If you run `./ollama serve` there's a line in the log like this ``` time=2024-10-02T13:53:08.292+02:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 rocm rocm_v60102]" ``` Is rocm included in the runners? When I compiled ollama the first time it wasn't and `./ollama ps` still reported that the model was saved in vram but the actual computing was still done on the CPU without the changes mentioned in the original comment. I finally got my new GPU working and all and the error I'm getting changed. Instead of the old one I now get the following and using the bundled rocBLAS.so no longer fixes the issue. ``` ggml_cuda_compute_forward: SCALE failed CUDA error: invalid device function current device: 0, in function ggml_cuda_compute_forward at /home/roger/Git/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:2326 err /home/roger/Git/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:102: CUDA error ``` When I compiled llama.cpp manually the resulting binary has working rocm support. I'm gonna look at how ollama compiles it and see if I can make any progress from there on.

GiteaMirror commented

2026-04-22 09:32:57 -05:00

@rohitnanda1443 commented on GitHub (Oct 2, 2024):

Output of my ./ollama serve

`2024/10/02 18:34:53 routes.go:1158: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:11.0.0 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2024-10-02T18:34:53.904+05:30 level=INFO source=images.go:753 msg="total blobs: 11"
time=2024-10-02T18:34:53.904+05:30 level=INFO source=images.go:760 msg="total unused blobs removed: 0"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.

using env: export GIN_MODE=release
using code: gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST /api/pull [GIN-debug] POST /api/generate [GIN-debug] POST /api/chat [GIN-debug] POST /api/embed [GIN-debug] POST /api/embeddings [GIN-debug] POST /api/create [GIN-debug] POST /api/push [GIN-debug] POST /api/copy [GIN-debug] DELETE /api/delete [GIN-debug] POST /api/show [GIN-debug] POST /api/blobs/:digest [GIN-debug] HEAD /api/blobs/:digest [GIN-debug] GET /api/ps [GIN-debug] POST /v1/chat/completions [GIN-debug] POST /v1/completions [GIN-debug] POST /v1/embeddings [GIN-debug] GET /v1/models [GIN-debug] GET /v1/models/:model [GIN-debug] GET / [GIN-debug] GET /api/tags [GIN-debug] GET /api/version [GIN-debug] HEAD / [GIN-debug] HEAD /api/tags [GIN-debug] HEAD /api/version time=2024-10-02T18:34:53.905+05:30 time=2024-10-02T18:34:53.905+05:30 time=2024-10-02T18:34:53.919+05:30 time=2024-10-02T18:34:53.919+05:30 time=2024-10-02T18:34:53.922+05:30 time=2024-10-02T18:34:53.923+05:30 time=2024-10-02T18:34:53.923+05:30 ` --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers)
--> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers)
--> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers)
--> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers)
--> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers)
--> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers)
--> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers)
--> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers)
--> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers)
--> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers)
--> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers)
--> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers)
--> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers)
--> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers)
--> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers)
--> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers)
--> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers)
--> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers)
--> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
--> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
--> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
--> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
--> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
--> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
level=INFO source=routes.go:1205 msg="Listening on 127.0.0.1:11434 (version 0.0.0)"
level=INFO source=common.go:135 msg="extracting embedded files" dir=/tmp/ollama1610614431/runners
level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2]"
level=INFO source=gpu.go:199 msg="looking for compatible GPUs"
level=WARN source=amd_linux.go:60 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
level=INFO source=amd_linux.go:349 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=11.0.0
level=INFO source=types.go:107 msg="inference compute" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="16.0 GiB" available="15.4 GiB"

@rohitnanda1443 commented on GitHub (Oct 2, 2024): Output of my ./ollama serve `2024/10/02 18:34:53 routes.go:1158: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:11.0.0 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2024-10-02T18:34:53.904+05:30 level=INFO source=images.go:753 msg="total blobs: 11" time=2024-10-02T18:34:53.904+05:30 level=INFO source=images.go:760 msg="total unused blobs removed: 0" [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached. [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production. - using env: export GIN_MODE=release - using code: gin.SetMode(gin.ReleaseMode) [GIN-debug] POST /api/pull --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers) [GIN-debug] POST /api/generate --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers) [GIN-debug] POST /api/chat --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers) [GIN-debug] POST /api/embed --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers) [GIN-debug] POST /api/embeddings --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers) [GIN-debug] POST /api/create --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers) [GIN-debug] POST /api/push --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers) [GIN-debug] POST /api/copy --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers) [GIN-debug] DELETE /api/delete --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers) [GIN-debug] POST /api/show --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers) [GIN-debug] POST /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers) [GIN-debug] HEAD /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers) [GIN-debug] GET /api/ps --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers) [GIN-debug] POST /v1/chat/completions --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers) [GIN-debug] POST /v1/completions --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers) [GIN-debug] POST /v1/embeddings --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers) [GIN-debug] GET /v1/models --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers) [GIN-debug] GET /v1/models/:model --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers) [GIN-debug] GET / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers) [GIN-debug] GET /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers) [GIN-debug] GET /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) [GIN-debug] HEAD / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers) [GIN-debug] HEAD /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers) [GIN-debug] HEAD /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) time=2024-10-02T18:34:53.905+05:30 level=INFO source=routes.go:1205 msg="Listening on 127.0.0.1:11434 (version 0.0.0)" time=2024-10-02T18:34:53.905+05:30 level=INFO source=common.go:135 msg="extracting embedded files" dir=/tmp/ollama1610614431/runners time=2024-10-02T18:34:53.919+05:30 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2]" time=2024-10-02T18:34:53.919+05:30 level=INFO source=gpu.go:199 msg="looking for compatible GPUs" time=2024-10-02T18:34:53.922+05:30 level=WARN source=amd_linux.go:60 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-10-02T18:34:53.923+05:30 level=INFO source=amd_linux.go:349 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=11.0.0 time=2024-10-02T18:34:53.923+05:30 level=INFO source=types.go:107 msg="inference compute" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="16.0 GiB" available="15.4 GiB" `

GiteaMirror commented

2026-04-22 09:32:57 -05:00

@kiaraly commented on GitHub (Oct 2, 2024):

time=2024-10-02T18:34:53.919+05:30 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2]"

I think it's not using your GPU. I noticed it when I loaded a bigger model (maybe try llama3.1?) and saw a massive speed difference between my compiled version and the binary from the releases. Maybe you could try the same?

@kiaraly commented on GitHub (Oct 2, 2024): > `time=2024-10-02T18:34:53.919+05:30 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2]"` I think it's not using your GPU. I noticed it when I loaded a bigger model (maybe try llama3.1?) and saw a massive speed difference between my compiled version and the binary from the releases. Maybe you could try the same?

GiteaMirror commented

2026-04-22 09:32:58 -05:00

@rohitnanda1443 commented on GitHub (Oct 2, 2024):

Yes you are correct, tried llama-3.1: Similar issues reported by others on Nvidia also: https://github.com/ollama/ollama/issues/4486 (Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support)

Interestingly, I am not getting this issue while running Mistral-7B-v0.3
NAME ID SIZE PROCESSOR UNTIL mistral:instruct f974a74358d6 6.3 GB 100% GPU 4 minutes from now

[GIN] 2024/10/02 - 19:10:51 | 200 | 12.294952ms | 127.0.0.1 | POST "/api/show" time=2024-10-02T19:10:51.823+05:30 level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe gpu=0 parallel=4 available=16494211072 required="6.2 GiB" time=2024-10-02T19:10:51.823+05:30 level=INFO source=server.go:103 msg="system memory" total="46.6 GiB" free="42.5 GiB" free_swap="63.6 GiB" time=2024-10-02T19:10:51.823+05:30 level=INFO source=memory.go:326 msg="offload to rocm" layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[15.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="6.2 GiB" memory.required.partial="6.2 GiB" memory.required.kv="1.0 GiB" memory.required.allocations="[6.2 GiB]" memory.weights.total="4.7 GiB" memory.weights.repeating="4.3 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="560.0 MiB" memory.graph.partial="677.5 MiB" time=2024-10-02T19:10:51.825+05:30 level=INFO source=server.go:388 msg="starting llama server" cmd="/tmp/ollama1610614431/runners/cpu_avx2/ollama_llama_server --model /root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --parallel 4 --port 35323" time=2024-10-02T19:10:51.825+05:30 level=INFO source=sched.go:449 msg="loaded runners" count=1 time=2024-10-02T19:10:51.825+05:30 level=INFO source=server.go:587 msg="waiting for llama runner to start responding" time=2024-10-02T19:10:51.825+05:30 level=INFO source=server.go:621 msg="waiting for server to become available" status="llm server error" **WARN [server_params_parse] Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support |** n_gpu_layers=-1 tid="140523311131776" timestamp=1727876451 INFO [main] build info | build=3670 commit="bf6c2c83" tid="140523311131776" timestamp=1727876451 INFO [main] system info | n_threads=8 n_threads_batch=8 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="140523311131776" timestamp=1727876451 total_threads=16 INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="15" port="35323" tid="140523311131776" timestamp=1727876451 llama_model_loader: loaded meta data with 29 key-value pairs and 292 tensors from /root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Meta Llama 3.1 8B Instruct llama_model_loader: - kv 3: general.finetune str = Instruct llama_model_loader: - kv 4: general.basename str = Meta-Llama-3.1 llama_model_loader: - kv 5: general.size_label str = 8B llama_model_loader: - kv 6: general.license str = llama3.1 llama_model_loader: - kv 7: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam... llama_model_loader: - kv 8: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ... llama_model_loader: - kv 9: llama.block_count u32 = 32 llama_model_loader: - kv 10: llama.context_length u32 = 131072 llama_model_loader: - kv 11: llama.embedding_length u32 = 4096 llama_model_loader: - kv 12: llama.feed_forward_length u32 = 14336 llama_model_loader: - kv 13: llama.attention.head_count u32 = 32 llama_model_loader: - kv 14: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 15: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 16: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 17: general.file_type u32 = 2 llama_model_loader: - kv 18: llama.vocab_size u32 = 128256 llama_model_loader: - kv 19: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 21: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 25: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 27: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... llama_model_loader: - kv 28: general.quantization_version u32 = 2 llama_model_loader: - type f32: 66 tensors llama_model_loader: - type q4_0: 225 tensors llama_model_loader: - type q6_K: 1 tensors ⠹ llm_load_vocab: special tokens cache size = 256 llm_load_vocab: token to piece cache size = 0.7999 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 131072 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 4 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 14336 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 500000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 131072 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: ssm_dt_b_c_rms = 0 llm_load_print_meta: model type = 8B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 8.03 B llm_load_print_meta: model size = 4.33 GiB (4.64 BPW) llm_load_print_meta: general.name = Meta Llama 3.1 8B Instruct llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' llm_load_print_meta: EOS token = 128009 '<|eot_id|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_print_meta: EOT token = 128009 '<|eot_id|>' llm_load_print_meta: max token length = 256 llm_load_tensors: ggml ctx size = 0.14 MiB ⠼ llm_load_tensors: CPU buffer size = 4437.80 MiB time=2024-10-02T19:10:52.268+05:30 level=INFO source=server.go:621 msg="waiting for server to become available" status="llm server loading model" llama_new_context_with_model: n_ctx = 8192 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 500000.0 llama_new_context_with_model: freq_scale = 1 ⠧ llama_kv_cache_init: CPU KV buffer size = 1024.00 MiB llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB llama_new_context_with_model: CPU output buffer size = 2.02 MiB llama_new_context_with_model: CPU compute buffer size = 560.01 MiB llama_new_context_with_model: graph nodes = 1030 llama_new_context_with_model: graph splits = 1 ⠇ INFO [main] model loaded | tid="140523311131776" timestamp=1727876452 ⠏ time=2024-10-02T19:10:52.770+05:30 level=INFO source=server.go:626 msg="llama runner started in 0.95 seconds"

@rohitnanda1443 commented on GitHub (Oct 2, 2024): Yes you are correct, tried llama-3.1: Similar issues reported by others on Nvidia also: https://github.com/ollama/ollama/issues/4486 (**Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support)** Interestingly, I am not getting this issue while running Mistral-7B-v0.3 `NAME ID SIZE PROCESSOR UNTIL mistral:instruct f974a74358d6 6.3 GB 100% GPU 4 minutes from now ` `[GIN] 2024/10/02 - 19:10:51 | 200 | 12.294952ms | 127.0.0.1 | POST "/api/show" time=2024-10-02T19:10:51.823+05:30 level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe gpu=0 parallel=4 available=16494211072 required="6.2 GiB" time=2024-10-02T19:10:51.823+05:30 level=INFO source=server.go:103 msg="system memory" total="46.6 GiB" free="42.5 GiB" free_swap="63.6 GiB" time=2024-10-02T19:10:51.823+05:30 level=INFO source=memory.go:326 msg="offload to rocm" layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[15.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="6.2 GiB" memory.required.partial="6.2 GiB" memory.required.kv="1.0 GiB" memory.required.allocations="[6.2 GiB]" memory.weights.total="4.7 GiB" memory.weights.repeating="4.3 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="560.0 MiB" memory.graph.partial="677.5 MiB" time=2024-10-02T19:10:51.825+05:30 level=INFO source=server.go:388 msg="starting llama server" cmd="/tmp/ollama1610614431/runners/cpu_avx2/ollama_llama_server --model /root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --parallel 4 --port 35323" time=2024-10-02T19:10:51.825+05:30 level=INFO source=sched.go:449 msg="loaded runners" count=1 time=2024-10-02T19:10:51.825+05:30 level=INFO source=server.go:587 msg="waiting for llama runner to start responding" time=2024-10-02T19:10:51.825+05:30 level=INFO source=server.go:621 msg="waiting for server to become available" status="llm server error" **WARN [server_params_parse] Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support |** n_gpu_layers=-1 tid="140523311131776" timestamp=1727876451 INFO [main] build info | build=3670 commit="bf6c2c83" tid="140523311131776" timestamp=1727876451 INFO [main] system info | n_threads=8 n_threads_batch=8 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="140523311131776" timestamp=1727876451 total_threads=16 INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="15" port="35323" tid="140523311131776" timestamp=1727876451 llama_model_loader: loaded meta data with 29 key-value pairs and 292 tensors from /root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Meta Llama 3.1 8B Instruct llama_model_loader: - kv 3: general.finetune str = Instruct llama_model_loader: - kv 4: general.basename str = Meta-Llama-3.1 llama_model_loader: - kv 5: general.size_label str = 8B llama_model_loader: - kv 6: general.license str = llama3.1 llama_model_loader: - kv 7: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam... llama_model_loader: - kv 8: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ... llama_model_loader: - kv 9: llama.block_count u32 = 32 llama_model_loader: - kv 10: llama.context_length u32 = 131072 llama_model_loader: - kv 11: llama.embedding_length u32 = 4096 llama_model_loader: - kv 12: llama.feed_forward_length u32 = 14336 llama_model_loader: - kv 13: llama.attention.head_count u32 = 32 llama_model_loader: - kv 14: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 15: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 16: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 17: general.file_type u32 = 2 llama_model_loader: - kv 18: llama.vocab_size u32 = 128256 llama_model_loader: - kv 19: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 21: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 25: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 27: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... llama_model_loader: - kv 28: general.quantization_version u32 = 2 llama_model_loader: - type f32: 66 tensors llama_model_loader: - type q4_0: 225 tensors llama_model_loader: - type q6_K: 1 tensors ⠹ llm_load_vocab: special tokens cache size = 256 llm_load_vocab: token to piece cache size = 0.7999 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 131072 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 4 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 14336 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 500000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 131072 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: ssm_dt_b_c_rms = 0 llm_load_print_meta: model type = 8B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 8.03 B llm_load_print_meta: model size = 4.33 GiB (4.64 BPW) llm_load_print_meta: general.name = Meta Llama 3.1 8B Instruct llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' llm_load_print_meta: EOS token = 128009 '<|eot_id|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_print_meta: EOT token = 128009 '<|eot_id|>' llm_load_print_meta: max token length = 256 llm_load_tensors: ggml ctx size = 0.14 MiB ⠼ llm_load_tensors: CPU buffer size = 4437.80 MiB time=2024-10-02T19:10:52.268+05:30 level=INFO source=server.go:621 msg="waiting for server to become available" status="llm server loading model" llama_new_context_with_model: n_ctx = 8192 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 500000.0 llama_new_context_with_model: freq_scale = 1 ⠧ llama_kv_cache_init: CPU KV buffer size = 1024.00 MiB llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB llama_new_context_with_model: CPU output buffer size = 2.02 MiB llama_new_context_with_model: CPU compute buffer size = 560.01 MiB llama_new_context_with_model: graph nodes = 1030 llama_new_context_with_model: graph splits = 1 ⠇ INFO [main] model loaded | tid="140523311131776" timestamp=1727876452 ⠏ time=2024-10-02T19:10:52.770+05:30 level=INFO source=server.go:626 msg="llama runner started in 0.95 seconds"`

GiteaMirror commented

2026-04-22 09:32:59 -05:00

@kiaraly commented on GitHub (Oct 2, 2024):

If you want to try as well you can apply this patch and compile ollama with ROCM_PATH=/usr CLBlast_DIR=/usr/lib64/cmake/CLBlast AMDGPU_TARGETS="gfx1100" go generate './...' (replace the gpu target with your version).

diff --git a/llm/generate/gen_common.sh b/llm/generate/gen_common.sh
index 3825c155..513ac9d2 100644
--- a/llm/generate/gen_common.sh
+++ b/llm/generate/gen_common.sh
@@ -76,6 +76,7 @@ apply_patches() {
 }
 
 build() {
+	export HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -p)"
     cmake -S ${LLAMACPP_DIR} -B ${BUILD_DIR} ${CMAKE_DEFS}
     cmake --build ${BUILD_DIR} ${CMAKE_TARGETS} -j8
     # remove unnecessary build artifacts
diff --git a/llm/generate/gen_linux.sh b/llm/generate/gen_linux.sh
index 48d08fd0..0eebeab4 100755
--- a/llm/generate/gen_linux.sh
+++ b/llm/generate/gen_linux.sh
@@ -260,11 +260,11 @@ fi
 
 if [ -z "${OLLAMA_SKIP_ROCM_GENERATE}" -a -d "${ROCM_PATH}" ]; then
     echo "ROCm libraries detected - building dynamic ROCm library"
-    if [ -f ${ROCM_PATH}/lib/librocblas.so.*.*.????? ]; then
-        ROCM_VARIANT=_v$(ls ${ROCM_PATH}/lib/librocblas.so.*.*.????? | cut -f5 -d. || true)
+    if [ -f ${ROCM_PATH}/lib64/librocblas.so.*.*.????? ]; then
+        ROCM_VARIANT=_v$(ls ${ROCM_PATH}/lib64/librocblas.so.*.*.????? | cut -f5 -d. || true)
     fi
     init_vars
-    CMAKE_DEFS="${COMMON_CMAKE_DEFS} ${CMAKE_DEFS} -DGGML_HIPBLAS=on -DGGML_CUDA_NO_PEER_COPY=on -DCMAKE_C_COMPILER=$ROCM_PATH/llvm/bin/clang -DCMAKE_CXX_COMPILER=$ROCM_PATH/llvm/bin/clang++ -DAMDGPU_TARGETS=$(amdGPUs) -DGPU_TARGETS=$(amdGPUs)"
+	CMAKE_DEFS="${COMMON_CMAKE_DEFS} ${CMAKE_DEFS} -DGGML_HIPBLAS=on -DGGML_CUDA_NO_PEER_COPY=on -DCMAKE_C_COMPILER=$(hipconfig -l)/clang -DCMAKE_CXX_COMPILER=$(hipconfig -l)/clang++ -DAMDGPU_TARGETS=$(amdGPUs) -DGPU_TARGETS=$(amdGPUs)"
     # Users building from source can tune the exact flags we pass to cmake for configuring llama.cpp
     if [ -n "${OLLAMA_CUSTOM_ROCM_DEFS}" ]; then
         echo "OLLAMA_CUSTOM_ROCM_DEFS=\"${OLLAMA_CUSTOM_ROCM_DEFS}\""
@@ -277,7 +277,7 @@ if [ -z "${OLLAMA_SKIP_ROCM_GENERATE}" -a -d "${ROCM_PATH}" ]; then
     ROCM_DIST_DIR="${DIST_BASE}/../linux-${GOARCH}-rocm/lib/ollama"
     # TODO figure out how to disable runpath (rpath)
     # export CMAKE_HIP_FLAGS="-fno-rtlib-add-rpath" # doesn't work
-    export LLAMA_SERVER_LDFLAGS="-L${ROCM_PATH}/lib -L/opt/amdgpu/lib/x86_64-linux-gnu/ -lhipblas -lrocblas -lamdhip64 -lrocsolver -lamd_comgr -lhsa-runtime64 -lrocsparse -ldrm -ldrm_amdgpu"
+    export LLAMA_SERVER_LDFLAGS="-L${ROCM_PATH}/lib -L${ROCM_PATH}/lib64 -L/opt/amdgpu/lib/x86_64-linux-gnu/ -lhipblas -lrocblas -lamdhip64 -lrocsolver -lamd_comgr -lhsa-runtime64 -lrocsparse -ldrm -ldrm_amdgpu"
     build
 
     # copy the ROCM dependencies

I thought I had made some progress on the error but that seems to have been wrong. Maybe I'll just have to wait until the changes mentioned in https://github.com/ollama/ollama/issues/6857#issuecomment-2364782947 have been done and try again afterwards.

@kiaraly commented on GitHub (Oct 2, 2024): If you want to try as well you can apply this patch and compile ollama with `ROCM_PATH=/usr CLBlast_DIR=/usr/lib64/cmake/CLBlast AMDGPU_TARGETS="gfx1100" go generate './...'` (replace the gpu target with your version). ```diff diff --git a/llm/generate/gen_common.sh b/llm/generate/gen_common.sh index 3825c155..513ac9d2 100644 --- a/llm/generate/gen_common.sh +++ b/llm/generate/gen_common.sh @@ -76,6 +76,7 @@ apply_patches() { } build() { + export HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -p)" cmake -S ${LLAMACPP_DIR} -B ${BUILD_DIR} ${CMAKE_DEFS} cmake --build ${BUILD_DIR} ${CMAKE_TARGETS} -j8 # remove unnecessary build artifacts diff --git a/llm/generate/gen_linux.sh b/llm/generate/gen_linux.sh index 48d08fd0..0eebeab4 100755 --- a/llm/generate/gen_linux.sh +++ b/llm/generate/gen_linux.sh @@ -260,11 +260,11 @@ fi if [ -z "${OLLAMA_SKIP_ROCM_GENERATE}" -a -d "${ROCM_PATH}" ]; then echo "ROCm libraries detected - building dynamic ROCm library" - if [ -f ${ROCM_PATH}/lib/librocblas.so.*.*.????? ]; then - ROCM_VARIANT=_v$(ls ${ROCM_PATH}/lib/librocblas.so.*.*.????? | cut -f5 -d. || true) + if [ -f ${ROCM_PATH}/lib64/librocblas.so.*.*.????? ]; then + ROCM_VARIANT=_v$(ls ${ROCM_PATH}/lib64/librocblas.so.*.*.????? | cut -f5 -d. || true) fi init_vars - CMAKE_DEFS="${COMMON_CMAKE_DEFS} ${CMAKE_DEFS} -DGGML_HIPBLAS=on -DGGML_CUDA_NO_PEER_COPY=on -DCMAKE_C_COMPILER=$ROCM_PATH/llvm/bin/clang -DCMAKE_CXX_COMPILER=$ROCM_PATH/llvm/bin/clang++ -DAMDGPU_TARGETS=$(amdGPUs) -DGPU_TARGETS=$(amdGPUs)" + CMAKE_DEFS="${COMMON_CMAKE_DEFS} ${CMAKE_DEFS} -DGGML_HIPBLAS=on -DGGML_CUDA_NO_PEER_COPY=on -DCMAKE_C_COMPILER=$(hipconfig -l)/clang -DCMAKE_CXX_COMPILER=$(hipconfig -l)/clang++ -DAMDGPU_TARGETS=$(amdGPUs) -DGPU_TARGETS=$(amdGPUs)" # Users building from source can tune the exact flags we pass to cmake for configuring llama.cpp if [ -n "${OLLAMA_CUSTOM_ROCM_DEFS}" ]; then echo "OLLAMA_CUSTOM_ROCM_DEFS=\"${OLLAMA_CUSTOM_ROCM_DEFS}\"" @@ -277,7 +277,7 @@ if [ -z "${OLLAMA_SKIP_ROCM_GENERATE}" -a -d "${ROCM_PATH}" ]; then ROCM_DIST_DIR="${DIST_BASE}/../linux-${GOARCH}-rocm/lib/ollama" # TODO figure out how to disable runpath (rpath) # export CMAKE_HIP_FLAGS="-fno-rtlib-add-rpath" # doesn't work - export LLAMA_SERVER_LDFLAGS="-L${ROCM_PATH}/lib -L/opt/amdgpu/lib/x86_64-linux-gnu/ -lhipblas -lrocblas -lamdhip64 -lrocsolver -lamd_comgr -lhsa-runtime64 -lrocsparse -ldrm -ldrm_amdgpu" + export LLAMA_SERVER_LDFLAGS="-L${ROCM_PATH}/lib -L${ROCM_PATH}/lib64 -L/opt/amdgpu/lib/x86_64-linux-gnu/ -lhipblas -lrocblas -lamdhip64 -lrocsolver -lamd_comgr -lhsa-runtime64 -lrocsparse -ldrm -ldrm_amdgpu" build # copy the ROCM dependencies ``` I thought I had made some progress on the error but that seems to have been wrong. Maybe I'll just have to wait until the changes mentioned in https://github.com/ollama/ollama/issues/6857#issuecomment-2364782947 have been done and try again afterwards.

GiteaMirror commented

2026-04-22 09:32:59 -05:00

@waltercool commented on GitHub (Oct 2, 2024):

Using hipconfig is the correct way

@waltercool commented on GitHub (Oct 2, 2024): Using hipconfig is the correct way

GiteaMirror commented

2026-04-22 09:33:00 -05:00

@ProjectMoon commented on GitHub (Oct 3, 2024):

It seems the guru ebuild on Gentoo doesn't properly compile in GPU support, even when nvidia or amd use flags are enabled. This is my experience using the ebuild (which I just tested briefly). I was able to run it fine, it just offloaded to CPU instead of GPU. Interestingly, the compiled version from the ebuild finds the ROCm device on startup and considers it as an inference resource. But when running the model, the llama.cpp subprocess says it wasn't compiled with GPU support.

@ProjectMoon commented on GitHub (Oct 3, 2024): It seems the guru ebuild on Gentoo doesn't properly compile in GPU support, even when nvidia or amd use flags are enabled. This is my experience using the ebuild (which I just tested briefly). I _was_ able to run it fine, it just offloaded to CPU instead of GPU. Interestingly, the compiled version from the ebuild finds the ROCm device on startup and considers it as an inference resource. But when running the model, the llama.cpp subprocess says it wasn't compiled with GPU support.

GiteaMirror commented

2026-04-22 09:33:01 -05:00

@kiaraly commented on GitHub (Oct 17, 2024):

I don't know if it's progress but I'm getting a different error now. Given that everything seems to work with a different librocblas.so I've decided to hack a bit on the ebuild. I commented out this part of the ebuild and now get the following error

ggml_cuda_compute_forward: SCALE failed
CUDA error: invalid device function
  current device: 0, in function ggml_cuda_compute_forward at /home/roger/Git/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:2326
  err
/home/roger/Git/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:102: CUDA error

Compiling llama.cpp manually doesn't have this error and instead just produces gibberish.

@kiaraly commented on GitHub (Oct 17, 2024): I don't know if it's progress but I'm getting a different error now. Given that everything seems to work with a different librocblas.so I've decided to hack a bit on the ebuild. I commented out [this part of the ebuild](https://github.com/gentoo/gentoo/blob/f9c02033d5657758c53ff90e982b63cf0578b9fa/sci-libs/rocBLAS/rocBLAS-6.1.1.ebuild#L55C1-L58C2) and now get the following error ``` ggml_cuda_compute_forward: SCALE failed CUDA error: invalid device function current device: 0, in function ggml_cuda_compute_forward at /home/roger/Git/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:2326 err /home/roger/Git/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:102: CUDA error ``` Compiling llama.cpp manually doesn't have this error and instead just produces gibberish.

GiteaMirror commented

2026-04-22 09:33:01 -05:00

@dhiltgen commented on GitHub (Oct 24, 2024):

It may need some adjusting, but please give the new Go server build a try. It no longer relies on cmake.

https://github.com/ollama/ollama/blob/main/docs/development.md#transition-to-go-runner

@dhiltgen commented on GitHub (Oct 24, 2024): It may need some adjusting, but please give the new Go server build a try. It no longer relies on cmake. https://github.com/ollama/ollama/blob/main/docs/development.md#transition-to-go-runner

GiteaMirror commented

2026-04-22 09:33:02 -05:00

@kiaraly commented on GitHub (Oct 24, 2024):

With the new Go server building the project was much smoother than before. It compiled correctly and tries to use my GPU but sadly fails. When trying to run gemma2:27b or gemma2:2b I get the error I mentioned in my previous comment. I also tried running llama2-uncensored and got a somewhat different error.

ggml_cuda_compute_forward: RMS_NORM failed
CUDA error: invalid device function
  current device: 0, in function ggml_cuda_compute_forward at /home/roger/Git/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:2326
  err
/home/roger/Git/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:102: CUDA error

I searched online for RMS_NORM failed but didn't find anything helpful sadly.

Do you think this even is an ollama error? Or should I open an issue in llama.cpp about this?

@kiaraly commented on GitHub (Oct 24, 2024): With the new Go server building the project was much smoother than before. It compiled correctly and tries to use my GPU but sadly fails. When trying to run gemma2:27b or gemma2:2b I get the error I mentioned in my previous comment. I also tried running llama2-uncensored and got a somewhat different error. ``` ggml_cuda_compute_forward: RMS_NORM failed CUDA error: invalid device function current device: 0, in function ggml_cuda_compute_forward at /home/roger/Git/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:2326 err /home/roger/Git/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:102: CUDA error ``` I searched online for `RMS_NORM failed` but didn't find anything helpful sadly. Do you think this even is an ollama error? Or should I open an issue in llama.cpp about this?

GiteaMirror commented

2026-04-22 09:33:03 -05:00

@stalkerg commented on GitHub (Oct 29, 2024):

Using hipconfig is the correct way

do we have cmake integration with hipconfig?

@stalkerg commented on GitHub (Oct 29, 2024): > Using hipconfig is the correct way do we have cmake integration with hipconfig?

GiteaMirror commented

2026-04-22 09:33:04 -05:00

@lubosz commented on GitHub (Nov 10, 2024):

@Roger-Roger-debug If you see the CUBLAS_STATUS_INTERNAL_ERROR in hipblasGemmBatchedEx on llama.cpp as well you can hop onto my issue:
https://github.com/ggerganov/llama.cpp/issues/10234

I have the same error on lama.cpp and ollama on Arch Linux.

@lubosz commented on GitHub (Nov 10, 2024): @Roger-Roger-debug If you see the `CUBLAS_STATUS_INTERNAL_ERROR` in `hipblasGemmBatchedEx` on `llama.cpp` as well you can hop onto my issue: https://github.com/ggerganov/llama.cpp/issues/10234 I have the same error on `lama.cpp` and `ollama` on Arch Linux.

GiteaMirror commented

2026-04-22 09:33:05 -05:00

@kiaraly commented on GitHub (Nov 30, 2024):

I've reported the underlying rocblas issue to the Gentoo bug-tracker (https://bugs.gentoo.org/944820).
With the patch everything works on #7499 so as far as Ollama is concerned this issue can be closed once the PR gets merged.

@kiaraly commented on GitHub (Nov 30, 2024): I've reported the underlying rocblas issue to the Gentoo bug-tracker (https://bugs.gentoo.org/944820). With the patch everything works on #7499 so as far as Ollama is concerned this issue can be closed once the PR gets merged.

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

hoyyeva/editor-config-repair

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

hoyyeva/launch-backup-ux

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-mlx-decode-checkpoints

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

brucemacd/download-before-remove

parth/update-claude-docs

parth-anthropic-reference-images-path

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#30087