[GH-ISSUE #2396] llama.cpp now supports Vulkan #1394

Closed
opened 2026-04-12 11:13:28 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @ddpasa on GitHub (Feb 7, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2396

Originally assigned to: @dhiltgen on GitHub.

As of 10 days ago: 2307523d32

This is great news for people who non-CUDA cards.

What's necessary to support this with Ollama? I'm happy to help if you show me the pointers.

Originally created by @ddpasa on GitHub (Feb 7, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2396 Originally assigned to: @dhiltgen on GitHub. As of 10 days ago: https://github.com/ggerganov/llama.cpp/commit/2307523d322af762ae06648b29ec5a9eb1c73032 This is great news for people who non-CUDA cards. What's necessary to support this with Ollama? I'm happy to help if you show me the pointers.
GiteaMirror added the gpu label 2026-04-12 11:13:28 -05:00
Author
Owner

@ddpasa commented on GitHub (Feb 8, 2024):

I managed to compile ollama with the following code snippet gen_linux.sh and it builds a vulkan version:

OLLAMA_CUSTOM_CPU_DEFS="-DLLAMA_VULKAN=1 -DLLAMA_AVX=on -DLLAMA_AVX2=on -DLLAMA_AVX512=on -DLLAMA_FMA=on -DLLAMA_AVX512_VBMI=on -DLLAMA_AVX512_VNNI=on -DLLAMA_F16C=on -DCMAKE_BUILD_TYPE=Release -DLLAMA_SERVER_VERBOSE=on" go generate ./...
go build .

I'm now getting a very cryptic segfault. Debugging...

Edit: segfault fixed, I was forgetting to load libvulkan. Now it runs, but produces empty output. Continuing to debug...

Edit2: Phi-2 is running on Vulkan, but the outputs from the CPU version and the Vulkan version are different. Nice speedup though...

<!-- gh-comment-id:1933594219 --> @ddpasa commented on GitHub (Feb 8, 2024): I managed to compile ollama with the following code snippet gen_linux.sh and it builds a vulkan version: ``` OLLAMA_CUSTOM_CPU_DEFS="-DLLAMA_VULKAN=1 -DLLAMA_AVX=on -DLLAMA_AVX2=on -DLLAMA_AVX512=on -DLLAMA_FMA=on -DLLAMA_AVX512_VBMI=on -DLLAMA_AVX512_VNNI=on -DLLAMA_F16C=on -DCMAKE_BUILD_TYPE=Release -DLLAMA_SERVER_VERBOSE=on" go generate ./... go build . ``` I'm now getting a very cryptic segfault. Debugging... Edit: segfault fixed, I was forgetting to load libvulkan. Now it runs, but produces empty output. Continuing to debug... Edit2: Phi-2 is running on Vulkan, but the outputs from the CPU version and the Vulkan version are different. Nice speedup though...
Author
Owner

@ddpasa commented on GitHub (Feb 9, 2024):

I was able to get llama.cpp compiled with the following, and confirm that it works. However, when I try to hack gen_commons.sh, I always get empty or grabled output. I'm not very familiar with how ollama builds llama.cpp, so I'm probably messing something up. Tagging @dhiltgen because he was kind enough to help me in the AVX thread.

working llama.cpp config:

mkdir build
cd build
cmake .. -DLLAMA_VULKAN=1
cmake --build . --config Release
# now test:
./build/bin/main -m ggml-model-q4_0.gguf -p "Hi you how are you" -n 50 -e -ngl 0 -t 4

ollama gen_commons.sh that compiles fine, but produces garbled output:

cmake -S ${LLAMACPP_DIR} -B ${BUILD_DIR} -DLLAMA_VULKAN=1 -DCMAKE_BUILD_TYPE=Release -DCMAKE_POSITION_INDEPENDENT_CODE=on -DLLAMA_SERVER_VERBOSE=on -DLLAMA_AVX=on -DLLAMA_AVX2=on -DLLAMA_FMA=on 
cmake --build ${BUILD_DIR} ${CMAKE_TARGETS} -j8
mkdir -p ${BUILD_DIR}/lib/
g++ -fPIC -g -shared -o ${BUILD_DIR}/lib/libext_server.${LIB_EXT} \
        ${GCC_ARCH} \
        ${WHOLE_ARCHIVE} ${BUILD_DIR}/examples/server/libext_server.a ${NO_WHOLE_ARCHIVE} \
        ${BUILD_DIR}/common/libcommon.a \
        ${BUILD_DIR}/libllama.a \
        -Wl,-rpath,\$ORIGIN \
        -lpthread -ldl -lm -lvulkan \
        ${EXTRA_LIBS}
<!-- gh-comment-id:1936133022 --> @ddpasa commented on GitHub (Feb 9, 2024): I was able to get llama.cpp compiled with the following, and confirm that it works. However, when I try to hack [gen_commons.sh](https://github.com/ollama/ollama/blob/main/llm/generate/gen_common.sh#L85), I always get empty or grabled output. I'm not very familiar with how ollama builds llama.cpp, so I'm probably messing something up. Tagging @dhiltgen because he was kind enough to help me in the [AVX thread.](https://github.com/ollama/ollama/issues/2205) working llama.cpp config: ``` mkdir build cd build cmake .. -DLLAMA_VULKAN=1 cmake --build . --config Release # now test: ./build/bin/main -m ggml-model-q4_0.gguf -p "Hi you how are you" -n 50 -e -ngl 0 -t 4 ``` ollama gen_commons.sh that compiles fine, but produces garbled output: ``` cmake -S ${LLAMACPP_DIR} -B ${BUILD_DIR} -DLLAMA_VULKAN=1 -DCMAKE_BUILD_TYPE=Release -DCMAKE_POSITION_INDEPENDENT_CODE=on -DLLAMA_SERVER_VERBOSE=on -DLLAMA_AVX=on -DLLAMA_AVX2=on -DLLAMA_FMA=on cmake --build ${BUILD_DIR} ${CMAKE_TARGETS} -j8 mkdir -p ${BUILD_DIR}/lib/ g++ -fPIC -g -shared -o ${BUILD_DIR}/lib/libext_server.${LIB_EXT} \ ${GCC_ARCH} \ ${WHOLE_ARCHIVE} ${BUILD_DIR}/examples/server/libext_server.a ${NO_WHOLE_ARCHIVE} \ ${BUILD_DIR}/common/libcommon.a \ ${BUILD_DIR}/libllama.a \ -Wl,-rpath,\$ORIGIN \ -lpthread -ldl -lm -lvulkan \ ${EXTRA_LIBS} ```
Author
Owner

@dhiltgen commented on GitHub (Mar 21, 2024):

We have multiple issues tracking Vulkan support - lets use #2033

<!-- gh-comment-id:2012381264 --> @dhiltgen commented on GitHub (Mar 21, 2024): We have multiple issues tracking Vulkan support - lets use #2033
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#1394