[GH-ISSUE #8038] undefined reference to `ggml_backend_cuda_reg' #51651

Closed
opened 2026-04-28 20:42:10 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @regularRandom on GitHub (Dec 11, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/8038

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

0.5.2-rc0 build fails with the following error message:

/usr/bin/ld: /tmp/go-link-3808825391/000013.o: in function ggml_backend_registry::ggml_backend_registry()': /_/github.com/ollama/ollama/llama/ggml-backend-reg.cpp:164: undefined reference to ggml_backend_cuda_reg'
collect2: error: ld returned 1 exit status

make[1]: *** [make/gpu.make:63: llama/build/linux-amd64/runners/cuda_v12/ollama_llama_server] Error 1
make: *** [Makefile:50: cuda_v12] Error 2

Same for 0.5.1.

nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Oct_29_23:50:19_PDT_2024
Cuda compilation tools, release 12.6, V12.6.85
Build cuda_12.6.r12.6/compiler.35059454_0

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.5.2-rc0-0-g527cc97

Originally created by @regularRandom on GitHub (Dec 11, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/8038 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? 0.5.2-rc0 build fails with the following error message: > /usr/bin/ld: /tmp/go-link-3808825391/000013.o: in function `ggml_backend_registry::ggml_backend_registry()': > /_/github.com/ollama/ollama/llama/ggml-backend-reg.cpp:164: undefined reference to `ggml_backend_cuda_reg' > collect2: error: ld returned 1 exit status > > make[1]: *** [make/gpu.make:63: llama/build/linux-amd64/runners/cuda_v12/ollama_llama_server] Error 1 > make: *** [Makefile:50: cuda_v12] Error 2 Same for 0.5.1. nvcc --version > nvcc: NVIDIA (R) Cuda compiler driver > Copyright (c) 2005-2024 NVIDIA Corporation > Built on Tue_Oct_29_23:50:19_PDT_2024 > Cuda compilation tools, release 12.6, V12.6.85 > Build cuda_12.6.r12.6/compiler.35059454_0 > ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.5.2-rc0-0-g527cc97
GiteaMirror added the buildbuglinux labels 2026-04-28 20:42:11 -05:00
Author
Owner

@dhiltgen commented on GitHub (Dec 11, 2024):

main currently builds on my linux environments, so this might be distro or gcc/clang compiler version specific. Can you share those details as well?

<!-- gh-comment-id:2536686488 --> @dhiltgen commented on GitHub (Dec 11, 2024): main currently builds on my linux environments, so this might be distro or gcc/clang compiler version specific. Can you share those details as well?
Author
Owner

@regularRandom commented on GitHub (Dec 12, 2024):

CentOS Stream release 9
6.12.4-1.el9.elrepo.x86_64

gcc --version
gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-2)

clang --version
clang version 19.1.3 (CentOS 19.1.3-1.el9)
Target: x86_64-redhat-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Configuration file: /etc/clang/x86_64-redhat-linux-gnu-clang.cfg

Here is my script to build from main:

#!/bin/bash

OLLAMA_HOME=/opt/ollama
OLLAMA_SRC=/usr/src/ollama
OLLAMA_DIST=${OLLAMA_SRC}/dist/linux-amd64/lib/ollama

export CUDA_ARCHITECTURES=70
export CUSTOM_CPU_FLAGS=avx2

# build

echo "Building..."

make clean

git pull origin main
rm -f ${OLLAMA_SRC}/llm/build/linux/x86_64/cuda_v12/CMakeCache.txt
rm -rf ${OLLAMA_SRC}/llama/build/linux-amd64

export VERSION=${VERSION:-$(git describe --tags --first-parent --abbrev=7 --long --dirty --always | sed -e "s/^v//g")}
export GOFLAGS="'-ldflags=-w -s \"-X=github.com/ollama/ollama/version.Version=$VERSION\" \"-X=github.com/ollama/ollama/server.mode=release\"'"

make -j -S
go generate ./...
go build .

Here is the output:

Building...
rm -rf ./llama/build/linux-amd64 ./dist/linux-amd64/lib/ollama ./ollama ./dist/linux-amd64/bin/ollama
go clean -cache
From github.com:ollama/ollama

  • branch main -> FETCH_HEAD
    Updating b1fd7fef..18f6a98b
    Fast-forward
    llama/grammar_test.go | 6 +-----
    llama/json-schema-to-grammar.cpp | 2 +-
    llama/patches/0012-Maintain-ordering-for-rules-for-grammar.patch | 22 ++++++++++++++++++++++
    3 files changed, 24 insertions(+), 6 deletions(-)
    create mode 100644 llama/patches/0012-Maintain-ordering-for-rules-for-grammar.patch
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/acc.cuda_v12.o llama/ggml-cuda/acc.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/arange.cuda_v12.o llama/ggml-cuda/arange.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/argmax.cuda_v12.o llama/ggml-cuda/argmax.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/argsort.cuda_v12.o llama/ggml-cuda/argsort.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/binbcast.cuda_v12.o llama/ggml-cuda/binbcast.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/clamp.cuda_v12.o llama/ggml-cuda/clamp.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/concat.cuda_v12.o llama/ggml-cuda/concat.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/convert.cuda_v12.o llama/ggml-cuda/convert.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/conv-transpose-1d.cuda_v12.o llama/ggml-cuda/conv-transpose-1d.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/count-equal.cuda_v12.o llama/ggml-cuda/count-equal.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/cpy.cuda_v12.o llama/ggml-cuda/cpy.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/cross-entropy-loss.cuda_v12.o llama/ggml-cuda/cross-entropy-loss.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/diagmask.cuda_v12.o llama/ggml-cuda/diagmask.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/getrows.cuda_v12.o llama/ggml-cuda/getrows.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/ggml-cuda.cuda_v12.o llama/ggml-cuda/ggml-cuda.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/im2col.cuda_v12.o llama/ggml-cuda/im2col.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/mmq.cuda_v12.o llama/ggml-cuda/mmq.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/mmv.cuda_v12.o llama/ggml-cuda/mmv.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/mmvq.cuda_v12.o llama/ggml-cuda/mmvq.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/norm.cuda_v12.o llama/ggml-cuda/norm.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/opt-step-adamw.cuda_v12.o llama/ggml-cuda/opt-step-adamw.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/out-prod.cuda_v12.o llama/ggml-cuda/out-prod.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/pad.cuda_v12.o llama/ggml-cuda/pad.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/pool2d.cuda_v12.o llama/ggml-cuda/pool2d.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/quantize.cuda_v12.o llama/ggml-cuda/quantize.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/rope.cuda_v12.o llama/ggml-cuda/rope.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/scale.cuda_v12.o llama/ggml-cuda/scale.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/softmax.cuda_v12.o llama/ggml-cuda/softmax.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/sum.cuda_v12.o llama/ggml-cuda/sum.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/sumrows.cuda_v12.o llama/ggml-cuda/sumrows.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/tsembd.cuda_v12.o llama/ggml-cuda/tsembd.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/unary.cuda_v12.o llama/ggml-cuda/unary.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/upscale.cuda_v12.o llama/ggml-cuda/upscale.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/wkv6.cuda_v12.o llama/ggml-cuda/wkv6.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq1_s.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-iq1_s.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq2_s.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-iq2_s.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq2_xs.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-iq2_xs.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq2_xxs.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-iq2_xxs.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq3_s.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-iq3_s.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq3_xxs.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-iq3_xxs.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq4_nl.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-iq4_nl.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq4_xs.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-iq4_xs.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q2_k.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-q2_k.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q3_k.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-q3_k.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q4_0.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-q4_0.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q4_1.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-q4_1.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q4_k.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-q4_k.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q5_0.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-q5_0.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q5_1.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-q5_1.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q5_k.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-q5_k.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q6_k.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-q6_k.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q8_0.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-q8_0.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -o llama/build/linux-amd64/llama/ggml.cuda_v12.o llama/ggml.c
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -o llama/build/linux-amd64/llama/ggml-backend.cuda_v12.o llama/ggml-backend.cpp
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -o llama/build/linux-amd64/llama/ggml-alloc.cuda_v12.o llama/ggml-alloc.c
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -o llama/build/linux-amd64/llama/ggml-quants.cuda_v12.o llama/ggml-quants.c
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -o llama/build/linux-amd64/llama/sgemm.cuda_v12.o llama/sgemm.cpp
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -o llama/build/linux-amd64/llama/ggml-aarch64.cuda_v12.o llama/ggml-aarch64.c
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -o llama/build/linux-amd64/llama/ggml-threading.cuda_v12.o llama/ggml-threading.cpp
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/fattn.cuda_v12.o llama/ggml-cuda/fattn.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/fattn-tile-f16.cuda_v12.o llama/ggml-cuda/fattn-tile-f16.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/fattn-tile-f32.cuda_v12.o llama/ggml-cuda/fattn-tile-f32.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqfloat-cpb16.cuda_v12.o llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqfloat-cpb16.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqfloat-cpb32.cuda_v12.o llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqfloat-cpb32.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb16.cuda_v12.o llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb16.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb32.cuda_v12.o llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb32.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb8.cuda_v12.o llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb8.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-q4_0-q4_0.cuda_v12.o llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-q4_0-q4_0.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs128-q4_0-q4_0.cuda_v12.o llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs128-q4_0-q4_0.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-q8_0-q8_0.cuda_v12.o llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-q8_0-q8_0.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs128-q8_0-q8_0.cuda_v12.o llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs128-q8_0-q8_0.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-f16-f16.cuda_v12.o llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-f16-f16.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs256-f16-f16.cuda_v12.o llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs256-f16-f16.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs64-f16-f16.cuda_v12.o llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs64-f16-f16.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs128-f16-f16.cuda_v12.o llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs128-f16-f16.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs256-f16-f16.cuda_v12.o llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs256-f16-f16.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs64-f16-f16.cuda_v12.o llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs64-f16-f16.cu
    /usr/bin/ccache /usr/local/cuda-12/bin/nvcc --shared -L/usr/local/cuda-12/lib64 -lcuda -L./dist/linux-amd64/lib/ollama -lcublas -lcudart -lcublasLt ./llama/build/linux-amd64/llama/ggml-cuda/acc.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/arange.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/argmax.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/argsort.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/binbcast.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/clamp.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/concat.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/convert.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/conv-transpose-1d.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/count-equal.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/cpy.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/cross-entropy-loss.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/diagmask.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/getrows.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/ggml-cuda.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/im2col.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/mmq.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/mmv.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/mmvq.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/norm.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/opt-step-adamw.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/out-prod.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/pad.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/pool2d.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/quantize.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/rope.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/scale.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/softmax.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/sum.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/sumrows.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/tsembd.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/unary.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/upscale.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/wkv6.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq1_s.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq2_s.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq2_xs.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq2_xxs.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq3_s.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq3_xxs.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq4_nl.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq4_xs.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q2_k.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q3_k.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q4_0.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q4_1.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q4_k.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q5_0.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q5_1.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q5_k.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q6_k.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q8_0.cuda_v12.o ./llama/build/linux-amd64/llama/ggml.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-backend.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-alloc.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-quants.cuda_v12.o ./llama/build/linux-amd64/llama/sgemm.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-aarch64.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-threading.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/fattn.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/fattn-tile-f16.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/fattn-tile-f32.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqfloat-cpb16.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqfloat-cpb32.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb16.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb32.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb8.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-q4_0-q4_0.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs128-q4_0-q4_0.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-q8_0-q8_0.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs128-q8_0-q8_0.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-f16-f16.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs256-f16-f16.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs64-f16-f16.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs128-f16-f16.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs256-f16-f16.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs64-f16-f16.cuda_v12.o -o llama/build/linux-amd64/runners/cuda_v12/libggml_cuda_v12.so
    GOARCH=amd64 CGO_LDFLAGS="-L"/usr/local/cuda-12/lib64" -L"/usr/local/cuda-12/lib64/stubs" -L"./llama/build/linux-amd64/runners/cuda_v12/"" go build -buildmode=pie "-ldflags=-w -s "-X=github.com/ollama/ollama/version.Version=0.5.2-rc3-4-g18f6a98" " -trimpath -tags avx2,cuda,cuda_v12 -o llama/build/linux-amd64/runners/cuda_v12/ollama_llama_server ./cmd/runner
    GOARCH=amd64 go build -buildmode=pie "-ldflags=-w -s "-X=github.com/ollama/ollama/version.Version=0.5.2-rc3-4-g18f6a98" " -trimpath -tags avx2 -o ollama .

github.com/ollama/ollama/cmd/runner

/root/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.4.linux-amd64/pkg/tool/linux_amd64/link: running g++ failed: exit status 1
/usr/lib64/ccache/g++ -m64 -s -Wl,-z,relro -pie -o $WORK/b001/exe/a.out -Wl,--export-dynamic-symbol=_cgo_panic -Wl,--export-dynamic-symbol=_cgo_topofstack -Wl,--export-dynamic-symbol=crosscall2 -Wl,--export-dynamic-symbol=llamaLog -Wl,--export-dynamic-symbol=llamaProgressCallback -Wl,--compress-debug-sections=zlib /tmp/go-link-401602149/go.o /tmp/go-link-401602149/000000.o /tmp/go-link-401602149/000001.o /tmp/go-link-401602149/000002.o /tmp/go-link-401602149/000003.o /tmp/go-link-401602149/000004.o /tmp/go-link-401602149/000005.o /tmp/go-link-401602149/000006.o /tmp/go-link-401602149/000007.o /tmp/go-link-401602149/000008.o /tmp/go-link-401602149/000009.o /tmp/go-link-401602149/000010.o /tmp/go-link-401602149/000011.o /tmp/go-link-401602149/000012.o /tmp/go-link-401602149/000013.o /tmp/go-link-401602149/000014.o /tmp/go-link-401602149/000015.o /tmp/go-link-401602149/000016.o /tmp/go-link-401602149/000017.o /tmp/go-link-401602149/000018.o /tmp/go-link-401602149/000019.o /tmp/go-link-401602149/000020.o /tmp/go-link-401602149/000021.o /tmp/go-link-401602149/000022.o /tmp/go-link-401602149/000023.o /tmp/go-link-401602149/000024.o /tmp/go-link-401602149/000025.o /tmp/go-link-401602149/000026.o /tmp/go-link-401602149/000027.o /tmp/go-link-401602149/000028.o /tmp/go-link-401602149/000029.o /tmp/go-link-401602149/000030.o /tmp/go-link-401602149/000031.o /tmp/go-link-401602149/000032.o /tmp/go-link-401602149/000033.o /tmp/go-link-401602149/000034.o /tmp/go-link-401602149/000035.o /tmp/go-link-401602149/000036.o /tmp/go-link-401602149/000037.o /tmp/go-link-401602149/000038.o /tmp/go-link-401602149/000039.o /tmp/go-link-401602149/000040.o /tmp/go-link-401602149/000041.o /tmp/go-link-401602149/000042.o /tmp/go-link-401602149/000043.o /tmp/go-link-401602149/000044.o /tmp/go-link-401602149/000045.o /tmp/go-link-401602149/000046.o /tmp/go-link-401602149/000047.o /tmp/go-link-401602149/000048.o /tmp/go-link-401602149/000049.o /tmp/go-link-401602149/000050.o /tmp/go-link-401602149/000051.o -L/usr/local/cuda-12/lib64 -L/usr/local/cuda-12/lib64/stubs -L./llama/build/linux-amd64/runners/cuda_v12/ -lggml_cuda_v12 -ldl -L/usr/src/ollama/llama/build/linux-amd64 -lcuda -lcudart -lcublas -lcublasLt -lpthread -lrt -lresolv -L/usr/local/cuda-12/lib64 -L/usr/local/cuda-12/lib64/stubs -L./llama/build/linux-amd64/runners/cuda_v12/ -lresolv -L/usr/local/cuda-12/lib64 -L/usr/local/cuda-12/lib64/stubs -L./llama/build/linux-amd64/runners/cuda_v12/ -lpthread
/usr/bin/ld: /tmp/go-link-401602149/000013.o: in function ggml_backend_registry::ggml_backend_registry()': /_/github.com/ollama/ollama/llama/ggml-backend-reg.cpp:164: undefined reference to ggml_backend_cuda_reg'
collect2: error: ld returned 1 exit status

make[1]: *** [make/gpu.make:63: llama/build/linux-amd64/runners/cuda_v12/ollama_llama_server] Error 1
make: *** [Makefile:50: cuda_v12] Error 2
make: *** Waiting for unfinished jobs....
make: *** No targets specified and no makefile found. Stop.
llama/llama.go:3: running "make": exit status 2

github.com/ollama/ollama/llama

ggml-cpu.c: In function ‘ggml_vec_mad_f16’:
ggml-cpu.c:1667:45: warning: passing argument 1 of ‘__sse_f16x4_load’ discards ‘con st’ qualifier from pointer target type [-Wdiscarded-qualifiers]
1667 | ax[j] = GGML_F16_VEC_LOAD(x + i + jGGML_F16_EPR, j);
| ^
ggml-cpu.c:1082:50: note: in definition of macro ‘GGML_F32Cx4_LOAD’
1082 | #define GGML_F32Cx4_LOAD(x) __sse_f16x4_load(x)
| ^
ggml-cpu.c:1667:21: note: in expansion of macro ‘GGML_F16_VEC_LOAD’
1667 | ax[j] = GGML_F16_VEC_LOAD(x + i + j
GGML_F16_EPR, j);
| ^~~~~~~~~~~~~~~~~
ggml-cpu.c:1057:52: note: expected ‘ggml_fp16_t *’ {aka ‘short unsigned int *’} but argument is of type ‘const ggml_fp16_t *’ {aka ‘const short unsigned int *’}
1057 | static inline __m128 __sse_f16x4_load(ggml_fp16_t *x) {
| ~~~~~~~~~~~~~^

<!-- gh-comment-id:2539199668 --> @regularRandom commented on GitHub (Dec 12, 2024): CentOS Stream release 9 6.12.4-1.el9.elrepo.x86_64 gcc --version gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-2) clang --version clang version 19.1.3 (CentOS 19.1.3-1.el9) Target: x86_64-redhat-linux-gnu Thread model: posix InstalledDir: /usr/bin Configuration file: /etc/clang/x86_64-redhat-linux-gnu-clang.cfg Here is my script to build from main: ``` #!/bin/bash OLLAMA_HOME=/opt/ollama OLLAMA_SRC=/usr/src/ollama OLLAMA_DIST=${OLLAMA_SRC}/dist/linux-amd64/lib/ollama export CUDA_ARCHITECTURES=70 export CUSTOM_CPU_FLAGS=avx2 # build echo "Building..." make clean git pull origin main rm -f ${OLLAMA_SRC}/llm/build/linux/x86_64/cuda_v12/CMakeCache.txt rm -rf ${OLLAMA_SRC}/llama/build/linux-amd64 export VERSION=${VERSION:-$(git describe --tags --first-parent --abbrev=7 --long --dirty --always | sed -e "s/^v//g")} export GOFLAGS="'-ldflags=-w -s \"-X=github.com/ollama/ollama/version.Version=$VERSION\" \"-X=github.com/ollama/ollama/server.mode=release\"'" make -j -S go generate ./... go build . ``` Here is the output: > Building... > rm -rf ./llama/build/linux-amd64 ./dist/linux-amd64/lib/ollama ./ollama ./dist/linux-amd64/bin/ollama > go clean -cache > From github.com:ollama/ollama > * branch main -> FETCH_HEAD > Updating b1fd7fef..18f6a98b > Fast-forward > llama/grammar_test.go | 6 +----- > llama/json-schema-to-grammar.cpp | 2 +- > llama/patches/0012-Maintain-ordering-for-rules-for-grammar.patch | 22 ++++++++++++++++++++++ > 3 files changed, 24 insertions(+), 6 deletions(-) > create mode 100644 llama/patches/0012-Maintain-ordering-for-rules-for-grammar.patch > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/acc.cuda_v12.o llama/ggml-cuda/acc.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/arange.cuda_v12.o llama/ggml-cuda/arange.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/argmax.cuda_v12.o llama/ggml-cuda/argmax.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/argsort.cuda_v12.o llama/ggml-cuda/argsort.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/binbcast.cuda_v12.o llama/ggml-cuda/binbcast.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/clamp.cuda_v12.o llama/ggml-cuda/clamp.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/concat.cuda_v12.o llama/ggml-cuda/concat.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/convert.cuda_v12.o llama/ggml-cuda/convert.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/conv-transpose-1d.cuda_v12.o llama/ggml-cuda/conv-transpose-1d.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/count-equal.cuda_v12.o llama/ggml-cuda/count-equal.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/cpy.cuda_v12.o llama/ggml-cuda/cpy.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/cross-entropy-loss.cuda_v12.o llama/ggml-cuda/cross-entropy-loss.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/diagmask.cuda_v12.o llama/ggml-cuda/diagmask.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/getrows.cuda_v12.o llama/ggml-cuda/getrows.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/ggml-cuda.cuda_v12.o llama/ggml-cuda/ggml-cuda.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/im2col.cuda_v12.o llama/ggml-cuda/im2col.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/mmq.cuda_v12.o llama/ggml-cuda/mmq.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/mmv.cuda_v12.o llama/ggml-cuda/mmv.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/mmvq.cuda_v12.o llama/ggml-cuda/mmvq.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/norm.cuda_v12.o llama/ggml-cuda/norm.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/opt-step-adamw.cuda_v12.o llama/ggml-cuda/opt-step-adamw.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/out-prod.cuda_v12.o llama/ggml-cuda/out-prod.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/pad.cuda_v12.o llama/ggml-cuda/pad.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/pool2d.cuda_v12.o llama/ggml-cuda/pool2d.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/quantize.cuda_v12.o llama/ggml-cuda/quantize.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/rope.cuda_v12.o llama/ggml-cuda/rope.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/scale.cuda_v12.o llama/ggml-cuda/scale.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/softmax.cuda_v12.o llama/ggml-cuda/softmax.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/sum.cuda_v12.o llama/ggml-cuda/sum.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/sumrows.cuda_v12.o llama/ggml-cuda/sumrows.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/tsembd.cuda_v12.o llama/ggml-cuda/tsembd.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/unary.cuda_v12.o llama/ggml-cuda/unary.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/upscale.cuda_v12.o llama/ggml-cuda/upscale.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/wkv6.cuda_v12.o llama/ggml-cuda/wkv6.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq1_s.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-iq1_s.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq2_s.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-iq2_s.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq2_xs.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-iq2_xs.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq2_xxs.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-iq2_xxs.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq3_s.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-iq3_s.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq3_xxs.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-iq3_xxs.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq4_nl.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-iq4_nl.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq4_xs.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-iq4_xs.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q2_k.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-q2_k.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q3_k.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-q3_k.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q4_0.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-q4_0.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q4_1.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-q4_1.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q4_k.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-q4_k.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q5_0.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-q5_0.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q5_1.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-q5_1.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q5_k.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-q5_k.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q6_k.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-q6_k.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q8_0.cuda_v12.o llama/ggml-cuda/template-instances/mmq-instance-q8_0.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -o llama/build/linux-amd64/llama/ggml.cuda_v12.o llama/ggml.c > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -o llama/build/linux-amd64/llama/ggml-backend.cuda_v12.o llama/ggml-backend.cpp > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -o llama/build/linux-amd64/llama/ggml-alloc.cuda_v12.o llama/ggml-alloc.c > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -o llama/build/linux-amd64/llama/ggml-quants.cuda_v12.o llama/ggml-quants.c > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -o llama/build/linux-amd64/llama/sgemm.cuda_v12.o llama/sgemm.cpp > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -o llama/build/linux-amd64/llama/ggml-aarch64.cuda_v12.o llama/ggml-aarch64.c > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -o llama/build/linux-amd64/llama/ggml-threading.cuda_v12.o llama/ggml-threading.cpp > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/fattn.cuda_v12.o llama/ggml-cuda/fattn.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/fattn-tile-f16.cuda_v12.o llama/ggml-cuda/fattn-tile-f16.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/fattn-tile-f32.cuda_v12.o llama/ggml-cuda/fattn-tile-f32.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqfloat-cpb16.cuda_v12.o llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqfloat-cpb16.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqfloat-cpb32.cuda_v12.o llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqfloat-cpb32.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb16.cuda_v12.o llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb16.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb32.cuda_v12.o llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb32.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb8.cuda_v12.o llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb8.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-q4_0-q4_0.cuda_v12.o llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-q4_0-q4_0.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs128-q4_0-q4_0.cuda_v12.o llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs128-q4_0-q4_0.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-q8_0-q8_0.cuda_v12.o llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-q8_0-q8_0.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs128-q8_0-q8_0.cuda_v12.o llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs128-q8_0-q8_0.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-f16-f16.cuda_v12.o llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-f16-f16.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs256-f16-f16.cuda_v12.o llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs256-f16-f16.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs64-f16-f16.cuda_v12.o llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs64-f16-f16.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs128-f16-f16.cuda_v12.o llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs128-f16-f16.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs256-f16-f16.cuda_v12.o llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs256-f16-f16.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc -c -Xcompiler -fPIC -D_GNU_SOURCE -fPIC -Wno-unused-function -std=c++11 -Xcompiler "-mavx2" -t2 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUDA=1 -DGGML_SHARED=1 -DGGML_BACKEND_SHARED=1 -DGGML_BUILD=1 -DGGML_BACKEND_BUILD=1 -DGGML_USE_LLAMAFILE -DK_QUANTS_PER_ITERATION=2 -DNDEBUG -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Wno-deprecated-gpu-targets --forward-unknown-to-host-compiler -use_fast_math -I./llama/ -O3 --generate-code=arch=compute_70,code=[compute_70,sm_70] -DGGML_CUDA_USE_GRAPHS=1 -o llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs64-f16-f16.cuda_v12.o llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs64-f16-f16.cu > /usr/bin/ccache /usr/local/cuda-12/bin/nvcc --shared -L/usr/local/cuda-12/lib64 -lcuda -L./dist/linux-amd64/lib/ollama -lcublas -lcudart -lcublasLt ./llama/build/linux-amd64/llama/ggml-cuda/acc.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/arange.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/argmax.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/argsort.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/binbcast.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/clamp.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/concat.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/convert.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/conv-transpose-1d.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/count-equal.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/cpy.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/cross-entropy-loss.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/diagmask.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/getrows.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/ggml-cuda.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/im2col.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/mmq.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/mmv.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/mmvq.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/norm.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/opt-step-adamw.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/out-prod.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/pad.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/pool2d.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/quantize.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/rope.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/scale.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/softmax.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/sum.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/sumrows.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/tsembd.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/unary.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/upscale.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/wkv6.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq1_s.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq2_s.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq2_xs.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq2_xxs.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq3_s.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq3_xxs.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq4_nl.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-iq4_xs.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q2_k.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q3_k.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q4_0.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q4_1.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q4_k.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q5_0.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q5_1.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q5_k.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q6_k.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/mmq-instance-q8_0.cuda_v12.o ./llama/build/linux-amd64/llama/ggml.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-backend.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-alloc.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-quants.cuda_v12.o ./llama/build/linux-amd64/llama/sgemm.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-aarch64.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-threading.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/fattn.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/fattn-tile-f16.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/fattn-tile-f32.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqfloat-cpb16.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqfloat-cpb32.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb16.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb32.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb8.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-q4_0-q4_0.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs128-q4_0-q4_0.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-q8_0-q8_0.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs128-q8_0-q8_0.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-f16-f16.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs256-f16-f16.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f16-instance-hs64-f16-f16.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs128-f16-f16.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs256-f16-f16.cuda_v12.o ./llama/build/linux-amd64/llama/ggml-cuda/template-instances/fattn-vec-f32-instance-hs64-f16-f16.cuda_v12.o -o llama/build/linux-amd64/runners/cuda_v12/libggml_cuda_v12.so > GOARCH=amd64 CGO_LDFLAGS="-L"/usr/local/cuda-12/lib64" -L"/usr/local/cuda-12/lib64/stubs" -L"./llama/build/linux-amd64/runners/cuda_v12/"" go build -buildmode=pie "-ldflags=-w -s \"-X=github.com/ollama/ollama/version.Version=0.5.2-rc3-4-g18f6a98\" " -trimpath -tags avx2,cuda,cuda_v12 -o llama/build/linux-amd64/runners/cuda_v12/ollama_llama_server ./cmd/runner > GOARCH=amd64 go build -buildmode=pie "-ldflags=-w -s \"-X=github.com/ollama/ollama/version.Version=0.5.2-rc3-4-g18f6a98\" " -trimpath -tags avx2 -o ollama . > # github.com/ollama/ollama/cmd/runner > /root/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.4.linux-amd64/pkg/tool/linux_amd64/link: running g++ failed: exit status 1 > /usr/lib64/ccache/g++ -m64 -s -Wl,-z,relro -pie -o $WORK/b001/exe/a.out -Wl,--export-dynamic-symbol=_cgo_panic -Wl,--export-dynamic-symbol=_cgo_topofstack -Wl,--export-dynamic-symbol=crosscall2 -Wl,--export-dynamic-symbol=llamaLog -Wl,--export-dynamic-symbol=llamaProgressCallback -Wl,--compress-debug-sections=zlib /tmp/go-link-401602149/go.o /tmp/go-link-401602149/000000.o /tmp/go-link-401602149/000001.o /tmp/go-link-401602149/000002.o /tmp/go-link-401602149/000003.o /tmp/go-link-401602149/000004.o /tmp/go-link-401602149/000005.o /tmp/go-link-401602149/000006.o /tmp/go-link-401602149/000007.o /tmp/go-link-401602149/000008.o /tmp/go-link-401602149/000009.o /tmp/go-link-401602149/000010.o /tmp/go-link-401602149/000011.o /tmp/go-link-401602149/000012.o /tmp/go-link-401602149/000013.o /tmp/go-link-401602149/000014.o /tmp/go-link-401602149/000015.o /tmp/go-link-401602149/000016.o /tmp/go-link-401602149/000017.o /tmp/go-link-401602149/000018.o /tmp/go-link-401602149/000019.o /tmp/go-link-401602149/000020.o /tmp/go-link-401602149/000021.o /tmp/go-link-401602149/000022.o /tmp/go-link-401602149/000023.o /tmp/go-link-401602149/000024.o /tmp/go-link-401602149/000025.o /tmp/go-link-401602149/000026.o /tmp/go-link-401602149/000027.o /tmp/go-link-401602149/000028.o /tmp/go-link-401602149/000029.o /tmp/go-link-401602149/000030.o /tmp/go-link-401602149/000031.o /tmp/go-link-401602149/000032.o /tmp/go-link-401602149/000033.o /tmp/go-link-401602149/000034.o /tmp/go-link-401602149/000035.o /tmp/go-link-401602149/000036.o /tmp/go-link-401602149/000037.o /tmp/go-link-401602149/000038.o /tmp/go-link-401602149/000039.o /tmp/go-link-401602149/000040.o /tmp/go-link-401602149/000041.o /tmp/go-link-401602149/000042.o /tmp/go-link-401602149/000043.o /tmp/go-link-401602149/000044.o /tmp/go-link-401602149/000045.o /tmp/go-link-401602149/000046.o /tmp/go-link-401602149/000047.o /tmp/go-link-401602149/000048.o /tmp/go-link-401602149/000049.o /tmp/go-link-401602149/000050.o /tmp/go-link-401602149/000051.o -L/usr/local/cuda-12/lib64 -L/usr/local/cuda-12/lib64/stubs -L./llama/build/linux-amd64/runners/cuda_v12/ -lggml_cuda_v12 -ldl -L/usr/src/ollama/llama/build/linux-amd64 -lcuda -lcudart -lcublas -lcublasLt -lpthread -lrt -lresolv -L/usr/local/cuda-12/lib64 -L/usr/local/cuda-12/lib64/stubs -L./llama/build/linux-amd64/runners/cuda_v12/ -lresolv -L/usr/local/cuda-12/lib64 -L/usr/local/cuda-12/lib64/stubs -L./llama/build/linux-amd64/runners/cuda_v12/ -lpthread > /usr/bin/ld: /tmp/go-link-401602149/000013.o: in function `ggml_backend_registry::ggml_backend_registry()': > /_/github.com/ollama/ollama/llama/ggml-backend-reg.cpp:164: undefined reference to `ggml_backend_cuda_reg' > collect2: error: ld returned 1 exit status > > make[1]: *** [make/gpu.make:63: llama/build/linux-amd64/runners/cuda_v12/ollama_llama_server] Error 1 > make: *** [Makefile:50: cuda_v12] Error 2 > make: *** Waiting for unfinished jobs.... > make: *** No targets specified and no makefile found. Stop. > llama/llama.go:3: running "make": exit status 2 > # github.com/ollama/ollama/llama > ggml-cpu.c: In function ‘ggml_vec_mad_f16’: > ggml-cpu.c:1667:45: warning: passing argument 1 of ‘__sse_f16x4_load’ discards ‘con st’ qualifier from pointer target type [-Wdiscarded-qualifiers] > 1667 | ax[j] = GGML_F16_VEC_LOAD(x + i + j*GGML_F16_EPR, j); > | ^ > ggml-cpu.c:1082:50: note: in definition of macro ‘GGML_F32Cx4_LOAD’ > 1082 | #define GGML_F32Cx4_LOAD(x) __sse_f16x4_load(x) > | ^ > ggml-cpu.c:1667:21: note: in expansion of macro ‘GGML_F16_VEC_LOAD’ > 1667 | ax[j] = GGML_F16_VEC_LOAD(x + i + j*GGML_F16_EPR, j); > | ^~~~~~~~~~~~~~~~~ > ggml-cpu.c:1057:52: note: expected ‘ggml_fp16_t *’ {aka ‘short unsigned int *’} but argument is of type ‘const ggml_fp16_t *’ {aka ‘const short unsigned int *’} > 1057 | static inline __m128 __sse_f16x4_load(ggml_fp16_t *x) { > | ~~~~~~~~~~~~~^ >
Author
Owner

@regularRandom commented on GitHub (Dec 14, 2024):

Any update please?

Release 0.5.2 also fails with the same error.

<!-- gh-comment-id:2542587434 --> @regularRandom commented on GitHub (Dec 14, 2024): Any update please? Release 0.5.2 also fails with the same error.
Author
Owner

@regularRandom commented on GitHub (Dec 14, 2024):

I figured out, issue can be closed. I made cleanup in the local repository, rewrote my build scripts and that's it.

<!-- gh-comment-id:2542775994 --> @regularRandom commented on GitHub (Dec 14, 2024): I figured out, issue can be closed. I made cleanup in the local repository, rewrote my build scripts and that's it.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#51651