[GH-ISSUE #1979] Unable to get Ollama to utilize GPU on Jetson Orin Nano 8Gb #47654

New Issue

https://github.com/remy415/ollama.git

@remy415 commented on GitHub (Feb 3, 2024):

@remy415 commented on GitHub (Feb 3, 2024): https://github.com/remy415/ollama.git

GiteaMirror commented

@remy415 commented on GitHub (Feb 3, 2024):

I need more info about your setup and what repo you’re using

@remy415 commented on GitHub (Feb 3, 2024): I need more info about your setup and what repo you’re using

GiteaMirror commented

@Q-point commented on GitHub (Feb 3, 2024):

@remy415
I'm using your repo; Simply followed your instructions on https://github.com/remy415/ollama_tegra_fix

Jetson AGX orin 64Gb
go version go1.21.6 linux/arm64
cmake version 3.28.2

Regarding the comment in bold: Also ensure: IMPORTANT -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc is in the llm/generate/gen_linux.sh file under CUBLAS;

That is already included under:

        TEGRA_COMPILER="-DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc"
        TEGRA_CUDA_DEFS="-DCMAKE_CUDA_STANDARD=17 -DLLAMA_CUBLAS=on -DLLAMA_CUDA_FORCE_MMQ=on  -DLLAMA_CUDA_F16=1"

@Q-point commented on GitHub (Feb 3, 2024): @remy415 I'm using your repo; Simply followed your instructions on https://github.com/remy415/ollama_tegra_fix - Jetson AGX orin 64Gb - go version go1.21.6 linux/arm64 - cmake version 3.28.2 Regarding the comment in bold: Also ensure: IMPORTANT -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc is in the llm/generate/gen_linux.sh file under CUBLAS; That is already included under: ``` TEGRA_COMPILER="-DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc" TEGRA_CUDA_DEFS="-DCMAKE_CUDA_STANDARD=17 -DLLAMA_CUBLAS=on -DLLAMA_CUDA_FORCE_MMQ=on -DLLAMA_CUDA_F16=1" ```

GiteaMirror commented

2026-04-28 04:43:16 -05:00

@remy415 commented on GitHub (Feb 3, 2024):

Yes, I created the guide before I made the package. I need to update the guide. If you’re running Jetpack 5, it should work if you clone the repo and install as is. I am currently working back my edits to make it align with the default install. I think if you ensure your ld_library_path is set it should work

@remy415 commented on GitHub (Feb 3, 2024): Yes, I created the guide before I made the package. I need to update the guide. If you’re running Jetpack 5, it should work if you clone the repo and install as is. I am currently working back my edits to make it align with the default install. I think if you ensure your ld_library_path is set it should work

GiteaMirror commented

@remy415 commented on GitHub (Feb 3, 2024):

@Q-point use this repo in an empty folder, it’s the whole package https://github.com/remy415/ollama.git

@remy415 commented on GitHub (Feb 3, 2024): @Q-point use this repo in an empty folder, it’s the whole package https://github.com/remy415/ollama.git

GiteaMirror commented

2026-04-28 04:43:16 -05:00

@Q-point commented on GitHub (Feb 3, 2024):

OK, just got it working with that repo. https://github.com/remy415/ollama.git

@Q-point commented on GitHub (Feb 3, 2024): OK, just got it working with that repo. https://github.com/remy415/ollama.git

GiteaMirror commented

@remy415 commented on GitHub (Feb 3, 2024):

Okay, let me know if the GPU acceleration works 😃

@remy415 commented on GitHub (Feb 3, 2024): Okay, let me know if the GPU acceleration works 😃

GiteaMirror commented

@Q-point commented on GitHub (Feb 3, 2024):

@Q-point commented on GitHub (Feb 3, 2024): ![image](https://github.com/ollama/ollama/assets/5604553/2475fc4f-9164-436c-ac84-58c07783bcd5)

GiteaMirror commented

@remy415 commented on GitHub (Feb 3, 2024):

Awesome! Now I kinda wish I got an AGX instead of 4 Orin Nanos 😂🤣

@remy415 commented on GitHub (Feb 3, 2024): Awesome! Now I kinda wish I got an AGX instead of 4 Orin Nanos 😂🤣

GiteaMirror commented

2026-04-28 04:43:18 -05:00

@dhiltgen commented on GitHub (Feb 5, 2024):

@dhiltgen I don't know how you want to approach incorporating this into Ollama for Jetson users, whether you want to incorporate it into the main branch or offer it as a patch or something.

~~@remy415 could you open a draft PR with your WIP so we can take a look and provide feedback?~~

scratch that, you already did - just missed it.

@dhiltgen commented on GitHub (Feb 5, 2024): > @dhiltgen I don't know how you want to approach incorporating this into Ollama for Jetson users, whether you want to incorporate it into the main branch or offer it as a patch or something. ~~@remy415 could you open a draft PR with your WIP so we can take a look and provide feedback?~~ scratch that, you already did - just missed it.

GiteaMirror commented

@remy415 commented on GitHub (Feb 5, 2024):

@dhiltgen I don't know how you want to approach incorporating this into Ollama for Jetson users, whether you want to incorporate it into the main branch or offer it as a patch or something.

@remy415 could you open a draft PR with your WIP so we can take a look and provide feedback?

@dhiltgen
https://github.com/ollama/ollama/pull/2279

@remy415 commented on GitHub (Feb 5, 2024): > > @dhiltgen I don't know how you want to approach incorporating this into Ollama for Jetson users, whether you want to incorporate it into the main branch or offer it as a patch or something. > > @remy415 could you open a draft PR with your WIP so we can take a look and provide feedback? @dhiltgen https://github.com/ollama/ollama/pull/2279

GiteaMirror commented

2026-04-28 04:43:18 -05:00

@jhkuperus commented on GitHub (Feb 15, 2024):

I was reading along through this thread yesterday when I received my Jetson AGX Devkit. I couldn't get Ollama to use the CUDA-cores and then subsequently bricked my Jetson when upgrading stuff and trying to compile the version from #2279 .

Today I reflashed and upgraded everything on the Jetson, managed to get the version from @remy415's PR compiling and working when I start it as root. However, when I try to start it as a system service, it fails with a permission denied when trying to load the CUDA libraries. Here's the relevant output:

Feb 15 12:26:22 ubuntu ollama[5728]: time=2024-02-15T12:26:22.748Z level=INFO source=gpu.go:133 msg="Detecting GPU type"
Feb 15 12:26:22 ubuntu ollama[5728]: time=2024-02-15T12:26:22.748Z level=INFO source=gpu.go:317 msg="Searching for GPU management library libcudart.so"
Feb 15 12:26:22 ubuntu ollama[5728]: time=2024-02-15T12:26:22.752Z level=INFO source=gpu.go:363 msg="Discovered GPU libraries: [/usr/local/cuda/lib64/libcudart.so.12.2.140 /usr/local/cuda-12/lib64/libcudart.so.12.2.140 /usr/local/cuda-12.2/lib64/libcudart.so.12.2.140]"
Feb 15 12:26:22 ubuntu ollama[5728]: NvRmMemInitNvmap failed with Permission denied
Feb 15 12:26:22 ubuntu ollama[5728]: 356: Memory Manager Not supported
Feb 15 12:26:22 ubuntu ollama[5728]: ****NvRmMemMgrInit failed**** error type: 196626
Feb 15 12:26:22 ubuntu ollama[5728]: time=2024-02-15T12:26:22.756Z level=INFO source=gpu.go:375 msg="Unable to load CUDA management library /usr/local/cuda/lib64/libcudart.so.12.2.140: cudart vram init failure: 999"
Feb 15 12:26:22 ubuntu ollama[5728]: NvRmMemInitNvmap failed with Permission denied
Feb 15 12:26:22 ubuntu ollama[5728]: 356: Memory Manager Not supported
Feb 15 12:26:22 ubuntu ollama[5728]: ****NvRmMemMgrInit failed**** error type: 196626
Feb 15 12:26:22 ubuntu ollama[5728]: time=2024-02-15T12:26:22.758Z level=INFO source=gpu.go:375 msg="Unable to load CUDA management library /usr/local/cuda-12/lib64/libcudart.so.12.2.140: cudart vram init failure: 999"
Feb 15 12:26:22 ubuntu ollama[5728]: NvRmMemInitNvmap failed with Permission denied
Feb 15 12:26:22 ubuntu ollama[5728]: 356: Memory Manager Not supported
Feb 15 12:26:22 ubuntu ollama[5728]: ****NvRmMemMgrInit failed**** error type: 196626
Feb 15 12:26:22 ubuntu ollama[5728]: time=2024-02-15T12:26:22.759Z level=INFO source=gpu.go:375 msg="Unable to load CUDA management library /usr/local/cuda-12.2/lib64/libcudart.so.12.2.140: cudart vram init failure: 999"
Feb 15 12:26:22 ubuntu ollama[5728]: time=2024-02-15T12:26:22.759Z level=INFO source=gpu.go:317 msg="Searching for GPU management library librocm_smi64.so"
Feb 15 12:26:22 ubuntu ollama[5728]: time=2024-02-15T12:26:22.760Z level=INFO source=gpu.go:363 msg="Discovered GPU libraries: []"
Feb 15 12:26:22 ubuntu ollama[5728]: time=2024-02-15T12:26:22.760Z level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions"
Feb 15 12:26:22 ubuntu ollama[5728]: time=2024-02-15T12:26:22.760Z level=INFO source=routes.go:1036 msg="no GPU detected"

What am I missing here? The files in /usr/local/cuda/lib64 look like they have permissions that are set just fine:

lrwxrwxrwx 1 root root        15 Aug 16  2023 libcudart.so -> libcudart.so.12
lrwxrwxrwx 1 root root        21 Aug 16  2023 libcudart.so.12 -> libcudart.so.12.2.140
-rw-r--r-- 1 root root    747424 Aug 16  2023 libcudart.so.12.2.140
-rw-r--r-- 1 root root   1403952 Aug 16  2023 libcudart_static.a

@jhkuperus commented on GitHub (Feb 15, 2024): I was reading along through this thread yesterday when I received my Jetson AGX Devkit. I couldn't get Ollama to use the CUDA-cores and then subsequently bricked my Jetson when upgrading stuff and trying to compile the version from #2279 . Today I reflashed and upgraded everything on the Jetson, managed to get the version from @remy415's PR compiling and working when I start it as root. However, when I try to start it as a system service, it fails with a permission denied when trying to load the CUDA libraries. Here's the relevant output: ``` Feb 15 12:26:22 ubuntu ollama[5728]: time=2024-02-15T12:26:22.748Z level=INFO source=gpu.go:133 msg="Detecting GPU type" Feb 15 12:26:22 ubuntu ollama[5728]: time=2024-02-15T12:26:22.748Z level=INFO source=gpu.go:317 msg="Searching for GPU management library libcudart.so" Feb 15 12:26:22 ubuntu ollama[5728]: time=2024-02-15T12:26:22.752Z level=INFO source=gpu.go:363 msg="Discovered GPU libraries: [/usr/local/cuda/lib64/libcudart.so.12.2.140 /usr/local/cuda-12/lib64/libcudart.so.12.2.140 /usr/local/cuda-12.2/lib64/libcudart.so.12.2.140]" Feb 15 12:26:22 ubuntu ollama[5728]: NvRmMemInitNvmap failed with Permission denied Feb 15 12:26:22 ubuntu ollama[5728]: 356: Memory Manager Not supported Feb 15 12:26:22 ubuntu ollama[5728]: ****NvRmMemMgrInit failed**** error type: 196626 Feb 15 12:26:22 ubuntu ollama[5728]: time=2024-02-15T12:26:22.756Z level=INFO source=gpu.go:375 msg="Unable to load CUDA management library /usr/local/cuda/lib64/libcudart.so.12.2.140: cudart vram init failure: 999" Feb 15 12:26:22 ubuntu ollama[5728]: NvRmMemInitNvmap failed with Permission denied Feb 15 12:26:22 ubuntu ollama[5728]: 356: Memory Manager Not supported Feb 15 12:26:22 ubuntu ollama[5728]: ****NvRmMemMgrInit failed**** error type: 196626 Feb 15 12:26:22 ubuntu ollama[5728]: time=2024-02-15T12:26:22.758Z level=INFO source=gpu.go:375 msg="Unable to load CUDA management library /usr/local/cuda-12/lib64/libcudart.so.12.2.140: cudart vram init failure: 999" Feb 15 12:26:22 ubuntu ollama[5728]: NvRmMemInitNvmap failed with Permission denied Feb 15 12:26:22 ubuntu ollama[5728]: 356: Memory Manager Not supported Feb 15 12:26:22 ubuntu ollama[5728]: ****NvRmMemMgrInit failed**** error type: 196626 Feb 15 12:26:22 ubuntu ollama[5728]: time=2024-02-15T12:26:22.759Z level=INFO source=gpu.go:375 msg="Unable to load CUDA management library /usr/local/cuda-12.2/lib64/libcudart.so.12.2.140: cudart vram init failure: 999" Feb 15 12:26:22 ubuntu ollama[5728]: time=2024-02-15T12:26:22.759Z level=INFO source=gpu.go:317 msg="Searching for GPU management library librocm_smi64.so" Feb 15 12:26:22 ubuntu ollama[5728]: time=2024-02-15T12:26:22.760Z level=INFO source=gpu.go:363 msg="Discovered GPU libraries: []" Feb 15 12:26:22 ubuntu ollama[5728]: time=2024-02-15T12:26:22.760Z level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions" Feb 15 12:26:22 ubuntu ollama[5728]: time=2024-02-15T12:26:22.760Z level=INFO source=routes.go:1036 msg="no GPU detected" ``` What am I missing here? The files in `/usr/local/cuda/lib64` look like they have permissions that are set just fine: ``` lrwxrwxrwx 1 root root 15 Aug 16 2023 libcudart.so -> libcudart.so.12 lrwxrwxrwx 1 root root 21 Aug 16 2023 libcudart.so.12 -> libcudart.so.12.2.140 -rw-r--r-- 1 root root 747424 Aug 16 2023 libcudart.so.12.2.140 -rw-r--r-- 1 root root 1403952 Aug 16 2023 libcudart_static.a ```

GiteaMirror commented

2026-04-28 04:43:20 -05:00

@remy415 commented on GitHub (Feb 15, 2024):

@jhkuperus which Jetson AGX do you have? What version of L4T/Jetpack? I see you’re running CUDA 12, did you install that manually? Jetson prior to Jetpack 6 comes with CUDA 11.4, and only the Orin series even supports Jetpack 6. Also JP6 is in beta, if you’re using it I recommend reflashing JP 5

Disregard, see below.

Note that if you have continuous issues with JP6, I would recommend reflashing JP5 since JP6 is still under "developer preview" and isn't a full release yet.

@remy415 commented on GitHub (Feb 15, 2024): ~~@jhkuperus which Jetson AGX do you have? What version of L4T/Jetpack? I see you’re running CUDA 12, did you install that manually? Jetson prior to Jetpack 6 comes with CUDA 11.4, and only the Orin series even supports Jetpack 6. Also JP6 is in beta, if you’re using it I recommend reflashing JP 5~~ Disregard, see below. Note that if you have continuous issues with JP6, I would recommend reflashing JP5 since JP6 is still under "developer preview" and isn't a full release yet.

GiteaMirror commented

2026-04-28 04:43:21 -05:00

@remy415 commented on GitHub (Feb 15, 2024):

@jhkuperus

... working when I start it as root. However, when I try to start it as a system service, it fails with a permission denied when trying to load the CUDA libraries.

oh I missed this part. Also I poked more into the errors being displayed and I'm frequently seeing one of two things pop up:

Reboot the system with sudo init 6 or whichever mechanism you prefer
Reload the nvidia kernel modules. On my Jetson Orin Nano, the only modules preset are 'nvidia'. You can check this with sudo lsmod | grep -i nvidia on your system. Note the -r in sudo modprobe -r has the same effect as rmmod command.

sudo modprobe -r nvidia
sudo modprobe nvidia

Other systems referenced nvidia_uvm, I didn't have this module on my system but if it's present on yours you can use the same commands:

sudo modprobe -r nvidia_uvm
sudo modprobe nvidia_uvm

Let me know if any of this works for you.

@remy415 commented on GitHub (Feb 15, 2024): @jhkuperus > ... working when I start it as root. However, when I try to start it as a system service, it fails with a permission denied when trying to load the CUDA libraries. oh I missed this part. Also I poked more into the errors being displayed and I'm frequently seeing one of two things pop up: 1. Reboot the system with `sudo init 6` or whichever mechanism you prefer 2. Reload the nvidia kernel modules. On my Jetson Orin Nano, the only modules preset are 'nvidia'. You can check this with `sudo lsmod | grep -i nvidia` on your system. Note the `-r` in `sudo modprobe -r` has the same effect as `rmmod` command. ``` sudo modprobe -r nvidia sudo modprobe nvidia ``` Other systems referenced `nvidia_uvm`, I didn't have this module on my system but if it's present on yours you can use the same commands: ``` sudo modprobe -r nvidia_uvm sudo modprobe nvidia_uvm ``` Let me know if any of this works for you.

GiteaMirror commented

2026-04-28 04:43:21 -05:00

@jhkuperus commented on GitHub (Feb 15, 2024):

These are the details of my Jetson device:

Device used to test:
Jetson AGX Orin Developer Kit 64GB
Jetpack 6.0DP, L4T 36.2.0
CUDA 12.2.140
CUDA Capability Supported 8.7
Go version 1.21.6
Cmake 3.22.1
nvcc 12.2.140

I received it yesterday and I couldn't get compilation to work on the pre-flashed JP5. So I tried to upgrade the system, but that broke more things than it made better. Today I reflashed the device with JP6 DP. Compilation went smoothly and it runs perfectly when I run it as root, or as my normal user. If I try to start it as the ollama-user, I get permission denied.

This is the output of my lsmod for nvidia-modules, I don't see a problem here:

nvidia_modeset       1253376  3
nvidia               1454080  7 nvidia_modeset
nvidia_vrs_pseq        16384  0
tegra_dce              98304  2 nvidia
tsecriscv              28672  1 nvidia
host1x_nvhost          40960  10 nvhost_isp5,nvhost_nvcsi_t194,nvidia,tegra_camera,nvhost_nvdla,nvhost_capture,nvhost_nvcsi,nvhost_pva,nvhost_vi5,nvidia_modeset
mc_utils               16384  3 nvidia,nvgpu,tegra_camera_platform
host1x_next           180224  8 tegra_drm_next,host1x_nvhost,host1x_fence,tegra_se,nvgpu,nvhost_nvdla,nvhost_pva,nvidia_modeset
drm                   602112  12 tegra_drm_next,drm_kms_helper,nvidia

I'm thinking maybe it's a permission problem on a device, or socket or something like that?

@jhkuperus commented on GitHub (Feb 15, 2024): These are the details of my Jetson device: Device used to test: Jetson AGX Orin Developer Kit 64GB Jetpack 6.0DP, L4T 36.2.0 CUDA 12.2.140 CUDA Capability Supported 8.7 Go version 1.21.6 Cmake 3.22.1 nvcc 12.2.140 I received it yesterday and I couldn't get compilation to work on the pre-flashed JP5. So I tried to upgrade the system, but that broke more things than it made better. Today I reflashed the device with JP6 DP. Compilation went smoothly and it runs perfectly when I run it as root, or as my normal user. If I try to start it as the `ollama`-user, I get permission denied. This is the output of my `lsmod` for `nvidia`-modules, I don't see a problem here: ``` nvidia_modeset 1253376 3 nvidia 1454080 7 nvidia_modeset nvidia_vrs_pseq 16384 0 tegra_dce 98304 2 nvidia tsecriscv 28672 1 nvidia host1x_nvhost 40960 10 nvhost_isp5,nvhost_nvcsi_t194,nvidia,tegra_camera,nvhost_nvdla,nvhost_capture,nvhost_nvcsi,nvhost_pva,nvhost_vi5,nvidia_modeset mc_utils 16384 3 nvidia,nvgpu,tegra_camera_platform host1x_next 180224 8 tegra_drm_next,host1x_nvhost,host1x_fence,tegra_se,nvgpu,nvhost_nvdla,nvhost_pva,nvidia_modeset drm 602112 12 tegra_drm_next,drm_kms_helper,nvidia ``` I'm thinking maybe it's a permission problem on a device, or socket or something like that?

GiteaMirror commented

2026-04-28 04:43:21 -05:00

@remy415 commented on GitHub (Feb 15, 2024):

I didn't realize you were running it as a separate user. I'm assuming you added the user manually? May I see the contents of your service file please?

@remy415 commented on GitHub (Feb 15, 2024): I didn't realize you were running it as a separate user. I'm assuming you added the user manually? May I see the contents of your service file please?

GiteaMirror commented

2026-04-28 04:43:22 -05:00

@jhkuperus commented on GitHub (Feb 15, 2024):

Hah! Just figured it out. The problem is actually quite simple: the ollama user had to be a member of the video group to fix it.

This user is added by the installation script provided by the Ollama repository itself. Maybe this is something we can still add to your PR?

@jhkuperus commented on GitHub (Feb 15, 2024): Hah! Just figured it out. The problem is actually quite simple: the `ollama` user had to be a member of the `video` group to fix it. This user is added by the installation script provided by the Ollama repository itself. Maybe this is something we can still add to your PR?

GiteaMirror commented

2026-04-28 04:43:22 -05:00

@remy415 commented on GitHub (Feb 15, 2024):

The official script adds the ollama user to the render group. If/when the PR is merged with the main branch, the rest should automatically work as the PR only affects which shared libraries are loaded when the binary is executed.
From the source code

configure_systemd() {
    if ! id ollama >/dev/null 2>&1; then
        status "Creating ollama user..."
        $SUDO useradd -r -s /bin/false -m -d /usr/share/ollama ollama
    fi
    if getent group render >/dev/null 2>&1; then
        status "Adding ollama user to render group..."
        $SUDO usermod -a -G render ollama
    fi

    status "Adding current user to ollama group..."
    $SUDO usermod -a -G ollama $(whoami)

    status "Creating ollama systemd service..."
    cat <<EOF | $SUDO tee /etc/systemd/system/ollama.service >/dev/null
[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=$BINDIR/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=$PATH"

[Install]
WantedBy=default.target
EOF
    SYSTEMCTL_RUNNING="$(systemctl is-system-running || true)"
    case $SYSTEMCTL_RUNNING in
        running|degraded)
            status "Enabling and starting ollama service..."
            $SUDO systemctl daemon-reload
            $SUDO systemctl enable ollama

            start_service() { $SUDO systemctl restart ollama; }
            trap start_service EXIT
            ;;
    esac
}

@remy415 commented on GitHub (Feb 15, 2024): The official script adds the `ollama` user to the `render` group. If/when the PR is merged with the main branch, the rest should automatically work as the PR only affects which shared libraries are loaded when the binary is executed. [From the source code](https://github.com/ollama/ollama/blob/main/scripts/install.sh) ``` configure_systemd() { if ! id ollama >/dev/null 2>&1; then status "Creating ollama user..." $SUDO useradd -r -s /bin/false -m -d /usr/share/ollama ollama fi if getent group render >/dev/null 2>&1; then status "Adding ollama user to render group..." $SUDO usermod -a -G render ollama fi status "Adding current user to ollama group..." $SUDO usermod -a -G ollama $(whoami) status "Creating ollama systemd service..." cat <<EOF | $SUDO tee /etc/systemd/system/ollama.service >/dev/null [Unit] Description=Ollama Service After=network-online.target [Service] ExecStart=$BINDIR/ollama serve User=ollama Group=ollama Restart=always RestartSec=3 Environment="PATH=$PATH" [Install] WantedBy=default.target EOF SYSTEMCTL_RUNNING="$(systemctl is-system-running || true)" case $SYSTEMCTL_RUNNING in running|degraded) status "Enabling and starting ollama service..." $SUDO systemctl daemon-reload $SUDO systemctl enable ollama start_service() { $SUDO systemctl restart ollama; } trap start_service EXIT ;; esac } ```

GiteaMirror commented

2026-04-28 04:43:23 -05:00

@davidtheITguy commented on GitHub (Feb 18, 2024):

@Q-point Hi, could you please let me know what you did to resolve these build errors, I'm where you were: https://github.com/ollama/ollama/issues/1979#issuecomment-1925028134 Many thanks.

@davidtheITguy commented on GitHub (Feb 18, 2024): @Q-point Hi, could you please let me know what you did to resolve these build errors, I'm where you were: https://github.com/ollama/ollama/issues/1979#issuecomment-1925028134 Many thanks.

GiteaMirror commented

2026-04-28 04:43:23 -05:00

@Q-point commented on GitHub (Feb 18, 2024):

@davidtheITguy I used the suggested repo https://github.com/remy415/ollama.git . Compiled from source. You'll have to install go compiler.

@Q-point commented on GitHub (Feb 18, 2024): @davidtheITguy I used the suggested repo https://github.com/remy415/ollama.git . Compiled from source. You'll have to install go compiler.

GiteaMirror commented

2026-04-28 04:43:23 -05:00

@davidtheITguy commented on GitHub (Feb 18, 2024):

Yes sir, did all that. Also overlaid the [package_cudart_build] directory per the https://github.com/remy415/ollama_tegra_fix, loaded the suggested env vars too. Compiled from source and got the same errors you posted https://github.com/ollama/ollama/issues/1979#issuecomment-1925028134.

Appreciate the response and I'll keep plugging away good to know I'm on the right track.

@davidtheITguy commented on GitHub (Feb 18, 2024): Yes sir, did all that. Also overlaid the `[package_cudart_build]` directory per the [https://github.com/remy415/ollama_tegra_fix](https://github.com/remy415/ollama_tegra_fix), loaded the suggested env vars too. Compiled from source and got the same errors you posted https://github.com/ollama/ollama/issues/1979#issuecomment-1925028134. Appreciate the response and I'll keep plugging away good to know I'm on the right track.

GiteaMirror commented

2026-04-28 04:43:24 -05:00

@remy415 commented on GitHub (Feb 18, 2024):

Yes sir, did all that. Also overlaid the [package_cudart_build] directory per the https://github.com/remy415/ollama_tegra_fix, loaded the suggested env vars too. Compiled from source and got the same errors you posted #1979 (comment).

Appreciate the response and I'll keep plugging away good to know I'm on the right track.

@davidtheITguy what error messages are you getting? Is it the ones about the half precision posted above? When did you last pull the repo? It should work with any JP5 Jetson

@remy415 commented on GitHub (Feb 18, 2024): > Yes sir, did all that. Also overlaid the `[package_cudart_build]` directory per the https://github.com/remy415/ollama_tegra_fix, loaded the suggested env vars too. Compiled from source and got the same errors you posted [#1979 (comment)](https://github.com/ollama/ollama/issues/1979#issuecomment-1925028134). > > Appreciate the response and I'll keep plugging away good to know I'm on the right track. @davidtheITguy what error messages are you getting? Is it the ones about the half precision posted above? When did you last pull the repo? It should work with any JP5 Jetson

GiteaMirror commented

2026-04-28 04:43:24 -05:00

@davidtheITguy commented on GitHub (Feb 18, 2024):

Apologize in advance for posting the entire error output, but want you to see. Actually got pretty far into the build before this happened:

/ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(645): warning: function "warp_reduce_sum(half2)" was declared but never referenced
/ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(666): warning: function "warp_reduce_max(half2)" was declared but never referenced
/ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(1071): error: identifier "__hsub2" is undefined
/ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(1072): error: identifier "__hmul2" is undefined
/ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(1063): warning: variable "d" was declared but never referenced
/ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(1091): error: identifier "__hmul2" is undefined
/ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(1092): error: identifier "__hadd2" is undefined
/ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(1114): error: identifier "__hsub2" is undefined
/ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(1115): error: identifier "__hmul2" is undefined
/ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(1102): warning: variable "d" was declared but never referenced
/ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(1138): error: identifier "__hmul2" is undefined
/ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(1139): error: identifier "__hadd2" is undefined
/ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(1155): error: identifier "__hmul2" is undefined
/ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(1149): warning: variable "d" was declared but never referenced
/ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(5433): error: identifier "__hmul2" is undefined
/ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(5447): warning: no operator "+=" matches these operands
/ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(5452): error: more than one conversion function from "__half" to a built-in type applies:
            function "__half::operator float() const"
/usr/local/cuda/include/cuda_fp16.hpp(204): here
            function "__half::operator short() const"
/usr/local/cuda/include/cuda_fp16.hpp(222): here
            function "__half::operator unsigned short() const"
/usr/local/cuda/include/cuda_fp16.hpp(225): here
            function "__half::operator int() const"
/usr/local/cuda/include/cuda_fp16.hpp(228): here
            function "__half::operator unsigned int() const"
/usr/local/cuda/include/cuda_fp16.hpp(231): here
            function "__half::operator long long() const"
/usr/local/cuda/include/cuda_fp16.hpp(234): here
            function "__half::operator unsigned long long() const"
/usr/local/cuda/include/cuda_fp16.hpp(237): here
            function "__half::operator __nv_bool() const"
/usr/local/cuda/include/cuda_fp16.hpp(241): here
/ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(5452): error: more than one conversion function from "__half" to a built-in type applies:
            function "__half::operator float() const"
/usr/local/cuda/include/cuda_fp16.hpp(204): here
            function "__half::operator short() const"
/usr/local/cuda/include/cuda_fp16.hpp(222): here
            function "__half::operator unsigned short() const"
/usr/local/cuda/include/cuda_fp16.hpp(225): here
            function "__half::operator int() const"
/usr/local/cuda/include/cuda_fp16.hpp(228): here
            function "__half::operator unsigned int() const"
/usr/local/cuda/include/cuda_fp16.hpp(231): here
            function "__half::operator long long() const"
/usr/local/cuda/include/cuda_fp16.hpp(234): here
            function "__half::operator unsigned long long() const"
/usr/local/cuda/include/cuda_fp16.hpp(237): here
            function "__half::operator __nv_bool() const"
/usr/local/cuda/include/cuda_fp16.hpp(241): here
/ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(5447): error: no operator "+=" matches these operands
            operand types are: half2 += __half2
          detected during instantiation of "void dequantize_mul_mat_vec<qk,qr,dequantize_kernel>(const void *, const dfloat *, float *, int, int) [with qk=32, qr=2, dequantize_kernel=&dequantize_q4_0]"
(6768): here
/ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(5447): error: no operator "+=" matches these operands
            operand types are: half2 += __half2
          detected during instantiation of "void dequantize_mul_mat_vec<qk,qr,dequantize_kernel>(const void *, const dfloat *, float *, int, int) [with qk=32, qr=2, dequantize_kernel=&dequantize_q4_1]"
(6777): here

/ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(5447): error: no operator "+=" matches these operands
            operand types are: half2 += __half2
          detected during instantiation of "void dequantize_mul_mat_vec<qk,qr,dequantize_kernel>(const void *, const dfloat *, float *, int, int) [with qk=32, qr=2, dequantize_kernel=&dequantize_q5_0]"
(6786): here

/ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(5447): error: no operator "+=" matches these operands
            operand types are: half2 += __half2
          detected during instantiation of "void dequantize_mul_mat_vec<qk,qr,dequantize_kernel>(const void *, const dfloat *, float *, int, int) [with qk=32, qr=2, dequantize_kernel=&dequantize_q5_1]"
(6795): here

/ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(5447): error: no operator "+=" matches these operands
            operand types are: half2 += __half2
          detected during instantiation of "void dequantize_mul_mat_vec<qk,qr,dequantize_kernel>(const void *, const dfloat *, float *, int, int) [with qk=32, qr=1, dequantize_kernel=&dequantize_q8_0]"
(6804): here

/ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(5447): error: no operator "+=" matches these operands
            operand types are: half2 += __half2
          detected during instantiation of "void dequantize_mul_mat_vec<qk,qr,dequantize_kernel>(const void *, const dfloat *, float *, int, int) [with qk=1, qr=1, dequantize_kernel=&convert_f16]"
(6855): here

/ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(645): warning: function "warp_reduce_sum(half2)" was declared but never referenced

/ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(666): warning: function "warp_reduce_max(half2)" was declared but never referenced

/ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(1696): warning: variable "ksigns64" was declared but never referenced

I tried to follow your directions to the letter. Intergated the ollama_tegra_fix repo, cloned your latest fork, etc.

Tool chain versions:

cmake version 3.20.2
go version go1.13.8 linux/arm64
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0

@davidtheITguy commented on GitHub (Feb 18, 2024): Apologize in advance for posting the entire error output, but want you to see. Actually got pretty far into the build before this happened: ``` /ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(645): warning: function "warp_reduce_sum(half2)" was declared but never referenced /ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(666): warning: function "warp_reduce_max(half2)" was declared but never referenced /ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(1071): error: identifier "__hsub2" is undefined /ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(1072): error: identifier "__hmul2" is undefined /ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(1063): warning: variable "d" was declared but never referenced /ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(1091): error: identifier "__hmul2" is undefined /ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(1092): error: identifier "__hadd2" is undefined /ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(1114): error: identifier "__hsub2" is undefined /ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(1115): error: identifier "__hmul2" is undefined /ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(1102): warning: variable "d" was declared but never referenced /ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(1138): error: identifier "__hmul2" is undefined /ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(1139): error: identifier "__hadd2" is undefined /ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(1155): error: identifier "__hmul2" is undefined /ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(1149): warning: variable "d" was declared but never referenced /ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(5433): error: identifier "__hmul2" is undefined /ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(5447): warning: no operator "+=" matches these operands /ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(5452): error: more than one conversion function from "__half" to a built-in type applies: function "__half::operator float() const" /usr/local/cuda/include/cuda_fp16.hpp(204): here function "__half::operator short() const" /usr/local/cuda/include/cuda_fp16.hpp(222): here function "__half::operator unsigned short() const" /usr/local/cuda/include/cuda_fp16.hpp(225): here function "__half::operator int() const" /usr/local/cuda/include/cuda_fp16.hpp(228): here function "__half::operator unsigned int() const" /usr/local/cuda/include/cuda_fp16.hpp(231): here function "__half::operator long long() const" /usr/local/cuda/include/cuda_fp16.hpp(234): here function "__half::operator unsigned long long() const" /usr/local/cuda/include/cuda_fp16.hpp(237): here function "__half::operator __nv_bool() const" /usr/local/cuda/include/cuda_fp16.hpp(241): here /ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(5452): error: more than one conversion function from "__half" to a built-in type applies: function "__half::operator float() const" /usr/local/cuda/include/cuda_fp16.hpp(204): here function "__half::operator short() const" /usr/local/cuda/include/cuda_fp16.hpp(222): here function "__half::operator unsigned short() const" /usr/local/cuda/include/cuda_fp16.hpp(225): here function "__half::operator int() const" /usr/local/cuda/include/cuda_fp16.hpp(228): here function "__half::operator unsigned int() const" /usr/local/cuda/include/cuda_fp16.hpp(231): here function "__half::operator long long() const" /usr/local/cuda/include/cuda_fp16.hpp(234): here function "__half::operator unsigned long long() const" /usr/local/cuda/include/cuda_fp16.hpp(237): here function "__half::operator __nv_bool() const" /usr/local/cuda/include/cuda_fp16.hpp(241): here /ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(5447): error: no operator "+=" matches these operands operand types are: half2 += __half2 detected during instantiation of "void dequantize_mul_mat_vec<qk,qr,dequantize_kernel>(const void *, const dfloat *, float *, int, int) [with qk=32, qr=2, dequantize_kernel=&dequantize_q4_0]" (6768): here /ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(5447): error: no operator "+=" matches these operands operand types are: half2 += __half2 detected during instantiation of "void dequantize_mul_mat_vec<qk,qr,dequantize_kernel>(const void *, const dfloat *, float *, int, int) [with qk=32, qr=2, dequantize_kernel=&dequantize_q4_1]" (6777): here /ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(5447): error: no operator "+=" matches these operands operand types are: half2 += __half2 detected during instantiation of "void dequantize_mul_mat_vec<qk,qr,dequantize_kernel>(const void *, const dfloat *, float *, int, int) [with qk=32, qr=2, dequantize_kernel=&dequantize_q5_0]" (6786): here /ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(5447): error: no operator "+=" matches these operands operand types are: half2 += __half2 detected during instantiation of "void dequantize_mul_mat_vec<qk,qr,dequantize_kernel>(const void *, const dfloat *, float *, int, int) [with qk=32, qr=2, dequantize_kernel=&dequantize_q5_1]" (6795): here /ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(5447): error: no operator "+=" matches these operands operand types are: half2 += __half2 detected during instantiation of "void dequantize_mul_mat_vec<qk,qr,dequantize_kernel>(const void *, const dfloat *, float *, int, int) [with qk=32, qr=1, dequantize_kernel=&dequantize_q8_0]" (6804): here /ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(5447): error: no operator "+=" matches these operands operand types are: half2 += __half2 detected during instantiation of "void dequantize_mul_mat_vec<qk,qr,dequantize_kernel>(const void *, const dfloat *, float *, int, int) [with qk=1, qr=1, dequantize_kernel=&convert_f16]" (6855): here /ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(645): warning: function "warp_reduce_sum(half2)" was declared but never referenced /ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(666): warning: function "warp_reduce_max(half2)" was declared but never referenced /ssd/ollama/ollama/llm/llama.cpp/ggml-cuda.cu(1696): warning: variable "ksigns64" was declared but never referenced ``` I tried to follow your directions to the letter. Intergated the ollama_tegra_fix repo, cloned your latest fork, etc. Tool chain versions: ``` cmake version 3.20.2 go version go1.13.8 linux/arm64 gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 ```

GiteaMirror commented

2026-04-28 04:43:25 -05:00

@remy415 commented on GitHub (Feb 18, 2024):

@davidtheITguy Okay that error you're seeing happens to me when I try to compile and the cuda architectures are incorrectly set. Are you manually setting the architectures? Ensure you clear the variable CUDA_ARCHITECTURES with the command export CUDA_ARCHITECTURES="" and let the script configure it automatically. I've incorporated the necessary fixes into the code base and eliminated the need for most of the ollama_tegra_fix page. The only env variables that are absolutely critical are:
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:/usr/local/cuda/compat:/usr/local/cuda/include"
and the one regarding cpu generate is just helpful to Jetson users as it skips the AVX builds
export OLLAMA_SKIP_CPU_GENERATE="yes"

I would highly recommend deleting your ollama folder completely, re-clone https://github.com/remy415/ollama.git, ensure these 2 env variables only are set, and run go generate ./... && go build .

For reference, during the build process you should see a reference to CUDA_ARCHITECTURES. The only ones that should be loaded depend on your Jetson device and Jetpack version but it should detect it automatically if you are running any of the 3 Jetpack versions listed here:

Nano/TX1 = 5.3, TX2 = 6.2, Xavier = 7.2, Orin = 8.7
L4T_VERSION.major >= 36: # JetPack 6
CUDA_ARCHITECTURES = [87]
L4T_VERSION.major >= 34: # JetPack 5
CUDA_ARCHITECTURES = [72, 87]
L4T_VERSION.major == 32: # JetPack 4
CUDA_ARCHITECTURES = [53, 62, 72]

Edited for formatting

@remy415 commented on GitHub (Feb 18, 2024): @davidtheITguy Okay that error you're seeing happens to me when I try to compile and the cuda architectures are incorrectly set. Are you manually setting the architectures? Ensure you clear the variable `CUDA_ARCHITECTURES` with the command `export CUDA_ARCHITECTURES=""` and let the script configure it automatically. I've incorporated the necessary fixes into the code base and eliminated the need for most of the ollama_tegra_fix page. The only env variables that are absolutely critical are: ```export LD_LIBRARY_PATH="/usr/local/cuda/lib64:/usr/local/cuda/compat:/usr/local/cuda/include"``` and the one regarding cpu generate is just helpful to Jetson users as it skips the AVX builds ```export OLLAMA_SKIP_CPU_GENERATE="yes"``` I would highly recommend deleting your ollama folder completely, re-clone `https://github.com/remy415/ollama.git`, ensure these 2 env variables only are set, and run `go generate ./... && go build .` For reference, during the build process you should see a reference to CUDA_ARCHITECTURES. The only ones that should be loaded depend on your Jetson device and Jetpack version but `it should detect it automatically` if you are running any of the 3 Jetpack versions listed here: Nano/TX1 = 5.3, TX2 = 6.2, Xavier = 7.2, Orin = 8.7 L4T_VERSION.major >= 36: # JetPack 6 `CUDA_ARCHITECTURES = [87]` L4T_VERSION.major >= 34: # JetPack 5 `CUDA_ARCHITECTURES = [72, 87]` L4T_VERSION.major == 32: # JetPack 4 `CUDA_ARCHITECTURES = [53, 62, 72]` Edited for formatting

GiteaMirror commented

2026-04-28 04:43:27 -05:00

@remy415 commented on GitHub (Feb 18, 2024):

Also overlaid the [package_cudart_build] directory per the https://github.com/remy415/ollama_tegra_fix,

I just noticed this. Sorry, I haven't cleaned up the documentation yet as I'm still working on the code itself to get it cleaned up. This step is no longer needed as that package is no longer valid.

@remy415 commented on GitHub (Feb 18, 2024): > Also overlaid the `[package_cudart_build]` directory per the https://github.com/remy415/ollama_tegra_fix, I just noticed this. Sorry, I haven't cleaned up the documentation yet as I'm still working on the code itself to get it cleaned up. This step is no longer needed as that package is no longer valid.

GiteaMirror commented

2026-04-28 04:43:27 -05:00

@davidtheITguy commented on GitHub (Feb 19, 2024):

Understood and thank you ill circle back when the build is successful

@davidtheITguy commented on GitHub (Feb 19, 2024): Understood and thank you ill circle back when the build is successful

GiteaMirror commented

2026-04-28 04:43:28 -05:00

@davidtheITguy commented on GitHub (Feb 19, 2024):

Looks like the build is very close, but not quite there yet. go generate ./... seems to have completed successfully.

The go build . errors out with what appears to be an external reference issue: build github.com/jmorganca/ollama: cannot load crypto/ecdh: malformed module path "crypto/ecdh": missing dot in first path element.

My environment:
LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/compat:/usr/local/cuda/include
OLLAMA_SKIP_CPU_GENERATE=yes

Build steps:

conda activate ollama # activate private build environment
git clone --depth=1 --recursive https://github.com/remy415/ollama
go generate ./...
go build .

Debug steps: I ran a go get -u and then go mod tidy to try to resolve any reference issues here is the output:

github.com/jmorganca/ollama/app/assets imports
        embed: malformed module path "embed": missing dot in first path element
github.com/jmorganca/ollama/app/assets imports
        io/fs: malformed module path "io/fs": missing dot in first path element
github.com/jmorganca/ollama/app/lifecycle imports
        log/slog: malformed module path "log/slog": missing dot in first path element
github.com/jmorganca/ollama/parser imports
        slices: malformed module path "slices": missing dot in first path element
github.com/jmorganca/ollama/auth imports
        golang.org/x/crypto/ssh imports
        golang.org/x/crypto/curve25519 imports
        crypto/ecdh: malformed module path "crypto/ecdh": missing dot in first path element

I couldn't find any local reference to "crypto/ecdh" so wondering if it is an external dependency...

@davidtheITguy commented on GitHub (Feb 19, 2024): Looks like the build is very close, but not quite there yet. `go generate ./...` seems to have completed successfully. The `go build .` errors out with what appears to be an external reference issue: `build github.com/jmorganca/ollama: cannot load crypto/ecdh: malformed module path "crypto/ecdh": missing dot in first path element`. My environment: `LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/compat:/usr/local/cuda/include` `OLLAMA_SKIP_CPU_GENERATE=yes` Build steps: 1. `conda activate ollama # activate private build environment` 2. `git clone --depth=1 --recursive https://github.com/remy415/ollama` 3. `go generate ./...` 4. `go build .` Debug steps: I ran a `go get -u` and then `go mod tidy` to try to resolve any reference issues here is the output: ``` github.com/jmorganca/ollama/app/assets imports embed: malformed module path "embed": missing dot in first path element github.com/jmorganca/ollama/app/assets imports io/fs: malformed module path "io/fs": missing dot in first path element github.com/jmorganca/ollama/app/lifecycle imports log/slog: malformed module path "log/slog": missing dot in first path element github.com/jmorganca/ollama/parser imports slices: malformed module path "slices": missing dot in first path element github.com/jmorganca/ollama/auth imports golang.org/x/crypto/ssh imports golang.org/x/crypto/curve25519 imports crypto/ecdh: malformed module path "crypto/ecdh": missing dot in first path element ``` I couldn't find any local reference to "crypto/ecdh" so wondering if it is an external dependency...

GiteaMirror commented

@remy415 commented on GitHub (Feb 19, 2024):

I found the below env on a forum post about some new changes in Go 1.13. Odd that you’re getting that error, I haven’t seen it before. I think it may have something to do with your conda environment. Try checking go env before and after activating your conda environment.

Try this:
export GO111MODULE=off

@remy415 commented on GitHub (Feb 19, 2024): I found the below env on a forum post about some new changes in Go 1.13. Odd that you’re getting that error, I haven’t seen it before. I think it may have something to do with your conda environment. Try checking go env before and after activating your conda environment. Try this: `export GO111MODULE=off`

GiteaMirror commented

@davidtheITguy commented on GitHub (Feb 20, 2024):

Ok got it to compile. Made a rookie move, my version of golang was something like 1.13, go.mod says 1.21 (but doesn't enforce for some reason). Upgrading go to 1.21.7 worked for me.

So for the next person just to sum up what's needed as of this comment to compile the fork for Nvidia GPUs:

set up your environment

export LD_LIBRARY_PATH="/usr/local/cuda/lib64:/usr/local/cuda/compat:/usr/local/cuda/include"
export OLLAMA_SKIP_CPU_GENERATE="yes"

Clone the forked repo from @remy415: git clone --depth=1 --recursive https://github.com/remy415/ollama
cd ollama
go generate ./...
go build .
Do a manual ollama install HERE (you need to COPY the new executable to /usr/bin/ollama, don't 'curl' it)

BUT still no CUDA/GPU unfortunately, looks a perms issue on the CUDA libs:

Feb 19 19:40:32 nvidia-agx ollama[178771]: time=2024-02-19T19:40:32.613-05:00 level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
Feb 19 19:40:40 nvidia-agx ollama[178771]: time=2024-02-19T19:40:40.217-05:00 level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [cuda_v11]"
Feb 19 19:40:40 nvidia-agx ollama[178771]: time=2024-02-19T19:40:40.217-05:00 level=INFO source=gpu.go:133 msg="Detecting GPU type"
Feb 19 19:40:40 nvidia-agx ollama[178771]: time=2024-02-19T19:40:40.217-05:00 level=INFO source=gpu.go:320 msg="Searching for GPU management library libcudart.so"
Feb 19 19:40:40 nvidia-agx ollama[178771]: time=2024-02-19T19:40:40.221-05:00 level=INFO source=gpu.go:366 msg="Discovered GPU libraries: [/usr/local/cuda/lib64/libcudart.so.11.4.298 /usr/local/c>
Feb 19 19:40:40 nvidia-agx ollama[178771]: NvRmMemInitNvmap failed with Permission denied
Feb 19 19:40:40 nvidia-agx ollama[178771]: 549: Memory Manager Not supported
Feb 19 19:40:40 nvidia-agx ollama[178771]: ****NvRmMemInit failed**** error type: 196626
Feb 19 19:40:40 nvidia-agx ollama[178771]: *** NvRmMemInit failed NvRmMemConstructor
Feb 19 19:40:40 nvidia-agx ollama[178771]: time=2024-02-19T19:40:40.228-05:00 level=INFO source=gpu.go:378 msg="Unable to load CUDA management library /usr/local/cuda/lib64/libcudart.so.11.4.298:>
Feb 19 19:40:40 nvidia-agx ollama[178771]: time=2024-02-19T19:40:40.230-05:00 level=INFO source=gpu.go:378 msg="Unable to load CUDA management library /usr/local/cuda-11/lib64/libcudart.so.11.4.2>
Feb 19 19:40:40 nvidia-agx ollama[178771]: time=2024-02-19T19:40:40.232-05:00 level=INFO source=gpu.go:378 msg="Unable to load CUDA management library /usr/local/cuda-11.4/lib64/libcudart.so.11.4>
Feb 19 19:40:40 nvidia-agx ollama[178771]: time=2024-02-19T19:40:40.232-05:00 level=INFO source=gpu.go:320 msg="Searching for GPU management library librocm_smi64.so"
Feb 19 19:40:40 nvidia-agx ollama[178771]: time=2024-02-19T19:40:40.232-05:00 level=INFO source=gpu.go:366 msg="Discovered GPU libraries: []"
Feb 19 19:40:40 nvidia-agx ollama[178771]: time=2024-02-19T19:40:40.232-05:00 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions"
Feb 19 19:40:40 nvidia-agx ollama[178771]: time=2024-02-19T19:40:40.232-05:00 level=INFO source=routes.go:1037 msg="no GPU detected"

I'll keep plugging away hopefully, this will all get settled and merged at some point.

Many thanks @remy415 for your assistance on the build.

@davidtheITguy commented on GitHub (Feb 20, 2024): Ok got it to compile. Made a rookie move, my version of golang was something like 1.13, go.mod says 1.21 (but doesn't enforce for some reason). Upgrading go to 1.21.7 worked for me. So for the next person just to sum up what's needed as of this comment to compile the fork for Nvidia GPUs: 1. set up your environment ``` export LD_LIBRARY_PATH="/usr/local/cuda/lib64:/usr/local/cuda/compat:/usr/local/cuda/include" export OLLAMA_SKIP_CPU_GENERATE="yes" ``` 2. Clone the forked repo from @remy415: `git clone --depth=1 --recursive https://github.com/remy415/ollama` 3. `cd ollama` 4. `go generate ./...` 5. `go build .` 6. Do a manual ollama install [HERE](https://github.com/ollama/ollama/blob/main/docs/linux.md#manual-install) (you need to COPY the new executable to /usr/bin/ollama, don't 'curl' it) BUT still no CUDA/GPU unfortunately, looks a perms issue on the CUDA libs: ``` Feb 19 19:40:32 nvidia-agx ollama[178771]: time=2024-02-19T19:40:32.613-05:00 level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." Feb 19 19:40:40 nvidia-agx ollama[178771]: time=2024-02-19T19:40:40.217-05:00 level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [cuda_v11]" Feb 19 19:40:40 nvidia-agx ollama[178771]: time=2024-02-19T19:40:40.217-05:00 level=INFO source=gpu.go:133 msg="Detecting GPU type" Feb 19 19:40:40 nvidia-agx ollama[178771]: time=2024-02-19T19:40:40.217-05:00 level=INFO source=gpu.go:320 msg="Searching for GPU management library libcudart.so" Feb 19 19:40:40 nvidia-agx ollama[178771]: time=2024-02-19T19:40:40.221-05:00 level=INFO source=gpu.go:366 msg="Discovered GPU libraries: [/usr/local/cuda/lib64/libcudart.so.11.4.298 /usr/local/c> Feb 19 19:40:40 nvidia-agx ollama[178771]: NvRmMemInitNvmap failed with Permission denied Feb 19 19:40:40 nvidia-agx ollama[178771]: 549: Memory Manager Not supported Feb 19 19:40:40 nvidia-agx ollama[178771]: ****NvRmMemInit failed**** error type: 196626 Feb 19 19:40:40 nvidia-agx ollama[178771]: *** NvRmMemInit failed NvRmMemConstructor Feb 19 19:40:40 nvidia-agx ollama[178771]: time=2024-02-19T19:40:40.228-05:00 level=INFO source=gpu.go:378 msg="Unable to load CUDA management library /usr/local/cuda/lib64/libcudart.so.11.4.298:> Feb 19 19:40:40 nvidia-agx ollama[178771]: time=2024-02-19T19:40:40.230-05:00 level=INFO source=gpu.go:378 msg="Unable to load CUDA management library /usr/local/cuda-11/lib64/libcudart.so.11.4.2> Feb 19 19:40:40 nvidia-agx ollama[178771]: time=2024-02-19T19:40:40.232-05:00 level=INFO source=gpu.go:378 msg="Unable to load CUDA management library /usr/local/cuda-11.4/lib64/libcudart.so.11.4> Feb 19 19:40:40 nvidia-agx ollama[178771]: time=2024-02-19T19:40:40.232-05:00 level=INFO source=gpu.go:320 msg="Searching for GPU management library librocm_smi64.so" Feb 19 19:40:40 nvidia-agx ollama[178771]: time=2024-02-19T19:40:40.232-05:00 level=INFO source=gpu.go:366 msg="Discovered GPU libraries: []" Feb 19 19:40:40 nvidia-agx ollama[178771]: time=2024-02-19T19:40:40.232-05:00 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions" Feb 19 19:40:40 nvidia-agx ollama[178771]: time=2024-02-19T19:40:40.232-05:00 level=INFO source=routes.go:1037 msg="no GPU detected" ``` I'll keep plugging away hopefully, this will all get settled and merged at some point. Many thanks @remy415 for your assistance on the build.

GiteaMirror commented

@remy415 commented on GitHub (Feb 20, 2024):

@davidtheITguy the user running the binary needs to be part of the render user group, and the video user group.
The command follows this syntax: sudo usermod -a -G examplegroup exampleusername.

sudo usermod -a -G render <user name>
sudo usermod -a -G video <user name>

If you’re running it as a service using the ollama user, then add the ollama user. If you’re running directly from CLI ensure your user is part of the group.

@remy415 commented on GitHub (Feb 20, 2024): @davidtheITguy the user running the binary needs to be part of the `render` user group, and the `video` user group. The command follows this syntax: `sudo usermod -a -G examplegroup exampleusername`. `sudo usermod -a -G render <user name>` `sudo usermod -a -G video <user name>` If you’re running it as a service using the ollama user, then add the ollama user. If you’re running directly from CLI ensure your user is part of the group.

GiteaMirror commented