[GH-ISSUE #3475] Build fails when using OLLAMA_SKIP_CPU_GENERATE=1 on aarch64 Linux #48651

Closed
opened 2026-04-28 08:59:26 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @remy415 on GitHub (Apr 3, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3475

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

When using the OLLAMA_SKIP_CPU_GENERATE=1 flag for compiling the binary, the Ollama binary fails to compile with the error:

In file included from gpu_info_nvml.h:4,
                 from gpu_info_nvml.c:5:
gpu_info_nvml.c: In function ‘nvml_check_vram’:
gpu_info_nvml.c:158:20: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 4 has type ‘long long unsigned int’ [-Wformat=]
  158 |     LOG(h.verbose, "[%d] CUDA totalMem %ld\n", i, memInfo.total);
      |                    ^~~~~~~~~~~~~~~~~~~~~~~~~~     ~~~~~~~~~~~~~
      |                                                          |
      |                                                          long long unsigned int
gpu_info.h:33:23: note: in definition of macro ‘LOG’
   33 |       fprintf(stderr, __VA_ARGS__); \
      |                       ^~~~~~~~~~~
gpu_info_nvml.c:158:42: note: format string is defined here
  158 |     LOG(h.verbose, "[%d] CUDA totalMem %ld\n", i, memInfo.total);
      |                                        ~~^
      |                                          |
      |                                          long int
      |                                        %lld
In file included from gpu_info_nvml.h:4,
                 from gpu_info_nvml.c:5:
gpu_info_nvml.c:159:20: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 4 has type ‘long long unsigned int’ [-Wformat=]
  159 |     LOG(h.verbose, "[%d] CUDA freeMem %ld\n", i, memInfo.free);
      |                    ^~~~~~~~~~~~~~~~~~~~~~~~~     ~~~~~~~~~~~~
      |                                                         |
      |                                                         long long unsigned int
gpu_info.h:33:23: note: in definition of macro ‘LOG’
   33 |       fprintf(stderr, __VA_ARGS__); \
      |                       ^~~~~~~~~~~
gpu_info_nvml.c:159:41: note: format string is defined here
  159 |     LOG(h.verbose, "[%d] CUDA freeMem %ld\n", i, memInfo.free);
      |                                       ~~^
      |                                         |
      |                                         long int
      |                                       %lld
# github.com/ollama/ollama
/home/tegra/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.0.linux-arm64/pkg/tool/linux_arm64/link: running gcc failed: exit status 1
gcc: error: /home/tegra/ok3d/ollama-container/dev/ollama/llm/build/linux/arm64_static/libllama.a: No such file or directory

What did you expect to see?

It appears code to build a required static library folder was added inside of the optional flag OLLAMA_SKIP_CPU_GENERATE on line 62. Not sure if the intent was to build the _static folder to include in every build or to only include it when building for CPU.

Additionally, although I enjoy saving time by adding the OLLAMA_SKIP_CPU_GENERATE flag, should the LCD_CPU build be created regardless of the skip flag as a fallback for GPU OOM?

Steps to reproduce

Set OLLAMA_SKIP_CPU_GENERATE=1 as an environment variable for compiling the binary.

Are there any recent changes that introduced the issue?

In file llm/generate/gen_linux.sh, the following code snippet exists starting on line 62:

if [ -z "${OLLAMA_SKIP_CPU_GENERATE}" ]; then 

    if [ -z "${OLLAMA_CPU_TARGET}" -o "${OLLAMA_CPU_TARGET}" = "static" ]; then
        # Static build for linking into the Go binary
        init_vars
        CMAKE_TARGETS="--target llama --target ggml"
        CMAKE_DEFS="-DBUILD_SHARED_LIBS=off -DLLAMA_NATIVE=off -DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_FMA=off -DLLAMA_F16C=off ${CMAKE_DEFS}"
        BUILD_DIR="../build/linux/${ARCH}_static"
        echo "Building static library"
        build
    fi

OS

Linux

Architecture

arm64

Platform

No response

Ollama version

0.1.30

GPU

Nvidia

GPU info

NVidia Jetson Orin Nano 8Gb (Tegra ARM64v8 SOC)

CPU

Other

Other software

No response

Originally created by @remy415 on GitHub (Apr 3, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3475 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? When using the OLLAMA_SKIP_CPU_GENERATE=1 flag for compiling the binary, the Ollama binary fails to compile with the error: ``` In file included from gpu_info_nvml.h:4, from gpu_info_nvml.c:5: gpu_info_nvml.c: In function ‘nvml_check_vram’: gpu_info_nvml.c:158:20: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 4 has type ‘long long unsigned int’ [-Wformat=] 158 | LOG(h.verbose, "[%d] CUDA totalMem %ld\n", i, memInfo.total); | ^~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~ | | | long long unsigned int gpu_info.h:33:23: note: in definition of macro ‘LOG’ 33 | fprintf(stderr, __VA_ARGS__); \ | ^~~~~~~~~~~ gpu_info_nvml.c:158:42: note: format string is defined here 158 | LOG(h.verbose, "[%d] CUDA totalMem %ld\n", i, memInfo.total); | ~~^ | | | long int | %lld In file included from gpu_info_nvml.h:4, from gpu_info_nvml.c:5: gpu_info_nvml.c:159:20: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 4 has type ‘long long unsigned int’ [-Wformat=] 159 | LOG(h.verbose, "[%d] CUDA freeMem %ld\n", i, memInfo.free); | ^~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~ | | | long long unsigned int gpu_info.h:33:23: note: in definition of macro ‘LOG’ 33 | fprintf(stderr, __VA_ARGS__); \ | ^~~~~~~~~~~ gpu_info_nvml.c:159:41: note: format string is defined here 159 | LOG(h.verbose, "[%d] CUDA freeMem %ld\n", i, memInfo.free); | ~~^ | | | long int | %lld # github.com/ollama/ollama /home/tegra/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.0.linux-arm64/pkg/tool/linux_arm64/link: running gcc failed: exit status 1 gcc: error: /home/tegra/ok3d/ollama-container/dev/ollama/llm/build/linux/arm64_static/libllama.a: No such file or directory ``` ### What did you expect to see? It appears code to build a required static library folder was added inside of the optional flag `OLLAMA_SKIP_CPU_GENERATE` on line 62. Not sure if the intent was to build the `_static` folder to include in every build or to only include it when building for CPU. Additionally, although I enjoy saving time by adding the OLLAMA_SKIP_CPU_GENERATE flag, should the LCD_CPU build be created regardless of the skip flag as a fallback for GPU OOM? ### Steps to reproduce Set `OLLAMA_SKIP_CPU_GENERATE=1` as an environment variable for compiling the binary. ### Are there any recent changes that introduced the issue? In file `llm/generate/gen_linux.sh`, the following code snippet exists starting on line 62: ``` if [ -z "${OLLAMA_SKIP_CPU_GENERATE}" ]; then if [ -z "${OLLAMA_CPU_TARGET}" -o "${OLLAMA_CPU_TARGET}" = "static" ]; then # Static build for linking into the Go binary init_vars CMAKE_TARGETS="--target llama --target ggml" CMAKE_DEFS="-DBUILD_SHARED_LIBS=off -DLLAMA_NATIVE=off -DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_FMA=off -DLLAMA_F16C=off ${CMAKE_DEFS}" BUILD_DIR="../build/linux/${ARCH}_static" echo "Building static library" build fi ``` ### OS Linux ### Architecture arm64 ### Platform _No response_ ### Ollama version 0.1.30 ### GPU Nvidia ### GPU info NVidia Jetson Orin Nano 8Gb (Tegra ARM64v8 SOC) ### CPU Other ### Other software _No response_
GiteaMirror added the feature request label 2026-04-28 08:59:26 -05:00
Author
Owner

@dhiltgen commented on GitHub (Apr 12, 2024):

We've been shuffling things around a bit with the new subprocess based model. We're building a static link version of the library to link to ollama via cgo so we can access some low-level routines. In our Dockerfile, we build the CPU variants in other build targets and then stitch the results together. Largely this is meant as a mechanism to be able to parallelize more parts of the build process to speed it up.

I'm not sure what your objective is, so I'm not sure how we want to adjust these settings. Ultimately we do need to generate llm/build/linux/arm64_static/libllama.a to be able to compile.

<!-- gh-comment-id:2052702089 --> @dhiltgen commented on GitHub (Apr 12, 2024): We've been shuffling things around a bit with the new subprocess based model. We're building a static link version of the library to link to ollama via cgo so we can access some low-level routines. In our Dockerfile, we build the CPU variants in other build targets and then stitch the results together. Largely this is meant as a mechanism to be able to parallelize more parts of the build process to speed it up. I'm not sure what your objective is, so I'm not sure how we want to adjust these settings. Ultimately we do need to generate `llm/build/linux/arm64_static/libllama.a` to be able to compile.
Author
Owner

@remy415 commented on GitHub (Apr 13, 2024):

The logic to build libllama.a (mandatory piece) is held within an optional flag (the skip cpu gen flag), I figured just move libllama.a to build every time? I skip the cou generate on the Jetson so it only builds cuda

<!-- gh-comment-id:2052897412 --> @remy415 commented on GitHub (Apr 13, 2024): The logic to build libllama.a (mandatory piece) is held within an optional flag (the skip cpu gen flag), I figured just move libllama.a to build every time? I skip the cou generate on the Jetson so it only builds cuda
Author
Owner

@dhiltgen commented on GitHub (Apr 15, 2024):

Got it. This might be getting into diminishing returns, but splitting those out for the x86 and arm build shaves off 30-60 seconds of wall-clock time for the build. ROCm is by far the slowest part of the build (x86 only) but we're trying to keep things as efficient and parallelizable as possible given the volume of llama.cpp variants we're building. I'm expecting more variants will come over time.

Feel free to land a PR to refine this, as long as we can retain the parallel logic here and here and not build the libllama.a multiple times.

<!-- gh-comment-id:2057962377 --> @dhiltgen commented on GitHub (Apr 15, 2024): Got it. This might be getting into diminishing returns, but splitting those out for the x86 and arm build shaves off 30-60 seconds of wall-clock time for the build. ROCm is by far the slowest part of the build (x86 only) but we're trying to keep things as efficient and parallelizable as possible given the volume of llama.cpp variants we're building. I'm expecting more variants will come over time. Feel free to land a PR to refine this, as long as we can retain the parallel logic [here](https://github.com/ollama/ollama/blob/main/Dockerfile#L64-L65) and [here](https://github.com/ollama/ollama/blob/main/Dockerfile#L84-L85) and not build the libllama.a multiple times.
Author
Owner

@remy415 commented on GitHub (Apr 16, 2024):

Okay, so I was thinking we could either add a flag to skip static generation like:
if [ -z "${OLLAMA_SKIP_STATIC_GENERATE}" ]; then <build llm/build/linux/${ARCH}_static/libllama.a>

If the llm/build directory is shared between builds, can instead just check if it's already built with:
if [ -z llm/build/linux/${ARCH}_static/libllama.a ]; then <build llm/build/linux/${ARCH}_static/libllama.a>.

Would either of these work for you? I know adding the skip flag would add an additional ARG line per container build, which I think should be avoided when possible but I'm not seeing a lot of viable options for programmatically skipping a portion of the build while making it happen by default.

<!-- gh-comment-id:2059466124 --> @remy415 commented on GitHub (Apr 16, 2024): Okay, so I was thinking we could either add a flag to skip static generation like: `if [ -z "${OLLAMA_SKIP_STATIC_GENERATE}" ]; then <build llm/build/linux/${ARCH}_static/libllama.a>` If the `llm/build` directory is shared between builds, can instead just check if it's already built with: `if [ -z llm/build/linux/${ARCH}_static/libllama.a ]; then <build llm/build/linux/${ARCH}_static/libllama.a>`. Would either of these work for you? I know adding the skip flag would add an additional ARG line per container build, which I think should be avoided when possible but I'm not seeing a lot of viable options for programmatically skipping a portion of the build while making it happen by default.
Author
Owner

@dhiltgen commented on GitHub (Apr 16, 2024):

In the containerized build, the filesystem isn't shared as they're all running in parallel on their own overlay filesystem, so I don't think the second option will work. Flags are probably the best option and set it up so a human doing a simple go generate ./... gets all the bits and pieces, but the containerized build can set flags to optimize the parallelism.

<!-- gh-comment-id:2059494099 --> @dhiltgen commented on GitHub (Apr 16, 2024): In the containerized build, the filesystem isn't shared as they're all running in parallel on their own overlay filesystem, so I don't think the second option will work. Flags are probably the best option and set it up so a human doing a simple `go generate ./...` gets all the bits and pieces, but the containerized build can set flags to optimize the parallelism.
Author
Owner

@remy415 commented on GitHub (Apr 17, 2024):

Edit: May want to stick to the former. The original logic if [ -z "${OLLAMA_CPU_TARGET}" -o "${OLLAMA_CPU_TARGET}" = "static" ] will fail to build if someone tries to generate an AVX build.

@dhiltgen Do you include a RUN OLLAMA_CPU_TARGET="static" <generate> line in every build that requires one? I'm asking because if so, I'd change the Dockerfile to blanket OLLAMA_SKIP_STATIC_GENERATE, and set OLLAMA_CPU_TARGET="static" to override the skip when you want to build it. It would still build the static by default.

init_vars
if [ -z "${OLLAMA_SKIP_STATIC_GENERATE}" -o "${OLLAMA_CPU_TARGET}" = "static" ]; then
    # Builds by default, allows skipping, forces build if OLLAMA_CPU_TARGET="static"
    # Enables optimized Dockerfile builds using a blanket skip and targeted overrides
    # Static build for linking into the Go binary
    init_vars
    CMAKE_TARGETS="--target llama --target ggml"
    CMAKE_DEFS="-DBUILD_SHARED_LIBS=off -DLLAMA_NATIVE=off -DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_FMA=off -DLLAMA_F16C=off ${CMAKE_DEFS}"
    BUILD_DIR="../build/linux/${ARCH}_static"
    echo "Building static library"
    build
fi

If not, how do you want to handle the cascading if statements? I guess I could just leave it like this and you can further adjust it later.

init_vars
if [ -z "${OLLAMA_SKIP_STATIC_GENERATE}" ]; then
    if [ -z "${OLLAMA_CPU_TARGET}" -o "${OLLAMA_CPU_TARGET}" = "static" ]; then
        init_vars
        CMAKE_TARGETS="--target llama --target ggml"
        CMAKE_DEFS="-DBUILD_SHARED_LIBS=off -DLLAMA_NATIVE=off -DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_FMA=off -DLLAMA_F16C=off ${CMAKE_DEFS}"
        BUILD_DIR="../build/linux/${ARCH}_static"
        echo "Building static library"
        build
    fi
fi
<!-- gh-comment-id:2061890937 --> @remy415 commented on GitHub (Apr 17, 2024): Edit: May want to stick to the former. The original logic `if [ -z "${OLLAMA_CPU_TARGET}" -o "${OLLAMA_CPU_TARGET}" = "static" ]` will fail to build if someone tries to generate an AVX build. @dhiltgen Do you include a `RUN OLLAMA_CPU_TARGET="static" <generate>` line in every build that requires one? I'm asking because if so, I'd change the Dockerfile to blanket `OLLAMA_SKIP_STATIC_GENERATE`, and set `OLLAMA_CPU_TARGET="static"` to override the skip when you want to build it. It would still build the static by default. ``` init_vars if [ -z "${OLLAMA_SKIP_STATIC_GENERATE}" -o "${OLLAMA_CPU_TARGET}" = "static" ]; then # Builds by default, allows skipping, forces build if OLLAMA_CPU_TARGET="static" # Enables optimized Dockerfile builds using a blanket skip and targeted overrides # Static build for linking into the Go binary init_vars CMAKE_TARGETS="--target llama --target ggml" CMAKE_DEFS="-DBUILD_SHARED_LIBS=off -DLLAMA_NATIVE=off -DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_FMA=off -DLLAMA_F16C=off ${CMAKE_DEFS}" BUILD_DIR="../build/linux/${ARCH}_static" echo "Building static library" build fi ``` If not, how do you want to handle the cascading if statements? I guess I could just leave it like this and you can further adjust it later. ``` init_vars if [ -z "${OLLAMA_SKIP_STATIC_GENERATE}" ]; then if [ -z "${OLLAMA_CPU_TARGET}" -o "${OLLAMA_CPU_TARGET}" = "static" ]; then init_vars CMAKE_TARGETS="--target llama --target ggml" CMAKE_DEFS="-DBUILD_SHARED_LIBS=off -DLLAMA_NATIVE=off -DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_FMA=off -DLLAMA_F16C=off ${CMAKE_DEFS}" BUILD_DIR="../build/linux/${ARCH}_static" echo "Building static library" build fi fi ```
Author
Owner

@remy415 commented on GitHub (Apr 17, 2024):

#3708 Submitted

<!-- gh-comment-id:2061983077 --> @remy415 commented on GitHub (Apr 17, 2024): #3708 Submitted
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#48651