[GH-ISSUE #15354] Ollama 0.20.2 unknown model architecture: 'gemma4' with Cuda arch 50 #71883

Closed
opened 2026-05-05 02:49:44 -05:00 by GiteaMirror · 28 comments
Owner

Originally created by @S0AndS0 on GitHub (Apr 6, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15354

Update 2026-04-08

This issue is resolved as a ID10T error, I failed to use Nix configs correctly for overriding package version and hash(es)

Correctly functioning configuration file is

{
  pkgs,
  ...
}:

{
  services.ollama = {
    enable = true;

    package = (pkgs.ollama-cuda.override {
      cudaArches = [ "50" ];
    }).overrideAttrs (finalAttrs: previousAttrs: {
      version = "0.20.2";
      src = pkgs.fetchFromGitHub {
        owner = "ollama";
        repo = "ollama";
        tag = "v${finalAttrs.version}";
        hash = "sha256-Ic3eLOohLR7MQGkLvDJBNOCiBBKxh6l8X9MgK0b4w+Y=";
      };
      vendorHash = "sha256-Lc1Ktdqtv2VhJQssk8K1UOimeEjVNvDWePE9WkamCos=";
    });

    loadModels = [
      "gemma4:26b"
      "gemma4:e4b"
    ];
  };
}

... readers who stumble across this, and need a version different must modify; version, hash, and vendorHash to mitigate repeating my mistake(s)


Following content is OG OP


What is the issue?

ollama run gemma4:26b
Error: 500 Internal Server Error: unable to load model: /var/lib/ollama/models/blobs/sha256-7121486771cbfe218851513210c40b35dbdee93ab1ef43fe36283c883980f0df

Note; same sorta error is reported for gemma4:e4b too

Models such qwen3.5:9b and gemma3n:e4b have no issues on this device and sometimes, when weather is just right, will use GPU too


Partial NixOS config that may aid in reproducing issue;

{ pkgs, ... }:

{
  hardware = {
    enableRedistributableFirmware = true;
    enableAllFirmware = true;
    graphics.enable = true;
  };

  # https://wiki.nixos.org/wiki/CUDA#Setting_up_CUDA_Binary_Cache
  nix.settings = {
    download-buffer-size = 524288000;
    substituters = [
      "https://cache.nixos-cuda.org"
    ];
    trusted-public-keys = [
      "cache.nixos-cuda.org:74DUi4Ye579gUqzH4ziL9IyiJBlDpMRn9MBN8oNan9M="
    ];
  };

  services.xserver.videoDrivers = [ "nvidia" ];

  services.ollama = {
    enable = true;

    package = (pkgs.ollama-cuda.override {
      cudaArches = [ "50" ];
    }).overrideAttrs (finalAttrs: previousAttrs: rec {
      version = "0.20.2";
      src = previousAttrs.src.override {
        tag = "v${version}";
      };
    });

    loadModels = [
      "gemma4:26b"
      "gemma4:e4b"
    ];
  };
}

Relevant log output

systemctl status ollama.service
● ollama.service - Server for local large language models
     Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: ignored)
     Active: active (running) since Sun 2026-04-05 17:35:15 PDT; 31min ago
 Invocation: a579fdd633f94e5f96fd2fc6412f6c0e
   Main PID: 66802 (.ollama-wrapped)
         IP: 16.9G in, 143M out
         IO: 16.9G read, 33.5G written
      Tasks: 25 (limit: 38310)
     Memory: 17.4G (peak: 17.5G)
        CPU: 7min 13.052s
     CGroup: /system.slice/ollama.service
             └─66802 /nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/bin/ollama serve

Apr 05 18:06:13 nixos ollama[66802]: llama_model_loader: - type q8_0:   28 tensors
Apr 05 18:06:13 nixos ollama[66802]: llama_model_loader: - type q4_K:  193 tensors
Apr 05 18:06:13 nixos ollama[66802]: llama_model_loader: - type q6_K:   13 tensors
Apr 05 18:06:13 nixos ollama[66802]: print_info: file format = GGUF V3 (latest)
Apr 05 18:06:13 nixos ollama[66802]: print_info: file type   = Q4_K - Medium
Apr 05 18:06:13 nixos ollama[66802]: print_info: file size   = 16.74 GiB (5.57 BPW)
Apr 05 18:06:13 nixos ollama[66802]: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma4'
Apr 05 18:06:13 nixos ollama[66802]: llama_model_load_from_file_impl: failed to load model
Apr 05 18:06:13 nixos ollama[66802]: time=2026-04-05T18:06:13.089-07:00 level=INFO source=sched.go:471 msg="NewLlamaServer failed" model=/var/lib/ollama/models/blobs/sha256-7121486771cbfe218851513210c40b35dbdee93ab1ef43fe36283c883980f0df error="unable to load model: /var/lib/ollama/models/blobs/sha256-7121486771cbfe218851513210c40b35dbdee93ab1ef43fe36283c883980f0df"
Apr 05 18:06:13 nixos ollama[66802]: [GIN] 2026/04/05 - 18:06:13 | 500 |   1.22498756s |       127.0.0.1 | POST     "/api/generate"

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.20.2


Notes and updates

I did search about for related issues, hence why for manually updating to version provided via override, but no joy was had

I also tried a flake update for lock file, but that resulted in previously functional models that previously used GPU being restricted to CPU, so that was less than joyful too

Edit; NixOS/nixpkgs#503740 had solution to my GPU wohowes 🎉

Originally created by @S0AndS0 on GitHub (Apr 6, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15354 ## Update 2026-04-08 This issue is resolved as a ID10T error, I failed to use Nix configs correctly for overriding package version and hash(es) Correctly functioning configuration file is ```nix { pkgs, ... }: { services.ollama = { enable = true; package = (pkgs.ollama-cuda.override { cudaArches = [ "50" ]; }).overrideAttrs (finalAttrs: previousAttrs: { version = "0.20.2"; src = pkgs.fetchFromGitHub { owner = "ollama"; repo = "ollama"; tag = "v${finalAttrs.version}"; hash = "sha256-Ic3eLOohLR7MQGkLvDJBNOCiBBKxh6l8X9MgK0b4w+Y="; }; vendorHash = "sha256-Lc1Ktdqtv2VhJQssk8K1UOimeEjVNvDWePE9WkamCos="; }); loadModels = [ "gemma4:26b" "gemma4:e4b" ]; }; } ``` ... readers who stumble across this, and need a version different must modify; `version`, `hash`, **and** `vendorHash` to mitigate repeating my mistake(s) --- Following content is OG OP --- ### What is the issue? ```bash ollama run gemma4:26b ``` ``` Error: 500 Internal Server Error: unable to load model: /var/lib/ollama/models/blobs/sha256-7121486771cbfe218851513210c40b35dbdee93ab1ef43fe36283c883980f0df ``` > Note; same sorta error is reported for `gemma4:e4b` too > > Models such `qwen3.5:9b` and `gemma3n:e4b` have no issues on this device and sometimes, when weather is just right, will use GPU too --- Partial NixOS config that may aid in reproducing issue; ```nix { pkgs, ... }: { hardware = { enableRedistributableFirmware = true; enableAllFirmware = true; graphics.enable = true; }; # https://wiki.nixos.org/wiki/CUDA#Setting_up_CUDA_Binary_Cache nix.settings = { download-buffer-size = 524288000; substituters = [ "https://cache.nixos-cuda.org" ]; trusted-public-keys = [ "cache.nixos-cuda.org:74DUi4Ye579gUqzH4ziL9IyiJBlDpMRn9MBN8oNan9M=" ]; }; services.xserver.videoDrivers = [ "nvidia" ]; services.ollama = { enable = true; package = (pkgs.ollama-cuda.override { cudaArches = [ "50" ]; }).overrideAttrs (finalAttrs: previousAttrs: rec { version = "0.20.2"; src = previousAttrs.src.override { tag = "v${version}"; }; }); loadModels = [ "gemma4:26b" "gemma4:e4b" ]; }; } ``` ### Relevant log output ```shell systemctl status ollama.service ``` ``` ● ollama.service - Server for local large language models Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: ignored) Active: active (running) since Sun 2026-04-05 17:35:15 PDT; 31min ago Invocation: a579fdd633f94e5f96fd2fc6412f6c0e Main PID: 66802 (.ollama-wrapped) IP: 16.9G in, 143M out IO: 16.9G read, 33.5G written Tasks: 25 (limit: 38310) Memory: 17.4G (peak: 17.5G) CPU: 7min 13.052s CGroup: /system.slice/ollama.service └─66802 /nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/bin/ollama serve Apr 05 18:06:13 nixos ollama[66802]: llama_model_loader: - type q8_0: 28 tensors Apr 05 18:06:13 nixos ollama[66802]: llama_model_loader: - type q4_K: 193 tensors Apr 05 18:06:13 nixos ollama[66802]: llama_model_loader: - type q6_K: 13 tensors Apr 05 18:06:13 nixos ollama[66802]: print_info: file format = GGUF V3 (latest) Apr 05 18:06:13 nixos ollama[66802]: print_info: file type = Q4_K - Medium Apr 05 18:06:13 nixos ollama[66802]: print_info: file size = 16.74 GiB (5.57 BPW) Apr 05 18:06:13 nixos ollama[66802]: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma4' Apr 05 18:06:13 nixos ollama[66802]: llama_model_load_from_file_impl: failed to load model Apr 05 18:06:13 nixos ollama[66802]: time=2026-04-05T18:06:13.089-07:00 level=INFO source=sched.go:471 msg="NewLlamaServer failed" model=/var/lib/ollama/models/blobs/sha256-7121486771cbfe218851513210c40b35dbdee93ab1ef43fe36283c883980f0df error="unable to load model: /var/lib/ollama/models/blobs/sha256-7121486771cbfe218851513210c40b35dbdee93ab1ef43fe36283c883980f0df" Apr 05 18:06:13 nixos ollama[66802]: [GIN] 2026/04/05 - 18:06:13 | 500 | 1.22498756s | 127.0.0.1 | POST "/api/generate" ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.20.2 --- ## Notes and updates I did search about for related issues, hence why for manually updating to version provided via override, but no joy was had I also tried a flake update for lock file, but that resulted in previously functional models that previously used GPU being restricted to CPU, so that was less than joyful too > Edit; [`NixOS/nixpkgs#503740`](https://github.com/NixOS/nixpkgs/issues/503740#issuecomment-4171174002) had solution to my GPU wohowes :tada:
GiteaMirror added the bug label 2026-05-05 02:49:44 -05:00
Author
Owner

@rick-github commented on GitHub (Apr 6, 2026):

What's the output of

ollama -v

Where was the model pulled from?

<!-- gh-comment-id:4191051084 --> @rick-github commented on GitHub (Apr 6, 2026): What's the output of ``` ollama -v ``` Where was the model pulled from?
Author
Owner

@YellowOnion commented on GitHub (Apr 6, 2026):

I'm seeing the same issue with rocm and cpu only.

# ollama.nix
let
  pkgs = import <nixpkgs> {};
in
  (pkgs.ollama.overrideAttrs (args: {
    version = "0.20.2";
    src =   pkgs.fetchFromGitHub {
    owner = "ollama";
    repo = "ollama";
    rev = "4589fa2cf5afd15fb19aca96c15b5fbf885d11cf";
    hash = "sha256-Ic3eLOohLR7MQGkLvDJBNOCiBBKxh6l8X9MgK0b4w+Y=";
  };
  doCheck = false;
}))

2
then running ./result/bin/ollama,

nixpkgs has pushed an update, but I doubt it'll help since the change is basically the same as mine.

./result/bin/ollama -v shows 0.20.2 for me.

<!-- gh-comment-id:4191429038 --> @YellowOnion commented on GitHub (Apr 6, 2026): I'm seeing the same issue with rocm and cpu only. ```nix # ollama.nix let pkgs = import <nixpkgs> {}; in (pkgs.ollama.overrideAttrs (args: { version = "0.20.2"; src = pkgs.fetchFromGitHub { owner = "ollama"; repo = "ollama"; rev = "4589fa2cf5afd15fb19aca96c15b5fbf885d11cf"; hash = "sha256-Ic3eLOohLR7MQGkLvDJBNOCiBBKxh6l8X9MgK0b4w+Y="; }; doCheck = false; })) ``` 2 then running `./result/bin/ollama`, nixpkgs has pushed an [update](https://github.com/NixOS/nixpkgs/commit/1266aa38aa83f9a7f266c205e2ea6db904525866), but I doubt it'll help since the change is basically the same as mine. `./result/bin/ollama -v` shows 0.20.2 for me.
Author
Owner

@rick-github commented on GitHub (Apr 6, 2026):

Server logs will aid in debugging.

<!-- gh-comment-id:4191543510 --> @rick-github commented on GitHub (Apr 6, 2026): [Server logs](https://docs.ollama.com/troubleshooting) will aid in debugging.
Author
Owner

@YellowOnion commented on GitHub (Apr 6, 2026):

CPU run: https://gist.github.com/YellowOnion/96ed1eb98878782ebf0fda2b11634922

GPU run: https://gist.github.com/YellowOnion/be694f9f7778d99bbd7cb7841890882d

Does it matter that I'm trying to pull unsloth dynamic quants? I can try with Google variants.

<!-- gh-comment-id:4191780635 --> @YellowOnion commented on GitHub (Apr 6, 2026): CPU run: https://gist.github.com/YellowOnion/96ed1eb98878782ebf0fda2b11634922 GPU run: https://gist.github.com/YellowOnion/be694f9f7778d99bbd7cb7841890882d Does it matter that I'm trying to pull unsloth dynamic quants? I can try with Google variants.
Author
Owner

@rick-github commented on GitHub (Apr 6, 2026):

Does it matter that I'm trying to pull unsloth dynamic quants? I can try with Google variants.

Yes. https://github.com/ollama/ollama/issues/14575#issuecomment-3989918451

<!-- gh-comment-id:4191794224 --> @rick-github commented on GitHub (Apr 6, 2026): > Does it matter that I'm trying to pull unsloth dynamic quants? I can try with Google variants. Yes. https://github.com/ollama/ollama/issues/14575#issuecomment-3989918451
Author
Owner

@YellowOnion commented on GitHub (Apr 6, 2026):

Ahh okay, yeah it seems to be working with run gemma4:e4b, I guess I'll have to wait for better quants of 26GB model to work.

The logs provided by Op also look similar to mine:

print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma4'

This looks like it's nixpkgs is setup to just call pull.

I don't quite understand how it's pulling in a Q4_K quant, but that would be the root of the issue for OP.

<!-- gh-comment-id:4192026464 --> @YellowOnion commented on GitHub (Apr 6, 2026): Ahh okay, yeah it seems to be working with `run gemma4:e4b`, I guess I'll have to wait for better quants of 26GB model to work. The logs provided by Op also look similar to mine: ``` print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma4' ``` This looks like it's [nixpkgs is setup](https://github.com/NixOS/nixpkgs/blob/nixos-25.11/nixos/modules/services/misc/ollama.nix#L379) to just call `pull`. I don't quite understand how it's pulling in a Q4_K quant, but that would be the root of the issue for OP.
Author
Owner

@rick-github commented on GitHub (Apr 6, 2026):

Q4_K_M is the default. The logs indicate that ollama is trying to load the model on the llama.cpp engine rather than the native ollama engine, and fails for the reason explained in the link. The 26B version of the model should work with ollama run gemma4:26b-a4b-it-q4_K_M or the shorter alias ollama run gemma4:26b.

<!-- gh-comment-id:4192073384 --> @rick-github commented on GitHub (Apr 6, 2026): Q4_K_M is the default. The logs indicate that ollama is trying to load the model on the llama.cpp engine rather than the native ollama engine, and fails for the reason explained in the link. The 26B version of the model should work with `ollama run gemma4:26b-a4b-it-q4_K_M` or the shorter alias `ollama run gemma4:26b`.
Author
Owner

@S0AndS0 commented on GitHub (Apr 6, 2026):

What's the output of

ollama -v

Is reporting what I expect;

ollama -v
#> ollama version is 0.20.2

Where was the model pulled from?

TLDR: it's pulling via ollama pull so I'd expect it should be doing whatever ollama says

Source-code spelunking details;

Server logs will aid in debugging.

Thanks for the link @rick-github!

Logs from running;

ollama run gemma4:e4b

journalctl -u ollama --no-pager --follow --pager-end
Apr 06 07:48:36 nixos ollama[1488]: [GIN] 2026/04/06 - 07:48:36 | 200 |      50.507µs |       127.0.0.1 | HEAD     "/"
Apr 06 07:48:36 nixos ollama[1488]: [GIN] 2026/04/06 - 07:48:36 | 200 |      82.705µs |       127.0.0.1 | HEAD     "/"
Apr 06 07:48:36 nixos ollama[1488]: [GIN] 2026/04/06 - 07:48:36 | 200 |      31.849µs |       127.0.0.1 | HEAD     "/"
Apr 06 07:48:36 nixos ollama[1488]: [GIN] 2026/04/06 - 07:48:36 | 200 |      44.177µs |       127.0.0.1 | HEAD     "/"
Apr 06 07:48:37 nixos ollama[1488]: [GIN] 2026/04/06 - 07:48:37 | 200 |  783.201543ms |       127.0.0.1 | POST     "/api/pull"
Apr 06 07:48:37 nixos ollama[1488]: [GIN] 2026/04/06 - 07:48:37 | 200 |  767.293444ms |       127.0.0.1 | POST     "/api/pull"
Apr 06 07:48:37 nixos ollama[1488]: [GIN] 2026/04/06 - 07:48:37 | 200 |   770.98887ms |       127.0.0.1 | POST     "/api/pull"
Apr 06 07:48:37 nixos ollama[1488]: [GIN] 2026/04/06 - 07:48:37 | 200 |  776.336664ms |       127.0.0.1 | POST     "/api/pull"
Apr 06 07:48:37 nixos ollama[1488]: [GIN] 2026/04/06 - 07:48:37 | 200 |   760.42554ms |       127.0.0.1 | POST     "/api/pull"
Apr 06 16:12:33 nixos ollama[1488]: [GIN] 2026/04/06 - 16:12:33 | 200 |      43.211µs |       127.0.0.1 | GET      "/api/version"
Apr 06 16:16:48 nixos ollama[1488]: [GIN] 2026/04/06 - 16:16:48 | 200 |       29.52µs |       127.0.0.1 | HEAD     "/"
Apr 06 16:16:48 nixos ollama[1488]: [GIN] 2026/04/06 - 16:16:48 | 200 |    3.447591ms |       127.0.0.1 | GET      "/api/tags"
Apr 06 16:16:56 nixos ollama[1488]: [GIN] 2026/04/06 - 16:16:56 | 200 |      43.173µs |       127.0.0.1 | HEAD     "/"
Apr 06 16:16:57 nixos ollama[1488]: [GIN] 2026/04/06 - 16:16:57 | 200 |  455.951066ms |       127.0.0.1 | POST     "/api/show"
Apr 06 16:16:57 nixos ollama[1488]: [GIN] 2026/04/06 - 16:16:57 | 200 |  418.075581ms |       127.0.0.1 | POST     "/api/show"
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: loaded meta data with 55 key-value pairs and 2131 tensors from /var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a (version GGUF V3 (latest))
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv   0:                gemma4.attention.head_count u32              = 8
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv   1:             gemma4.attention.head_count_kv u32              = 2
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv   2:                gemma4.attention.key_length u32              = 512
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv   3:            gemma4.attention.key_length_swa u32              = 256
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv   4:    gemma4.attention.layer_norm_rms_epsilon f32              = 0.000001
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv   5:          gemma4.attention.shared_kv_layers u32              = 18
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv   6:            gemma4.attention.sliding_window u32              = 512
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv   7:    gemma4.attention.sliding_window_pattern arr[bool,42]     = [true, true, true, true, true, false,...
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv   8:              gemma4.attention.value_length u32              = 512
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv   9:          gemma4.attention.value_length_swa u32              = 256
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  10:          gemma4.audio.attention.head_count u32              = 8
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  11:  gemma4.audio.attention.layer_norm_epsilon f32              = 0.000001
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  12:                   gemma4.audio.block_count u32              = 12
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  13:              gemma4.audio.conv_kernel_size u32              = 5
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  14:              gemma4.audio.embedding_length u32              = 1024
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  15:           gemma4.audio.feed_forward_length u32              = 4096
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  16:                         gemma4.block_count u32              = 42
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  17:                      gemma4.context_length u32              = 131072
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  18:                    gemma4.embedding_length u32              = 2560
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  19:    gemma4.embedding_length_per_layer_input u32              = 256
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  20:                 gemma4.feed_forward_length u32              = 10240
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  21:             gemma4.final_logit_softcapping f32              = 30.000000
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  22:                gemma4.rope.dimension_count u32              = 512
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  23:            gemma4.rope.dimension_count_swa u32              = 256
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  24:                      gemma4.rope.freq_base f32              = 1000000.000000
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  25:                  gemma4.rope.freq_base_swa f32              = 10000.000000
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  26:         gemma4.vision.attention.head_count u32              = 12
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  27: gemma4.vision.attention.layer_norm_epsilon f32              = 0.000001
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  28:                  gemma4.vision.block_count u32              = 16
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  29:             gemma4.vision.embedding_length u32              = 768
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  30:          gemma4.vision.feed_forward_length u32              = 3072
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  31:                 gemma4.vision.num_channels u32              = 3
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  32:                   gemma4.vision.patch_size u32              = 16
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  33:       gemma4.vision.projector.scale_factor u32              = 3
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  34:                       general.architecture str              = gemma4
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  35:                          general.file_type u32              = 15
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  36:                    general.parameter_count u64              = 7996157674
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  37:               general.quantization_version u32              = 2
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  38:               tokenizer.ggml.add_bos_token bool             = false
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  39:               tokenizer.ggml.add_eos_token bool             = false
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  40:              tokenizer.ggml.add_mask_token bool             = false
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  41:           tokenizer.ggml.add_padding_token bool             = false
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  42:           tokenizer.ggml.add_unknown_token bool             = false
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  43:                tokenizer.ggml.bos_token_id u32              = 2
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  44:                tokenizer.ggml.eos_token_id u32              = 1
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  45:               tokenizer.ggml.eos_token_ids arr[i32,3]       = [1, 106, 50]
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  46:               tokenizer.ggml.mask_token_id u32              = 4
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  47:                      tokenizer.ggml.merges arr[str,514906]  = ["\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n", ...
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  48:                       tokenizer.ggml.model str              = llama
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  49:            tokenizer.ggml.padding_token_id u32              = 0
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  50:                         tokenizer.ggml.pre str              = gemma4
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  51:                      tokenizer.ggml.scores arr[f32,262144]  = [0.000000, 1.000000, 2.000000, 3.0000...
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  52:                  tokenizer.ggml.token_type arr[i32,262144]  = [3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, ...
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  53:                      tokenizer.ggml.tokens arr[str,262144]  = ["<pad>", "<eos>", "<bos>", "<unk>", ...
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv  54:            tokenizer.ggml.unknown_token_id u32              = 3
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - type  f32: 1501 tensors
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - type  f16:  116 tensors
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - type q4_K:  339 tensors
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - type q6_K:   41 tensors
Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - type bf16:  134 tensors
Apr 06 16:16:58 nixos ollama[1488]: print_info: file format = GGUF V3 (latest)
Apr 06 16:16:58 nixos ollama[1488]: print_info: file type   = Q4_K - Medium
Apr 06 16:16:58 nixos ollama[1488]: print_info: file size   = 8.93 GiB (9.60 BPW)
Apr 06 16:16:58 nixos ollama[1488]: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma4'
Apr 06 16:16:58 nixos ollama[1488]: llama_model_load_from_file_impl: failed to load model
Apr 06 16:16:58 nixos ollama[1488]: time=2026-04-06T16:16:58.638-07:00 level=INFO source=sched.go:471 msg="NewLlamaServer failed" model=/var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a error="unable to load model: /var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a"
Apr 06 16:16:58 nixos ollama[1488]: [GIN] 2026/04/06 - 16:16:58 | 500 |  1.120252105s |       127.0.0.1 | POST     "/api/generate"
<!-- gh-comment-id:4195548587 --> @S0AndS0 commented on GitHub (Apr 6, 2026): > What's the output of > > ``` > ollama -v > ``` Is reporting what I expect; ```bash ollama -v #> ollama version is 0.20.2 ``` > Where was the model pulled from? TLDR: it's pulling via `ollama pull` so I'd expect it _should_ be doing whatever `ollama` says Source-code spelunking details; - [NixOS/nixpkgs → GitHash `5b2c2d8` → `pkgs/by-name/ol/ollama/package.nix`](https://github.com/NixOS/nixpkgs/blob/5b2c2d84341b2afb5647081c1386a80d7a8d8605/pkgs/by-name/ol/ollama/package.nix) is what defines the `package` that I'm overriding in order to bump version to what I thought was supported - [NixOS/nixpkgs → GitHash `5b2c2d8` → `nixos/modules/services/misc/ollama.nix` → `loadModels`](https://github.com/NixOS/nixpkgs/blob/5b2c2d84341b2afb5647081c1386a80d7a8d8605/nixos/modules/services/misc/ollama.nix#L151-L170) is where models are defined - [NixOS/nixpkgs → GitHash `5b2c2d8` → `nixos/modules/services/misc/ollama.nix` → `systemd.services.ollama-model-loader` → `script`](https://github.com/NixOS/nixpkgs/blob/5b2c2d84341b2afb5647081c1386a80d7a8d8605/nixos/modules/services/misc/ollama.nix#L283-L354) is what handles model downloads, specifically via `'${parallel}' --tag '${ollama}' pull ::: ${lib.escapeShellArgs cfg.loadModels}` > [Server logs](https://docs.ollama.com/troubleshooting) will aid in debugging. Thanks for the link @rick-github! <details><summary> Logs from running; `ollama run gemma4:e4b` </summary> ```bash journalctl -u ollama --no-pager --follow --pager-end ``` ``` Apr 06 07:48:36 nixos ollama[1488]: [GIN] 2026/04/06 - 07:48:36 | 200 | 50.507µs | 127.0.0.1 | HEAD "/" Apr 06 07:48:36 nixos ollama[1488]: [GIN] 2026/04/06 - 07:48:36 | 200 | 82.705µs | 127.0.0.1 | HEAD "/" Apr 06 07:48:36 nixos ollama[1488]: [GIN] 2026/04/06 - 07:48:36 | 200 | 31.849µs | 127.0.0.1 | HEAD "/" Apr 06 07:48:36 nixos ollama[1488]: [GIN] 2026/04/06 - 07:48:36 | 200 | 44.177µs | 127.0.0.1 | HEAD "/" Apr 06 07:48:37 nixos ollama[1488]: [GIN] 2026/04/06 - 07:48:37 | 200 | 783.201543ms | 127.0.0.1 | POST "/api/pull" Apr 06 07:48:37 nixos ollama[1488]: [GIN] 2026/04/06 - 07:48:37 | 200 | 767.293444ms | 127.0.0.1 | POST "/api/pull" Apr 06 07:48:37 nixos ollama[1488]: [GIN] 2026/04/06 - 07:48:37 | 200 | 770.98887ms | 127.0.0.1 | POST "/api/pull" Apr 06 07:48:37 nixos ollama[1488]: [GIN] 2026/04/06 - 07:48:37 | 200 | 776.336664ms | 127.0.0.1 | POST "/api/pull" Apr 06 07:48:37 nixos ollama[1488]: [GIN] 2026/04/06 - 07:48:37 | 200 | 760.42554ms | 127.0.0.1 | POST "/api/pull" Apr 06 16:12:33 nixos ollama[1488]: [GIN] 2026/04/06 - 16:12:33 | 200 | 43.211µs | 127.0.0.1 | GET "/api/version" Apr 06 16:16:48 nixos ollama[1488]: [GIN] 2026/04/06 - 16:16:48 | 200 | 29.52µs | 127.0.0.1 | HEAD "/" Apr 06 16:16:48 nixos ollama[1488]: [GIN] 2026/04/06 - 16:16:48 | 200 | 3.447591ms | 127.0.0.1 | GET "/api/tags" Apr 06 16:16:56 nixos ollama[1488]: [GIN] 2026/04/06 - 16:16:56 | 200 | 43.173µs | 127.0.0.1 | HEAD "/" Apr 06 16:16:57 nixos ollama[1488]: [GIN] 2026/04/06 - 16:16:57 | 200 | 455.951066ms | 127.0.0.1 | POST "/api/show" Apr 06 16:16:57 nixos ollama[1488]: [GIN] 2026/04/06 - 16:16:57 | 200 | 418.075581ms | 127.0.0.1 | POST "/api/show" Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: loaded meta data with 55 key-value pairs and 2131 tensors from /var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a (version GGUF V3 (latest)) Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 0: gemma4.attention.head_count u32 = 8 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 1: gemma4.attention.head_count_kv u32 = 2 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 2: gemma4.attention.key_length u32 = 512 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 3: gemma4.attention.key_length_swa u32 = 256 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 4: gemma4.attention.layer_norm_rms_epsilon f32 = 0.000001 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 5: gemma4.attention.shared_kv_layers u32 = 18 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 6: gemma4.attention.sliding_window u32 = 512 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 7: gemma4.attention.sliding_window_pattern arr[bool,42] = [true, true, true, true, true, false,... Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 8: gemma4.attention.value_length u32 = 512 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 9: gemma4.attention.value_length_swa u32 = 256 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 10: gemma4.audio.attention.head_count u32 = 8 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 11: gemma4.audio.attention.layer_norm_epsilon f32 = 0.000001 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 12: gemma4.audio.block_count u32 = 12 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 13: gemma4.audio.conv_kernel_size u32 = 5 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 14: gemma4.audio.embedding_length u32 = 1024 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 15: gemma4.audio.feed_forward_length u32 = 4096 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 16: gemma4.block_count u32 = 42 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 17: gemma4.context_length u32 = 131072 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 18: gemma4.embedding_length u32 = 2560 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 19: gemma4.embedding_length_per_layer_input u32 = 256 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 20: gemma4.feed_forward_length u32 = 10240 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 21: gemma4.final_logit_softcapping f32 = 30.000000 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 22: gemma4.rope.dimension_count u32 = 512 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 23: gemma4.rope.dimension_count_swa u32 = 256 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 24: gemma4.rope.freq_base f32 = 1000000.000000 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 25: gemma4.rope.freq_base_swa f32 = 10000.000000 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 26: gemma4.vision.attention.head_count u32 = 12 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 27: gemma4.vision.attention.layer_norm_epsilon f32 = 0.000001 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 28: gemma4.vision.block_count u32 = 16 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 29: gemma4.vision.embedding_length u32 = 768 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 30: gemma4.vision.feed_forward_length u32 = 3072 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 31: gemma4.vision.num_channels u32 = 3 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 32: gemma4.vision.patch_size u32 = 16 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 33: gemma4.vision.projector.scale_factor u32 = 3 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 34: general.architecture str = gemma4 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 35: general.file_type u32 = 15 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 36: general.parameter_count u64 = 7996157674 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 37: general.quantization_version u32 = 2 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 38: tokenizer.ggml.add_bos_token bool = false Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 39: tokenizer.ggml.add_eos_token bool = false Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 40: tokenizer.ggml.add_mask_token bool = false Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 41: tokenizer.ggml.add_padding_token bool = false Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 42: tokenizer.ggml.add_unknown_token bool = false Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 43: tokenizer.ggml.bos_token_id u32 = 2 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 44: tokenizer.ggml.eos_token_id u32 = 1 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 45: tokenizer.ggml.eos_token_ids arr[i32,3] = [1, 106, 50] Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 46: tokenizer.ggml.mask_token_id u32 = 4 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 47: tokenizer.ggml.merges arr[str,514906] = ["\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n", ... Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 48: tokenizer.ggml.model str = llama Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 49: tokenizer.ggml.padding_token_id u32 = 0 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 50: tokenizer.ggml.pre str = gemma4 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 51: tokenizer.ggml.scores arr[f32,262144] = [0.000000, 1.000000, 2.000000, 3.0000... Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 52: tokenizer.ggml.token_type arr[i32,262144] = [3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, ... Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 53: tokenizer.ggml.tokens arr[str,262144] = ["<pad>", "<eos>", "<bos>", "<unk>", ... Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - kv 54: tokenizer.ggml.unknown_token_id u32 = 3 Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - type f32: 1501 tensors Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - type f16: 116 tensors Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - type q4_K: 339 tensors Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - type q6_K: 41 tensors Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: - type bf16: 134 tensors Apr 06 16:16:58 nixos ollama[1488]: print_info: file format = GGUF V3 (latest) Apr 06 16:16:58 nixos ollama[1488]: print_info: file type = Q4_K - Medium Apr 06 16:16:58 nixos ollama[1488]: print_info: file size = 8.93 GiB (9.60 BPW) Apr 06 16:16:58 nixos ollama[1488]: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma4' Apr 06 16:16:58 nixos ollama[1488]: llama_model_load_from_file_impl: failed to load model Apr 06 16:16:58 nixos ollama[1488]: time=2026-04-06T16:16:58.638-07:00 level=INFO source=sched.go:471 msg="NewLlamaServer failed" model=/var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a error="unable to load model: /var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a" Apr 06 16:16:58 nixos ollama[1488]: [GIN] 2026/04/06 - 16:16:58 | 500 | 1.120252105s | 127.0.0.1 | POST "/api/generate" ``` </details>
Author
Owner

@rick-github commented on GitHub (Apr 6, 2026):

Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: loaded meta data with 55 key-value pairs and 2131 tensors from /var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a (version GGUF V3 (latest))

It's using the wrong backend. Set OLLAMA_DEBUG=1 in the server environment, restart the server, load the model and post the full log.

<!-- gh-comment-id:4195560478 --> @rick-github commented on GitHub (Apr 6, 2026): ``` Apr 06 16:16:58 nixos ollama[1488]: llama_model_loader: loaded meta data with 55 key-value pairs and 2131 tensors from /var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a (version GGUF V3 (latest)) ``` It's using the wrong backend. Set `OLLAMA_DEBUG=1` in the server environment, restart the server, load the model and post the full log.
Author
Owner

@S0AndS0 commented on GitHub (Apr 6, 2026):

diff --git a/services/ollama.nix b/services/ollama.nix
index 14e3f49..c332da1 100644
--- a/services/ollama.nix
+++ b/services/ollama.nix
@@ -30,6 +30,10 @@
       };
     });
 
+    environmentVariables = {
+      OLLAMA_DEBUG = "1";
+    };
+
     loadModels = [
       "gemma4:26b"
       "gemma4:e4b"
Latest logs after applying above patch to Nix configs...
nixos-rebuild switch --impure --flake . &&
  systemctl restart ollama.service &&
  journalctl -u ollama --no-pager --follow --pager-end;
Apr 06 16:37:23 nixos ollama[35604]: [GIN] 2026/04/06 - 16:37:23 | 200 |      76.583µs |       127.0.0.1 | HEAD     "/"
Apr 06 16:37:23 nixos ollama[35604]: [GIN] 2026/04/06 - 16:37:23 | 200 |      35.996µs |       127.0.0.1 | HEAD     "/"
Apr 06 16:37:23 nixos ollama[35604]: [GIN] 2026/04/06 - 16:37:23 | 200 |      28.013µs |       127.0.0.1 | HEAD     "/"
Apr 06 16:37:23 nixos ollama[35604]: [GIN] 2026/04/06 - 16:37:23 | 200 |      26.131µs |       127.0.0.1 | HEAD     "/"
Apr 06 16:37:23 nixos ollama[35604]: [GIN] 2026/04/06 - 16:37:23 | 200 |      30.137µs |       127.0.0.1 | HEAD     "/"
Apr 06 16:37:23 nixos ollama[35604]: [GIN] 2026/04/06 - 16:37:23 | 200 |  584.598817ms |       127.0.0.1 | POST     "/api/pull"
Apr 06 16:37:23 nixos ollama[35604]: [GIN] 2026/04/06 - 16:37:23 | 200 |  605.969472ms |       127.0.0.1 | POST     "/api/pull"
Apr 06 16:37:23 nixos ollama[35604]: [GIN] 2026/04/06 - 16:37:23 | 200 |  601.059016ms |       127.0.0.1 | POST     "/api/pull"
Apr 06 16:37:23 nixos ollama[35604]: [GIN] 2026/04/06 - 16:37:23 | 200 |  592.189349ms |       127.0.0.1 | POST     "/api/pull"
Apr 06 16:37:23 nixos ollama[35604]: [GIN] 2026/04/06 - 16:37:23 | 200 |  600.558359ms |       127.0.0.1 | POST     "/api/pull"
Apr 06 16:37:35 nixos ollama[35604]: [GIN] 2026/04/06 - 16:37:35 | 200 |      43.242µs |       127.0.0.1 | HEAD     "/"
Apr 06 16:37:35 nixos ollama[35604]: time=2026-04-06T16:37:35.823-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
Apr 06 16:37:35 nixos ollama[35604]: [GIN] 2026/04/06 - 16:37:35 | 200 |  427.004015ms |       127.0.0.1 | POST     "/api/show"
Apr 06 16:37:36 nixos ollama[35604]: time=2026-04-06T16:37:36.221-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
Apr 06 16:37:36 nixos ollama[35604]: [GIN] 2026/04/06 - 16:37:36 | 200 |  394.734727ms |       127.0.0.1 | POST     "/api/show"
Apr 06 16:37:36 nixos ollama[35604]: time=2026-04-06T16:37:36.691-07:00 level=DEBUG source=runner.go:264 msg="refreshing free memory"
Apr 06 16:37:36 nixos ollama[35604]: time=2026-04-06T16:37:36.691-07:00 level=DEBUG source=runner.go:328 msg="unable to refresh all GPUs with existing runners, performing bootstrap discovery"
Apr 06 16:37:36 nixos ollama[35604]: time=2026-04-06T16:37:36.691-07:00 level=INFO source=server.go:430 msg="starting runner" cmd="/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/bin/.ollama-wrapped runner --ollama-engine --port 37157"
Apr 06 16:37:36 nixos ollama[35604]: time=2026-04-06T16:37:36.691-07:00 level=DEBUG source=server.go:431 msg=subprocess PATH=/nix/store/hlxw2q9qansq7bn52xvlb5badw3z1v8s-coreutils-9.10/bin:/nix/store/b3rx5wac9hhfxn9120xkcvdwj51mc9z2-findutils-4.10.0/bin:/nix/store/8laf6k81j9ckylrigj3xsk76j69knhvl-gnugrep-3.12/bin:/nix/store/wv7qq5yb8plyhxji9x3r5gpkyfm2kf29-gnused-4.9/bin:/nix/store/wxyn8d3m8g4fnn6xazinjwhzhzdg6wib-systemd-259/bin:/nix/store/hlxw2q9qansq7bn52xvlb5badw3z1v8s-coreutils-9.10/sbin:/nix/store/b3rx5wac9hhfxn9120xkcvdwj51mc9z2-findutils-4.10.0/sbin:/nix/store/8laf6k81j9ckylrigj3xsk76j69knhvl-gnugrep-3.12/sbin:/nix/store/wv7qq5yb8plyhxji9x3r5gpkyfm2kf29-gnused-4.9/sbin:/nix/store/wxyn8d3m8g4fnn6xazinjwhzhzdg6wib-systemd-259/sbin OLLAMA_DEBUG=1 OLLAMA_HOST=127.0.0.1:11434 OLLAMA_MODELS=/var/lib/ollama/models LD_LIBRARY_PATH=/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama::/run/opengl-driver/lib:/nix/store/5f722gfp28da7xdnak1pv6gjjdffz2ry-cuda12.8-cuda_cudart-12.8.90/lib:/nix/store/cv3qcxd7n3q2v7r8sy3c9fd5mspabsgc-cuda12.8-libcublas-12.8.4.1-lib/lib:/nix/store/aqjyykxq8nmn9b5zj5y6s8frqlx89dj0-cuda12.8-cuda_cccl-12.8.90/lib OLLAMA_LIBRARY_PATH=/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama:
Apr 06 16:37:36 nixos ollama[35604]: time=2026-04-06T16:37:36.830-07:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=138.875441ms OLLAMA_LIBRARY_PATH="[/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama ]" extra_envs=map[]
Apr 06 16:37:36 nixos ollama[35604]: time=2026-04-06T16:37:36.830-07:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=139.068054ms
Apr 06 16:37:36 nixos ollama[35604]: time=2026-04-06T16:37:36.831-07:00 level=DEBUG source=sched.go:220 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1
Apr 06 16:37:36 nixos ollama[35604]: time=2026-04-06T16:37:36.883-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
Apr 06 16:37:36 nixos ollama[35604]: time=2026-04-06T16:37:36.892-07:00 level=DEBUG source=sched.go:256 msg="loading first model" model=/var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: loaded meta data with 55 key-value pairs and 2131 tensors from /var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a (version GGUF V3 (latest))
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv   0:                gemma4.attention.head_count u32              = 8
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv   1:             gemma4.attention.head_count_kv u32              = 2
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv   2:                gemma4.attention.key_length u32              = 512
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv   3:            gemma4.attention.key_length_swa u32              = 256
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv   4:    gemma4.attention.layer_norm_rms_epsilon f32              = 0.000001
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv   5:          gemma4.attention.shared_kv_layers u32              = 18
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv   6:            gemma4.attention.sliding_window u32              = 512
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv   7:    gemma4.attention.sliding_window_pattern arr[bool,42]     = [true, true, true, true, true, false,...
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv   8:              gemma4.attention.value_length u32              = 512
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv   9:          gemma4.attention.value_length_swa u32              = 256
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  10:          gemma4.audio.attention.head_count u32              = 8
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  11:  gemma4.audio.attention.layer_norm_epsilon f32              = 0.000001
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  12:                   gemma4.audio.block_count u32              = 12
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  13:              gemma4.audio.conv_kernel_size u32              = 5
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  14:              gemma4.audio.embedding_length u32              = 1024
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  15:           gemma4.audio.feed_forward_length u32              = 4096
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  16:                         gemma4.block_count u32              = 42
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  17:                      gemma4.context_length u32              = 131072
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  18:                    gemma4.embedding_length u32              = 2560
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  19:    gemma4.embedding_length_per_layer_input u32              = 256
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  20:                 gemma4.feed_forward_length u32              = 10240
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  21:             gemma4.final_logit_softcapping f32              = 30.000000
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  22:                gemma4.rope.dimension_count u32              = 512
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  23:            gemma4.rope.dimension_count_swa u32              = 256
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  24:                      gemma4.rope.freq_base f32              = 1000000.000000
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  25:                  gemma4.rope.freq_base_swa f32              = 10000.000000
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  26:         gemma4.vision.attention.head_count u32              = 12
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  27: gemma4.vision.attention.layer_norm_epsilon f32              = 0.000001
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  28:                  gemma4.vision.block_count u32              = 16
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  29:             gemma4.vision.embedding_length u32              = 768
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  30:          gemma4.vision.feed_forward_length u32              = 3072
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  31:                 gemma4.vision.num_channels u32              = 3
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  32:                   gemma4.vision.patch_size u32              = 16
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  33:       gemma4.vision.projector.scale_factor u32              = 3
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  34:                       general.architecture str              = gemma4
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  35:                          general.file_type u32              = 15
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  36:                    general.parameter_count u64              = 7996157674
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  37:               general.quantization_version u32              = 2
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  38:               tokenizer.ggml.add_bos_token bool             = false
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  39:               tokenizer.ggml.add_eos_token bool             = false
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  40:              tokenizer.ggml.add_mask_token bool             = false
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  41:           tokenizer.ggml.add_padding_token bool             = false
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  42:           tokenizer.ggml.add_unknown_token bool             = false
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  43:                tokenizer.ggml.bos_token_id u32              = 2
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  44:                tokenizer.ggml.eos_token_id u32              = 1
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  45:               tokenizer.ggml.eos_token_ids arr[i32,3]       = [1, 106, 50]
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  46:               tokenizer.ggml.mask_token_id u32              = 4
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  47:                      tokenizer.ggml.merges arr[str,514906]  = ["\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n", ...
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  48:                       tokenizer.ggml.model str              = llama
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  49:            tokenizer.ggml.padding_token_id u32              = 0
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  50:                         tokenizer.ggml.pre str              = gemma4
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  51:                      tokenizer.ggml.scores arr[f32,262144]  = [0.000000, 1.000000, 2.000000, 3.0000...
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  52:                  tokenizer.ggml.token_type arr[i32,262144]  = [3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, ...
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  53:                      tokenizer.ggml.tokens arr[str,262144]  = ["<pad>", "<eos>", "<bos>", "<unk>", ...
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv  54:            tokenizer.ggml.unknown_token_id u32              = 3
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - type  f32: 1501 tensors
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - type  f16:  116 tensors
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - type q4_K:  339 tensors
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - type q6_K:   41 tensors
Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - type bf16:  134 tensors
Apr 06 16:37:37 nixos ollama[35604]: print_info: file format = GGUF V3 (latest)
Apr 06 16:37:37 nixos ollama[35604]: print_info: file type   = Q4_K - Medium
Apr 06 16:37:37 nixos ollama[35604]: print_info: file size   = 8.93 GiB (9.60 BPW)
Apr 06 16:37:37 nixos ollama[35604]: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma4'
Apr 06 16:37:37 nixos ollama[35604]: llama_model_load_from_file_impl: failed to load model
Apr 06 16:37:37 nixos ollama[35604]: time=2026-04-06T16:37:37.476-07:00 level=INFO source=sched.go:471 msg="NewLlamaServer failed" model=/var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a error="unable to load model: /var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a"
Apr 06 16:37:37 nixos ollama[35604]: [GIN] 2026/04/06 - 16:37:37 | 500 |  1.236562901s |       127.0.0.1 | POST     "/api/generate"
<!-- gh-comment-id:4195599438 --> @S0AndS0 commented on GitHub (Apr 6, 2026): ```diff diff --git a/services/ollama.nix b/services/ollama.nix index 14e3f49..c332da1 100644 --- a/services/ollama.nix +++ b/services/ollama.nix @@ -30,6 +30,10 @@ }; }); + environmentVariables = { + OLLAMA_DEBUG = "1"; + }; + loadModels = [ "gemma4:26b" "gemma4:e4b" ``` <details><summary> Latest logs after applying above patch to Nix configs... ```bash nixos-rebuild switch --impure --flake . && systemctl restart ollama.service && journalctl -u ollama --no-pager --follow --pager-end; ``` </summary> ``` Apr 06 16:37:23 nixos ollama[35604]: [GIN] 2026/04/06 - 16:37:23 | 200 | 76.583µs | 127.0.0.1 | HEAD "/" Apr 06 16:37:23 nixos ollama[35604]: [GIN] 2026/04/06 - 16:37:23 | 200 | 35.996µs | 127.0.0.1 | HEAD "/" Apr 06 16:37:23 nixos ollama[35604]: [GIN] 2026/04/06 - 16:37:23 | 200 | 28.013µs | 127.0.0.1 | HEAD "/" Apr 06 16:37:23 nixos ollama[35604]: [GIN] 2026/04/06 - 16:37:23 | 200 | 26.131µs | 127.0.0.1 | HEAD "/" Apr 06 16:37:23 nixos ollama[35604]: [GIN] 2026/04/06 - 16:37:23 | 200 | 30.137µs | 127.0.0.1 | HEAD "/" Apr 06 16:37:23 nixos ollama[35604]: [GIN] 2026/04/06 - 16:37:23 | 200 | 584.598817ms | 127.0.0.1 | POST "/api/pull" Apr 06 16:37:23 nixos ollama[35604]: [GIN] 2026/04/06 - 16:37:23 | 200 | 605.969472ms | 127.0.0.1 | POST "/api/pull" Apr 06 16:37:23 nixos ollama[35604]: [GIN] 2026/04/06 - 16:37:23 | 200 | 601.059016ms | 127.0.0.1 | POST "/api/pull" Apr 06 16:37:23 nixos ollama[35604]: [GIN] 2026/04/06 - 16:37:23 | 200 | 592.189349ms | 127.0.0.1 | POST "/api/pull" Apr 06 16:37:23 nixos ollama[35604]: [GIN] 2026/04/06 - 16:37:23 | 200 | 600.558359ms | 127.0.0.1 | POST "/api/pull" Apr 06 16:37:35 nixos ollama[35604]: [GIN] 2026/04/06 - 16:37:35 | 200 | 43.242µs | 127.0.0.1 | HEAD "/" Apr 06 16:37:35 nixos ollama[35604]: time=2026-04-06T16:37:35.823-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 Apr 06 16:37:35 nixos ollama[35604]: [GIN] 2026/04/06 - 16:37:35 | 200 | 427.004015ms | 127.0.0.1 | POST "/api/show" Apr 06 16:37:36 nixos ollama[35604]: time=2026-04-06T16:37:36.221-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 Apr 06 16:37:36 nixos ollama[35604]: [GIN] 2026/04/06 - 16:37:36 | 200 | 394.734727ms | 127.0.0.1 | POST "/api/show" Apr 06 16:37:36 nixos ollama[35604]: time=2026-04-06T16:37:36.691-07:00 level=DEBUG source=runner.go:264 msg="refreshing free memory" Apr 06 16:37:36 nixos ollama[35604]: time=2026-04-06T16:37:36.691-07:00 level=DEBUG source=runner.go:328 msg="unable to refresh all GPUs with existing runners, performing bootstrap discovery" Apr 06 16:37:36 nixos ollama[35604]: time=2026-04-06T16:37:36.691-07:00 level=INFO source=server.go:430 msg="starting runner" cmd="/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/bin/.ollama-wrapped runner --ollama-engine --port 37157" Apr 06 16:37:36 nixos ollama[35604]: time=2026-04-06T16:37:36.691-07:00 level=DEBUG source=server.go:431 msg=subprocess PATH=/nix/store/hlxw2q9qansq7bn52xvlb5badw3z1v8s-coreutils-9.10/bin:/nix/store/b3rx5wac9hhfxn9120xkcvdwj51mc9z2-findutils-4.10.0/bin:/nix/store/8laf6k81j9ckylrigj3xsk76j69knhvl-gnugrep-3.12/bin:/nix/store/wv7qq5yb8plyhxji9x3r5gpkyfm2kf29-gnused-4.9/bin:/nix/store/wxyn8d3m8g4fnn6xazinjwhzhzdg6wib-systemd-259/bin:/nix/store/hlxw2q9qansq7bn52xvlb5badw3z1v8s-coreutils-9.10/sbin:/nix/store/b3rx5wac9hhfxn9120xkcvdwj51mc9z2-findutils-4.10.0/sbin:/nix/store/8laf6k81j9ckylrigj3xsk76j69knhvl-gnugrep-3.12/sbin:/nix/store/wv7qq5yb8plyhxji9x3r5gpkyfm2kf29-gnused-4.9/sbin:/nix/store/wxyn8d3m8g4fnn6xazinjwhzhzdg6wib-systemd-259/sbin OLLAMA_DEBUG=1 OLLAMA_HOST=127.0.0.1:11434 OLLAMA_MODELS=/var/lib/ollama/models LD_LIBRARY_PATH=/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama::/run/opengl-driver/lib:/nix/store/5f722gfp28da7xdnak1pv6gjjdffz2ry-cuda12.8-cuda_cudart-12.8.90/lib:/nix/store/cv3qcxd7n3q2v7r8sy3c9fd5mspabsgc-cuda12.8-libcublas-12.8.4.1-lib/lib:/nix/store/aqjyykxq8nmn9b5zj5y6s8frqlx89dj0-cuda12.8-cuda_cccl-12.8.90/lib OLLAMA_LIBRARY_PATH=/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama: Apr 06 16:37:36 nixos ollama[35604]: time=2026-04-06T16:37:36.830-07:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=138.875441ms OLLAMA_LIBRARY_PATH="[/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama ]" extra_envs=map[] Apr 06 16:37:36 nixos ollama[35604]: time=2026-04-06T16:37:36.830-07:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=139.068054ms Apr 06 16:37:36 nixos ollama[35604]: time=2026-04-06T16:37:36.831-07:00 level=DEBUG source=sched.go:220 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1 Apr 06 16:37:36 nixos ollama[35604]: time=2026-04-06T16:37:36.883-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 Apr 06 16:37:36 nixos ollama[35604]: time=2026-04-06T16:37:36.892-07:00 level=DEBUG source=sched.go:256 msg="loading first model" model=/var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: loaded meta data with 55 key-value pairs and 2131 tensors from /var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a (version GGUF V3 (latest)) Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 0: gemma4.attention.head_count u32 = 8 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 1: gemma4.attention.head_count_kv u32 = 2 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 2: gemma4.attention.key_length u32 = 512 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 3: gemma4.attention.key_length_swa u32 = 256 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 4: gemma4.attention.layer_norm_rms_epsilon f32 = 0.000001 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 5: gemma4.attention.shared_kv_layers u32 = 18 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 6: gemma4.attention.sliding_window u32 = 512 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 7: gemma4.attention.sliding_window_pattern arr[bool,42] = [true, true, true, true, true, false,... Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 8: gemma4.attention.value_length u32 = 512 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 9: gemma4.attention.value_length_swa u32 = 256 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 10: gemma4.audio.attention.head_count u32 = 8 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 11: gemma4.audio.attention.layer_norm_epsilon f32 = 0.000001 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 12: gemma4.audio.block_count u32 = 12 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 13: gemma4.audio.conv_kernel_size u32 = 5 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 14: gemma4.audio.embedding_length u32 = 1024 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 15: gemma4.audio.feed_forward_length u32 = 4096 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 16: gemma4.block_count u32 = 42 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 17: gemma4.context_length u32 = 131072 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 18: gemma4.embedding_length u32 = 2560 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 19: gemma4.embedding_length_per_layer_input u32 = 256 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 20: gemma4.feed_forward_length u32 = 10240 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 21: gemma4.final_logit_softcapping f32 = 30.000000 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 22: gemma4.rope.dimension_count u32 = 512 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 23: gemma4.rope.dimension_count_swa u32 = 256 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 24: gemma4.rope.freq_base f32 = 1000000.000000 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 25: gemma4.rope.freq_base_swa f32 = 10000.000000 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 26: gemma4.vision.attention.head_count u32 = 12 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 27: gemma4.vision.attention.layer_norm_epsilon f32 = 0.000001 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 28: gemma4.vision.block_count u32 = 16 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 29: gemma4.vision.embedding_length u32 = 768 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 30: gemma4.vision.feed_forward_length u32 = 3072 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 31: gemma4.vision.num_channels u32 = 3 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 32: gemma4.vision.patch_size u32 = 16 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 33: gemma4.vision.projector.scale_factor u32 = 3 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 34: general.architecture str = gemma4 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 35: general.file_type u32 = 15 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 36: general.parameter_count u64 = 7996157674 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 37: general.quantization_version u32 = 2 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 38: tokenizer.ggml.add_bos_token bool = false Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 39: tokenizer.ggml.add_eos_token bool = false Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 40: tokenizer.ggml.add_mask_token bool = false Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 41: tokenizer.ggml.add_padding_token bool = false Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 42: tokenizer.ggml.add_unknown_token bool = false Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 43: tokenizer.ggml.bos_token_id u32 = 2 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 44: tokenizer.ggml.eos_token_id u32 = 1 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 45: tokenizer.ggml.eos_token_ids arr[i32,3] = [1, 106, 50] Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 46: tokenizer.ggml.mask_token_id u32 = 4 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 47: tokenizer.ggml.merges arr[str,514906] = ["\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n", ... Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 48: tokenizer.ggml.model str = llama Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 49: tokenizer.ggml.padding_token_id u32 = 0 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 50: tokenizer.ggml.pre str = gemma4 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 51: tokenizer.ggml.scores arr[f32,262144] = [0.000000, 1.000000, 2.000000, 3.0000... Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 52: tokenizer.ggml.token_type arr[i32,262144] = [3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, ... Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 53: tokenizer.ggml.tokens arr[str,262144] = ["<pad>", "<eos>", "<bos>", "<unk>", ... Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - kv 54: tokenizer.ggml.unknown_token_id u32 = 3 Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - type f32: 1501 tensors Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - type f16: 116 tensors Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - type q4_K: 339 tensors Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - type q6_K: 41 tensors Apr 06 16:37:37 nixos ollama[35604]: llama_model_loader: - type bf16: 134 tensors Apr 06 16:37:37 nixos ollama[35604]: print_info: file format = GGUF V3 (latest) Apr 06 16:37:37 nixos ollama[35604]: print_info: file type = Q4_K - Medium Apr 06 16:37:37 nixos ollama[35604]: print_info: file size = 8.93 GiB (9.60 BPW) Apr 06 16:37:37 nixos ollama[35604]: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma4' Apr 06 16:37:37 nixos ollama[35604]: llama_model_load_from_file_impl: failed to load model Apr 06 16:37:37 nixos ollama[35604]: time=2026-04-06T16:37:37.476-07:00 level=INFO source=sched.go:471 msg="NewLlamaServer failed" model=/var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a error="unable to load model: /var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a" Apr 06 16:37:37 nixos ollama[35604]: [GIN] 2026/04/06 - 16:37:37 | 500 | 1.236562901s | 127.0.0.1 | POST "/api/generate" ``` </details>
Author
Owner

@rick-github commented on GitHub (Apr 6, 2026):

Something not making sense. Set OLLAMA_DEBUG=2, restart the server, load the model and post the output of:

journalctl -u ollama --no-pager --since "$(systemctl show ollama --property=ActiveEnterTimestamp --value)"  

Warning, this will be a lot of logging.

<!-- gh-comment-id:4195612462 --> @rick-github commented on GitHub (Apr 6, 2026): Something not making sense. Set `OLLAMA_DEBUG=2`, restart the server, load the model and post the output of: ``` journalctl -u ollama --no-pager --since "$(systemctl show ollama --property=ActiveEnterTimestamp --value)" ``` Warning, this will be a lot of logging.
Author
Owner

@S0AndS0 commented on GitHub (Apr 6, 2026):

Massive log for sure (-!
journalctl -u ollama --no-pager --follow --pager-end --since "$(
  systemctl show ollama --property ActiveEnterTimestamp --value
)" | tee -a /tmp/ollama-service.log;

Note; I tacked on tee to ensure all was captured, and to mitigate me messing-up formatting again

Apr 06 16:53:08 nixos systemd[1]: Stopping Server for local large language models...
Apr 06 16:53:08 nixos systemd[1]: ollama.service: Deactivated successfully.
Apr 06 16:53:08 nixos systemd[1]: Stopped Server for local large language models.
Apr 06 16:53:08 nixos systemd[1]: ollama.service: Consumed 609ms CPU time over 38.165s wall clock time, 123M memory peak, 45.3K incoming IP traffic, 26K outgoing IP traffic.
Apr 06 16:53:08 nixos systemd[1]: Starting Server for local large language models...
Apr 06 16:53:08 nixos systemd[1]: Started Server for local large language models.
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.187-07:00 level=INFO source=routes.go:1727 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:DEBUG-4 OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/var/lib/ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.187-07:00 level=INFO source=routes.go:1729 msg="Ollama cloud disabled: false"
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.189-07:00 level=INFO source=images.go:477 msg="total blobs: 17"
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.189-07:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0"
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.190-07:00 level=INFO source=routes.go:1782 msg="Listening on 127.0.0.1:11434 (version 0.20.2)"
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.190-07:00 level=DEBUG source=sched.go:145 msg="starting llm scheduler"
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.191-07:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.191-07:00 level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs=[/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama] extraEnvs=map[]
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.191-07:00 level=INFO source=server.go:430 msg="starting runner" cmd="/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/bin/.ollama-wrapped runner --ollama-engine --port 38931"
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.192-07:00 level=DEBUG source=server.go:431 msg=subprocess PATH=/nix/store/hlxw2q9qansq7bn52xvlb5badw3z1v8s-coreutils-9.10/bin:/nix/store/b3rx5wac9hhfxn9120xkcvdwj51mc9z2-findutils-4.10.0/bin:/nix/store/8laf6k81j9ckylrigj3xsk76j69knhvl-gnugrep-3.12/bin:/nix/store/wv7qq5yb8plyhxji9x3r5gpkyfm2kf29-gnused-4.9/bin:/nix/store/wxyn8d3m8g4fnn6xazinjwhzhzdg6wib-systemd-259/bin:/nix/store/hlxw2q9qansq7bn52xvlb5badw3z1v8s-coreutils-9.10/sbin:/nix/store/b3rx5wac9hhfxn9120xkcvdwj51mc9z2-findutils-4.10.0/sbin:/nix/store/8laf6k81j9ckylrigj3xsk76j69knhvl-gnugrep-3.12/sbin:/nix/store/wv7qq5yb8plyhxji9x3r5gpkyfm2kf29-gnused-4.9/sbin:/nix/store/wxyn8d3m8g4fnn6xazinjwhzhzdg6wib-systemd-259/sbin OLLAMA_DEBUG=2 OLLAMA_HOST=127.0.0.1:11434 OLLAMA_MODELS=/var/lib/ollama/models LD_LIBRARY_PATH=/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama:/run/opengl-driver/lib:/nix/store/5f722gfp28da7xdnak1pv6gjjdffz2ry-cuda12.8-cuda_cudart-12.8.90/lib:/nix/store/cv3qcxd7n3q2v7r8sy3c9fd5mspabsgc-cuda12.8-libcublas-12.8.4.1-lib/lib:/nix/store/aqjyykxq8nmn9b5zj5y6s8frqlx89dj0-cuda12.8-cuda_cccl-12.8.90/lib OLLAMA_LIBRARY_PATH=/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.207-07:00 level=INFO source=runner.go:1411 msg="starting ollama engine"
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.208-07:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:38931"
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.213-07:00 level=DEBUG source=gguf.go:604 msg=general.architecture type=string
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.213-07:00 level=DEBUG source=gguf.go:604 msg=tokenizer.ggml.model type=string
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.213-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.213-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.213-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.file_type default=0
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.213-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default=""
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.213-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default=""
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.213-07:00 level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.213-07:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama
Apr 06 16:53:08 nixos ollama[39190]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Apr 06 16:53:08 nixos ollama[39190]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Apr 06 16:53:08 nixos ollama[39190]: ggml_cuda_init: found 1 CUDA devices:
Apr 06 16:53:08 nixos ollama[39190]:   Device 0: NVIDIA GeForce GTX 960M, compute capability 5.0, VMM: yes, ID: GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e
Apr 06 16:53:08 nixos ollama[39190]: load_backend: loaded CUDA backend from /nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama/libggml-cuda.so
Apr 06 16:53:08 nixos ollama[39190]: load_backend: loaded CPU backend from /nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama/libggml-cpu-haswell.so
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.pooling_type default=0
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.expert_count default=0
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.pre default=""
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.embedding_length default=0
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count default=0
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count_kv default=0
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.key_length default=0
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.dimension_count default=0
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.freq_base default=100000
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.scaling.factor default=1
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=runner.go:1386 msg="dummy model load took" duration=100.876789ms
Apr 06 16:53:08 nixos ollama[39190]: ggml_backend_cuda_device_get_memory device GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e utilizing NVML memory reporting free: 4230479872 total: 4294967296
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.334-07:00 level=DEBUG source=runner.go:1391 msg="gathering device infos took" duration=20.270189ms
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.334-07:00 level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH=[/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama] devices="[{DeviceID:{ID:GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e Library:CUDA} Name:CUDA0 Description:NVIDIA GeForce GTX 960M FilterID: Integrated:false PCIID:0000:02:00.0 TotalMemory:4294967296 FreeMemory:4230479872 ComputeMajor:5 ComputeMinor:0 DriverMajor:13 DriverMinor:0 LibraryPath:[/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama]}]"
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.334-07:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=143.425587ms OLLAMA_LIBRARY_PATH=[/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama] extra_envs=map[]
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.334-07:00 level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=1
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.334-07:00 level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama description="NVIDIA GeForce GTX 960M" compute=5.0 id=GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e pci_id=0000:02:00.0
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.334-07:00 level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs=[/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama] extraEnvs="map[CUDA_VISIBLE_DEVICES:GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e GGML_CUDA_INIT:1]"
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.335-07:00 level=INFO source=server.go:430 msg="starting runner" cmd="/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/bin/.ollama-wrapped runner --ollama-engine --port 40073"
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.335-07:00 level=DEBUG source=server.go:431 msg=subprocess PATH=/nix/store/hlxw2q9qansq7bn52xvlb5badw3z1v8s-coreutils-9.10/bin:/nix/store/b3rx5wac9hhfxn9120xkcvdwj51mc9z2-findutils-4.10.0/bin:/nix/store/8laf6k81j9ckylrigj3xsk76j69knhvl-gnugrep-3.12/bin:/nix/store/wv7qq5yb8plyhxji9x3r5gpkyfm2kf29-gnused-4.9/bin:/nix/store/wxyn8d3m8g4fnn6xazinjwhzhzdg6wib-systemd-259/bin:/nix/store/hlxw2q9qansq7bn52xvlb5badw3z1v8s-coreutils-9.10/sbin:/nix/store/b3rx5wac9hhfxn9120xkcvdwj51mc9z2-findutils-4.10.0/sbin:/nix/store/8laf6k81j9ckylrigj3xsk76j69knhvl-gnugrep-3.12/sbin:/nix/store/wv7qq5yb8plyhxji9x3r5gpkyfm2kf29-gnused-4.9/sbin:/nix/store/wxyn8d3m8g4fnn6xazinjwhzhzdg6wib-systemd-259/sbin OLLAMA_DEBUG=2 OLLAMA_HOST=127.0.0.1:11434 OLLAMA_MODELS=/var/lib/ollama/models LD_LIBRARY_PATH=/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama:/run/opengl-driver/lib:/nix/store/5f722gfp28da7xdnak1pv6gjjdffz2ry-cuda12.8-cuda_cudart-12.8.90/lib:/nix/store/cv3qcxd7n3q2v7r8sy3c9fd5mspabsgc-cuda12.8-libcublas-12.8.4.1-lib/lib:/nix/store/aqjyykxq8nmn9b5zj5y6s8frqlx89dj0-cuda12.8-cuda_cccl-12.8.90/lib OLLAMA_LIBRARY_PATH=/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama CUDA_VISIBLE_DEVICES=GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e GGML_CUDA_INIT=1
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.354-07:00 level=INFO source=runner.go:1411 msg="starting ollama engine"
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.354-07:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:40073"
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.357-07:00 level=DEBUG source=gguf.go:604 msg=general.architecture type=string
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.357-07:00 level=DEBUG source=gguf.go:604 msg=tokenizer.ggml.model type=string
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.357-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.357-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.357-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.file_type default=0
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.357-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default=""
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.357-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default=""
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.357-07:00 level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.357-07:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama
Apr 06 16:53:08 nixos ollama[39190]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Apr 06 16:53:08 nixos ollama[39190]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Apr 06 16:53:08 nixos ollama[39190]: ggml_cuda_init: found 1 CUDA devices:
Apr 06 16:53:08 nixos ollama[39190]:   Device 0: NVIDIA GeForce GTX 960M, compute capability 5.0, VMM: yes, ID: GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e
Apr 06 16:53:08 nixos ollama[39190]: load_backend: loaded CUDA backend from /nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama/libggml-cuda.so
Apr 06 16:53:08 nixos ollama[39190]: load_backend: loaded CPU backend from /nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama/libggml-cpu-haswell.so
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.pooling_type default=0
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.expert_count default=0
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.pre default=""
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.embedding_length default=0
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count default=0
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count_kv default=0
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.key_length default=0
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.dimension_count default=0
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.freq_base default=100000
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.scaling.factor default=1
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=runner.go:1386 msg="dummy model load took" duration=117.321811ms
Apr 06 16:53:08 nixos ollama[39190]: ggml_backend_cuda_device_get_memory device GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e utilizing NVML memory reporting free: 4230479872 total: 4294967296
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.495-07:00 level=DEBUG source=runner.go:1391 msg="gathering device infos took" duration=21.238651ms
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.496-07:00 level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH=[/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama] devices="[{DeviceID:{ID:GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e Library:CUDA} Name:CUDA0 Description:NVIDIA GeForce GTX 960M FilterID: Integrated:false PCIID:0000:02:00.0 TotalMemory:4294967296 FreeMemory:4230479872 ComputeMajor:5 ComputeMinor:0 DriverMajor:13 DriverMinor:0 LibraryPath:[/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama]}]"
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.496-07:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=161.555999ms OLLAMA_LIBRARY_PATH=[/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama] extra_envs="map[CUDA_VISIBLE_DEVICES:GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e GGML_CUDA_INIT:1]"
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.496-07:00 level=TRACE source=runner.go:174 msg="supported GPU library combinations before filtering" supported=map[CUDA:map[/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama:map[GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e:0]]]
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.496-07:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=305.535348ms
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.496-07:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e filter_id="" library=CUDA compute=5.0 name=CUDA0 description="NVIDIA GeForce GTX 960M" libdirs=ollama driver=13.0 pci_id=0000:02:00.0 type=discrete total="4.0 GiB" available="3.9 GiB"
Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.496-07:00 level=INFO source=routes.go:1832 msg="vram-based default context" total_vram="4.0 GiB" default_num_ctx=4096
Apr 06 16:53:08 nixos ollama[39190]: [GIN] 2026/04/06 - 16:53:08 | 200 |      56.681µs |       127.0.0.1 | HEAD     "/"
Apr 06 16:53:08 nixos ollama[39190]: [GIN] 2026/04/06 - 16:53:08 | 200 |      33.594µs |       127.0.0.1 | HEAD     "/"
Apr 06 16:53:08 nixos ollama[39190]: [GIN] 2026/04/06 - 16:53:08 | 200 |      22.053µs |       127.0.0.1 | HEAD     "/"
Apr 06 16:53:08 nixos ollama[39190]: [GIN] 2026/04/06 - 16:53:08 | 200 |      22.929µs |       127.0.0.1 | HEAD     "/"
Apr 06 16:53:08 nixos ollama[39190]: [GIN] 2026/04/06 - 16:53:08 | 200 |      34.216µs |       127.0.0.1 | HEAD     "/"
Apr 06 16:53:09 nixos ollama[39190]: [GIN] 2026/04/06 - 16:53:09 | 200 |  513.880447ms |       127.0.0.1 | POST     "/api/pull"
Apr 06 16:53:09 nixos ollama[39190]: [GIN] 2026/04/06 - 16:53:09 | 200 |  494.218901ms |       127.0.0.1 | POST     "/api/pull"
Apr 06 16:53:09 nixos ollama[39190]: [GIN] 2026/04/06 - 16:53:09 | 200 |  500.634447ms |       127.0.0.1 | POST     "/api/pull"
Apr 06 16:53:09 nixos ollama[39190]: [GIN] 2026/04/06 - 16:53:09 | 200 |  534.436458ms |       127.0.0.1 | POST     "/api/pull"
Apr 06 16:53:09 nixos ollama[39190]: [GIN] 2026/04/06 - 16:53:09 | 200 |  531.074999ms |       127.0.0.1 | POST     "/api/pull"
Apr 06 16:54:24 nixos ollama[39190]: [GIN] 2026/04/06 - 16:54:24 | 200 |      33.113µs |       127.0.0.1 | HEAD     "/"
Apr 06 16:54:25 nixos ollama[39190]: time=2026-04-06T16:54:25.070-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
Apr 06 16:54:25 nixos ollama[39190]: [GIN] 2026/04/06 - 16:54:25 | 200 |  450.347167ms |       127.0.0.1 | POST     "/api/show"
Apr 06 16:54:25 nixos ollama[39190]: time=2026-04-06T16:54:25.522-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
Apr 06 16:54:25 nixos ollama[39190]: [GIN] 2026/04/06 - 16:54:25 | 200 |  440.351579ms |       127.0.0.1 | POST     "/api/show"
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.038-07:00 level=TRACE source=sched.go:171 msg="processing incoming request" model=/var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.038-07:00 level=TRACE source=sched.go:205 msg="refreshing GPU list" model=/var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.038-07:00 level=DEBUG source=runner.go:264 msg="refreshing free memory"
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.038-07:00 level=DEBUG source=runner.go:328 msg="unable to refresh all GPUs with existing runners, performing bootstrap discovery"
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.038-07:00 level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama ]" extraEnvs=map[]
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.038-07:00 level=INFO source=server.go:430 msg="starting runner" cmd="/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/bin/.ollama-wrapped runner --ollama-engine --port 33665"
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.038-07:00 level=DEBUG source=server.go:431 msg=subprocess PATH=/nix/store/hlxw2q9qansq7bn52xvlb5badw3z1v8s-coreutils-9.10/bin:/nix/store/b3rx5wac9hhfxn9120xkcvdwj51mc9z2-findutils-4.10.0/bin:/nix/store/8laf6k81j9ckylrigj3xsk76j69knhvl-gnugrep-3.12/bin:/nix/store/wv7qq5yb8plyhxji9x3r5gpkyfm2kf29-gnused-4.9/bin:/nix/store/wxyn8d3m8g4fnn6xazinjwhzhzdg6wib-systemd-259/bin:/nix/store/hlxw2q9qansq7bn52xvlb5badw3z1v8s-coreutils-9.10/sbin:/nix/store/b3rx5wac9hhfxn9120xkcvdwj51mc9z2-findutils-4.10.0/sbin:/nix/store/8laf6k81j9ckylrigj3xsk76j69knhvl-gnugrep-3.12/sbin:/nix/store/wv7qq5yb8plyhxji9x3r5gpkyfm2kf29-gnused-4.9/sbin:/nix/store/wxyn8d3m8g4fnn6xazinjwhzhzdg6wib-systemd-259/sbin OLLAMA_DEBUG=2 OLLAMA_HOST=127.0.0.1:11434 OLLAMA_MODELS=/var/lib/ollama/models LD_LIBRARY_PATH=/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama::/run/opengl-driver/lib:/nix/store/5f722gfp28da7xdnak1pv6gjjdffz2ry-cuda12.8-cuda_cudart-12.8.90/lib:/nix/store/cv3qcxd7n3q2v7r8sy3c9fd5mspabsgc-cuda12.8-libcublas-12.8.4.1-lib/lib:/nix/store/aqjyykxq8nmn9b5zj5y6s8frqlx89dj0-cuda12.8-cuda_cccl-12.8.90/lib OLLAMA_LIBRARY_PATH=/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama:
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.055-07:00 level=INFO source=runner.go:1411 msg="starting ollama engine"
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.055-07:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:33665"
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.060-07:00 level=DEBUG source=gguf.go:604 msg=general.architecture type=string
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.060-07:00 level=DEBUG source=gguf.go:604 msg=tokenizer.ggml.model type=string
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.060-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.060-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.060-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.file_type default=0
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.060-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default=""
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.060-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default=""
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.060-07:00 level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.060-07:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama
Apr 06 16:54:26 nixos ollama[39190]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Apr 06 16:54:26 nixos ollama[39190]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Apr 06 16:54:26 nixos ollama[39190]: ggml_cuda_init: found 1 CUDA devices:
Apr 06 16:54:26 nixos ollama[39190]:   Device 0: NVIDIA GeForce GTX 960M, compute capability 5.0, VMM: yes, ID: GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e
Apr 06 16:54:26 nixos ollama[39190]: load_backend: loaded CUDA backend from /nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama/libggml-cuda.so
Apr 06 16:54:26 nixos ollama[39190]: load_backend: loaded CPU backend from /nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama/libggml-cpu-haswell.so
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:88 msg="skipping path which is not part of ollama" path=/var/lib/private/ollama
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.pooling_type default=0
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.expert_count default=0
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.pre default=""
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.embedding_length default=0
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count default=0
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count_kv default=0
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.key_length default=0
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.dimension_count default=0
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.freq_base default=100000
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.scaling.factor default=1
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=runner.go:1386 msg="dummy model load took" duration=101.50125ms
Apr 06 16:54:26 nixos ollama[39190]: ggml_backend_cuda_device_get_memory device GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e utilizing NVML memory reporting free: 4230479872 total: 4294967296
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.183-07:00 level=DEBUG source=runner.go:1391 msg="gathering device infos took" duration=21.674395ms
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.184-07:00 level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama ]" devices="[{DeviceID:{ID:GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e Library:CUDA} Name:CUDA0 Description:NVIDIA GeForce GTX 960M FilterID: Integrated:false PCIID:0000:02:00.0 TotalMemory:4294967296 FreeMemory:4230479872 ComputeMajor:5 ComputeMinor:0 DriverMajor:13 DriverMinor:0 LibraryPath:[/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama ]}]"
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.184-07:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=146.113073ms OLLAMA_LIBRARY_PATH="[/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama ]" extra_envs=map[]
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.184-07:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=146.205747ms
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.184-07:00 level=TRACE source=sched.go:208 msg="refreshing system information" model=/var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.184-07:00 level=TRACE source=gpu.go:22 msg="performing CPU discovery"
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.186-07:00 level=TRACE source=gpu.go:25 msg="CPU discovery completed" duration=1.701382ms
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.186-07:00 level=DEBUG source=sched.go:220 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.186-07:00 level=TRACE source=sched.go:243 msg="loading model metadata" model=/var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.245-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.253-07:00 level=TRACE source=sched.go:251 msg="updating free space" gpu_count=1 model=/var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.253-07:00 level=DEBUG source=sched.go:256 msg="loading first model" model=/var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: loaded meta data with 55 key-value pairs and 2131 tensors from /var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a (version GGUF V3 (latest))
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv   0:                gemma4.attention.head_count u32              = 8
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv   1:             gemma4.attention.head_count_kv u32              = 2
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv   2:                gemma4.attention.key_length u32              = 512
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv   3:            gemma4.attention.key_length_swa u32              = 256
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv   4:    gemma4.attention.layer_norm_rms_epsilon f32              = 0.000001
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv   5:          gemma4.attention.shared_kv_layers u32              = 18
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv   6:            gemma4.attention.sliding_window u32              = 512
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv   7:    gemma4.attention.sliding_window_pattern arr[bool,42]     = [true, true, true, true, true, false,...
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv   8:              gemma4.attention.value_length u32              = 512
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv   9:          gemma4.attention.value_length_swa u32              = 256
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  10:          gemma4.audio.attention.head_count u32              = 8
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  11:  gemma4.audio.attention.layer_norm_epsilon f32              = 0.000001
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  12:                   gemma4.audio.block_count u32              = 12
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  13:              gemma4.audio.conv_kernel_size u32              = 5
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  14:              gemma4.audio.embedding_length u32              = 1024
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  15:           gemma4.audio.feed_forward_length u32              = 4096
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  16:                         gemma4.block_count u32              = 42
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  17:                      gemma4.context_length u32              = 131072
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  18:                    gemma4.embedding_length u32              = 2560
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  19:    gemma4.embedding_length_per_layer_input u32              = 256
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  20:                 gemma4.feed_forward_length u32              = 10240
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  21:             gemma4.final_logit_softcapping f32              = 30.000000
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  22:                gemma4.rope.dimension_count u32              = 512
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  23:            gemma4.rope.dimension_count_swa u32              = 256
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  24:                      gemma4.rope.freq_base f32              = 1000000.000000
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  25:                  gemma4.rope.freq_base_swa f32              = 10000.000000
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  26:         gemma4.vision.attention.head_count u32              = 12
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  27: gemma4.vision.attention.layer_norm_epsilon f32              = 0.000001
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  28:                  gemma4.vision.block_count u32              = 16
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  29:             gemma4.vision.embedding_length u32              = 768
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  30:          gemma4.vision.feed_forward_length u32              = 3072
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  31:                 gemma4.vision.num_channels u32              = 3
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  32:                   gemma4.vision.patch_size u32              = 16
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  33:       gemma4.vision.projector.scale_factor u32              = 3
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  34:                       general.architecture str              = gemma4
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  35:                          general.file_type u32              = 15
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  36:                    general.parameter_count u64              = 7996157674
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  37:               general.quantization_version u32              = 2
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  38:               tokenizer.ggml.add_bos_token bool             = false
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  39:               tokenizer.ggml.add_eos_token bool             = false
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  40:              tokenizer.ggml.add_mask_token bool             = false
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  41:           tokenizer.ggml.add_padding_token bool             = false
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  42:           tokenizer.ggml.add_unknown_token bool             = false
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  43:                tokenizer.ggml.bos_token_id u32              = 2
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  44:                tokenizer.ggml.eos_token_id u32              = 1
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  45:               tokenizer.ggml.eos_token_ids arr[i32,3]       = [1, 106, 50]
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  46:               tokenizer.ggml.mask_token_id u32              = 4
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  47:                      tokenizer.ggml.merges arr[str,514906]  = ["\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n", ...
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  48:                       tokenizer.ggml.model str              = llama
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  49:            tokenizer.ggml.padding_token_id u32              = 0
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  50:                         tokenizer.ggml.pre str              = gemma4
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  51:                      tokenizer.ggml.scores arr[f32,262144]  = [0.000000, 1.000000, 2.000000, 3.0000...
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  52:                  tokenizer.ggml.token_type arr[i32,262144]  = [3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, ...
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  53:                      tokenizer.ggml.tokens arr[str,262144]  = ["<pad>", "<eos>", "<bos>", "<unk>", ...
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv  54:            tokenizer.ggml.unknown_token_id u32              = 3
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - type  f32: 1501 tensors
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - type  f16:  116 tensors
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - type q4_K:  339 tensors
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - type q6_K:   41 tensors
Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - type bf16:  134 tensors
Apr 06 16:54:26 nixos ollama[39190]: print_info: file format = GGUF V3 (latest)
Apr 06 16:54:26 nixos ollama[39190]: print_info: file type   = Q4_K - Medium
Apr 06 16:54:26 nixos ollama[39190]: print_info: file size   = 8.93 GiB (9.60 BPW)
Apr 06 16:54:26 nixos ollama[39190]: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma4'
Apr 06 16:54:26 nixos ollama[39190]: llama_model_load_from_file_impl: failed to load model
Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.872-07:00 level=INFO source=sched.go:471 msg="NewLlamaServer failed" model=/var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a error="unable to load model: /var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a"
Apr 06 16:54:26 nixos ollama[39190]: [GIN] 2026/04/06 - 16:54:26 | 500 |   1.33148162s |       127.0.0.1 | POST     "/api/generate"
<!-- gh-comment-id:4195642990 --> @S0AndS0 commented on GitHub (Apr 6, 2026): <details><summary> Massive log for sure (-! ```bash journalctl -u ollama --no-pager --follow --pager-end --since "$( systemctl show ollama --property ActiveEnterTimestamp --value )" | tee -a /tmp/ollama-service.log; ``` Note; I tacked on `tee` to ensure all was captured, and to mitigate me messing-up formatting again </summary> ``` Apr 06 16:53:08 nixos systemd[1]: Stopping Server for local large language models... Apr 06 16:53:08 nixos systemd[1]: ollama.service: Deactivated successfully. Apr 06 16:53:08 nixos systemd[1]: Stopped Server for local large language models. Apr 06 16:53:08 nixos systemd[1]: ollama.service: Consumed 609ms CPU time over 38.165s wall clock time, 123M memory peak, 45.3K incoming IP traffic, 26K outgoing IP traffic. Apr 06 16:53:08 nixos systemd[1]: Starting Server for local large language models... Apr 06 16:53:08 nixos systemd[1]: Started Server for local large language models. Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.187-07:00 level=INFO source=routes.go:1727 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:DEBUG-4 OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/var/lib/ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.187-07:00 level=INFO source=routes.go:1729 msg="Ollama cloud disabled: false" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.189-07:00 level=INFO source=images.go:477 msg="total blobs: 17" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.189-07:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.190-07:00 level=INFO source=routes.go:1782 msg="Listening on 127.0.0.1:11434 (version 0.20.2)" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.190-07:00 level=DEBUG source=sched.go:145 msg="starting llm scheduler" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.191-07:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.191-07:00 level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs=[/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama] extraEnvs=map[] Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.191-07:00 level=INFO source=server.go:430 msg="starting runner" cmd="/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/bin/.ollama-wrapped runner --ollama-engine --port 38931" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.192-07:00 level=DEBUG source=server.go:431 msg=subprocess PATH=/nix/store/hlxw2q9qansq7bn52xvlb5badw3z1v8s-coreutils-9.10/bin:/nix/store/b3rx5wac9hhfxn9120xkcvdwj51mc9z2-findutils-4.10.0/bin:/nix/store/8laf6k81j9ckylrigj3xsk76j69knhvl-gnugrep-3.12/bin:/nix/store/wv7qq5yb8plyhxji9x3r5gpkyfm2kf29-gnused-4.9/bin:/nix/store/wxyn8d3m8g4fnn6xazinjwhzhzdg6wib-systemd-259/bin:/nix/store/hlxw2q9qansq7bn52xvlb5badw3z1v8s-coreutils-9.10/sbin:/nix/store/b3rx5wac9hhfxn9120xkcvdwj51mc9z2-findutils-4.10.0/sbin:/nix/store/8laf6k81j9ckylrigj3xsk76j69knhvl-gnugrep-3.12/sbin:/nix/store/wv7qq5yb8plyhxji9x3r5gpkyfm2kf29-gnused-4.9/sbin:/nix/store/wxyn8d3m8g4fnn6xazinjwhzhzdg6wib-systemd-259/sbin OLLAMA_DEBUG=2 OLLAMA_HOST=127.0.0.1:11434 OLLAMA_MODELS=/var/lib/ollama/models LD_LIBRARY_PATH=/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama:/run/opengl-driver/lib:/nix/store/5f722gfp28da7xdnak1pv6gjjdffz2ry-cuda12.8-cuda_cudart-12.8.90/lib:/nix/store/cv3qcxd7n3q2v7r8sy3c9fd5mspabsgc-cuda12.8-libcublas-12.8.4.1-lib/lib:/nix/store/aqjyykxq8nmn9b5zj5y6s8frqlx89dj0-cuda12.8-cuda_cccl-12.8.90/lib OLLAMA_LIBRARY_PATH=/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.207-07:00 level=INFO source=runner.go:1411 msg="starting ollama engine" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.208-07:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:38931" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.213-07:00 level=DEBUG source=gguf.go:604 msg=general.architecture type=string Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.213-07:00 level=DEBUG source=gguf.go:604 msg=tokenizer.ggml.model type=string Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.213-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.213-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.213-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.file_type default=0 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.213-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default="" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.213-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default="" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.213-07:00 level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.213-07:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama Apr 06 16:53:08 nixos ollama[39190]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Apr 06 16:53:08 nixos ollama[39190]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Apr 06 16:53:08 nixos ollama[39190]: ggml_cuda_init: found 1 CUDA devices: Apr 06 16:53:08 nixos ollama[39190]: Device 0: NVIDIA GeForce GTX 960M, compute capability 5.0, VMM: yes, ID: GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e Apr 06 16:53:08 nixos ollama[39190]: load_backend: loaded CUDA backend from /nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama/libggml-cuda.so Apr 06 16:53:08 nixos ollama[39190]: load_backend: loaded CPU backend from /nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama/libggml-cpu-haswell.so Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.pooling_type default=0 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.expert_count default=0 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.pre default="" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.embedding_length default=0 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count default=0 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count_kv default=0 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.key_length default=0 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.dimension_count default=0 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.freq_base default=100000 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.scaling.factor default=1 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.313-07:00 level=DEBUG source=runner.go:1386 msg="dummy model load took" duration=100.876789ms Apr 06 16:53:08 nixos ollama[39190]: ggml_backend_cuda_device_get_memory device GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e utilizing NVML memory reporting free: 4230479872 total: 4294967296 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.334-07:00 level=DEBUG source=runner.go:1391 msg="gathering device infos took" duration=20.270189ms Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.334-07:00 level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH=[/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama] devices="[{DeviceID:{ID:GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e Library:CUDA} Name:CUDA0 Description:NVIDIA GeForce GTX 960M FilterID: Integrated:false PCIID:0000:02:00.0 TotalMemory:4294967296 FreeMemory:4230479872 ComputeMajor:5 ComputeMinor:0 DriverMajor:13 DriverMinor:0 LibraryPath:[/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama]}]" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.334-07:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=143.425587ms OLLAMA_LIBRARY_PATH=[/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama] extra_envs=map[] Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.334-07:00 level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=1 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.334-07:00 level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama description="NVIDIA GeForce GTX 960M" compute=5.0 id=GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e pci_id=0000:02:00.0 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.334-07:00 level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs=[/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama] extraEnvs="map[CUDA_VISIBLE_DEVICES:GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e GGML_CUDA_INIT:1]" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.335-07:00 level=INFO source=server.go:430 msg="starting runner" cmd="/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/bin/.ollama-wrapped runner --ollama-engine --port 40073" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.335-07:00 level=DEBUG source=server.go:431 msg=subprocess PATH=/nix/store/hlxw2q9qansq7bn52xvlb5badw3z1v8s-coreutils-9.10/bin:/nix/store/b3rx5wac9hhfxn9120xkcvdwj51mc9z2-findutils-4.10.0/bin:/nix/store/8laf6k81j9ckylrigj3xsk76j69knhvl-gnugrep-3.12/bin:/nix/store/wv7qq5yb8plyhxji9x3r5gpkyfm2kf29-gnused-4.9/bin:/nix/store/wxyn8d3m8g4fnn6xazinjwhzhzdg6wib-systemd-259/bin:/nix/store/hlxw2q9qansq7bn52xvlb5badw3z1v8s-coreutils-9.10/sbin:/nix/store/b3rx5wac9hhfxn9120xkcvdwj51mc9z2-findutils-4.10.0/sbin:/nix/store/8laf6k81j9ckylrigj3xsk76j69knhvl-gnugrep-3.12/sbin:/nix/store/wv7qq5yb8plyhxji9x3r5gpkyfm2kf29-gnused-4.9/sbin:/nix/store/wxyn8d3m8g4fnn6xazinjwhzhzdg6wib-systemd-259/sbin OLLAMA_DEBUG=2 OLLAMA_HOST=127.0.0.1:11434 OLLAMA_MODELS=/var/lib/ollama/models LD_LIBRARY_PATH=/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama:/run/opengl-driver/lib:/nix/store/5f722gfp28da7xdnak1pv6gjjdffz2ry-cuda12.8-cuda_cudart-12.8.90/lib:/nix/store/cv3qcxd7n3q2v7r8sy3c9fd5mspabsgc-cuda12.8-libcublas-12.8.4.1-lib/lib:/nix/store/aqjyykxq8nmn9b5zj5y6s8frqlx89dj0-cuda12.8-cuda_cccl-12.8.90/lib OLLAMA_LIBRARY_PATH=/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama CUDA_VISIBLE_DEVICES=GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e GGML_CUDA_INIT=1 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.354-07:00 level=INFO source=runner.go:1411 msg="starting ollama engine" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.354-07:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:40073" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.357-07:00 level=DEBUG source=gguf.go:604 msg=general.architecture type=string Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.357-07:00 level=DEBUG source=gguf.go:604 msg=tokenizer.ggml.model type=string Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.357-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.357-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.357-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.file_type default=0 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.357-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default="" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.357-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default="" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.357-07:00 level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.357-07:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama Apr 06 16:53:08 nixos ollama[39190]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Apr 06 16:53:08 nixos ollama[39190]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Apr 06 16:53:08 nixos ollama[39190]: ggml_cuda_init: found 1 CUDA devices: Apr 06 16:53:08 nixos ollama[39190]: Device 0: NVIDIA GeForce GTX 960M, compute capability 5.0, VMM: yes, ID: GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e Apr 06 16:53:08 nixos ollama[39190]: load_backend: loaded CUDA backend from /nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama/libggml-cuda.so Apr 06 16:53:08 nixos ollama[39190]: load_backend: loaded CPU backend from /nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama/libggml-cpu-haswell.so Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.pooling_type default=0 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.expert_count default=0 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.pre default="" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.embedding_length default=0 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count default=0 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count_kv default=0 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.key_length default=0 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.dimension_count default=0 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.freq_base default=100000 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.scaling.factor default=1 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.474-07:00 level=DEBUG source=runner.go:1386 msg="dummy model load took" duration=117.321811ms Apr 06 16:53:08 nixos ollama[39190]: ggml_backend_cuda_device_get_memory device GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e utilizing NVML memory reporting free: 4230479872 total: 4294967296 Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.495-07:00 level=DEBUG source=runner.go:1391 msg="gathering device infos took" duration=21.238651ms Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.496-07:00 level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH=[/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama] devices="[{DeviceID:{ID:GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e Library:CUDA} Name:CUDA0 Description:NVIDIA GeForce GTX 960M FilterID: Integrated:false PCIID:0000:02:00.0 TotalMemory:4294967296 FreeMemory:4230479872 ComputeMajor:5 ComputeMinor:0 DriverMajor:13 DriverMinor:0 LibraryPath:[/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama]}]" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.496-07:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=161.555999ms OLLAMA_LIBRARY_PATH=[/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama] extra_envs="map[CUDA_VISIBLE_DEVICES:GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e GGML_CUDA_INIT:1]" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.496-07:00 level=TRACE source=runner.go:174 msg="supported GPU library combinations before filtering" supported=map[CUDA:map[/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama:map[GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e:0]]] Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.496-07:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=305.535348ms Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.496-07:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e filter_id="" library=CUDA compute=5.0 name=CUDA0 description="NVIDIA GeForce GTX 960M" libdirs=ollama driver=13.0 pci_id=0000:02:00.0 type=discrete total="4.0 GiB" available="3.9 GiB" Apr 06 16:53:08 nixos ollama[39190]: time=2026-04-06T16:53:08.496-07:00 level=INFO source=routes.go:1832 msg="vram-based default context" total_vram="4.0 GiB" default_num_ctx=4096 Apr 06 16:53:08 nixos ollama[39190]: [GIN] 2026/04/06 - 16:53:08 | 200 | 56.681µs | 127.0.0.1 | HEAD "/" Apr 06 16:53:08 nixos ollama[39190]: [GIN] 2026/04/06 - 16:53:08 | 200 | 33.594µs | 127.0.0.1 | HEAD "/" Apr 06 16:53:08 nixos ollama[39190]: [GIN] 2026/04/06 - 16:53:08 | 200 | 22.053µs | 127.0.0.1 | HEAD "/" Apr 06 16:53:08 nixos ollama[39190]: [GIN] 2026/04/06 - 16:53:08 | 200 | 22.929µs | 127.0.0.1 | HEAD "/" Apr 06 16:53:08 nixos ollama[39190]: [GIN] 2026/04/06 - 16:53:08 | 200 | 34.216µs | 127.0.0.1 | HEAD "/" Apr 06 16:53:09 nixos ollama[39190]: [GIN] 2026/04/06 - 16:53:09 | 200 | 513.880447ms | 127.0.0.1 | POST "/api/pull" Apr 06 16:53:09 nixos ollama[39190]: [GIN] 2026/04/06 - 16:53:09 | 200 | 494.218901ms | 127.0.0.1 | POST "/api/pull" Apr 06 16:53:09 nixos ollama[39190]: [GIN] 2026/04/06 - 16:53:09 | 200 | 500.634447ms | 127.0.0.1 | POST "/api/pull" Apr 06 16:53:09 nixos ollama[39190]: [GIN] 2026/04/06 - 16:53:09 | 200 | 534.436458ms | 127.0.0.1 | POST "/api/pull" Apr 06 16:53:09 nixos ollama[39190]: [GIN] 2026/04/06 - 16:53:09 | 200 | 531.074999ms | 127.0.0.1 | POST "/api/pull" Apr 06 16:54:24 nixos ollama[39190]: [GIN] 2026/04/06 - 16:54:24 | 200 | 33.113µs | 127.0.0.1 | HEAD "/" Apr 06 16:54:25 nixos ollama[39190]: time=2026-04-06T16:54:25.070-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 Apr 06 16:54:25 nixos ollama[39190]: [GIN] 2026/04/06 - 16:54:25 | 200 | 450.347167ms | 127.0.0.1 | POST "/api/show" Apr 06 16:54:25 nixos ollama[39190]: time=2026-04-06T16:54:25.522-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 Apr 06 16:54:25 nixos ollama[39190]: [GIN] 2026/04/06 - 16:54:25 | 200 | 440.351579ms | 127.0.0.1 | POST "/api/show" Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.038-07:00 level=TRACE source=sched.go:171 msg="processing incoming request" model=/var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.038-07:00 level=TRACE source=sched.go:205 msg="refreshing GPU list" model=/var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.038-07:00 level=DEBUG source=runner.go:264 msg="refreshing free memory" Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.038-07:00 level=DEBUG source=runner.go:328 msg="unable to refresh all GPUs with existing runners, performing bootstrap discovery" Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.038-07:00 level=TRACE source=runner.go:440 msg="starting runner for device discovery" libDirs="[/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama ]" extraEnvs=map[] Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.038-07:00 level=INFO source=server.go:430 msg="starting runner" cmd="/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/bin/.ollama-wrapped runner --ollama-engine --port 33665" Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.038-07:00 level=DEBUG source=server.go:431 msg=subprocess PATH=/nix/store/hlxw2q9qansq7bn52xvlb5badw3z1v8s-coreutils-9.10/bin:/nix/store/b3rx5wac9hhfxn9120xkcvdwj51mc9z2-findutils-4.10.0/bin:/nix/store/8laf6k81j9ckylrigj3xsk76j69knhvl-gnugrep-3.12/bin:/nix/store/wv7qq5yb8plyhxji9x3r5gpkyfm2kf29-gnused-4.9/bin:/nix/store/wxyn8d3m8g4fnn6xazinjwhzhzdg6wib-systemd-259/bin:/nix/store/hlxw2q9qansq7bn52xvlb5badw3z1v8s-coreutils-9.10/sbin:/nix/store/b3rx5wac9hhfxn9120xkcvdwj51mc9z2-findutils-4.10.0/sbin:/nix/store/8laf6k81j9ckylrigj3xsk76j69knhvl-gnugrep-3.12/sbin:/nix/store/wv7qq5yb8plyhxji9x3r5gpkyfm2kf29-gnused-4.9/sbin:/nix/store/wxyn8d3m8g4fnn6xazinjwhzhzdg6wib-systemd-259/sbin OLLAMA_DEBUG=2 OLLAMA_HOST=127.0.0.1:11434 OLLAMA_MODELS=/var/lib/ollama/models LD_LIBRARY_PATH=/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama::/run/opengl-driver/lib:/nix/store/5f722gfp28da7xdnak1pv6gjjdffz2ry-cuda12.8-cuda_cudart-12.8.90/lib:/nix/store/cv3qcxd7n3q2v7r8sy3c9fd5mspabsgc-cuda12.8-libcublas-12.8.4.1-lib/lib:/nix/store/aqjyykxq8nmn9b5zj5y6s8frqlx89dj0-cuda12.8-cuda_cccl-12.8.90/lib OLLAMA_LIBRARY_PATH=/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama: Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.055-07:00 level=INFO source=runner.go:1411 msg="starting ollama engine" Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.055-07:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:33665" Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.060-07:00 level=DEBUG source=gguf.go:604 msg=general.architecture type=string Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.060-07:00 level=DEBUG source=gguf.go:604 msg=tokenizer.ggml.model type=string Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.060-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.060-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.060-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.file_type default=0 Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.060-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default="" Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.060-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default="" Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.060-07:00 level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.060-07:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama Apr 06 16:54:26 nixos ollama[39190]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Apr 06 16:54:26 nixos ollama[39190]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Apr 06 16:54:26 nixos ollama[39190]: ggml_cuda_init: found 1 CUDA devices: Apr 06 16:54:26 nixos ollama[39190]: Device 0: NVIDIA GeForce GTX 960M, compute capability 5.0, VMM: yes, ID: GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e Apr 06 16:54:26 nixos ollama[39190]: load_backend: loaded CUDA backend from /nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama/libggml-cuda.so Apr 06 16:54:26 nixos ollama[39190]: load_backend: loaded CPU backend from /nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama/libggml-cpu-haswell.so Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:88 msg="skipping path which is not part of ollama" path=/var/lib/private/ollama Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0 Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.pooling_type default=0 Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.expert_count default=0 Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.pre default="" Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.block_count default=0 Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.embedding_length default=0 Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count default=0 Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.head_count_kv default=0 Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.key_length default=0 Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.dimension_count default=0 Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.freq_base default=100000 Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=llama.rope.scaling.factor default=1 Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.161-07:00 level=DEBUG source=runner.go:1386 msg="dummy model load took" duration=101.50125ms Apr 06 16:54:26 nixos ollama[39190]: ggml_backend_cuda_device_get_memory device GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e utilizing NVML memory reporting free: 4230479872 total: 4294967296 Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.183-07:00 level=DEBUG source=runner.go:1391 msg="gathering device infos took" duration=21.674395ms Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.184-07:00 level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama ]" devices="[{DeviceID:{ID:GPU-962ca16e-d3f0-3dde-ecfd-3d968f5d2d4e Library:CUDA} Name:CUDA0 Description:NVIDIA GeForce GTX 960M FilterID: Integrated:false PCIID:0000:02:00.0 TotalMemory:4294967296 FreeMemory:4230479872 ComputeMajor:5 ComputeMinor:0 DriverMajor:13 DriverMinor:0 LibraryPath:[/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama ]}]" Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.184-07:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=146.113073ms OLLAMA_LIBRARY_PATH="[/nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/lib/ollama ]" extra_envs=map[] Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.184-07:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=146.205747ms Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.184-07:00 level=TRACE source=sched.go:208 msg="refreshing system information" model=/var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.184-07:00 level=TRACE source=gpu.go:22 msg="performing CPU discovery" Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.186-07:00 level=TRACE source=gpu.go:25 msg="CPU discovery completed" duration=1.701382ms Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.186-07:00 level=DEBUG source=sched.go:220 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1 Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.186-07:00 level=TRACE source=sched.go:243 msg="loading model metadata" model=/var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.245-07:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.253-07:00 level=TRACE source=sched.go:251 msg="updating free space" gpu_count=1 model=/var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.253-07:00 level=DEBUG source=sched.go:256 msg="loading first model" model=/var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: loaded meta data with 55 key-value pairs and 2131 tensors from /var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a (version GGUF V3 (latest)) Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 0: gemma4.attention.head_count u32 = 8 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 1: gemma4.attention.head_count_kv u32 = 2 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 2: gemma4.attention.key_length u32 = 512 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 3: gemma4.attention.key_length_swa u32 = 256 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 4: gemma4.attention.layer_norm_rms_epsilon f32 = 0.000001 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 5: gemma4.attention.shared_kv_layers u32 = 18 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 6: gemma4.attention.sliding_window u32 = 512 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 7: gemma4.attention.sliding_window_pattern arr[bool,42] = [true, true, true, true, true, false,... Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 8: gemma4.attention.value_length u32 = 512 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 9: gemma4.attention.value_length_swa u32 = 256 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 10: gemma4.audio.attention.head_count u32 = 8 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 11: gemma4.audio.attention.layer_norm_epsilon f32 = 0.000001 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 12: gemma4.audio.block_count u32 = 12 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 13: gemma4.audio.conv_kernel_size u32 = 5 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 14: gemma4.audio.embedding_length u32 = 1024 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 15: gemma4.audio.feed_forward_length u32 = 4096 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 16: gemma4.block_count u32 = 42 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 17: gemma4.context_length u32 = 131072 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 18: gemma4.embedding_length u32 = 2560 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 19: gemma4.embedding_length_per_layer_input u32 = 256 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 20: gemma4.feed_forward_length u32 = 10240 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 21: gemma4.final_logit_softcapping f32 = 30.000000 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 22: gemma4.rope.dimension_count u32 = 512 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 23: gemma4.rope.dimension_count_swa u32 = 256 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 24: gemma4.rope.freq_base f32 = 1000000.000000 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 25: gemma4.rope.freq_base_swa f32 = 10000.000000 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 26: gemma4.vision.attention.head_count u32 = 12 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 27: gemma4.vision.attention.layer_norm_epsilon f32 = 0.000001 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 28: gemma4.vision.block_count u32 = 16 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 29: gemma4.vision.embedding_length u32 = 768 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 30: gemma4.vision.feed_forward_length u32 = 3072 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 31: gemma4.vision.num_channels u32 = 3 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 32: gemma4.vision.patch_size u32 = 16 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 33: gemma4.vision.projector.scale_factor u32 = 3 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 34: general.architecture str = gemma4 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 35: general.file_type u32 = 15 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 36: general.parameter_count u64 = 7996157674 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 37: general.quantization_version u32 = 2 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 38: tokenizer.ggml.add_bos_token bool = false Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 39: tokenizer.ggml.add_eos_token bool = false Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 40: tokenizer.ggml.add_mask_token bool = false Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 41: tokenizer.ggml.add_padding_token bool = false Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 42: tokenizer.ggml.add_unknown_token bool = false Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 43: tokenizer.ggml.bos_token_id u32 = 2 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 44: tokenizer.ggml.eos_token_id u32 = 1 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 45: tokenizer.ggml.eos_token_ids arr[i32,3] = [1, 106, 50] Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 46: tokenizer.ggml.mask_token_id u32 = 4 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 47: tokenizer.ggml.merges arr[str,514906] = ["\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n", ... Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 48: tokenizer.ggml.model str = llama Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 49: tokenizer.ggml.padding_token_id u32 = 0 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 50: tokenizer.ggml.pre str = gemma4 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 51: tokenizer.ggml.scores arr[f32,262144] = [0.000000, 1.000000, 2.000000, 3.0000... Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 52: tokenizer.ggml.token_type arr[i32,262144] = [3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, ... Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 53: tokenizer.ggml.tokens arr[str,262144] = ["<pad>", "<eos>", "<bos>", "<unk>", ... Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - kv 54: tokenizer.ggml.unknown_token_id u32 = 3 Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - type f32: 1501 tensors Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - type f16: 116 tensors Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - type q4_K: 339 tensors Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - type q6_K: 41 tensors Apr 06 16:54:26 nixos ollama[39190]: llama_model_loader: - type bf16: 134 tensors Apr 06 16:54:26 nixos ollama[39190]: print_info: file format = GGUF V3 (latest) Apr 06 16:54:26 nixos ollama[39190]: print_info: file type = Q4_K - Medium Apr 06 16:54:26 nixos ollama[39190]: print_info: file size = 8.93 GiB (9.60 BPW) Apr 06 16:54:26 nixos ollama[39190]: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma4' Apr 06 16:54:26 nixos ollama[39190]: llama_model_load_from_file_impl: failed to load model Apr 06 16:54:26 nixos ollama[39190]: time=2026-04-06T16:54:26.872-07:00 level=INFO source=sched.go:471 msg="NewLlamaServer failed" model=/var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a error="unable to load model: /var/lib/ollama/models/blobs/sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a" Apr 06 16:54:26 nixos ollama[39190]: [GIN] 2026/04/06 - 16:54:26 | 500 | 1.33148162s | 127.0.0.1 | POST "/api/generate" ``` </details>
Author
Owner

@rick-github commented on GitHub (Apr 7, 2026):

The version of ollama here is not 0.20.2. I'm not familiar with NixOS, do they do their own package build?

<!-- gh-comment-id:4195732893 --> @rick-github commented on GitHub (Apr 7, 2026): The version of ollama here is not 0.20.2. I'm not familiar with NixOS, do they do their own package build?
Author
Owner

@S0AndS0 commented on GitHub (Apr 7, 2026):

NixOS is a build from source distro, but with caching for pure FOSS packages that are not overridden

But, because I'm overriding (which triggers non-cached build) the Cuda variant (which is not pure FOSS) it should be forced to build from source on my device

Plus all logs are pointing to it really being version 0.20.2 as defined in OP configs

... only cache I can think may be getting in the way is local /nix/store cache

For any in audience that are following along at home, here be how to likely rule out cache invalidation bugs when overriding preexisting packages;

Nix hash dance

First force src.hash to an empty string;

diff --git a/services/ollama.nix b/services/ollama.nix
index 14e3f49..30ab245 100644
--- a/services/ollama.nix
+++ b/services/ollama.nix
@@ -27,9 +27,14 @@
       version = "0.20.2";
       src = previousAttrs.src.override {
         tag = "v${version}";
+        hash = "";
       };
     });

Next attempt re-build and extract computed hash from error;

nixos-rebuild switch --impure --flake .
#....
error: hash mismatch in fixed-output derivation '/nix/store/pvprsc23shq85h1x3wmfg540s73ps3nk-source.drv':
         specified: sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=
            got:    sha256-Ic3eLOohLR7MQGkLvDJBNOCiBBKxh6l8X9MgK0b4w+Y=
#....

Then insert that hash;

diff --git a/services/ollama.nix b/services/ollama.nix
index 14e3f49..30ab245 100644
--- a/services/ollama.nix
+++ b/services/ollama.nix
@@ -27,9 +27,14 @@
       version = "0.20.2";
       src = previousAttrs.src.override {
         tag = "v${version}";
-        hash = "";
+        hash = "sha256-Ic3eLOohLR7MQGkLvDJBNOCiBBKxh6l8X9MgK0b4w+Y=";
       };
     });

And finally rebuild again;

nixos-rebuild switch --impure --flake .

🤞 here's hoping it was a me problem the whole time!

I'll report back, either with an edit to this post or reply to any follow-on replies, one way or the other once compiling completes

Edit and update

Looks like 0.20.2 has a build error?!

After all that cerimony with `hash`es it's popping;

cmd/cmd.go:30:2: github.com/mattn/go-runewidth@v0.0.14: reading file:///nix/store/bxzg12a4v6iq928g321cn8ka557gar12-ollama-0.20.2-go-modules/github.com/mattn/go-runewidth/@v/v0.0.14.zip: no such file or directory

... sorts of build errors :-\

Here be the full gorry log;

nix log /nix/store/ar522ql5jlvb7sk7l89p7m5hn67wmyfh-ollama-0.20.2.drv
Sourcing setup-cuda-hook
Sourcing auto-add-driver-runpath-hook
Using autoAddDriverRunpath
Sourcing fix-elf-files.sh
Using versionCheckHook
source: sourcing removeStubsFromRunpathHook.bash (hostOffset=0) (targetOffset=1)
source: added removeStubsFromRunpathHookRegistration to prePhases
Sourcing fix-elf-files.sh
Running phase: removeStubsFromRunpathHookRegistration
@nix { "action": "setPhase", "phase": "removeStubsFromRunpathHookRegistration" }
removeStubsFromRunpathHookRegistration: discovered 'autoFixElfFiles addDriverRunpath' in postFixupHooks; this hook should be unnecessary when linking against stub files!
removeStubsFromRunpathHookRegistration: added removeStubsFromRunpath to postFixupHooks
Running phase: unpackPhase
@nix { "action": "setPhase", "phase": "unpackPhase" }
unpacking source archive /nix/store/998q5nm76s029d02vznn5vbd63dksfaj-source
source root is source
Running phase: patchPhase
@nix { "action": "setPhase", "phase": "patchPhase" }
Running phase: updateAutotoolsGnuConfigScriptsPhase
@nix { "action": "setPhase", "phase": "updateAutotoolsGnuConfigScriptsPhase" }
Running phase: configurePhase
@nix { "action": "setPhase", "phase": "configurePhase" }
Executing setupCUDAToolkitCompilers
Running phase: buildPhase
@nix { "action": "setPhase", "phase": "buildPhase" }
-- The C compiler identification is GNU 14.3.0
-- The CXX compiler identification is GNU 14.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /nix/store/bn3mhkpnh7zrf0sb65jalb7dg76ycl42-gcc-wrapper-14.3.0/bin/gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /nix/store/bn3mhkpnh7zrf0sb65jalb7dg76ycl42-gcc-wrapper-14.3.0/bin/g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- GGML_SYSTEM_ARCH: x86
-- Including CPU backend
-- x86 detected
-- Adding CPU backend variant ggml-cpu-x64:  
-- x86 detected
-- Adding CPU backend variant ggml-cpu-sse42: -msse4.2 GGML_SSE42
-- x86 detected
-- Adding CPU backend variant ggml-cpu-sandybridge: -msse4.2;-mavx GGML_SSE42;GGML_AVX
-- x86 detected
-- Adding CPU backend variant ggml-cpu-haswell: -msse4.2;-mf16c;-mfma;-mbmi2;-mavx;-mavx2 GGML_SSE42;GGML_F16C;GGML_FMA;GGML_BMI2;GGML_AVX;GGML_AVX2
-- x86 detected
-- Adding CPU backend variant ggml-cpu-skylakex: -msse4.2;-mf16c;-mfma;-mbmi2;-mavx;-mavx2;-mavx512f;-mavx512cd;-mavx512vl;-mavx512dq;-mavx512bw GGML_SSE42;GGML_F16C;GGML_FMA;GGML_BMI2;GGML_AVX;GGML_AVX2;GGML_AVX512
-- x86 detected
-- Adding CPU backend variant ggml-cpu-icelake: -msse4.2;-mf16c;-mfma;-mbmi2;-mavx;-mavx2;-mavx512f;-mavx512cd;-mavx512vl;-mavx512dq;-mavx512bw;-mavx512vbmi;-mavx512vnni GGML_SSE42;GGML_F16C;GGML_FMA;GGML_BMI2;GGML_AVX;GGML_AVX2;GGML_AVX512;GGML_AVX512_VBMI;GGML_AVX512_VNNI
-- x86 detected
-- Adding CPU backend variant ggml-cpu-alderlake: -msse4.2;-mf16c;-mfma;-mbmi2;-mavx;-mavx2;-mavxvnni GGML_SSE42;GGML_F16C;GGML_FMA;GGML_BMI2;GGML_AVX;GGML_AVX2;GGML_AVX_VNNI
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - /nix/store/8j41syz9cbh1l74k2283q14ghpap7nfx-cuda12.9-cuda_nvcc-12.9.86/bin/nvcc
-- Looking for a CUDA host compiler - /nix/store/bn3mhkpnh7zrf0sb65jalb7dg76ycl42-gcc-wrapper-14.3.0/bin/c++
-- Found CUDAToolkit: /nix/store/8j41syz9cbh1l74k2283q14ghpap7nfx-cuda12.9-cuda_nvcc-12.9.86/include;/nix/store/7yh3xaxp5jyxsj0x4hhcks8xws89a4q3-cuda12.9-libcublas-12.9.1.4-include/include (found version "12.9.86")
-- CUDA Toolkit found
-- Using CUDA architectures: 50
-- The CUDA compiler identification is NVIDIA 12.9.86 with host compiler GNU 14.3.0
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /nix/store/8j41syz9cbh1l74k2283q14ghpap7nfx-cuda12.9-cuda_nvcc-12.9.86/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Looking for a HIP compiler
-- Looking for a HIP compiler - NOTFOUND
-- Could NOT find Vulkan (missing: Vulkan_LIBRARY Vulkan_INCLUDE_DIR) (found version "")
-- Configuring done (10.9s)
-- Generating done (0.1s)
-- Build files have been written to: /build/source/build
[  1%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64-feats.dir/ggml-cpu/arch/x86/cpu-feats.cpp.o
[  3%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake-feats.dir/ggml-cpu/arch/x86/cpu-feats.cpp.o
[  3%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o
[  4%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42-feats.dir/ggml-cpu/arch/x86/cpu-feats.cpp.o
[  4%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge-feats.dir/ggml-cpu/arch/x86/cpu-feats.cpp.o
[  4%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex-feats.dir/ggml-cpu/arch/x86/cpu-feats.cpp.o
[  4%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell-feats.dir/ggml-cpu/arch/x86/cpu-feats.cpp.o
[  4%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake-feats.dir/ggml-cpu/arch/x86/cpu-feats.cpp.o
[  4%] Built target ggml-cpu-sandybridge-feats
[  4%] Built target ggml-cpu-x64-feats
[  4%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-base.dir/ggml.cpp.o
[  4%] Built target ggml-cpu-haswell-feats
[  4%] Built target ggml-cpu-icelake-feats
[  4%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o
[  5%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o
[  5%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-base.dir/ggml-opt.cpp.o
[  5%] Built target ggml-cpu-alderlake-feats
[  5%] Built target ggml-cpu-skylakex-feats
[  6%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o
[  6%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o
[  6%] Built target ggml-cpu-sse42-feats
[  6%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-base.dir/mem_hip.cpp.o
[  7%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-base.dir/mem_nvml.cpp.o
[  7%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-base.dir/mem_dxgi_pdh.cpp.o
[  8%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-base.dir/gguf.cpp.o
[  8%] Linking CXX shared library ../../../../../lib/ollama/libggml-base.so
[  8%] Built target ggml-base
[  8%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/ggml-cpu.c.o
[  8%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/ggml-cpu.c.o
[  9%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/ggml-cpu.c.o
[ 10%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/ggml-cpu.c.o
[ 11%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/ggml-cpu.c.o
[ 11%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/ggml-cpu.c.o
[ 11%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/ggml-cpu.c.o
[ 11%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/acc.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 11%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/ggml-cpu.cpp.o
[ 11%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/ggml-cpu.cpp.o
[ 11%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/ggml-cpu.cpp.o
[ 12%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/ggml-cpu.cpp.o
[ 12%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/ggml-cpu.cpp.o
[ 12%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/ggml-cpu.cpp.o
[ 13%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/ggml-cpu.cpp.o
[ 14%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/repack.cpp.o
[ 14%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/repack.cpp.o
[ 15%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/repack.cpp.o
[ 16%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/repack.cpp.o
[ 16%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/repack.cpp.o
[ 16%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/repack.cpp.o
[ 16%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/repack.cpp.o
[ 17%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/add-id.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 17%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/arange.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 17%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/hbm.cpp.o
[ 18%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/quants.c.o
[ 19%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/hbm.cpp.o
[ 19%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/quants.c.o
[ 19%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/hbm.cpp.o
[ 20%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/quants.c.o
[ 20%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/hbm.cpp.o
[ 21%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/hbm.cpp.o
[ 21%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/quants.c.o
[ 21%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/quants.c.o
[ 21%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/hbm.cpp.o
[ 22%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/hbm.cpp.o
[ 23%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/quants.c.o
[ 23%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/quants.c.o
[ 24%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/traits.cpp.o
[ 24%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/traits.cpp.o
[ 24%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/traits.cpp.o
[ 25%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/traits.cpp.o
[ 25%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/traits.cpp.o
[ 25%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/traits.cpp.o
[ 25%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/traits.cpp.o
[ 25%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/amx/amx.cpp.o
[ 25%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/amx/amx.cpp.o
[ 25%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 26%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/amx/amx.cpp.o
[ 26%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/amx/amx.cpp.o
[ 27%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/amx/amx.cpp.o
[ 27%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/amx/amx.cpp.o
[ 28%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/amx/amx.cpp.o
[ 28%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/amx/mmq.cpp.o
[ 29%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/amx/mmq.cpp.o
[ 29%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/amx/mmq.cpp.o
[ 30%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/amx/mmq.cpp.o
[ 30%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/amx/mmq.cpp.o
[ 31%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/amx/mmq.cpp.o
[ 31%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/amx/mmq.cpp.o
[ 32%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/binary-ops.cpp.o
[ 33%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/binary-ops.cpp.o
[ 33%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/binary-ops.cpp.o
[ 34%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/binary-ops.cpp.o
[ 34%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/binary-ops.cpp.o
[ 34%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/binary-ops.cpp.o
[ 34%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/binary-ops.cpp.o
[ 35%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argsort.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 35%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/unary-ops.cpp.o
[ 36%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/unary-ops.cpp.o
[ 36%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/unary-ops.cpp.o
[ 36%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/unary-ops.cpp.o
[ 37%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/unary-ops.cpp.o
[ 37%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/unary-ops.cpp.o
[ 37%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/unary-ops.cpp.o
[ 38%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/vec.cpp.o
[ 38%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/vec.cpp.o
[ 39%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/vec.cpp.o
[ 40%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/vec.cpp.o
[ 40%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/vec.cpp.o
[ 40%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/ops.cpp.o
[ 40%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/ops.cpp.o
[ 40%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/vec.cpp.o
[ 40%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/ops.cpp.o
[ 40%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/vec.cpp.o
[ 40%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/ops.cpp.o
[ 41%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/ops.cpp.o
[ 42%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/ops.cpp.o
[ 43%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/ops.cpp.o
[ 43%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/binbcast.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 43%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/llamafile/sgemm.cpp.o
[ 44%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/arch/x86/quants.c.o
[ 45%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/llamafile/sgemm.cpp.o
[ 46%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/llamafile/sgemm.cpp.o
[ 46%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/llamafile/sgemm.cpp.o
[ 46%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/arch/x86/repack.cpp.o
[ 46%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/arch/x86/quants.c.o
[ 46%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/llamafile/sgemm.cpp.o
[ 46%] Linking CXX shared module ../../../../../lib/ollama/libggml-cpu-x64.so
[ 47%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/arch/x86/repack.cpp.o
[ 47%] Built target ggml-cpu-x64
[ 48%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/arch/x86/quants.c.o
[ 48%] Linking CXX shared module ../../../../../lib/ollama/libggml-cpu-sse42.so
[ 48%] Built target ggml-cpu-sse42
[ 49%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/clamp.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 49%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/arch/x86/quants.c.o
[ 49%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/llamafile/sgemm.cpp.o
[ 49%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/llamafile/sgemm.cpp.o
[ 49%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/arch/x86/repack.cpp.o
[ 49%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/arch/x86/repack.cpp.o
[ 49%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/arch/x86/quants.c.o
[ 49%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/concat.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 50%] Linking CXX shared module ../../../../../lib/ollama/libggml-cpu-sandybridge.so
[ 50%] Built target ggml-cpu-sandybridge
[ 50%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/arch/x86/quants.c.o
[ 51%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/arch/x86/repack.cpp.o
[ 52%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/arch/x86/quants.c.o
[ 53%] Linking CXX shared module ../../../../../lib/ollama/libggml-cpu-haswell.so
[ 53%] Built target ggml-cpu-haswell
[ 54%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/arch/x86/repack.cpp.o
[ 54%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/conv-transpose-1d.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 55%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/conv2d-dw.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 55%] Linking CXX shared module ../../../../../lib/ollama/libggml-cpu-alderlake.so
[ 55%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/arch/x86/repack.cpp.o
[ 55%] Built target ggml-cpu-alderlake
[ 55%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/conv2d-transpose.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 55%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/conv2d.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 56%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/convert.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 56%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/count-equal.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 57%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cpy.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 57%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cross-entropy-loss.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 57%] Linking CXX shared module ../../../../../lib/ollama/libggml-cpu-icelake.so
[ 57%] Built target ggml-cpu-icelake
[ 57%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cumsum.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 58%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/diag.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 58%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/diagmask.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 59%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 59%] Linking CXX shared module ../../../../../lib/ollama/libggml-cpu-skylakex.so
[ 59%] Built target ggml-cpu-skylakex
[ 59%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-wmma-f16.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 59%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 60%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fill.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 60%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/getrows.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 61%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 61%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/gla.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 61%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/im2col.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 62%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mean.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 62%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmf.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 62%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmid.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 63%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmq.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 63%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmvf.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 64%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmvq.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 64%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/norm.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 64%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/opt-step-adamw.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 65%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/opt-step-sgd.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 65%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/out-prod.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 66%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/pad.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 66%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/pad_reflect_1d.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 66%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/pool2d.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 67%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/quantize.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 67%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/roll.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 67%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/rope.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 68%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/scale.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 68%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/set-rows.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 69%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/set.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 69%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/softcap.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 69%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/softmax.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 70%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/solve_tri.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 70%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ssm-conv.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 71%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ssm-scan.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 71%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/sum.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 71%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/sumrows.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 72%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/topk-moe.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 72%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/tri.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 72%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/tsembd.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 73%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/unary.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 73%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/upscale.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 74%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/wkv.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 74%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-tile-instance-dkq112-dv112.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 74%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-tile-instance-dkq128-dv128.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 75%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-tile-instance-dkq256-dv256.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 75%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-tile-instance-dkq40-dv40.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 76%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-tile-instance-dkq576-dv512.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 76%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-tile-instance-dkq64-dv64.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 76%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-tile-instance-dkq72-dv72.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 77%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-tile-instance-dkq80-dv80.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 77%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-tile-instance-dkq96-dv96.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 77%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_1-ncols2_16.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 78%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_1-ncols2_8.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 78%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_16-ncols2_1.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 79%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_16-ncols2_2.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 79%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_16-ncols2_4.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 79%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_2-ncols2_16.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 80%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_2-ncols2_4.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 80%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_2-ncols2_8.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 81%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_32-ncols2_1.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 81%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_32-ncols2_2.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 81%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_4-ncols2_16.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 82%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_4-ncols2_2.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 82%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_4-ncols2_4.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 83%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_4-ncols2_8.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 83%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_64-ncols2_1.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 83%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_8-ncols2_1.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 84%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_8-ncols2_2.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 84%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_8-ncols2_4.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 84%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_8-ncols2_8.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 85%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-iq1_s.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 85%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-iq2_s.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 86%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-iq2_xs.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 86%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-iq2_xxs.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 86%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-iq3_s.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 87%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-iq3_xxs.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 87%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-iq4_nl.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 88%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-iq4_xs.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 88%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-mxfp4.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 88%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-q2_k.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 89%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-q3_k.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 89%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-q4_0.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 89%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-q4_1.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 90%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-q4_k.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 90%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-q5_0.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 91%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-q5_1.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 91%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-q5_k.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 91%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-q6_k.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 92%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-q8_0.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 92%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_1.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 93%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_10.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 93%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_11.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 93%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_12.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 94%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_13.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 94%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_14.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 94%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_15.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 95%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_16.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 95%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_2.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 96%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_3.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 96%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_4.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 96%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_5.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 97%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_6.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 97%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_7.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 98%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_8.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 98%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_9.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 98%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-vec-instance-q4_0-q4_0.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 99%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-vec-instance-q8_0-q8_0.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 99%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-vec-instance-f16-f16.cu.o
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[100%] Linking CUDA shared module ../../../../../../lib/ollama/libggml-cuda.so
[100%] Built target ggml-cuda
Building subPackage .
go: downloading github.com/spf13/cobra v1.7.0
go: downloading github.com/containerd/console v1.0.3
go: downloading github.com/mattn/go-runewidth v0.0.14
go: downloading github.com/olekukonko/tablewriter v0.0.5
go: downloading github.com/pkg/browser v0.0.0-20240102092130-5ac0b6a4141c
go: downloading golang.org/x/crypto v0.43.0
go: downloading golang.org/x/sync v0.17.0
go: downloading golang.org/x/term v0.36.0
go: downloading github.com/google/uuid v1.6.0
go: downloading golang.org/x/mod v0.30.0
go: downloading golang.org/x/text v0.30.0
go: downloading github.com/emirpasic/gods/v2 v2.0.0-alpha
go: downloading github.com/gin-contrib/cors v1.7.2
go: downloading github.com/gin-gonic/gin v1.10.0
go: downloading golang.org/x/image v0.22.0
go: downloading golang.org/x/sys v0.37.0
go: downloading github.com/spf13/pflag v1.0.5
go: downloading github.com/wk8/go-ordered-map/v2 v2.1.8
go: downloading github.com/d4l3k/go-bfloat16 v0.0.0-20211005043715-690c3bdd05f1
go: downloading github.com/nlpodyssey/gopickle v0.3.0
go: downloading github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c
go: downloading github.com/x448/float16 v0.8.4
go: downloading gonum.org/v1/gonum v0.15.0
go: downloading google.golang.org/protobuf v1.34.1
go: downloading github.com/agnivade/levenshtein v1.1.1
go: downloading github.com/gin-contrib/sse v0.1.0
go: downloading github.com/mattn/go-isatty v0.0.20
go: downloading golang.org/x/net v0.46.0
go: downloading github.com/bahlo/generic-list-go v0.2.0
go: downloading github.com/buger/jsonparser v1.1.1
go: downloading github.com/mailru/easyjson v0.7.7
go: downloading gopkg.in/yaml.v3 v3.0.1
go: downloading github.com/dlclark/regexp2 v1.11.4
go: downloading github.com/pkg/errors v0.9.1
go: downloading github.com/apache/arrow/go/arrow v0.0.0-20211112161151-bc219186db40
go: downloading github.com/google/flatbuffers v24.3.25+incompatible
go: downloading github.com/chewxy/math32 v1.11.0
go: downloading github.com/chewxy/hm v1.0.0
go: downloading go4.org/unsafe/assume-no-moving-gc v0.0.0-20231121144256-b99613f794b6
go: downloading gorgonia.org/vecf32 v0.9.0
go: downloading gorgonia.org/vecf64 v0.9.0
go: downloading github.com/go-playground/validator/v10 v10.20.0
go: downloading github.com/pelletier/go-toml/v2 v2.2.2
go: downloading github.com/ugorji/go/codec v1.2.12
go: downloading golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1
go: downloading github.com/xtgo/set v1.0.0
go: downloading github.com/gogo/protobuf v1.3.2
go: downloading github.com/golang/protobuf v1.5.4
go: downloading golang.org/x/exp v0.0.0-20250218142911-aa4b98e5adaa
go: downloading github.com/gabriel-vasile/mimetype v1.4.3
go: downloading github.com/go-playground/universal-translator v0.18.1
go: downloading github.com/leodido/go-urn v1.4.0
go: downloading github.com/go-playground/locales v0.14.1
cmd/cmd.go:30:2: github.com/mattn/go-runewidth@v0.0.14: reading file:///nix/store/bxzg12a4v6iq928g321cn8ka557gar12-ollama-0.20.2-go-modules/github.com/mattn/go-runewidth/@v/v0.0.14.zip: no such file or directory

TLDR: ya likely be correct that I ain't running 0.20.2 🤦

Edit to the edit 💫

So turns out the derivation wrapper that NixOS uses for ollama is a goBuild function that has it's own hash for Go related dependencies, spicificlly vendorHash be that attribute, sooo I'm doing the song-and-dance of empty hash -> error -> computed hash -> re-compile 🤪

... nope, still no joy with gemma4 here be current state of Nix configs;

{ pkgs, ... }:

{
  services.ollama = {
    enable = true;

    package = (pkgs.ollama-cuda.override {
      cudaArches = [ "50" ];
    }).overrideAttrs (finalAttrs: previousAttrs: rec {
      version = "0.20.2";
      src = previousAttrs.src.override {
        tag = "v${version}";
        hash = "sha256-II9ffgkMj2yx7Sek5PuAgRnUIS1Kf1UeK71+DwAgBRE=";
      };
      vendorHash = "sha256-Y5ynzWzV5LcjQ834InTNcdlnc2Ru7pTgxfvHXjPH5d0=";
    });

    environmentVariables = {
      OLLAMA_DEBUG = "2";
    };

    loadModels = [
      "gemma4:26b"
      "gemma4:e4b"
    ];
  };
}
<!-- gh-comment-id:4195809257 --> @S0AndS0 commented on GitHub (Apr 7, 2026): NixOS is a build from source distro, but with caching for _pure_ FOSS packages that are not overridden But, because I'm overriding (which triggers non-cached build) the Cuda variant (which is not _pure_ FOSS) it _should_ be forced to build from source on my device Plus all logs are pointing to it really being version `0.20.2` as defined in OP configs ... only cache I can think may be getting in the way is local `/nix/store` cache For any in audience that are following along at home, here be how to likely rule out cache invalidation bugs when overriding preexisting packages; <details><summary>Nix hash dance</summary> First force `src.hash` to an empty string; ```diff diff --git a/services/ollama.nix b/services/ollama.nix index 14e3f49..30ab245 100644 --- a/services/ollama.nix +++ b/services/ollama.nix @@ -27,9 +27,14 @@ version = "0.20.2"; src = previousAttrs.src.override { tag = "v${version}"; + hash = ""; }; }); ``` Next attempt re-build and extract computed hash from error; ```bash nixos-rebuild switch --impure --flake . ``` ``` #.... error: hash mismatch in fixed-output derivation '/nix/store/pvprsc23shq85h1x3wmfg540s73ps3nk-source.drv': specified: sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= got: sha256-Ic3eLOohLR7MQGkLvDJBNOCiBBKxh6l8X9MgK0b4w+Y= #.... ``` Then insert that hash; ```diff diff --git a/services/ollama.nix b/services/ollama.nix index 14e3f49..30ab245 100644 --- a/services/ollama.nix +++ b/services/ollama.nix @@ -27,9 +27,14 @@ version = "0.20.2"; src = previousAttrs.src.override { tag = "v${version}"; - hash = ""; + hash = "sha256-Ic3eLOohLR7MQGkLvDJBNOCiBBKxh6l8X9MgK0b4w+Y="; }; }); ``` And finally rebuild again; ```bash nixos-rebuild switch --impure --flake . ``` </details> 🤞 here's hoping it was a me problem the whole time! I'll report back, either with an edit to this post or reply to any follow-on replies, one way or the other once compiling completes ## Edit and update Looks like `0.20.2` has a build error?! <details><summary> After all that cerimony with `hash`es it's popping; > `cmd/cmd.go:30:2: github.com/mattn/go-runewidth@v0.0.14: reading file:///nix/store/bxzg12a4v6iq928g321cn8ka557gar12-ollama-0.20.2-go-modules/github.com/mattn/go-runewidth/@v/v0.0.14.zip: no such file or directory` ... sorts of build errors :-\ Here be the full gorry log; ```bash nix log /nix/store/ar522ql5jlvb7sk7l89p7m5hn67wmyfh-ollama-0.20.2.drv ``` </summary> ``` Sourcing setup-cuda-hook Sourcing auto-add-driver-runpath-hook Using autoAddDriverRunpath Sourcing fix-elf-files.sh Using versionCheckHook source: sourcing removeStubsFromRunpathHook.bash (hostOffset=0) (targetOffset=1) source: added removeStubsFromRunpathHookRegistration to prePhases Sourcing fix-elf-files.sh Running phase: removeStubsFromRunpathHookRegistration @nix { "action": "setPhase", "phase": "removeStubsFromRunpathHookRegistration" } removeStubsFromRunpathHookRegistration: discovered 'autoFixElfFiles addDriverRunpath' in postFixupHooks; this hook should be unnecessary when linking against stub files! removeStubsFromRunpathHookRegistration: added removeStubsFromRunpath to postFixupHooks Running phase: unpackPhase @nix { "action": "setPhase", "phase": "unpackPhase" } unpacking source archive /nix/store/998q5nm76s029d02vznn5vbd63dksfaj-source source root is source Running phase: patchPhase @nix { "action": "setPhase", "phase": "patchPhase" } Running phase: updateAutotoolsGnuConfigScriptsPhase @nix { "action": "setPhase", "phase": "updateAutotoolsGnuConfigScriptsPhase" } Running phase: configurePhase @nix { "action": "setPhase", "phase": "configurePhase" } Executing setupCUDAToolkitCompilers Running phase: buildPhase @nix { "action": "setPhase", "phase": "buildPhase" } -- The C compiler identification is GNU 14.3.0 -- The CXX compiler identification is GNU 14.3.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /nix/store/bn3mhkpnh7zrf0sb65jalb7dg76ycl42-gcc-wrapper-14.3.0/bin/gcc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /nix/store/bn3mhkpnh7zrf0sb65jalb7dg76ycl42-gcc-wrapper-14.3.0/bin/g++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF -- CMAKE_SYSTEM_PROCESSOR: x86_64 -- GGML_SYSTEM_ARCH: x86 -- Including CPU backend -- x86 detected -- Adding CPU backend variant ggml-cpu-x64: -- x86 detected -- Adding CPU backend variant ggml-cpu-sse42: -msse4.2 GGML_SSE42 -- x86 detected -- Adding CPU backend variant ggml-cpu-sandybridge: -msse4.2;-mavx GGML_SSE42;GGML_AVX -- x86 detected -- Adding CPU backend variant ggml-cpu-haswell: -msse4.2;-mf16c;-mfma;-mbmi2;-mavx;-mavx2 GGML_SSE42;GGML_F16C;GGML_FMA;GGML_BMI2;GGML_AVX;GGML_AVX2 -- x86 detected -- Adding CPU backend variant ggml-cpu-skylakex: -msse4.2;-mf16c;-mfma;-mbmi2;-mavx;-mavx2;-mavx512f;-mavx512cd;-mavx512vl;-mavx512dq;-mavx512bw GGML_SSE42;GGML_F16C;GGML_FMA;GGML_BMI2;GGML_AVX;GGML_AVX2;GGML_AVX512 -- x86 detected -- Adding CPU backend variant ggml-cpu-icelake: -msse4.2;-mf16c;-mfma;-mbmi2;-mavx;-mavx2;-mavx512f;-mavx512cd;-mavx512vl;-mavx512dq;-mavx512bw;-mavx512vbmi;-mavx512vnni GGML_SSE42;GGML_F16C;GGML_FMA;GGML_BMI2;GGML_AVX;GGML_AVX2;GGML_AVX512;GGML_AVX512_VBMI;GGML_AVX512_VNNI -- x86 detected -- Adding CPU backend variant ggml-cpu-alderlake: -msse4.2;-mf16c;-mfma;-mbmi2;-mavx;-mavx2;-mavxvnni GGML_SSE42;GGML_F16C;GGML_FMA;GGML_BMI2;GGML_AVX;GGML_AVX2;GGML_AVX_VNNI -- Looking for a CUDA compiler -- Looking for a CUDA compiler - /nix/store/8j41syz9cbh1l74k2283q14ghpap7nfx-cuda12.9-cuda_nvcc-12.9.86/bin/nvcc -- Looking for a CUDA host compiler - /nix/store/bn3mhkpnh7zrf0sb65jalb7dg76ycl42-gcc-wrapper-14.3.0/bin/c++ -- Found CUDAToolkit: /nix/store/8j41syz9cbh1l74k2283q14ghpap7nfx-cuda12.9-cuda_nvcc-12.9.86/include;/nix/store/7yh3xaxp5jyxsj0x4hhcks8xws89a4q3-cuda12.9-libcublas-12.9.1.4-include/include (found version "12.9.86") -- CUDA Toolkit found -- Using CUDA architectures: 50 -- The CUDA compiler identification is NVIDIA 12.9.86 with host compiler GNU 14.3.0 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /nix/store/8j41syz9cbh1l74k2283q14ghpap7nfx-cuda12.9-cuda_nvcc-12.9.86/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Looking for a HIP compiler -- Looking for a HIP compiler - NOTFOUND -- Could NOT find Vulkan (missing: Vulkan_LIBRARY Vulkan_INCLUDE_DIR) (found version "") -- Configuring done (10.9s) -- Generating done (0.1s) -- Build files have been written to: /build/source/build [ 1%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64-feats.dir/ggml-cpu/arch/x86/cpu-feats.cpp.o [ 3%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake-feats.dir/ggml-cpu/arch/x86/cpu-feats.cpp.o [ 3%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o [ 4%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42-feats.dir/ggml-cpu/arch/x86/cpu-feats.cpp.o [ 4%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge-feats.dir/ggml-cpu/arch/x86/cpu-feats.cpp.o [ 4%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex-feats.dir/ggml-cpu/arch/x86/cpu-feats.cpp.o [ 4%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell-feats.dir/ggml-cpu/arch/x86/cpu-feats.cpp.o [ 4%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake-feats.dir/ggml-cpu/arch/x86/cpu-feats.cpp.o [ 4%] Built target ggml-cpu-sandybridge-feats [ 4%] Built target ggml-cpu-x64-feats [ 4%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-base.dir/ggml.cpp.o [ 4%] Built target ggml-cpu-haswell-feats [ 4%] Built target ggml-cpu-icelake-feats [ 4%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o [ 5%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o [ 5%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-base.dir/ggml-opt.cpp.o [ 5%] Built target ggml-cpu-alderlake-feats [ 5%] Built target ggml-cpu-skylakex-feats [ 6%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o [ 6%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o [ 6%] Built target ggml-cpu-sse42-feats [ 6%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-base.dir/mem_hip.cpp.o [ 7%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-base.dir/mem_nvml.cpp.o [ 7%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-base.dir/mem_dxgi_pdh.cpp.o [ 8%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-base.dir/gguf.cpp.o [ 8%] Linking CXX shared library ../../../../../lib/ollama/libggml-base.so [ 8%] Built target ggml-base [ 8%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/ggml-cpu.c.o [ 8%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/ggml-cpu.c.o [ 9%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/ggml-cpu.c.o [ 10%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/ggml-cpu.c.o [ 11%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/ggml-cpu.c.o [ 11%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/ggml-cpu.c.o [ 11%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/ggml-cpu.c.o [ 11%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/acc.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 11%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/ggml-cpu.cpp.o [ 11%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/ggml-cpu.cpp.o [ 11%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/ggml-cpu.cpp.o [ 12%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/ggml-cpu.cpp.o [ 12%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/ggml-cpu.cpp.o [ 12%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/ggml-cpu.cpp.o [ 13%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/ggml-cpu.cpp.o [ 14%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/repack.cpp.o [ 14%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/repack.cpp.o [ 15%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/repack.cpp.o [ 16%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/repack.cpp.o [ 16%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/repack.cpp.o [ 16%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/repack.cpp.o [ 16%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/repack.cpp.o [ 17%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/add-id.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 17%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/arange.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 17%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/hbm.cpp.o [ 18%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/quants.c.o [ 19%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/hbm.cpp.o [ 19%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/quants.c.o [ 19%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/hbm.cpp.o [ 20%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/quants.c.o [ 20%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/hbm.cpp.o [ 21%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/hbm.cpp.o [ 21%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/quants.c.o [ 21%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/quants.c.o [ 21%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/hbm.cpp.o [ 22%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/hbm.cpp.o [ 23%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/quants.c.o [ 23%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/quants.c.o [ 24%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/traits.cpp.o [ 24%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/traits.cpp.o [ 24%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/traits.cpp.o [ 25%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/traits.cpp.o [ 25%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/traits.cpp.o [ 25%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/traits.cpp.o [ 25%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/traits.cpp.o [ 25%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/amx/amx.cpp.o [ 25%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/amx/amx.cpp.o [ 25%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 26%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/amx/amx.cpp.o [ 26%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/amx/amx.cpp.o [ 27%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/amx/amx.cpp.o [ 27%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/amx/amx.cpp.o [ 28%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/amx/amx.cpp.o [ 28%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/amx/mmq.cpp.o [ 29%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/amx/mmq.cpp.o [ 29%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/amx/mmq.cpp.o [ 30%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/amx/mmq.cpp.o [ 30%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/amx/mmq.cpp.o [ 31%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/amx/mmq.cpp.o [ 31%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/amx/mmq.cpp.o [ 32%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/binary-ops.cpp.o [ 33%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/binary-ops.cpp.o [ 33%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/binary-ops.cpp.o [ 34%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/binary-ops.cpp.o [ 34%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/binary-ops.cpp.o [ 34%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/binary-ops.cpp.o [ 34%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/binary-ops.cpp.o [ 35%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argsort.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 35%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/unary-ops.cpp.o [ 36%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/unary-ops.cpp.o [ 36%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/unary-ops.cpp.o [ 36%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/unary-ops.cpp.o [ 37%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/unary-ops.cpp.o [ 37%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/unary-ops.cpp.o [ 37%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/unary-ops.cpp.o [ 38%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/vec.cpp.o [ 38%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/vec.cpp.o [ 39%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/vec.cpp.o [ 40%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/vec.cpp.o [ 40%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/vec.cpp.o [ 40%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/ops.cpp.o [ 40%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/ops.cpp.o [ 40%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/vec.cpp.o [ 40%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/ops.cpp.o [ 40%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/vec.cpp.o [ 40%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/ops.cpp.o [ 41%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/ops.cpp.o [ 42%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/ops.cpp.o [ 43%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/ops.cpp.o [ 43%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/binbcast.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 43%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/llamafile/sgemm.cpp.o [ 44%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/arch/x86/quants.c.o [ 45%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/llamafile/sgemm.cpp.o [ 46%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/llamafile/sgemm.cpp.o [ 46%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/llamafile/sgemm.cpp.o [ 46%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-x64.dir/ggml-cpu/arch/x86/repack.cpp.o [ 46%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/arch/x86/quants.c.o [ 46%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/llamafile/sgemm.cpp.o [ 46%] Linking CXX shared module ../../../../../lib/ollama/libggml-cpu-x64.so [ 47%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sse42.dir/ggml-cpu/arch/x86/repack.cpp.o [ 47%] Built target ggml-cpu-x64 [ 48%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/arch/x86/quants.c.o [ 48%] Linking CXX shared module ../../../../../lib/ollama/libggml-cpu-sse42.so [ 48%] Built target ggml-cpu-sse42 [ 49%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/clamp.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 49%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/arch/x86/quants.c.o [ 49%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/llamafile/sgemm.cpp.o [ 49%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/llamafile/sgemm.cpp.o [ 49%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-haswell.dir/ggml-cpu/arch/x86/repack.cpp.o [ 49%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-sandybridge.dir/ggml-cpu/arch/x86/repack.cpp.o [ 49%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/arch/x86/quants.c.o [ 49%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/concat.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 50%] Linking CXX shared module ../../../../../lib/ollama/libggml-cpu-sandybridge.so [ 50%] Built target ggml-cpu-sandybridge [ 50%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/arch/x86/quants.c.o [ 51%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-alderlake.dir/ggml-cpu/arch/x86/repack.cpp.o [ 52%] Building C object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/arch/x86/quants.c.o [ 53%] Linking CXX shared module ../../../../../lib/ollama/libggml-cpu-haswell.so [ 53%] Built target ggml-cpu-haswell [ 54%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-icelake.dir/ggml-cpu/arch/x86/repack.cpp.o [ 54%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/conv-transpose-1d.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 55%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/conv2d-dw.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 55%] Linking CXX shared module ../../../../../lib/ollama/libggml-cpu-alderlake.so [ 55%] Building CXX object ml/backend/ggml/ggml/src/CMakeFiles/ggml-cpu-skylakex.dir/ggml-cpu/arch/x86/repack.cpp.o [ 55%] Built target ggml-cpu-alderlake [ 55%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/conv2d-transpose.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 55%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/conv2d.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 56%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/convert.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 56%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/count-equal.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 57%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cpy.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 57%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cross-entropy-loss.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 57%] Linking CXX shared module ../../../../../lib/ollama/libggml-cpu-icelake.so [ 57%] Built target ggml-cpu-icelake [ 57%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cumsum.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 58%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/diag.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 58%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/diagmask.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 59%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 59%] Linking CXX shared module ../../../../../lib/ollama/libggml-cpu-skylakex.so [ 59%] Built target ggml-cpu-skylakex [ 59%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-wmma-f16.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 59%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 60%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fill.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 60%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/getrows.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 61%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 61%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/gla.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 61%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/im2col.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 62%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mean.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 62%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmf.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 62%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmid.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 63%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmq.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 63%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmvf.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 64%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmvq.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 64%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/norm.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 64%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/opt-step-adamw.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 65%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/opt-step-sgd.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 65%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/out-prod.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 66%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/pad.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 66%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/pad_reflect_1d.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 66%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/pool2d.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 67%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/quantize.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 67%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/roll.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 67%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/rope.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 68%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/scale.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 68%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/set-rows.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 69%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/set.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 69%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/softcap.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 69%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/softmax.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 70%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/solve_tri.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 70%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ssm-conv.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 71%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ssm-scan.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 71%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/sum.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 71%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/sumrows.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 72%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/topk-moe.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 72%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/tri.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 72%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/tsembd.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 73%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/unary.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 73%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/upscale.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 74%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/wkv.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 74%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-tile-instance-dkq112-dv112.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 74%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-tile-instance-dkq128-dv128.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 75%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-tile-instance-dkq256-dv256.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 75%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-tile-instance-dkq40-dv40.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 76%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-tile-instance-dkq576-dv512.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 76%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-tile-instance-dkq64-dv64.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 76%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-tile-instance-dkq72-dv72.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 77%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-tile-instance-dkq80-dv80.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 77%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-tile-instance-dkq96-dv96.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 77%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_1-ncols2_16.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 78%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_1-ncols2_8.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 78%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_16-ncols2_1.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 79%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_16-ncols2_2.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 79%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_16-ncols2_4.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 79%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_2-ncols2_16.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 80%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_2-ncols2_4.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 80%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_2-ncols2_8.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 81%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_32-ncols2_1.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 81%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_32-ncols2_2.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 81%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_4-ncols2_16.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 82%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_4-ncols2_2.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 82%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_4-ncols2_4.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 83%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_4-ncols2_8.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 83%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_64-ncols2_1.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 83%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_8-ncols2_1.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 84%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_8-ncols2_2.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 84%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_8-ncols2_4.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 84%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-mma-f16-instance-ncols1_8-ncols2_8.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 85%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-iq1_s.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 85%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-iq2_s.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 86%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-iq2_xs.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 86%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-iq2_xxs.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 86%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-iq3_s.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 87%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-iq3_xxs.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 87%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-iq4_nl.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 88%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-iq4_xs.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 88%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-mxfp4.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 88%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-q2_k.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 89%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-q3_k.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 89%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-q4_0.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 89%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-q4_1.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 90%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-q4_k.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 90%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-q5_0.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 91%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-q5_1.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 91%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-q5_k.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 91%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-q6_k.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 92%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmq-instance-q8_0.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 92%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_1.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 93%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_10.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 93%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_11.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 93%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_12.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 94%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_13.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 94%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_14.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 94%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_15.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 95%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_16.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 95%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_2.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 96%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_3.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 96%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_4.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 96%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_5.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 97%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_6.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 97%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_7.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 98%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_8.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 98%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/mmf-instance-ncols_9.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 98%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-vec-instance-q4_0-q4_0.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 99%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-vec-instance-q8_0-q8_0.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [ 99%] Building CUDA object ml/backend/ggml/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/template-instances/fattn-vec-instance-f16-f16.cu.o nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [100%] Linking CUDA shared module ../../../../../../lib/ollama/libggml-cuda.so [100%] Built target ggml-cuda Building subPackage . go: downloading github.com/spf13/cobra v1.7.0 go: downloading github.com/containerd/console v1.0.3 go: downloading github.com/mattn/go-runewidth v0.0.14 go: downloading github.com/olekukonko/tablewriter v0.0.5 go: downloading github.com/pkg/browser v0.0.0-20240102092130-5ac0b6a4141c go: downloading golang.org/x/crypto v0.43.0 go: downloading golang.org/x/sync v0.17.0 go: downloading golang.org/x/term v0.36.0 go: downloading github.com/google/uuid v1.6.0 go: downloading golang.org/x/mod v0.30.0 go: downloading golang.org/x/text v0.30.0 go: downloading github.com/emirpasic/gods/v2 v2.0.0-alpha go: downloading github.com/gin-contrib/cors v1.7.2 go: downloading github.com/gin-gonic/gin v1.10.0 go: downloading golang.org/x/image v0.22.0 go: downloading golang.org/x/sys v0.37.0 go: downloading github.com/spf13/pflag v1.0.5 go: downloading github.com/wk8/go-ordered-map/v2 v2.1.8 go: downloading github.com/d4l3k/go-bfloat16 v0.0.0-20211005043715-690c3bdd05f1 go: downloading github.com/nlpodyssey/gopickle v0.3.0 go: downloading github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c go: downloading github.com/x448/float16 v0.8.4 go: downloading gonum.org/v1/gonum v0.15.0 go: downloading google.golang.org/protobuf v1.34.1 go: downloading github.com/agnivade/levenshtein v1.1.1 go: downloading github.com/gin-contrib/sse v0.1.0 go: downloading github.com/mattn/go-isatty v0.0.20 go: downloading golang.org/x/net v0.46.0 go: downloading github.com/bahlo/generic-list-go v0.2.0 go: downloading github.com/buger/jsonparser v1.1.1 go: downloading github.com/mailru/easyjson v0.7.7 go: downloading gopkg.in/yaml.v3 v3.0.1 go: downloading github.com/dlclark/regexp2 v1.11.4 go: downloading github.com/pkg/errors v0.9.1 go: downloading github.com/apache/arrow/go/arrow v0.0.0-20211112161151-bc219186db40 go: downloading github.com/google/flatbuffers v24.3.25+incompatible go: downloading github.com/chewxy/math32 v1.11.0 go: downloading github.com/chewxy/hm v1.0.0 go: downloading go4.org/unsafe/assume-no-moving-gc v0.0.0-20231121144256-b99613f794b6 go: downloading gorgonia.org/vecf32 v0.9.0 go: downloading gorgonia.org/vecf64 v0.9.0 go: downloading github.com/go-playground/validator/v10 v10.20.0 go: downloading github.com/pelletier/go-toml/v2 v2.2.2 go: downloading github.com/ugorji/go/codec v1.2.12 go: downloading golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 go: downloading github.com/xtgo/set v1.0.0 go: downloading github.com/gogo/protobuf v1.3.2 go: downloading github.com/golang/protobuf v1.5.4 go: downloading golang.org/x/exp v0.0.0-20250218142911-aa4b98e5adaa go: downloading github.com/gabriel-vasile/mimetype v1.4.3 go: downloading github.com/go-playground/universal-translator v0.18.1 go: downloading github.com/leodido/go-urn v1.4.0 go: downloading github.com/go-playground/locales v0.14.1 cmd/cmd.go:30:2: github.com/mattn/go-runewidth@v0.0.14: reading file:///nix/store/bxzg12a4v6iq928g321cn8ka557gar12-ollama-0.20.2-go-modules/github.com/mattn/go-runewidth/@v/v0.0.14.zip: no such file or directory ``` </details> TLDR: ya likely be correct that I ain't running `0.20.2` 🤦 ## Edit to the edit 💫 So turns out the derivation wrapper that NixOS uses for `ollama` is a `goBuild` function that has it's own hash for Go related dependencies, spicificlly `vendorHash` be that attribute, sooo I'm doing the song-and-dance of empty hash -> error -> computed hash -> re-compile 🤪 ... nope, still no joy with `gemma4` here be current state of Nix configs; ```nix { pkgs, ... }: { services.ollama = { enable = true; package = (pkgs.ollama-cuda.override { cudaArches = [ "50" ]; }).overrideAttrs (finalAttrs: previousAttrs: rec { version = "0.20.2"; src = previousAttrs.src.override { tag = "v${version}"; hash = "sha256-II9ffgkMj2yx7Sek5PuAgRnUIS1Kf1UeK71+DwAgBRE="; }; vendorHash = "sha256-Y5ynzWzV5LcjQ834InTNcdlnc2Ru7pTgxfvHXjPH5d0="; }); environmentVariables = { OLLAMA_DEBUG = "2"; }; loadModels = [ "gemma4:26b" "gemma4:e4b" ]; }; } ```
Author
Owner

@ForsakenHarmony commented on GitHub (Apr 7, 2026):

having the same issue on 0.20.3 trying to load huggingface.co/unsloth/gemma-4-e4b-it-gguf:UD-Q4_K_XL

not working in docker or on windows directly

<!-- gh-comment-id:4200075469 --> @ForsakenHarmony commented on GitHub (Apr 7, 2026): having the same issue on `0.20.3` trying to load `huggingface.co/unsloth/gemma-4-e4b-it-gguf:UD-Q4_K_XL` not working in docker or on windows directly
Author
Owner

@rick-github commented on GitHub (Apr 7, 2026):

https://github.com/ollama/ollama/issues/15354#issuecomment-4191794224

<!-- gh-comment-id:4200084497 --> @rick-github commented on GitHub (Apr 7, 2026): https://github.com/ollama/ollama/issues/15354#issuecomment-4191794224
Author
Owner

@ZgblKylin commented on GitHub (Apr 7, 2026):

The reason is https://github.com/ollama/ollama/issues/14575#issuecomment-3989918451

As a workaround, removing the vision GGUF would work: https://github.com/ollama/ollama/issues/14503#issuecomment-4133511574

I ran hf.co/HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive:IQ4_XS successfully by these steps:

  • ollama show --modelfile <model> > Modelfile
  • Open Modelfile, delete second From xxxx
  • ollama create <new_name> -f Modelfile
  • ollama run <new_name>
<!-- gh-comment-id:4200406702 --> @ZgblKylin commented on GitHub (Apr 7, 2026): The reason is https://github.com/ollama/ollama/issues/14575#issuecomment-3989918451 As a workaround, removing the vision GGUF would work: https://github.com/ollama/ollama/issues/14503#issuecomment-4133511574 I ran `hf.co/HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive:IQ4_XS` successfully by these steps: - `ollama show --modelfile <model> > Modelfile` - Open `Modelfile`, delete second `From xxxx` - `ollama create <new_name> -f Modelfile` - `ollama run <new_name>`
Author
Owner

@S0AndS0 commented on GitHub (Apr 8, 2026):

No joy with instructions provided when used against gemma4:26b 🤷

To mitigate possible non-reproducibility, here be instructions as understood encoded into simple Bash script;

#!/usr/bin/env bash

set -eET;

_target_name="${1:?Undefined target name}";
_target_suffix="${2:-custom}";

_target_directory="${HOME}/.ollama/models";
_target_file="${_target_directory}/${_target_name}-${_target_suffix}-Modelfile";

if [[ -f "${_target_file}" ]]; then
	printf >&2 'Target file already exists -> %s\n' "${_target_file}";
	exit 1;
fi

ollama show --modelfile "${_target_name}" |
	awk 'BEGINFILE {
		count = 0;
	}
	{
		if ($0 ~ "^(#\\s+)?FROM\\s.*") {
			count++;
		}
	if (count == 2) {
		gsub(".*", "# &");
		count = -1;
	}
	print;
	}'  > "${_target_file}";

ollama create "${_target_name}-${_target_suffix}" --file "${_target_file}";
./ollama-customize gemma4:26b &&
  ollama run "${_}-custom";
#> Error: no FROM line

... near as I can tell gemma4:26b has one FROM line that's acthually used

I also tried using only FROM gemma4:26b too, and same 500 errors as reported were the result

<!-- gh-comment-id:4203067641 --> @S0AndS0 commented on GitHub (Apr 8, 2026): No joy with instructions provided when used against `gemma4:26b` :shrug: To mitigate possible non-reproducibility, here be instructions as understood encoded into simple Bash script; ```bash #!/usr/bin/env bash set -eET; _target_name="${1:?Undefined target name}"; _target_suffix="${2:-custom}"; _target_directory="${HOME}/.ollama/models"; _target_file="${_target_directory}/${_target_name}-${_target_suffix}-Modelfile"; if [[ -f "${_target_file}" ]]; then printf >&2 'Target file already exists -> %s\n' "${_target_file}"; exit 1; fi ollama show --modelfile "${_target_name}" | awk 'BEGINFILE { count = 0; } { if ($0 ~ "^(#\\s+)?FROM\\s.*") { count++; } if (count == 2) { gsub(".*", "# &"); count = -1; } print; }' > "${_target_file}"; ollama create "${_target_name}-${_target_suffix}" --file "${_target_file}"; ``` ```bash ./ollama-customize gemma4:26b && ollama run "${_}-custom"; #> Error: no FROM line ``` ... near as I can tell `gemma4:26b` has one `FROM` line that's acthually used I also tried using only `FROM gemma4:26b` too, and same 500 errors as reported were the result
Author
Owner

@rick-github commented on GitHub (Apr 8, 2026):

The issue you are having is not the presence of a second GGUF (vision) in the Modelfile. Your issue is that the ollama binary you are building is not at least 0.20.0, which is when gemma4 support was added.

<!-- gh-comment-id:4203084002 --> @rick-github commented on GitHub (Apr 8, 2026): The issue you are having is not the presence of a second GGUF (vision) in the Modelfile. Your issue is that the ollama binary you are building is not at least 0.20.0, which is when gemma4 support was added.
Author
Owner

@S0AndS0 commented on GitHub (Apr 8, 2026):

Pretty sure it is 0.20.2 since latest re-build and re-compile yesterday, especially as all known sources of possible cache confusion have been eliminated and properly overridden

Is there some way to check other than ollama -v?

<!-- gh-comment-id:4203112995 --> @S0AndS0 commented on GitHub (Apr 8, 2026): Pretty sure it is `0.20.2` since latest re-build and re-compile yesterday, especially as all known sources of possible cache confusion have been eliminated and properly overridden Is there some way to check other than `ollama -v`?
Author
Owner

@rick-github commented on GitHub (Apr 8, 2026):

Show me the server config line.

<!-- gh-comment-id:4203116535 --> @rick-github commented on GitHub (Apr 8, 2026): Show me the `server config` line.
Author
Owner

@S0AndS0 commented on GitHub (Apr 8, 2026):

server config line as in SystemD's ExecStart line, or via somewhere else?

<!-- gh-comment-id:4203132695 --> @S0AndS0 commented on GitHub (Apr 8, 2026): `server config` line as in SystemD's `ExecStart` line, or via somewhere else?
Author
Owner

@rick-github commented on GitHub (Apr 8, 2026):

journalctl -u ollama --no-pager | grep server.config | tail -1
<!-- gh-comment-id:4203139124 --> @rick-github commented on GitHub (Apr 8, 2026): ``` journalctl -u ollama --no-pager | grep server.config | tail -1 ```
Author
Owner

@S0AndS0 commented on GitHub (Apr 8, 2026):

Apr 07 07:29:36 nixos ollama[1835]: time=2026-04-07T07:29:36.263-07:00 level=INFO source=routes.go:1636 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/var/lib/ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"

<!-- gh-comment-id:4203147768 --> @S0AndS0 commented on GitHub (Apr 8, 2026): `Apr 07 07:29:36 nixos ollama[1835]: time=2026-04-07T07:29:36.263-07:00 level=INFO source=routes.go:1636 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/var/lib/ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"`
Author
Owner

@rick-github commented on GitHub (Apr 8, 2026):

https://github.com/ollama/ollama/pull/14106 added a new configuration variable, OLLAMA_DEBUG_LOG_REQUESTS, and was merged on March 20. Your server config line does not contain this variable, so the source code you are building from is at least three weeks old, older than ollama v0.18.3. Hence no gemma4 support.

<!-- gh-comment-id:4203158457 --> @rick-github commented on GitHub (Apr 8, 2026): https://github.com/ollama/ollama/pull/14106 added a new configuration variable, `OLLAMA_DEBUG_LOG_REQUESTS`, and was merged on March 20. Your `server config` line does not contain this variable, so the source code you are building from is at least three weeks old, older than ollama v0.18.3. Hence no gemma4 support.
Author
Owner

@S0AndS0 commented on GitHub (Apr 8, 2026):

🤦 ... thanks!

I'm gonna get a drink, or few, and mutter about NixOS filthy promises of fearless experimentation elsewhere for a bit x-]

<!-- gh-comment-id:4203168136 --> @S0AndS0 commented on GitHub (Apr 8, 2026): 🤦 ... thanks! I'm gonna get a drink, or few, and mutter about NixOS filthy promises of _fearless experimentation_ elsewhere for a bit x-]
Author
Owner

@YellowOnion commented on GitHub (Apr 8, 2026):

@S0AndS0

You're using the old source code, you must update the hash or nix will assume nothing has changed and not invalidate the caching. New Source code, means new hash.

using override on the src is a footgun causing this issue.

Just use fetchFromGithub with lib.fakeHash, or this command to generate the needed code.

$  nix-prefetch-github --nix ollama ollama --rev v0.20.3
<!-- gh-comment-id:4204834594 --> @YellowOnion commented on GitHub (Apr 8, 2026): @S0AndS0 You're using the old source code, you *must* update the hash or nix will assume nothing has changed and not invalidate the caching. New Source code, means new hash. using override on the `src` is a footgun causing this issue. Just use `fetchFromGithub` with `lib.fakeHash`, or this command to generate the needed code. ```shell $ nix-prefetch-github --nix ollama ollama --rev v0.20.3 ```
Author
Owner

@S0AndS0 commented on GitHub (Apr 8, 2026):

WoooohoooOooo!

@YellowOnion ya may not have fixed me, but ya fixed my particular issue with Ollama and NixOS, thanks!

@rick-github super grateful I am to ya for putting-up with my fumbling x-]

I'll be editing the OP with configs that are functioning, with GPU support, for Gemma4 with version 0.20.2 of Ollama and then closing this Issue as resolved 🎉

<!-- gh-comment-id:4210076113 --> @S0AndS0 commented on GitHub (Apr 8, 2026): WoooohoooOooo! @YellowOnion ya may not have fixed me, but ya fixed my particular issue with Ollama and NixOS, thanks! @rick-github super grateful I am to ya for putting-up with my fumbling x-] I'll be editing the OP with configs that are functioning, with GPU support, for Gemma4 with version `0.20.2` of Ollama and then closing this Issue as resolved 🎉
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71883