[GH-ISSUE #5460] custom model: error loading model: check_tensor_dims: tensor 'blk.0.ffn_norm.weight' not found #3416

Closed
opened 2026-04-12 14:03:31 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @finnbusse on GitHub (Jul 3, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5460

What is the issue?

I recently trained a custom AI model using Google Colab with Alpaca and Unsloth. The training process was successful, but when attempting to run the model using Ollama, I encountered an error.

C:\Users\Finn\Downloads>ollama run test2 Error: llama runner process has terminated: exit status 0xc0000409

Building of this AI model was sucessful:

C:\Users\Finn\Downloads>ollama create test2 -f Modelfile transferring model data using existing layer sha256:b9175c65733392c2bf6c90c4a2fc5772b948369ec3269fb7b0b1f2ae24a8ac2c creating new layer sha256:73d81a2944b28d56a86a4bd8980f14085e5ec0e894b80f1932da1010a2411add writing manifest success

Modelfile:
FROM test2.gguf

Before that, I was able to chat with the model within Google Colab.

Server logs:

Device 0: NVIDIA GeForce RTX 3060 Ti, compute capability 8.6, VMM: yes llm_load_tensors: ggml ctx size = 0.39 MiB llama_model_load: error loading model: check_tensor_dims: tensor 'blk.0.ffn_norm.weight' not found llama_load_model_from_file: exception loading model time=2024-07-03T14:34:25.661+02:00 level=INFO source=server.go:594 msg="waiting for server to become available" status="llm server error" time=2024-07-03T14:34:25.921+02:00 level=ERROR source=sched.go:388 msg="error loading llama server" error="llama runner process has terminated: exit status 0xc0000409 " [GIN] 2024/07/03 - 14:34:25 | 500 | 1.1579959s | 127.0.0.1 | POST "/api/chat" time=2024-07-03T14:34:30.949+02:00 level=WARN source=sched.go:575 msg="gpu VRAM usage didn't recover within timeout" seconds=5.0273536 model=C:\Users\Finn\.ollama\models\blobs\sha256-b9175c65733392c2bf6c90c4a2fc5772b948369ec3269fb7b0b1f2ae24a8ac2c time=2024-07-03T14:34:31.199+02:00 level=WARN source=sched.go:575 msg="gpu VRAM usage didn't recover within timeout" seconds=5.2771314 model=C:\Users\Finn\.ollama\models\blobs\sha256-b9175c65733392c2bf6c90c4a2fc5772b948369ec3269fb7b0b1f2ae24a8ac2c time=2024-07-03T14:34:31.451+02:00 level=WARN source=sched.go:575 msg="gpu VRAM usage didn't recover within timeout" seconds=5.5292856 model=C:\Users\Finn\.ollama\models\blobs\sha256-b9175c65733392c2bf6c90c4a2fc5772b948369ec3269fb7b0b1f2ae24a8ac2c

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.1.48

Originally created by @finnbusse on GitHub (Jul 3, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5460 ### What is the issue? I recently trained a custom AI model using Google Colab with Alpaca and Unsloth. The training process was successful, but when attempting to run the model using Ollama, I encountered an error. `C:\Users\Finn\Downloads>ollama run test2 Error: llama runner process has terminated: exit status 0xc0000409` Building of this AI model was sucessful: `C:\Users\Finn\Downloads>ollama create test2 -f Modelfile transferring model data using existing layer sha256:b9175c65733392c2bf6c90c4a2fc5772b948369ec3269fb7b0b1f2ae24a8ac2c creating new layer sha256:73d81a2944b28d56a86a4bd8980f14085e5ec0e894b80f1932da1010a2411add writing manifest success` Modelfile: `FROM test2.gguf` Before that, I was able to chat with the model within Google Colab. Server logs: ` Device 0: NVIDIA GeForce RTX 3060 Ti, compute capability 8.6, VMM: yes llm_load_tensors: ggml ctx size = 0.39 MiB llama_model_load: error loading model: check_tensor_dims: tensor 'blk.0.ffn_norm.weight' not found llama_load_model_from_file: exception loading model time=2024-07-03T14:34:25.661+02:00 level=INFO source=server.go:594 msg="waiting for server to become available" status="llm server error" time=2024-07-03T14:34:25.921+02:00 level=ERROR source=sched.go:388 msg="error loading llama server" error="llama runner process has terminated: exit status 0xc0000409 " [GIN] 2024/07/03 - 14:34:25 | 500 | 1.1579959s | 127.0.0.1 | POST "/api/chat" time=2024-07-03T14:34:30.949+02:00 level=WARN source=sched.go:575 msg="gpu VRAM usage didn't recover within timeout" seconds=5.0273536 model=C:\Users\Finn\.ollama\models\blobs\sha256-b9175c65733392c2bf6c90c4a2fc5772b948369ec3269fb7b0b1f2ae24a8ac2c time=2024-07-03T14:34:31.199+02:00 level=WARN source=sched.go:575 msg="gpu VRAM usage didn't recover within timeout" seconds=5.2771314 model=C:\Users\Finn\.ollama\models\blobs\sha256-b9175c65733392c2bf6c90c4a2fc5772b948369ec3269fb7b0b1f2ae24a8ac2c time=2024-07-03T14:34:31.451+02:00 level=WARN source=sched.go:575 msg="gpu VRAM usage didn't recover within timeout" seconds=5.5292856 model=C:\Users\Finn\.ollama\models\blobs\sha256-b9175c65733392c2bf6c90c4a2fc5772b948369ec3269fb7b0b1f2ae24a8ac2c` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.1.48
GiteaMirror added the bug label 2026-04-12 14:03:32 -05:00
Author
Owner

@dhiltgen commented on GitHub (Jul 24, 2024):

We're constantly adding new model architectures - does this still fail on the latest release?

<!-- gh-comment-id:2246618547 --> @dhiltgen commented on GitHub (Jul 24, 2024): We're constantly adding new model architectures - does this still fail on the latest release?
Author
Owner

@finnbusse commented on GitHub (Jul 24, 2024):

@dhiltgen Now its working!

<!-- gh-comment-id:2247183782 --> @finnbusse commented on GitHub (Jul 24, 2024): @dhiltgen Now its working!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3416