[GH-ISSUE #13829] Error loading model #71116

Open
opened 2026-05-05 00:24:32 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @soncemvo on GitHub (Jan 21, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/13829

What is the issue?

Finetuned qwen3-VL via LlamaFactory, converted from safetensors to gguf via llama.cpp, creating model in ollama is ok, but when i try to run model i got error, pls help

Others models include Qwen3-vl works fine

Relevant log output

Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: ggml_backend_cuda_device_get_memory device GPU-f96ee413-bbeb-db0e-f5b5-31cdf6e15749 utilizing NVML memory reporting free: 16721838080 total: 25757220864
Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.132+03:00 level=INFO source=sched.go:635 msg="updated VRAM based on existing loaded models" gpu=GPU-f96ee413-bbeb-db0e-f5b5-31cdf6e15749 library=CUDA total="24.0 GiB" available="15.6 GiB"
Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.163+03:00 level=INFO source=server.go:245 msg="enabling flash attention"
Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.163+03:00 level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-710dd9ac870452a8e599b4ed7672db3772d0a685541ff4e6b33d9ae9d2bfa8cb --port 44913"
Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.164+03:00 level=INFO source=sched.go:452 msg="system memory" total="31.1 GiB" free="27.2 GiB" free_swap="0 B"
Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.164+03:00 level=INFO source=sched.go:459 msg="gpu memory" id=GPU-f96ee413-bbeb-db0e-f5b5-31cdf6e15749 library=CUDA available="15.1 GiB" free="15.6 GiB" minimum="457.0 MiB" overhead="0 B"
Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.164+03:00 level=INFO source=server.go:755 msg="loading model" "model layers"=37 requested=-1
Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.172+03:00 level=INFO source=runner.go:1405 msg="starting ollama engine"
Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.172+03:00 level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:44913"
Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.175+03:00 level=INFO source=runner.go:1278 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:8192 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-f96ee413-bbeb-db0e-f5b5-31cdf6e15749 Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.191+03:00 level=INFO source=ggml.go:136 msg="" architecture=qwen3vl file_type=BF16 name="Qw3Vl Passport" description="" num_tensors=399 num_key_values=32
Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so
Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: ggml_cuda_init: found 1 CUDA devices:
Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]:   Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes, ID: GPU-f96ee413-bbeb-db0e-f5b5-31cdf6e15749
Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so
Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.258+03:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.459+03:00 level=INFO source=server.go:3634 msg="http: panic serving 127.0.0.1:38682: runtime error: invalid memory address or nil pointer dereference\ngoroutine 47 [running]:\nnet/http.(*conn).serve.func1()\n\tnet/http/server.go:1947 +0xbe\npanic({0x56032e866be0?, 0x56032f2626f0?})\n\truntime/panic.go:792 +0x132\ngithub.com/ollama/ollama/runner/ollamarunner.(*Server).allocModel.func1()\n\tgithub.com/ollama/ollama/runner/ollamarunner/runner.go:1187 +0x11a\npanic({0x56032e866be0?, 0x56032f2626f0?})\n\truntime/panic.go:792 +0x132\ngithub.com/ollama/ollama/ml/nn.(*Conv3D).Forward(0x0, {0x56032e9f8cd0, 0xc000e68500}, {0x56032ea04920?, 0xc0000107b0?}, 0x10?, 0xc000100808?, 0xc00097f800?, 0xc000047190?, 0x0, ...)\n\tgithub.com/ollama/ollama/ml/nn/convolution.go:25 +0x3a\ngithub.com/ollama/ollama/model/models/qwen3vl.(*VisionModel).Forward(0xc0000fe240, {0x56032e9f8cd0, 0xc000e68500}, {0x56032ea04920, 0xc000010198}, 0xc000e1c000)\n\tgithub.com/ollama/ollama/model/models/qwen3vl/model_vision.go:224 +0x118\ngithub.com/ollama/ollama/model/models/qwen3vl.(*Model).EncodeMultimodal(0xc0001691e0, {0x56032e9f8cd0, 0xc000e68500}, {0xc001a28000, 0x400436, 0x700000})\n\tgithub.com/ollama/ollama/model/models/qwen3vl/model.go:43 +0x14e\ngithub.com/ollama/ollama/runner/ollamarunner.(*Server).reserveWorstCaseGraph(0xc00024f0e0, 0x1)\n\tgithub.com/ollama/ollama/runner/ollamarunner/runner.go:1098 +0x34e\ngithub.com/ollama/ollama/runner/ollamarunner.(*Server).allocModel(0xc00024f0e0, {0x7ffde5ec2d24?, 0x56032d63b3fa?}, {0x0, 0x8, {0xc0000b9580, 0x1, 0x1}, 0x1}, {0x0, ...}, ...)\n\tgithub.com/ollama/ollama/runner/ollamarunner/runner.go:1226 +0x391\ngithub.com/ollama/ollama/runner/ollamarunner.(*Server).load(0xc00024f0e0, {0x56032e9eafc0, 0xc0004b8000}, 0xc0004b2000)\n\tgithub.com/ollama/ollama/runner/ollamarunner/runner.go:1305 +0x54b\nnet/http.HandlerFunc.ServeHTTP(0xc0000ffec0?, {0x56032e9eafc0?, 0xc0004b8000?}, 0xc000259b60?)\n\tnet/http/server.go:2294 +0x29\nnet/http.(*ServeMux).ServeHTTP(0x56032d2eaa85?, {0x56032e9eafc0, 0xc0004b8000}, 0xc0004b2000)\n\tnet/http/server.go:2822 +0x1c4\nnet/http.serverHandler.ServeHTTP({0x56032e9e74b0?}, {0x56032e9eafc0?, 0xc0004b8000?}, 0x1?)\n\tnet/http/server.go:3301 +0x8e\nnet/http.(*conn).serve(0xc00016c480, {0x56032e9ed438, 0xc00016aba0})\n\tnet/http/server.go:2102 +0x625\ncreated by net/http.(*Server).Serve in goroutine 1\n\tnet/http/server.go:3454 +0x485"
Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.460+03:00 level=INFO source=runner.go:1278 msg=load request="{Operation:close LoraPath:[] Parallel:0 BatchSize:0 FlashAttention:Disabled KvSize:0 KvCacheType: NumThreads:0 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.460+03:00 level=INFO source=sched.go:479 msg="Load failed" model=/usr/share/ollama/.ollama/models/blobs/sha256-710dd9ac870452a8e599b4ed7672db3772d0a685541ff4e6b33d9ae9d2bfa8cb error="do load request: Post \"http://127.0.0.1:44913/load\": EOF"
Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.470+03:00 level=ERROR source=server.go:302 msg="llama runner terminated" error="signal: killed"

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

latest

Originally created by @soncemvo on GitHub (Jan 21, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/13829 ### What is the issue? Finetuned qwen3-VL via LlamaFactory, converted from safetensors to gguf via llama.cpp, creating model in ollama is ok, but when i try to run model i got error, pls help Others models include Qwen3-vl works fine ### Relevant log output ```shell Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: ggml_backend_cuda_device_get_memory device GPU-f96ee413-bbeb-db0e-f5b5-31cdf6e15749 utilizing NVML memory reporting free: 16721838080 total: 25757220864 Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.132+03:00 level=INFO source=sched.go:635 msg="updated VRAM based on existing loaded models" gpu=GPU-f96ee413-bbeb-db0e-f5b5-31cdf6e15749 library=CUDA total="24.0 GiB" available="15.6 GiB" Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.163+03:00 level=INFO source=server.go:245 msg="enabling flash attention" Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.163+03:00 level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-710dd9ac870452a8e599b4ed7672db3772d0a685541ff4e6b33d9ae9d2bfa8cb --port 44913" Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.164+03:00 level=INFO source=sched.go:452 msg="system memory" total="31.1 GiB" free="27.2 GiB" free_swap="0 B" Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.164+03:00 level=INFO source=sched.go:459 msg="gpu memory" id=GPU-f96ee413-bbeb-db0e-f5b5-31cdf6e15749 library=CUDA available="15.1 GiB" free="15.6 GiB" minimum="457.0 MiB" overhead="0 B" Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.164+03:00 level=INFO source=server.go:755 msg="loading model" "model layers"=37 requested=-1 Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.172+03:00 level=INFO source=runner.go:1405 msg="starting ollama engine" Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.172+03:00 level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:44913" Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.175+03:00 level=INFO source=runner.go:1278 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:8192 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-f96ee413-bbeb-db0e-f5b5-31cdf6e15749 Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.191+03:00 level=INFO source=ggml.go:136 msg="" architecture=qwen3vl file_type=BF16 name="Qw3Vl Passport" description="" num_tensors=399 num_key_values=32 Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: ggml_cuda_init: found 1 CUDA devices: Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes, ID: GPU-f96ee413-bbeb-db0e-f5b5-31cdf6e15749 Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.258+03:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.459+03:00 level=INFO source=server.go:3634 msg="http: panic serving 127.0.0.1:38682: runtime error: invalid memory address or nil pointer dereference\ngoroutine 47 [running]:\nnet/http.(*conn).serve.func1()\n\tnet/http/server.go:1947 +0xbe\npanic({0x56032e866be0?, 0x56032f2626f0?})\n\truntime/panic.go:792 +0x132\ngithub.com/ollama/ollama/runner/ollamarunner.(*Server).allocModel.func1()\n\tgithub.com/ollama/ollama/runner/ollamarunner/runner.go:1187 +0x11a\npanic({0x56032e866be0?, 0x56032f2626f0?})\n\truntime/panic.go:792 +0x132\ngithub.com/ollama/ollama/ml/nn.(*Conv3D).Forward(0x0, {0x56032e9f8cd0, 0xc000e68500}, {0x56032ea04920?, 0xc0000107b0?}, 0x10?, 0xc000100808?, 0xc00097f800?, 0xc000047190?, 0x0, ...)\n\tgithub.com/ollama/ollama/ml/nn/convolution.go:25 +0x3a\ngithub.com/ollama/ollama/model/models/qwen3vl.(*VisionModel).Forward(0xc0000fe240, {0x56032e9f8cd0, 0xc000e68500}, {0x56032ea04920, 0xc000010198}, 0xc000e1c000)\n\tgithub.com/ollama/ollama/model/models/qwen3vl/model_vision.go:224 +0x118\ngithub.com/ollama/ollama/model/models/qwen3vl.(*Model).EncodeMultimodal(0xc0001691e0, {0x56032e9f8cd0, 0xc000e68500}, {0xc001a28000, 0x400436, 0x700000})\n\tgithub.com/ollama/ollama/model/models/qwen3vl/model.go:43 +0x14e\ngithub.com/ollama/ollama/runner/ollamarunner.(*Server).reserveWorstCaseGraph(0xc00024f0e0, 0x1)\n\tgithub.com/ollama/ollama/runner/ollamarunner/runner.go:1098 +0x34e\ngithub.com/ollama/ollama/runner/ollamarunner.(*Server).allocModel(0xc00024f0e0, {0x7ffde5ec2d24?, 0x56032d63b3fa?}, {0x0, 0x8, {0xc0000b9580, 0x1, 0x1}, 0x1}, {0x0, ...}, ...)\n\tgithub.com/ollama/ollama/runner/ollamarunner/runner.go:1226 +0x391\ngithub.com/ollama/ollama/runner/ollamarunner.(*Server).load(0xc00024f0e0, {0x56032e9eafc0, 0xc0004b8000}, 0xc0004b2000)\n\tgithub.com/ollama/ollama/runner/ollamarunner/runner.go:1305 +0x54b\nnet/http.HandlerFunc.ServeHTTP(0xc0000ffec0?, {0x56032e9eafc0?, 0xc0004b8000?}, 0xc000259b60?)\n\tnet/http/server.go:2294 +0x29\nnet/http.(*ServeMux).ServeHTTP(0x56032d2eaa85?, {0x56032e9eafc0, 0xc0004b8000}, 0xc0004b2000)\n\tnet/http/server.go:2822 +0x1c4\nnet/http.serverHandler.ServeHTTP({0x56032e9e74b0?}, {0x56032e9eafc0?, 0xc0004b8000?}, 0x1?)\n\tnet/http/server.go:3301 +0x8e\nnet/http.(*conn).serve(0xc00016c480, {0x56032e9ed438, 0xc00016aba0})\n\tnet/http/server.go:2102 +0x625\ncreated by net/http.(*Server).Serve in goroutine 1\n\tnet/http/server.go:3454 +0x485" Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.460+03:00 level=INFO source=runner.go:1278 msg=load request="{Operation:close LoraPath:[] Parallel:0 BatchSize:0 FlashAttention:Disabled KvSize:0 KvCacheType: NumThreads:0 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.460+03:00 level=INFO source=sched.go:479 msg="Load failed" model=/usr/share/ollama/.ollama/models/blobs/sha256-710dd9ac870452a8e599b4ed7672db3772d0a685541ff4e6b33d9ae9d2bfa8cb error="do load request: Post \"http://127.0.0.1:44913/load\": EOF" Jan 22 05:53:11 ai-nvidia-app01 ollama[4135940]: time=2026-01-22T05:53:11.470+03:00 level=ERROR source=server.go:302 msg="llama runner terminated" error="signal: killed" ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version latest
GiteaMirror added the bug label 2026-05-05 00:24:32 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71116