[GH-ISSUE #4022] cannot run moondream in Ollama (ERROR source=routes.go:120 msg="error loading llama server" error="llama runner process no longer running: 3221225477 ") #2495

Closed
opened 2026-04-12 12:49:20 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @prithvi151080 on GitHub (Apr 29, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4022

What is the issue?

I have downloaded the moondream model from official ollama site (https://ollama.com/library/moondream) but while running the model in ollama i get this error ERROR source=routes.go:120 msg="error loading llama server" error="llama runner process no longer running: 3221225477 "

image

Below is the entire ollama server log file:
time=2024-04-29T11:07:39.775+05:30 level=INFO source=images.go:817 msg="total blobs: 6"
time=2024-04-29T11:07:39.778+05:30 level=INFO source=images.go:824 msg="total unused blobs removed: 0"
time=2024-04-29T11:07:39.779+05:30 level=INFO source=routes.go:1143 msg="Listening on 127.0.0.1:11434 (version 0.1.32)"
time=2024-04-29T11:07:39.791+05:30 level=INFO source=payload.go:28 msg="extracting embedded files" dir=C:\Users\LENOVO\AppData\Local\Temp\ollama3526940464\runners
time=2024-04-29T11:07:40.021+05:30 level=INFO source=payload.go:41 msg="Dynamic LLM libraries [cpu_avx2 cuda_v11.3 rocm_v5.7 cpu cpu_avx]"
[GIN] 2024/04/29 - 11:07:40 | 200 | 0s | 127.0.0.1 | HEAD "/"
[GIN] 2024/04/29 - 11:07:40 | 200 | 2.1732ms | 127.0.0.1 | POST "/api/show"
[GIN] 2024/04/29 - 11:07:40 | 200 | 1.6291ms | 127.0.0.1 | POST "/api/show"
time=2024-04-29T11:07:40.857+05:30 level=INFO source=gpu.go:121 msg="Detecting GPU type"
time=2024-04-29T11:07:40.857+05:30 level=INFO source=gpu.go:268 msg="Searching for GPU management library cudart64_.dll"
time=2024-04-29T11:07:40.869+05:30 level=INFO source=gpu.go:314 msg="Discovered GPU libraries: [C:\Users\LENOVO\AppData\Local\Programs\Ollama\cudart64_110.dll]"
time=2024-04-29T11:07:41.992+05:30 level=INFO source=gpu.go:126 msg="Nvidia GPU detected via cudart"
time=2024-04-29T11:07:41.993+05:30 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-04-29T11:07:42.065+05:30 level=INFO source=gpu.go:202 msg="[cudart] CUDART CUDA Compute Capability detected: 8.6"
time=2024-04-29T11:07:42.082+05:30 level=INFO source=gpu.go:121 msg="Detecting GPU type"
time=2024-04-29T11:07:42.082+05:30 level=INFO source=gpu.go:268 msg="Searching for GPU management library cudart64_.dll"
time=2024-04-29T11:07:42.092+05:30 level=INFO source=gpu.go:314 msg="Discovered GPU libraries: [C:\Users\LENOVO\AppData\Local\Programs\Ollama\cudart64_110.dll]"
time=2024-04-29T11:07:42.093+05:30 level=INFO source=gpu.go:126 msg="Nvidia GPU detected via cudart"
time=2024-04-29T11:07:42.093+05:30 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-04-29T11:07:42.142+05:30 level=INFO source=gpu.go:202 msg="[cudart] CUDART CUDA Compute Capability detected: 8.6"
time=2024-04-29T11:07:42.168+05:30 level=INFO source=server.go:127 msg="offload to gpu" reallayers=25 layers=25 required="2588.9 MiB" used="2588.9 MiB" available="3304.2 MiB" kv="384.0 MiB" fulloffload="148.0 MiB" partialoffload="190.0 MiB"
time=2024-04-29T11:07:42.168+05:30 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-04-29T11:07:42.177+05:30 level=INFO source=server.go:264 msg="starting llama server" cmd="C:\Users\LENOVO\AppData\Local\Temp\ollama3526940464\runners\cuda_v11.3\ollama_llama_server.exe --model C:\Users\LENOVO.ollama\models\blobs\sha256-e554c6b9de016673fd2c732e0342967727e9659ca5f853a4947cc96263fa602b --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 25 --mmproj C:\Users\LENOVO.ollama\models\blobs\sha256-4cc1cb3660d87ff56432ebeb7884ad35d67c48c7b9f6b2856f305e39c38eed8f --port 53596"
time=2024-04-29T11:07:42.229+05:30 level=INFO source=server.go:389 msg="waiting for llama runner to start responding"
{"function":"server_params_parse","level":"INFO","line":2603,"msg":"logging to file is disabled.","tid":"21912","timestamp":1714369062}
{"build":2679,"commit":"7593639","function":"wmain","level":"INFO","line":2820,"msg":"build info","tid":"21912","timestamp":1714369062}
{"function":"wmain","level":"INFO","line":2827,"msg":"system info","n_threads":8,"n_threads_batch":-1,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | ","tid":"21912","timestamp":1714369062,"total_threads":16}
{"function":"load_model","level":"INFO","line":395,"msg":"Multi Modal Mode Enabled","tid":"21912","timestamp":1714369062}
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3050 Laptop GPU, compute capability 8.6, VMM: yes
key clip.vision.image_grid_pinpoints not found in file
key clip.vision.mm_patch_merge_type not found in file
key clip.vision.image_crop_resolution not found in file
clip_model_load: failed to load vision model tensors
time=2024-04-29T11:07:43.382+05:30 level=ERROR source=routes.go:120 msg="error loading llama server" error="llama runner process no longer running: 3221225477 "

I am able to run other models in ollama like Mixtral, mxbai, tinyllama etc without any issue but Moondream is failing to start.

OS details: Windows
GPU: Nvidia RTX 3050

Any help would be highly appreciated.

OS

Windows

GPU

Nvidia

CPU

AMD

Ollama version

0.1.32

Originally created by @prithvi151080 on GitHub (Apr 29, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4022 ### What is the issue? I have downloaded the moondream model from official ollama site (https://ollama.com/library/moondream) but while running the model in ollama i get this error ERROR source=routes.go:120 msg="error loading llama server" error="llama runner process no longer running: 3221225477 " ![image](https://github.com/ollama/ollama/assets/157370999/b5b20357-b825-4cc8-b908-ac0ea0768874) Below is the entire ollama server log file: time=2024-04-29T11:07:39.775+05:30 level=INFO source=images.go:817 msg="total blobs: 6" time=2024-04-29T11:07:39.778+05:30 level=INFO source=images.go:824 msg="total unused blobs removed: 0" time=2024-04-29T11:07:39.779+05:30 level=INFO source=routes.go:1143 msg="Listening on 127.0.0.1:11434 (version 0.1.32)" time=2024-04-29T11:07:39.791+05:30 level=INFO source=payload.go:28 msg="extracting embedded files" dir=C:\Users\LENOVO\AppData\Local\Temp\ollama3526940464\runners time=2024-04-29T11:07:40.021+05:30 level=INFO source=payload.go:41 msg="Dynamic LLM libraries [cpu_avx2 cuda_v11.3 rocm_v5.7 cpu cpu_avx]" [GIN] 2024/04/29 - 11:07:40 | 200 | 0s | 127.0.0.1 | HEAD "/" [GIN] 2024/04/29 - 11:07:40 | 200 | 2.1732ms | 127.0.0.1 | POST "/api/show" [GIN] 2024/04/29 - 11:07:40 | 200 | 1.6291ms | 127.0.0.1 | POST "/api/show" time=2024-04-29T11:07:40.857+05:30 level=INFO source=gpu.go:121 msg="Detecting GPU type" time=2024-04-29T11:07:40.857+05:30 level=INFO source=gpu.go:268 msg="Searching for GPU management library cudart64_.dll" time=2024-04-29T11:07:40.869+05:30 level=INFO source=gpu.go:314 msg="Discovered GPU libraries: [C:\Users\LENOVO\AppData\Local\Programs\Ollama\cudart64_110.dll]" time=2024-04-29T11:07:41.992+05:30 level=INFO source=gpu.go:126 msg="Nvidia GPU detected via cudart" time=2024-04-29T11:07:41.993+05:30 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-04-29T11:07:42.065+05:30 level=INFO source=gpu.go:202 msg="[cudart] CUDART CUDA Compute Capability detected: 8.6" time=2024-04-29T11:07:42.082+05:30 level=INFO source=gpu.go:121 msg="Detecting GPU type" time=2024-04-29T11:07:42.082+05:30 level=INFO source=gpu.go:268 msg="Searching for GPU management library cudart64_.dll" time=2024-04-29T11:07:42.092+05:30 level=INFO source=gpu.go:314 msg="Discovered GPU libraries: [C:\Users\LENOVO\AppData\Local\Programs\Ollama\cudart64_110.dll]" time=2024-04-29T11:07:42.093+05:30 level=INFO source=gpu.go:126 msg="Nvidia GPU detected via cudart" time=2024-04-29T11:07:42.093+05:30 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-04-29T11:07:42.142+05:30 level=INFO source=gpu.go:202 msg="[cudart] CUDART CUDA Compute Capability detected: 8.6" time=2024-04-29T11:07:42.168+05:30 level=INFO source=server.go:127 msg="offload to gpu" reallayers=25 layers=25 required="2588.9 MiB" used="2588.9 MiB" available="3304.2 MiB" kv="384.0 MiB" fulloffload="148.0 MiB" partialoffload="190.0 MiB" time=2024-04-29T11:07:42.168+05:30 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-04-29T11:07:42.177+05:30 level=INFO source=server.go:264 msg="starting llama server" cmd="C:\Users\LENOVO\AppData\Local\Temp\ollama3526940464\runners\cuda_v11.3\ollama_llama_server.exe --model C:\Users\LENOVO\.ollama\models\blobs\sha256-e554c6b9de016673fd2c732e0342967727e9659ca5f853a4947cc96263fa602b --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 25 --mmproj C:\Users\LENOVO\.ollama\models\blobs\sha256-4cc1cb3660d87ff56432ebeb7884ad35d67c48c7b9f6b2856f305e39c38eed8f --port 53596" time=2024-04-29T11:07:42.229+05:30 level=INFO source=server.go:389 msg="waiting for llama runner to start responding" {"function":"server_params_parse","level":"INFO","line":2603,"msg":"logging to file is disabled.","tid":"21912","timestamp":1714369062} {"build":2679,"commit":"7593639","function":"wmain","level":"INFO","line":2820,"msg":"build info","tid":"21912","timestamp":1714369062} {"function":"wmain","level":"INFO","line":2827,"msg":"system info","n_threads":8,"n_threads_batch":-1,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | ","tid":"21912","timestamp":1714369062,"total_threads":16} {"function":"load_model","level":"INFO","line":395,"msg":"Multi Modal Mode Enabled","tid":"21912","timestamp":1714369062} ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3050 Laptop GPU, compute capability 8.6, VMM: yes key clip.vision.image_grid_pinpoints not found in file key clip.vision.mm_patch_merge_type not found in file key clip.vision.image_crop_resolution not found in file clip_model_load: failed to load vision model tensors time=2024-04-29T11:07:43.382+05:30 level=ERROR source=routes.go:120 msg="error loading llama server" error="llama runner process no longer running: 3221225477 " I am able to run other models in ollama like Mixtral, mxbai, tinyllama etc without any issue but Moondream is failing to start. OS details: Windows GPU: Nvidia RTX 3050 Any help would be highly appreciated. ### OS Windows ### GPU Nvidia ### CPU AMD ### Ollama version 0.1.32
GiteaMirror added the bug label 2026-04-12 12:49:20 -05:00
Author
Owner

@prithvi151080 commented on GitHub (Apr 29, 2024):

updated to version 0.1.33 and the problem resolved

<!-- gh-comment-id:2082445950 --> @prithvi151080 commented on GitHub (Apr 29, 2024): updated to version 0.1.33 and the problem resolved
Author
Owner

@oliverbob commented on GitHub (May 2, 2024):

updated to version 0.1.33 and the problem resolved

When I rerun the install script intending to upgrade, it still say 0.1.32 when I do: ollama --version

<!-- gh-comment-id:2090388349 --> @oliverbob commented on GitHub (May 2, 2024): > updated to version 0.1.33 and the problem resolved When I rerun the install script intending to upgrade, it still say 0.1.32 when I do: ollama --version
Author
Owner

@prithvi151080 commented on GitHub (May 2, 2024):

@oliverbob: plz download the required prerelease ollama version 0.1.33 from https://github.com/ollama/ollama/releases/tag/v0.1.33-rc7 and run install...check ollama version and then try moondream...hope this helps

<!-- gh-comment-id:2091283302 --> @prithvi151080 commented on GitHub (May 2, 2024): @oliverbob: plz download the required prerelease ollama version 0.1.33 from https://github.com/ollama/ollama/releases/tag/v0.1.33-rc7 and run install...check ollama version and then try moondream...hope this helps
Author
Owner

@prithvi151080 commented on GitHub (May 3, 2024):

@oliverbob: version 0.1.33 officially released...plz update and run moondream

<!-- gh-comment-id:2092309793 --> @prithvi151080 commented on GitHub (May 3, 2024): @oliverbob: version 0.1.33 officially released...plz update and run moondream
Author
Owner

@UmutAlihan commented on GitHub (May 4, 2024):

0.1.33 still throws error for moondream multimodal queries:

llm-api-ollama | key clip.vision.image_grid_pinpoints not found in file
llm-api-ollama | key clip.vision.mm_patch_merge_type not found in file
llm-api-ollama | key clip.vision.image_crop_resolution not found in file
llm-api-ollama | clip_model_load: failed to load vision model tensors
llm-api-ollama | SIGSEGV: segmentation violation
llm-api-ollama | PC=0x7f9c681aed80 m=12 sigcode=128 addr=0x0
llm-api-ollama | signal arrived during cgo execution
llm-api-ollama |

<!-- gh-comment-id:2094283174 --> @UmutAlihan commented on GitHub (May 4, 2024): 0.1.33 still throws error for moondream multimodal queries: > llm-api-ollama | key clip.vision.image_grid_pinpoints not found in file llm-api-ollama | key clip.vision.mm_patch_merge_type not found in file llm-api-ollama | key clip.vision.image_crop_resolution not found in file llm-api-ollama | clip_model_load: failed to load vision model tensors llm-api-ollama | SIGSEGV: segmentation violation llm-api-ollama | PC=0x7f9c681aed80 m=12 sigcode=128 addr=0x0 llm-api-ollama | signal arrived during cgo execution llm-api-ollama |
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#2495