[GH-ISSUE #5329] clip models fail to load with unicode characters in OLLAMA_MODELS path on windows #49846

Closed
opened 2026-04-28 13:10:36 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @Derix76 on GitHub (Jun 27, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5329

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

I tried to start llava:1.6 (or any similar llava based modell) an the llama server terminated.
llama3 modell or different non llava models work just fine.

GPU is: NVIDIA GeForce RTX 4060" total="8.0 GiB" available="6.9 GiB"
CPU is: AMD Ryzen 7 4700G with Radeon GPU (ignored by Ollama) ("unsupported Radeon iGPU detected skipping" id=0 name="AMD Radeon(TM) Graphics")
OS in Win 11
Geforce Drivers: 555.99 Studio

Log extract:
time=2024-06-27T16:53:15.204+02:00 level=INFO source=memory.go:309 msg="offload to cuda" layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[7.2 GiB]" memory.required.full="5.3 GiB" memory.required.partial="5.3 GiB" memory.required.kv="256.0 MiB" memory.required.allocations="[5.3 GiB]" memory.weights.total="3.9 GiB" memory.weights.repeating="3.8 GiB" memory.weights.nonrepeating="102.6 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="181.0 MiB"
time=2024-06-27T16:53:15.205+02:00 level=WARN source=server.go:241 msg="multimodal models don't support parallel requests yet"
time=2024-06-27T16:53:15.208+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="C:\Users\Stefan Hüttner\AppData\Local\Programs\Ollama\ollama_runners\cuda_v11.3\ollama_llama_server.exe --model C:\Users\Stefan Hüttner\.ollama\models\blobs\sha256-170370233dd5c5415250a2ecd5c71586352850729062ccef1496385647293868 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --mmproj C:\Users\Stefan Hüttner\.ollama\models\blobs\sha256-72d6f08a42f656d36b356dbe0920675899a99ce21192fd66266fb7d82ed07539 --no-mmap --parallel 1 --port 51274"
time=2024-06-27T16:53:15.742+02:00 level=INFO source=sched.go:382 msg="loaded runners" count=1
time=2024-06-27T16:53:15.742+02:00 level=INFO source=server.go:556 msg="waiting for llama runner to start responding"
time=2024-06-27T16:53:15.743+02:00 level=INFO source=server.go:594 msg="waiting for server to become available" status="llm server error"
INFO [wmain] build info | build=3171 commit="7c26775a" tid="17384" timestamp=1719499996
INFO [wmain] system info | n_threads=8 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="17384" timestamp=1719499996 total_threads=16
INFO [wmain] HTTP server listening | hostname="127.0.0.1" n_threads_http="15" port="51274" tid="17384" timestamp=1719499996
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4060, compute capability 8.9, VMM: yes
ERROR [load_model] unable to load clip model | model="C:\Users\Stefan Hüttner\.ollama\models\blobs\sha256-72d6f08a42f656d36b356dbe0920675899a99ce21192fd66266fb7d82ed07539" tid="17384" timestamp=1719499996
time=2024-06-27T16:53:16.512+02:00 level=ERROR source=sched.go:388 msg="error loading llama server" error="llama runner process has terminated: exit status 0xc0000409 "

OS

Windows

GPU

Nvidia

CPU

AMD

Ollama version

0.1.47

Originally created by @Derix76 on GitHub (Jun 27, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5329 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? I tried to start llava:1.6 (or any similar llava based modell) an the llama server terminated. llama3 modell or different non llava models work just fine. GPU is: NVIDIA GeForce RTX 4060" total="8.0 GiB" available="6.9 GiB" CPU is: AMD Ryzen 7 4700G with Radeon GPU (ignored by Ollama) ("unsupported Radeon iGPU detected skipping" id=0 name="AMD Radeon(TM) Graphics") OS in Win 11 Geforce Drivers: 555.99 Studio Log extract: time=2024-06-27T16:53:15.204+02:00 level=INFO source=memory.go:309 msg="offload to cuda" layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[7.2 GiB]" memory.required.full="5.3 GiB" memory.required.partial="5.3 GiB" memory.required.kv="256.0 MiB" memory.required.allocations="[5.3 GiB]" memory.weights.total="3.9 GiB" memory.weights.repeating="3.8 GiB" memory.weights.nonrepeating="102.6 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="181.0 MiB" time=2024-06-27T16:53:15.205+02:00 level=WARN source=server.go:241 msg="multimodal models don't support parallel requests yet" time=2024-06-27T16:53:15.208+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="C:\\Users\\Stefan Hüttner\\AppData\\Local\\Programs\\Ollama\\ollama_runners\\cuda_v11.3\\ollama_llama_server.exe --model C:\\Users\\Stefan Hüttner\\.ollama\\models\\blobs\\sha256-170370233dd5c5415250a2ecd5c71586352850729062ccef1496385647293868 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --mmproj C:\\Users\\Stefan Hüttner\\.ollama\\models\\blobs\\sha256-72d6f08a42f656d36b356dbe0920675899a99ce21192fd66266fb7d82ed07539 --no-mmap --parallel 1 --port 51274" time=2024-06-27T16:53:15.742+02:00 level=INFO source=sched.go:382 msg="loaded runners" count=1 time=2024-06-27T16:53:15.742+02:00 level=INFO source=server.go:556 msg="waiting for llama runner to start responding" time=2024-06-27T16:53:15.743+02:00 level=INFO source=server.go:594 msg="waiting for server to become available" status="llm server error" INFO [wmain] build info | build=3171 commit="7c26775a" tid="17384" timestamp=1719499996 INFO [wmain] system info | n_threads=8 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="17384" timestamp=1719499996 total_threads=16 INFO [wmain] HTTP server listening | hostname="127.0.0.1" n_threads_http="15" port="51274" tid="17384" timestamp=1719499996 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 4060, compute capability 8.9, VMM: yes ERROR [load_model] unable to load clip model | model="C:\\Users\\Stefan Hüttner\\.ollama\\models\\blobs\\sha256-72d6f08a42f656d36b356dbe0920675899a99ce21192fd66266fb7d82ed07539" tid="17384" timestamp=1719499996 time=2024-06-27T16:53:16.512+02:00 level=ERROR source=sched.go:388 msg="error loading llama server" error="llama runner process has terminated: exit status 0xc0000409 " ### OS Windows ### GPU Nvidia ### CPU AMD ### Ollama version 0.1.47
GiteaMirror added the bugwindows labels 2026-04-28 13:10:37 -05:00
Author
Owner

@jsoncode commented on GitHub (Jun 28, 2024):

I also encountered the same problem
微信截图_20240628211157

<!-- gh-comment-id:2196876992 --> @jsoncode commented on GitHub (Jun 28, 2024): I also encountered the same problem ![微信截图_20240628211157](https://github.com/ollama/ollama/assets/13176273/0270077e-32d0-4ef2-ae3c-abc684ee18e8)
Author
Owner

@jsoncode commented on GitHub (Jun 28, 2024):

I updated ollama and it worked.

<!-- gh-comment-id:2196897658 --> @jsoncode commented on GitHub (Jun 28, 2024): I updated ollama and it worked.
Author
Owner

@Derix76 commented on GitHub (Jun 30, 2024):

Thanks, but error still persists after update.

<!-- gh-comment-id:2198502904 --> @Derix76 commented on GitHub (Jun 30, 2024): Thanks, but error still persists after update.
Author
Owner

@gaduffl commented on GitHub (Jun 30, 2024):

The error occured after updating to the latest version. I get the issue now with llama3 which was working fine before.

<!-- gh-comment-id:2198662513 --> @gaduffl commented on GitHub (Jun 30, 2024): The error occured after updating to the latest version. I get the issue now with llama3 which was working fine before.
Author
Owner

@Marten-Ka commented on GitHub (Jul 2, 2024):

Same here, opened another issue https://github.com/ollama/ollama/issues/5431

<!-- gh-comment-id:2202941630 --> @Marten-Ka commented on GitHub (Jul 2, 2024): Same here, opened another issue [https://github.com/ollama/ollama/issues/5431](url)
Author
Owner

@dhiltgen commented on GitHub (Jul 3, 2024):

It looks like there's a bug in the clip model loading code in C++ that doesn't handle unicode characters properly.

Until we can get this fixed, a workaround is create a model directory on your C: with only ASCII characters and set OLLAMA_MODELS to that path for the server.

https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server

<!-- gh-comment-id:2204771134 --> @dhiltgen commented on GitHub (Jul 3, 2024): It looks like there's a bug in the clip model loading code in C++ that doesn't handle unicode characters properly. Until we can get this fixed, a workaround is create a model directory on your `C:` with only ASCII characters and set `OLLAMA_MODELS` to that path for the server. https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server
Author
Owner

@Derix76 commented on GitHub (Jul 3, 2024):

Dear @dhiltgen - thanks a lot. This works like a charm. Unexpected that unicode is not handled, but workaround is fine. Many thanks for our great work!

<!-- gh-comment-id:2205416761 --> @Derix76 commented on GitHub (Jul 3, 2024): Dear @dhiltgen - thanks a lot. This works like a charm. Unexpected that unicode is not handled, but workaround is fine. Many thanks for our great work!
Author
Owner

@Derix76 commented on GitHub (Jul 3, 2024):

Item closed, workaround is doing fine

<!-- gh-comment-id:2205418203 --> @Derix76 commented on GitHub (Jul 3, 2024): Item closed, workaround is doing fine
Author
Owner

@dhiltgen commented on GitHub (Jul 3, 2024):

Thanks for submitting the issue. This definitely should work, so we'll get this fixed properly.

<!-- gh-comment-id:2206598731 --> @dhiltgen commented on GitHub (Jul 3, 2024): Thanks for submitting the issue. This definitely should work, so we'll get this fixed properly.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#49846