[GH-ISSUE #10730] Ollama ver. 0.7.0 force to update llama3.2-vision model (working in ver 0.6.0), but then it did not work #53560

Closed
opened 2026-04-29 03:47:34 -05:00 by GiteaMirror · 10 comments
Owner

Originally created by @DanoPTT on GitHub (May 16, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10730

What is the issue?

I updated to Ollama 0.7.0, than I try to run llama3.2-vision model (from ver 0.6). It refuses run, and ask to download actualized llama3.2-vision model. I do that, but now this model goes to error when trying to analyze picture.
Other models like gemma3:12b works still correctly.

Relevant log output

time=2025-05-16T13:11:57.887+02:00 level=INFO source=runner.go:836 msg="starting ollama engine"
time=2025-05-16T13:11:57.890+02:00 level=INFO source=runner.go:899 msg="Server listening on 127.0.0.1:60512"
time=2025-05-16T13:11:57.928+02:00 level=INFO source=ggml.go:73 msg="" architecture=mllama file_type=Q4_K_M name="" description="" num_tensors=908 num_key_values=39
load_backend: loaded CPU backend from C:\ollama\lib\ollama\ggml-cpu-haswell.dll
time=2025-05-16T13:11:57.949+02:00 level=WARN source=sched.go:676 msg="gpu VRAM usage didn't recover within timeout" seconds=5.2667797 runner.size="11.7 GiB" runner.vram="11.7 GiB" runner.parallel=1 runner.pid=26216 runner.model=C:\Users\petrik.MMSSOFTEC\.ollama\models\blobs\sha256-7633fdffe14c0f7acc115402376be5bd6052220c348676c5133dc011b35e2429
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes
load_backend: loaded CUDA backend from C:\ollama\lib\ollama\cuda_v12\ggml-cuda.dll
time=2025-05-16T13:11:58.050+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2025-05-16T13:11:58.092+02:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server loading model"
time=2025-05-16T13:11:58.167+02:00 level=INFO source=ggml.go:299 msg="model weights" buffer=CUDA0 size="7.2 GiB"
time=2025-05-16T13:11:58.167+02:00 level=INFO source=ggml.go:299 msg="model weights" buffer=CPU size="285.7 MiB"
time=2025-05-16T13:11:58.200+02:00 level=WARN source=sched.go:676 msg="gpu VRAM usage didn't recover within timeout" seconds=5.5183276 runner.size="11.7 GiB" runner.vram="11.7 GiB" runner.parallel=1 runner.pid=26216 runner.model=C:\Users\petrik.MMSSOFTEC\.ollama\models\blobs\sha256-7633fdffe14c0f7acc115402376be5bd6052220c348676c5133dc011b35e2429
time=2025-05-16T13:12:00.759+02:00 level=INFO source=ggml.go:556 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 size="280.0 MiB"
time=2025-05-16T13:12:00.759+02:00 level=INFO source=ggml.go:556 msg="compute graph" backend=CPU buffer_type=CPU size="8.0 MiB"
time=2025-05-16T13:12:00.871+02:00 level=INFO source=server.go:630 msg="llama runner started in 3.04 seconds"
C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\pad.cu:44: GGML_ASSERT(src0->ne[3] == 1 && dst->ne[3] == 1) failed
[GIN] 2025/05/16 - 13:12:01 | 200 |    9.4290978s |       127.0.0.1 | POST     "/v1/chat/completions"
time=2025-05-16T13:12:02.044+02:00 level=ERROR source=server.go:457 msg="llama runner terminated" error="exit status 0xc0000409"

OS

Windows

GPU

Nvidia

CPU

AMD

Ollama version

0.7.0

Originally created by @DanoPTT on GitHub (May 16, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10730 ### What is the issue? I updated to Ollama 0.7.0, than I try to run llama3.2-vision model (from ver 0.6). It refuses run, and ask to download actualized llama3.2-vision model. I do that, but now this model goes to error when trying to analyze picture. Other models like gemma3:12b works still correctly. ### Relevant log output ```shell time=2025-05-16T13:11:57.887+02:00 level=INFO source=runner.go:836 msg="starting ollama engine" time=2025-05-16T13:11:57.890+02:00 level=INFO source=runner.go:899 msg="Server listening on 127.0.0.1:60512" time=2025-05-16T13:11:57.928+02:00 level=INFO source=ggml.go:73 msg="" architecture=mllama file_type=Q4_K_M name="" description="" num_tensors=908 num_key_values=39 load_backend: loaded CPU backend from C:\ollama\lib\ollama\ggml-cpu-haswell.dll time=2025-05-16T13:11:57.949+02:00 level=WARN source=sched.go:676 msg="gpu VRAM usage didn't recover within timeout" seconds=5.2667797 runner.size="11.7 GiB" runner.vram="11.7 GiB" runner.parallel=1 runner.pid=26216 runner.model=C:\Users\petrik.MMSSOFTEC\.ollama\models\blobs\sha256-7633fdffe14c0f7acc115402376be5bd6052220c348676c5133dc011b35e2429 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes load_backend: loaded CUDA backend from C:\ollama\lib\ollama\cuda_v12\ggml-cuda.dll time=2025-05-16T13:11:58.050+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) time=2025-05-16T13:11:58.092+02:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server loading model" time=2025-05-16T13:11:58.167+02:00 level=INFO source=ggml.go:299 msg="model weights" buffer=CUDA0 size="7.2 GiB" time=2025-05-16T13:11:58.167+02:00 level=INFO source=ggml.go:299 msg="model weights" buffer=CPU size="285.7 MiB" time=2025-05-16T13:11:58.200+02:00 level=WARN source=sched.go:676 msg="gpu VRAM usage didn't recover within timeout" seconds=5.5183276 runner.size="11.7 GiB" runner.vram="11.7 GiB" runner.parallel=1 runner.pid=26216 runner.model=C:\Users\petrik.MMSSOFTEC\.ollama\models\blobs\sha256-7633fdffe14c0f7acc115402376be5bd6052220c348676c5133dc011b35e2429 time=2025-05-16T13:12:00.759+02:00 level=INFO source=ggml.go:556 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 size="280.0 MiB" time=2025-05-16T13:12:00.759+02:00 level=INFO source=ggml.go:556 msg="compute graph" backend=CPU buffer_type=CPU size="8.0 MiB" time=2025-05-16T13:12:00.871+02:00 level=INFO source=server.go:630 msg="llama runner started in 3.04 seconds" C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\pad.cu:44: GGML_ASSERT(src0->ne[3] == 1 && dst->ne[3] == 1) failed [GIN] 2025/05/16 - 13:12:01 | 200 | 9.4290978s | 127.0.0.1 | POST "/v1/chat/completions" time=2025-05-16T13:12:02.044+02:00 level=ERROR source=server.go:457 msg="llama runner terminated" error="exit status 0xc0000409" ``` ### OS Windows ### GPU Nvidia ### CPU AMD ### Ollama version 0.7.0
GiteaMirror added the bug label 2026-04-29 03:47:34 -05:00
Author
Owner

@bastard9 commented on GitHub (May 19, 2025):

I have the same problem.

<!-- gh-comment-id:2889646950 --> @bastard9 commented on GitHub (May 19, 2025): I have the same problem.
Author
Owner

@junewinter2013 commented on GitHub (May 19, 2025):

We are having similar problem with llama3.2-vision on ollama 0.7.0 here also.
qwen2.5-vl is fine.

<!-- gh-comment-id:2890908224 --> @junewinter2013 commented on GitHub (May 19, 2025): We are having similar problem with llama3.2-vision on ollama 0.7.0 here also. qwen2.5-vl is fine.
Author
Owner

@HarrievG commented on GitHub (May 19, 2025):

same

GGML_ASSERT(sections.v[0] \u003e 0 || sections.v[1] \u003e 0 || sections.v[2] \u003e 0) failed

PM for payload

<!-- gh-comment-id:2891543251 --> @HarrievG commented on GitHub (May 19, 2025): same GGML_ASSERT(sections.v[0] \u003e 0 || sections.v[1] \u003e 0 || sections.v[2] \u003e 0) failed PM for payload
Author
Owner

@dontpanic121 commented on GitHub (May 21, 2025):

I encountered the issue where a model stopped running after an update. Unfortunately, I don't have a straightforward solution at the moment, except for attempting a roll-back to a previous version of Ollama

  • The problem started with v0.7.0.
  • The last working version was v0.6.8

The following message appears when testing Ollama in the CMD in Windows 11.

{"error":"'llama3.2-vision' is no longer compatible with your version of Ollama and has been replaced by a newer version. To re-download, run 'ollama pull llama3.2-vision'"}

It would be nice to know whether this issue is intended behavior, left unfixed by design, or something that needs to be addressed.

<!-- gh-comment-id:2896438828 --> @dontpanic121 commented on GitHub (May 21, 2025): I encountered the issue where a model stopped running after an update. Unfortunately, I don't have a straightforward solution at the moment, except for attempting a roll-back to a previous version of Ollama - The problem started with v0.7.0. - The last working version was v0.6.8 The following message appears when testing Ollama in the CMD in Windows 11. ```powershell {"error":"'llama3.2-vision' is no longer compatible with your version of Ollama and has been replaced by a newer version. To re-download, run 'ollama pull llama3.2-vision'"} ``` It would be nice to know whether this issue is intended behavior, left unfixed by design, or something that needs to be addressed.
Author
Owner

@DanoPTT commented on GitHub (May 21, 2025):

if you want to use your llama3.2.-vision model downgrade to 0.6.8. I upgraded to 0.7.0, then i got message like you, than download actualized model llama3.2.-vision, and since then I can not analyze pictures with that model. For text is model working, but for pictures not.

<!-- gh-comment-id:2896767500 --> @DanoPTT commented on GitHub (May 21, 2025): if you want to use your llama3.2.-vision model downgrade to 0.6.8. I upgraded to 0.7.0, then i got message like you, than download actualized model llama3.2.-vision, and since then I can not analyze pictures with that model. For text is model working, but for pictures not.
Author
Owner

@rick-github commented on GitHub (May 24, 2025):

Model has been updated and 0.7.1 has a fix that should fix the crash.

<!-- gh-comment-id:2906093491 --> @rick-github commented on GitHub (May 24, 2025): Model has been updated and 0.7.1 has a fix that should fix the crash.
Author
Owner

@dontpanic121 commented on GitHub (May 24, 2025):

Model has been updated and 0.7.1 has a fix that should fix the crash.

I upgraded to 0.7.1 from 0.6.8 and the issue appears to still persist for me.

API Error: Error from Ollama: AI_RetryError: Failed after 3 attempts. Last error: 'llama3.2-vision' is no longer compatible with your version of Ollama and has been replaced by a newer version. To re-download, run 'ollama pull llama3.2-vision'
  • The last working version for me was v0.6.8
  • The problem started with v0.7.0
  • The problem still persist in v0.7.1
<!-- gh-comment-id:2906278143 --> @dontpanic121 commented on GitHub (May 24, 2025): > Model has been updated and 0.7.1 has a fix that should fix the crash. I upgraded to 0.7.1 from 0.6.8 and the issue appears to still persist for me. ```shell API Error: Error from Ollama: AI_RetryError: Failed after 3 attempts. Last error: 'llama3.2-vision' is no longer compatible with your version of Ollama and has been replaced by a newer version. To re-download, run 'ollama pull llama3.2-vision' ``` - The last working version for me was v0.6.8 - The problem started with v0.7.0 - The problem still persist in v0.7.1
Author
Owner

@rick-github commented on GitHub (May 24, 2025):

Run the following on your ollama server:

ollama pull llama3.2-vision
<!-- gh-comment-id:2906509808 --> @rick-github commented on GitHub (May 24, 2025): Run the following on your ollama server: ``` ollama pull llama3.2-vision ```
Author
Owner

@dontpanic121 commented on GitHub (May 24, 2025):

Run the following on your ollama server:

ollama pull llama3.2-vision

Did not work and was forced to downgrade again to v0.6.8. I guess I am kind of stuck until something gets fixed or just live with it.

API Error: Error from Ollama: AI_RetryError: Failed after 3 attempts. Last error: 'llama3.2-vision' is no longer compatible with your version of Ollama and has been replaced by a newer version. To re-download, run 'ollama pull llama3.2-vision'
<!-- gh-comment-id:2907323100 --> @dontpanic121 commented on GitHub (May 24, 2025): > Run the following on your ollama server: > > ``` > ollama pull llama3.2-vision > ``` Did not work and was forced to downgrade again to v0.6.8. I guess I am kind of stuck until something gets fixed or just live with it. ```powershell API Error: Error from Ollama: AI_RetryError: Failed after 3 attempts. Last error: 'llama3.2-vision' is no longer compatible with your version of Ollama and has been replaced by a newer version. To re-download, run 'ollama pull llama3.2-vision' ```
Author
Owner

@rick-github commented on GitHub (May 24, 2025):

Upgrade the ollama server and pull the model. It works for everybody else, if you still have a problem, logs will aid in debugging.

<!-- gh-comment-id:2907334780 --> @rick-github commented on GitHub (May 24, 2025): Upgrade the ollama server and pull the model. It works for everybody else, if you still have a problem, [logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will aid in debugging.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#53560