[GH-ISSUE #12150] "/api/genterate" error with granite3.3:2b since 0.11.5 when using OLLAMA_MULTIUSER_CACHE=1 #70138

Open
opened 2026-05-04 20:29:15 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @ChristianAnkeZ1 on GitHub (Sep 2, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12150

What is the issue?

I am using the api to generate a response with the granite3.3:2b model. Since ollama 0.11.5 i got this server error:

an error was encountered while running the model: GGML_ASSERT(is_full && \"seq_cp() is only supported for full KV buffers\") failed

0.11.4 works fine.

I run fully on CPU in a docker container and the following environment variables are set:

HTTP_PROXY=:3128
HTTPS_PROXY=:3128
NO_PROXY=
OLLAMA_KEEP_ALIVE=30m
OLLAMA_NUM_PARALLEL=2
OLLAMA_MAX_LOADED_MODELS=2
OLLAMA_FLASH_ATTENTION=1
OLLAMA_NEW_ESTIMATES=0
OLLAMA_DEBUG=1
OLLAMA_LOAD_TIMEOUT=90m
OLLAMA_MULTIUSER_CACHE=1
OLLAMA_CONTEXT_LENGTH=8096
OLLAMA_ORIGINS=

Disabling OLLAMA_MULTIUSER_CACHE will make it work again.

ollama-error.log.txt

Relevant log output


OS

Linux

GPU

No response

CPU

AMD

Ollama version

0.11.5

Originally created by @ChristianAnkeZ1 on GitHub (Sep 2, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12150 ### What is the issue? I am using the api to generate a response with the granite3.3:2b model. Since ollama 0.11.5 i got this server error: ``` an error was encountered while running the model: GGML_ASSERT(is_full && \"seq_cp() is only supported for full KV buffers\") failed ``` 0.11.4 works fine. I run fully on CPU in a docker container and the following environment variables are set: HTTP_PROXY=<myproxy>:3128 HTTPS_PROXY=<myproxy>:3128 NO_PROXY=<some-exceptions> OLLAMA_KEEP_ALIVE=30m OLLAMA_NUM_PARALLEL=2 OLLAMA_MAX_LOADED_MODELS=2 OLLAMA_FLASH_ATTENTION=1 OLLAMA_NEW_ESTIMATES=0 OLLAMA_DEBUG=1 OLLAMA_LOAD_TIMEOUT=90m OLLAMA_MULTIUSER_CACHE=1 OLLAMA_CONTEXT_LENGTH=8096 OLLAMA_ORIGINS=<some-origins> Disabling OLLAMA_MULTIUSER_CACHE will make it work again. [ollama-error.log.txt](https://github.com/user-attachments/files/22091381/ollama-error.log.txt) ### Relevant log output ```shell ``` ### OS Linux ### GPU _No response_ ### CPU AMD ### Ollama version 0.11.5
GiteaMirror added the bug label 2026-05-04 20:29:15 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#70138