[GH-ISSUE #5791] Ability to pass --predict to llama.cpp server in ollama #29368

Open
opened 2026-04-22 08:10:15 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @1cekrim on GitHub (Jul 19, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5791

Due to the structure of the deepseek v2 coder model, there was a problem with gibberish when k shift occurred.
And to solve this, a patch that causes GGML_ASSERT when k shift occurs in the deepseek v2 model has been merged.

Anyway, the easiest way to solve this problem is to pass the --predict -2 option when running the llama.cpp server. This option limits the number of tokens to predict until the context is full.

It would be a good idea to set the n predict value as an environment variable when serving ollama, or set the value in the Modelfile so that it can be passed as the --predict value in NewLlamaServer.
Also, if possible, it would be good to apply it to ollama.com's Deepseek V2 models.

Originally created by @1cekrim on GitHub (Jul 19, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5791 Due to the structure of the deepseek v2 coder model, there was [a problem with gibberish when k shift occurred.](https://github.com/ggerganov/llama.cpp/issues/8498) And to solve this, [a patch](https://github.com/ggerganov/llama.cpp/pull/8501) that causes GGML_ASSERT when k shift occurs in the deepseek v2 model has been merged. Anyway, the easiest way to solve this problem is to pass the `--predict -2` option when running the llama.cpp server. This option limits the number of tokens to predict until the context is full. It would be a good idea to set the n predict value as an environment variable when serving ollama, or set the value in the Modelfile so that it can be passed as the `--predict` value in `NewLlamaServer`. Also, if possible, it would be good to apply it to ollama.com's Deepseek V2 models. - Related issues - https://github.com/ollama/ollama/issues/5537 - https://github.com/ollama/ollama/issues/5339
GiteaMirror added the feature request label 2026-04-22 08:10:15 -05:00
Author
Owner

@rick-github commented on GitHub (Jul 19, 2024):

I created a patch to add the ability to set predict in the model file, but it didn't help with the output of https://github.com/ollama/ollama/issues/5339. Maybe it needs a more recent version of llama.cpp.

<!-- gh-comment-id:2240027821 --> @rick-github commented on GitHub (Jul 19, 2024): I created a patch to add the ability to set `predict` in the model file, but it didn't help with the output of https://github.com/ollama/ollama/issues/5339. Maybe it needs a more recent version of llama.cpp.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#29368