[GH-ISSUE #1713] Call specific options like num_predict ignored on master branch #26729

Closed
opened 2026-04-22 03:12:42 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @janpf on GitHub (Dec 25, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1713

Hi,

as the llama.go file got refactored a few days ago I just reimplemented my PR #1640 because it got unmergeable. But it seems that the "per call" options are currently ignored on the master branch resulting in unexpected behavior as far as I can tell. I believe the issue lies in this line as not the call options but the general llm options are passed, but I'm unsure (https://github.com/jmorganca/ollama/blob/main/llm/ext_server.go#L203)

janpf@whackintosh ~> curl http://localhost:11434/api/generate -d '{"model":"llama2", "temperature":0, "prompt": "How many tokens will you generate?", "options": {"num_predict": 3}}' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   884    0   770  100   114    875    129 --:--:-- --:--:-- --:--:--  1004
{
  "model": "llama2",
  "created_at": "2023-12-25T21:58:02.661144Z",
  "response": "\n",
  "done": false
}
{
  "model": "llama2",
  "created_at": "2023-12-25T21:58:02.678706Z",
  "response": "As",
  "done": false
}
{
  "model": "llama2",
  "created_at": "2023-12-25T21:58:02.696354Z",
  "response": " a",
  "done": false
}
{
  "model": "llama2",
  "created_at": "2023-12-25T21:58:02.71379Z",
  "response": " responsible",
  "done": false
}
{
  "model": "llama2",
  "created_at": "2023-12-25T21:58:02.71388Z",
  "response": "",
  "done": true,
  "context": [
    518,
    25580,
    29962,
    3532,
    14816,
    29903,
    29958,
    5299,
    829,
    14816,
    29903,
    6778,
    13,
    13,
    5328,
    1784,
    18897,
    674,
    366,
    5706,
    29973,
    518,
    29914,
    25580,
    29962,
    13,
    13,
    2887,
    263,
    14040
  ],
  "total_duration": 875981625,
  "load_duration": 703995750,
  "prompt_eval_count": 27,
  "prompt_eval_duration": 130382000,
  "eval_count": 3,
  "eval_duration": 35132000
}
Originally created by @janpf on GitHub (Dec 25, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1713 Hi, as the llama.go file got refactored a few days ago I just reimplemented my PR #1640 because it got unmergeable. But it seems that the "per call" options are currently ignored on the master branch resulting in unexpected behavior as far as I can tell. I believe the issue lies in this line as not the call options but the general llm options are passed, but I'm unsure (https://github.com/jmorganca/ollama/blob/main/llm/ext_server.go#L203) ``` janpf@whackintosh ~> curl http://localhost:11434/api/generate -d '{"model":"llama2", "temperature":0, "prompt": "How many tokens will you generate?", "options": {"num_predict": 3}}' | jq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 884 0 770 100 114 875 129 --:--:-- --:--:-- --:--:-- 1004 { "model": "llama2", "created_at": "2023-12-25T21:58:02.661144Z", "response": "\n", "done": false } { "model": "llama2", "created_at": "2023-12-25T21:58:02.678706Z", "response": "As", "done": false } { "model": "llama2", "created_at": "2023-12-25T21:58:02.696354Z", "response": " a", "done": false } { "model": "llama2", "created_at": "2023-12-25T21:58:02.71379Z", "response": " responsible", "done": false } { "model": "llama2", "created_at": "2023-12-25T21:58:02.71388Z", "response": "", "done": true, "context": [ 518, 25580, 29962, 3532, 14816, 29903, 29958, 5299, 829, 14816, 29903, 6778, 13, 13, 5328, 1784, 18897, 674, 366, 5706, 29973, 518, 29914, 25580, 29962, 13, 13, 2887, 263, 14040 ], "total_duration": 875981625, "load_duration": 703995750, "prompt_eval_count": 27, "prompt_eval_duration": 130382000, "eval_count": 3, "eval_duration": 35132000 } ```
GiteaMirror added the bug label 2026-04-22 03:12:42 -05:00
Author
Owner

@jmorganca commented on GitHub (Dec 28, 2023):

Sorry to hear your change was un-mergeable @janpf as I know you put work into that PR – let me (and other maintainers) know how we can help. That was a larger change that came in to help build more reliable binding to llama.cpp (incl support for other platforms/GPUs) sorry it took you by surprise!

In terms of this issue – do you have an example with the API that shows it not respecting the options? The one you provided seems to terminate early from num_predict (although it's 4 instead of 3, perhaps a separate issue)

<!-- gh-comment-id:1870723086 --> @jmorganca commented on GitHub (Dec 28, 2023): Sorry to hear your change was un-mergeable @janpf as I know you put work into that PR – let me (and other maintainers) know how we can help. That was a larger change that came in to help build more reliable binding to llama.cpp (incl support for other platforms/GPUs) sorry it took you by surprise! In terms of this issue – do you have an example with the API that shows it not respecting the options? The one you provided seems to terminate early from `num_predict` (although it's 4 instead of 3, perhaps a separate issue)
Author
Owner

@janpf commented on GitHub (Jan 8, 2024):

sorry it took you by surprise!

no worries, i mostly reimplemented it but i had troubles passing my options through as they get ignored. That's how i found this issue.

As it was an issue on the main branch and my PR is no longer mergeable anyways i just found that the issue has been fixed in the meantime. I still have issues of it respecting num_predict, but my options are now passed through.

Now the num_predict seems to be mostly respected, although usually it is off by one (not much of an issue), but sometimes it's completely ignored. I didn't further investigate where this might be stemming from :(

I just reopened #1640. Thanks!

<!-- gh-comment-id:1880903261 --> @janpf commented on GitHub (Jan 8, 2024): > sorry it took you by surprise! no worries, i mostly reimplemented it but i had troubles passing my options through as they get ignored. That's how i found this issue. As it was an issue on the main branch and my PR is no longer mergeable anyways i just found that the issue has been fixed in the meantime. I still have issues of it respecting `num_predict`, but my options are now passed through. Now the `num_predict` seems to be mostly respected, although usually it is off by one (not much of an issue), but sometimes it's completely ignored. I didn't further investigate where this might be stemming from :( I just reopened #1640. Thanks!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#26729