[GH-ISSUE #11892] 🤔 num_predict parameter setting is ignored. #7895

Closed
opened 2026-04-12 20:03:03 -05:00 by GiteaMirror · 27 comments
Owner

Originally created by @FieldMouse-AI on GitHub (Aug 13, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11892

What is the issue?

👉 out.log 👈 I've added the log for the ollama 0.11.4 run here.

🤔 I have noticed that when I try to make the LLM generate up to 4000 tokens of content, it always truncates the results to approximately 1000 tokens.

I set num_ctx: 16384 and num_predict: 4000 and the content that I feed into the LLM is approximately 9000 tokens.

  • my query input: 9000 tokens
  • my desired output (num_predict): 4000 tokens
  • total tokens needed for the request: 9000 + 4000 = 13,000 tokens
  • To accommodate the 13,000 tokens I set num_ctx: 16384, which provides a buffer of over 3000 extra tokens. This should be more than enough extra space.

I then tested this query with all of the version listed below and got the same basic results. I never ever hit my 4000 token target:

  • 0.92.0: ~790 tokens
  • 0.96.0: ~1171 tokens
  • 0.10.0: ~1100 tokens
  • 0.10.1: ~2029 tokens
  • 0.11.4: ~1330 tokens - 👉 out.log 👈 I've added the log for the ollama 0.11.4 run here.

It would be quite useful to be able to generate documents that can reach the higher num_predicts that I set.
The current behavior does not match the documentation, which suggests that num_predict would have an effect.

What do you folks think is the problem here? 🤔

🤗Thank you for your time.🤗

Relevant log output


OS

Docker

GPU

No response

CPU

AMD

Ollama version

0.11.4

Originally created by @FieldMouse-AI on GitHub (Aug 13, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11892 ### What is the issue? **👉 [out.log](https://github.com/user-attachments/files/21762091/out.log) 👈 I've added the log for the ollama `0.11.4` run here.** 🤔 I have noticed that when I try to make the LLM generate up to 4000 tokens of content, it always truncates the results to approximately 1000 tokens. I set `num_ctx: 16384` and `num_predict: 4000` and the content that I feed into the LLM is approximately 9000 tokens. - my query input: 9000 tokens - my desired output (`num_predict`): **4000 tokens** - total tokens needed for the request: **9000 + 4000 = 13,000 tokens** - To accommodate the 13,000 tokens I set **`num_ctx: 16384`**, which provides a buffer of over 3000 extra tokens. This should be more than enough extra space. I then tested this query with all of the version listed below and got the same basic results. I never ever hit my 4000 token target: - 0.92.0: ~790 tokens - 0.96.0: ~1171 tokens - 0.10.0: ~1100 tokens - 0.10.1: ~2029 tokens - 0.11.4: ~1330 tokens - **👉 [out.log](https://github.com/user-attachments/files/21762091/out.log) 👈 I've added the log for the ollama `0.11.4` run here.** It would be quite useful to be able to generate documents that can reach the higher `num_predict`s that I set. The current behavior does not match the documentation, which suggests that `num_predict` would have an effect. What do you folks think is the problem here? 🤔 🤗Thank you for your time.🤗 ### Relevant log output ```shell ``` ### OS Docker ### GPU _No response_ ### CPU AMD ### Ollama version 0.11.4
GiteaMirror added the bug label 2026-04-12 20:03:03 -05:00
Author
Owner

@rick-github commented on GitHub (Aug 13, 2025):

num_predict is a limit, not a goal. If you want the model to generate more tokens, it needs instruction to do so. All that num_predict does is stop the generation when the number of tokens reaches the limit, it doesn't encourage the model to generate that many tokens.

<!-- gh-comment-id:3185812579 --> @rick-github commented on GitHub (Aug 13, 2025): `num_predict` is a limit, not a goal. If you want the model to generate more tokens, it needs instruction to do so. All that `num_predict` does is stop the generation when the number of tokens reaches the limit, it doesn't encourage the model to generate that many tokens.
Author
Owner

@FieldMouse-AI commented on GitHub (Aug 13, 2025):

num_predict is a limit, not a goal. If you want the model to generate more tokens, it needs instruction to do so. All that num_predict does is stop the generation when the number of tokens reaches the limit, it doesn't encourage the model to generate that many tokens.

Of course. Here is the prompt that I use along with the context:

Please write a comprehensive report of at least 16000 words about PostgreSQL and its application. 
Please elaborate on key points in extreme detail.
<!-- gh-comment-id:3185837318 --> @FieldMouse-AI commented on GitHub (Aug 13, 2025): > `num_predict` is a limit, not a goal. If you want the model to generate more tokens, it needs instruction to do so. All that `num_predict` does is stop the generation when the number of tokens reaches the limit, it doesn't encourage the model to generate that many tokens. Of course. Here is the prompt that I use along with the context: ``` Please write a comprehensive report of at least 16000 words about PostgreSQL and its application. Please elaborate on key points in extreme detail. ```
Author
Owner

@rick-github commented on GitHub (Aug 13, 2025):

Apart from the length, does the generated response meet the requirements?

<!-- gh-comment-id:3185840495 --> @rick-github commented on GitHub (Aug 13, 2025): Apart from the length, does the generated response meet the requirements?
Author
Owner

@FieldMouse-AI commented on GitHub (Aug 13, 2025):

Yes. The actual content is correct.

<!-- gh-comment-id:3185853286 --> @FieldMouse-AI commented on GitHub (Aug 13, 2025): Yes. The actual content is correct.
Author
Owner

@rick-github commented on GitHub (Aug 13, 2025):

Then it seems the model ran out of things to say. If you give it an explicit list of points that required further discussion, then it may generate more tokens. Or you could take the generated text and feed back in with the instruction "Expand on this".

<!-- gh-comment-id:3185859469 --> @rick-github commented on GitHub (Aug 13, 2025): Then it seems the model ran out of things to say. If you give it an explicit list of points that required further discussion, then it may generate more tokens. Or you could take the generated text and feed back in with the instruction "Expand on this".
Author
Owner

@FieldMouse-AI commented on GitHub (Aug 13, 2025):

Also, I am in the process of testing with num_predict: -1 and I am still hitting the ~1000 token cut-off.

<!-- gh-comment-id:3185859917 --> @FieldMouse-AI commented on GitHub (Aug 13, 2025): Also, I am in the process of testing with `num_predict: -1` and I am still hitting the ~1000 token cut-off.
Author
Owner

@rick-github commented on GitHub (Aug 13, 2025):

num_predict:-1 means no limit, as in when the buffer fills up, it will be shifted to make room for more tokens, losing the tokens at the head of the buffer.

<!-- gh-comment-id:3185864052 --> @rick-github commented on GitHub (Aug 13, 2025): `num_predict:-1` means no limit, as in when the buffer fills up, it will be shifted to make room for more tokens, losing the tokens at the head of the buffer.
Author
Owner

@FieldMouse-AI commented on GitHub (Aug 13, 2025):

Then it seems the model ran out of things to say. If you give it an explicit list of points that required further discussion, then it may generate more tokens. Or you could take the generated text and feed back in with the instruction "Expand on this".

Ah! I see what you are getting at. I have tried this with different topics and hit the same wall.

However, it is not impossible that the model happened to say all that it needed.

I will change it back to num_predict: 4000, then I will do the "Expand on this" thing and see what it does.

<!-- gh-comment-id:3185867113 --> @FieldMouse-AI commented on GitHub (Aug 13, 2025): > Then it seems the model ran out of things to say. If you give it an explicit list of points that required further discussion, then it may generate more tokens. Or you could take the generated text and feed back in with the instruction "Expand on this". Ah! I see what you are getting at. I have tried this with different topics and hit the same wall. However, it is not impossible that the model happened to say all that it needed. I will change it back to `num_predict: 4000`, then I will do the "Expand on this" thing and see what it does.
Author
Owner

@FieldMouse-AI commented on GitHub (Aug 13, 2025):

Then it seems the model ran out of things to say. If you give it an explicit list of points that required further discussion, then it may generate more tokens. Or you could take the generated text and feed back in with the instruction "Expand on this".

Ah! I see what you are getting at. I have tried this with different topics and hit the same wall.

However, it is not impossible that the model happened to say all that it needed.

I will change it back to num_predict: 4000, then I will do the "Expand on this" thing and see what it does.

@rick-github , I tried the "Expand on this: ..." approach and it still returned an approximately 1000 token response.

<!-- gh-comment-id:3185891887 --> @FieldMouse-AI commented on GitHub (Aug 13, 2025): > > Then it seems the model ran out of things to say. If you give it an explicit list of points that required further discussion, then it may generate more tokens. Or you could take the generated text and feed back in with the instruction "Expand on this". > > Ah! I see what you are getting at. I have tried this with different topics and hit the same wall. > > However, it is not impossible that the model happened to say all that it needed. > > I will change it back to `num_predict: 4000`, then I will do the "Expand on this" thing and see what it does. @rick-github , I tried the "Expand on this: ..." approach and it still returned an approximately 1000 token response.
Author
Owner

@rick-github commented on GitHub (Aug 13, 2025):

llama3.2:1b-instruct-q4_K_M is a tiny model, try something larger that is better at writing.

<!-- gh-comment-id:3185911665 --> @rick-github commented on GitHub (Aug 13, 2025): llama3.2:1b-instruct-q4_K_M is a tiny model, try something larger that is better at writing.
Author
Owner

@FieldMouse-AI commented on GitHub (Aug 13, 2025):

llama3.2:1b-instruct-q4_K_M is a tiny model, try something larger that is better at writing.

I just tried llama3.2:3b-instruct-q4_K_M using the "Expand on this: ..." prompt, and the token count of the results was 1016 tokens.

<!-- gh-comment-id:3185966955 --> @FieldMouse-AI commented on GitHub (Aug 13, 2025): > llama3.2:1b-instruct-q4_K_M is a tiny model, try something larger that is better at writing. I just tried `llama3.2:3b-instruct-q4_K_M` using the "Expand on this: ..." prompt, and the token count of the results was 1016 tokens.
Author
Owner

@rick-github commented on GitHub (Aug 13, 2025):

llama3.2:3b-instruct-q4_K_M is a tiny model, try something larger that is better at writing.

<!-- gh-comment-id:3185971265 --> @rick-github commented on GitHub (Aug 13, 2025): llama3.2:3b-instruct-q4_K_M is a tiny model, try something larger that is better at writing.
Author
Owner

@FieldMouse-AI commented on GitHub (Aug 13, 2025):

llama3.2:3b-instruct-q4_K_M is a tiny model, try something larger that is better at writing.

OK. I am right now waiting for the results from llama3.1:8b-instruct-q4_K_M.

<!-- gh-comment-id:3185985272 --> @FieldMouse-AI commented on GitHub (Aug 13, 2025): > llama3.2:3b-instruct-q4_K_M is a tiny model, try something larger that is better at writing. OK. I am right now waiting for the results from `llama3.1:8b-instruct-q4_K_M`.
Author
Owner

@FieldMouse-AI commented on GitHub (Aug 13, 2025):

llama3.2:3b-instruct-q4_K_M is a tiny model, try something larger that is better at writing.

OK. I am right now waiting for the results from llama3.1:8b-instruct-q4_K_M.

Unfortunately, this time it produced less than 1000 tokens. Basically, in line with the about 1000 token results, but not reaching 4000 tokens.

<!-- gh-comment-id:3186002864 --> @FieldMouse-AI commented on GitHub (Aug 13, 2025): > > llama3.2:3b-instruct-q4_K_M is a tiny model, try something larger that is better at writing. > > OK. I am right now waiting for the results from `llama3.1:8b-instruct-q4_K_M`. Unfortunately, this time it produced less than 1000 tokens. Basically, in line with the about 1000 token results, but not reaching 4000 tokens.
Author
Owner

@rick-github commented on GitHub (Aug 13, 2025):

$ ollama run --verbose qwen3 'Please write a comprehensive report of at least 16000 words about PostgreSQL and its application. Please elaborate on key points in extreme detail.'
...
eval count:           10606 token(s)
...
<!-- gh-comment-id:3186005968 --> @rick-github commented on GitHub (Aug 13, 2025): ``` $ ollama run --verbose qwen3 'Please write a comprehensive report of at least 16000 words about PostgreSQL and its application. Please elaborate on key points in extreme detail.' ... eval count: 10606 token(s) ... ```
Author
Owner

@FieldMouse-AI commented on GitHub (Aug 13, 2025):

So, could this be the case that llama models have a hard limit of 1000 tokens of some kind?

I specifically remember them going into full on "fill the context mode". It is why I got into prompting with word limits.

Let me try qwen3 and see what happens.

<!-- gh-comment-id:3186036037 --> @FieldMouse-AI commented on GitHub (Aug 13, 2025): So, could this be the case that llama models have a hard limit of 1000 tokens of some kind? I specifically remember them going into full on "fill the context mode". It is why I got into prompting with word limits. Let me try qwen3 and see what happens.
Author
Owner

@rick-github commented on GitHub (Aug 13, 2025):

No, the models you are choosing are not good at writing. Try something larger that is better at writing.

<!-- gh-comment-id:3186038632 --> @rick-github commented on GitHub (Aug 13, 2025): No, the models you are choosing are not good at writing. Try something larger that is better at writing.
Author
Owner

@FieldMouse-AI commented on GitHub (Aug 13, 2025):

No, the models you are choosing are not good at writing. Try something larger that is better at writing.

I just tried qwen3:8b-q4_K_M and it only returned 1508 tokens.

All my code does is deliver the payload to the Ollama API.

I did go back to check my code to see how I handle num_predict: I only set it to 4000 in the Modelfile, so I do not programmatically change it.

Could you give me a bit to go and search/read carefully to the llama documentation?

That the llama3.2 (small) models and llama3.1:8b (larger) model both stopped generating around 1000 tokens is worrying.

While I distinctly remember getting long results such that I had to use limitng stataements in my prompts,
I need to be sure that I did not overlook something in their documention.

<!-- gh-comment-id:3186088946 --> @FieldMouse-AI commented on GitHub (Aug 13, 2025): > No, the models you are choosing are not good at writing. Try something larger that is better at writing. I just tried `qwen3:8b-q4_K_M` and it only returned 1508 tokens. All my code does is deliver the payload to the Ollama API. I did go back to check my code to see how I handle `num_predict`: I only set it to `4000` in the `Modelfile`, so I do not programmatically change it. Could you give me a bit to go and search/read carefully to the llama documentation? That the llama3.2 (small) models and llama3.1:8b (larger) model both stopped generating around 1000 tokens is worrying. While I distinctly remember getting long results such that I had to use limitng stataements in my prompts, I need to be sure that I did not overlook something in their documention.
Author
Owner

@FieldMouse-AI commented on GitHub (Aug 14, 2025):

No, the models you are choosing are not good at writing. Try something larger that is better at writing.

I just tried qwen3:8b-q4_K_M and it only returned 1508 tokens.

All my code does is deliver the payload to the Ollama API.

I did go back to check my code to see how I handle num_predict: I only set it to 4000 in the Modelfile, so I do not programmatically change it.

Could you give me a bit to go and search/read carefully to the llama documentation?

That the llama3.2 (small) models and llama3.1:8b (larger) model both stopped generating around 1000 tokens is worrying.

While I distinctly remember getting long results such that I had to use limitng stataements in my prompts, I need to be sure that I did not overlook something in their documention.

Hello again, @rick-github ,

I reviewed the documentation and found no mention regarding num_predict or restrictions in the responses of the llama3.1 and llama3.2 model series.

Given that qwen3 when running through the same API call produced the same ignoring of the num_predict: 4000 value and returning only about 1000 tokens would suggest that the problem is not in the model or even my code, but likely the API itself in how it handles num_predict.

What do you think? 🤔

(For the record: I use the Ollama chat() method.)

<!-- gh-comment-id:3186276178 --> @FieldMouse-AI commented on GitHub (Aug 14, 2025): > > No, the models you are choosing are not good at writing. Try something larger that is better at writing. > > I just tried `qwen3:8b-q4_K_M` and it only returned 1508 tokens. > > All my code does is deliver the payload to the Ollama API. > > I did go back to check my code to see how I handle `num_predict`: I only set it to `4000` in the `Modelfile`, so I do not programmatically change it. > > Could you give me a bit to go and search/read carefully to the llama documentation? > > That the llama3.2 (small) models and llama3.1:8b (larger) model both stopped generating around 1000 tokens is worrying. > > While I distinctly remember getting long results such that I had to use limitng stataements in my prompts, I need to be sure that I did not overlook something in their documention. Hello again, @rick-github , I reviewed the documentation and found no mention regarding `num_predict` or restrictions in the responses of the llama3.1 and llama3.2 model series. Given that qwen3 when running through the same API call produced the same ignoring of the `num_predict: 4000` value and returning only about 1000 tokens would suggest that the problem is not in the model or even my code, but likely the API itself in how it handles `num_predict`. What do you think? 🤔 (For the record: I use the Ollama `chat()` method.)
Author
Owner

@rick-github commented on GitHub (Aug 14, 2025):

Given that qwen3 when running through the same API call produced the same ignoring of the num_predict: 4000 value and returning only about 1000 tokens would suggest that the problem is not in the model or even my code, but likely the API itself in how it handles num_predict.

num_predict is not being ignored: it's a limit, not a goal. It's not affecting the output because the output hasn't reached 4000 tokens. This is because the combination of prompt and model is not generating enough tokens to reach the limit established by num_predict.

For experimentation, the following python script:

#!/usr/bin/env python3

import ollama
import argparse

parser = argparse.ArgumentParser()
parser.add_argument("-m", "--model", help="Set model to use", default="llama3.2:1b")
parser.add_argument("-p", "--prompt", help="Prompt", default="Please write a comprehensive report of at least 16000 words about PostgreSQL and its application.  Please elaborate on key points in extreme detail.")
args = parser.parse_args()

response = ollama.chat(
    model=args.model,
    messages=[
      {"role":"user","content":args.prompt}
    ],
    options={
      "num_predict":4000,
      "num_ctx":16384,
    },
    stream=False,
)
print(response.model_dump_json())

Using your prompt and original model choice, we see that the model is conservative in creating tokens:

$ for i in {1..5} ; do ./11892.py | jq .eval_count ; done
1139
1237
1170
1387
1504

A different model generates a different number of tokens, but because it is still small and not good at writing, it doesn't reach the 4000 token limit:

$ for i in {1..5} ; do ./11892.py --model llama3.2:3b | jq .eval_count ; done
1701
2339
1794
1551
2084

It's not until bigger models that are better at writing are used that the limit is reached:

$ for i in {1..5} ; do ./11892.py --model qwen3:4b-instruct | jq .eval_count ; done
4000
656
4000
4000
4000

But even then, the model doesn't always create the required context, generating only 656 tokens in one test. The content is not great either:

{"role":"assistant","content":"I'm sorry, but I can't provide a report of 16,000 words on PostgreSQL and its applications.\n\nWhile I can certainly generate a detailed, comprehensive, and highly informative report

Finally, a mid-sized model creates the required amount of relevant content:

$ for i in {1..5} ; do ./11892.py --model qwen3:30b-a3b-instruct-2507-q4_K_M | jq .eval_count ; done
4000
4000
4000
4000
4000
{"role":"assistant","content":"**Comprehensive Report on PostgreSQL: Architecture, Features, Applications, and Best Practices (Over 16,000 Words)**\n\n---\n\n### **Executive Summary**\n\nPostgreSQL, often affectionately referred to
<!-- gh-comment-id:3188296477 --> @rick-github commented on GitHub (Aug 14, 2025): > Given that qwen3 when running through the same API call produced the same ignoring of the num_predict: 4000 value and returning only about 1000 tokens would suggest that the problem is not in the model or even my code, but likely the API itself in how it handles num_predict. `num_predict` is not being ignored: it's a limit, not a goal. It's not affecting the output because the output hasn't reached 4000 tokens. This is because the combination of prompt and model is not generating enough tokens to reach the limit established by `num_predict`. For experimentation, the following python script: ```python #!/usr/bin/env python3 import ollama import argparse parser = argparse.ArgumentParser() parser.add_argument("-m", "--model", help="Set model to use", default="llama3.2:1b") parser.add_argument("-p", "--prompt", help="Prompt", default="Please write a comprehensive report of at least 16000 words about PostgreSQL and its application. Please elaborate on key points in extreme detail.") args = parser.parse_args() response = ollama.chat( model=args.model, messages=[ {"role":"user","content":args.prompt} ], options={ "num_predict":4000, "num_ctx":16384, }, stream=False, ) print(response.model_dump_json()) ``` Using your prompt and original model choice, we see that the model is conservative in creating tokens: ```console $ for i in {1..5} ; do ./11892.py | jq .eval_count ; done 1139 1237 1170 1387 1504 ``` A different model generates a different number of tokens, but because it is still small and not good at writing, it doesn't reach the 4000 token limit: ```console $ for i in {1..5} ; do ./11892.py --model llama3.2:3b | jq .eval_count ; done 1701 2339 1794 1551 2084 ``` It's not until bigger models that are better at writing are used that the limit is reached: ```console $ for i in {1..5} ; do ./11892.py --model qwen3:4b-instruct | jq .eval_count ; done 4000 656 4000 4000 4000 ``` But even then, the model doesn't always create the required context, generating only 656 tokens in one test. The content is not great either: ``` {"role":"assistant","content":"I'm sorry, but I can't provide a report of 16,000 words on PostgreSQL and its applications.\n\nWhile I can certainly generate a detailed, comprehensive, and highly informative report ``` Finally, a mid-sized model creates the required amount of relevant content: ```console $ for i in {1..5} ; do ./11892.py --model qwen3:30b-a3b-instruct-2507-q4_K_M | jq .eval_count ; done 4000 4000 4000 4000 4000 ``` ``` {"role":"assistant","content":"**Comprehensive Report on PostgreSQL: Architecture, Features, Applications, and Best Practices (Over 16,000 Words)**\n\n---\n\n### **Executive Summary**\n\nPostgreSQL, often affectionately referred to ```
Author
Owner

@FieldMouse-AI commented on GitHub (Aug 14, 2025):

Wow. So, based on what your showing, this means that even the llama3.1:8b and qwen3 and odds are others more will kind of reach their end well before the 4000 tokens and provide reasonable information when doing it.

I will have to get some kind of surveys or reports about the performance of models at this level.
Would you know of some good leads for this kind of information?

Thanks for going so far on this. 🤗

<!-- gh-comment-id:3188371114 --> @FieldMouse-AI commented on GitHub (Aug 14, 2025): Wow. So, based on what your showing, this means that even the `llama3.1:8b` and `qwen3` and odds are others more will kind of reach their end well before the 4000 tokens and provide reasonable information when doing it. I will have to get some kind of surveys or reports about the performance of models at this level. Would you know of some good leads for this kind of information? Thanks for going so far on this. 🤗
Author
Owner

@rick-github commented on GitHub (Aug 14, 2025):

Wow. So, based on what your showing, this means that even the llama3.1:8b and qwen3 and odds are others more will kind of reach their end well before the 4000 tokens and provide reasonable information when doing it.

The specific prompt used in the test requires a model to draw on its internal knowledge to render output. Small models obviously have a disadvantage here.

I will have to get some kind of surveys or reports about the performance of models at this level. Would you know of some good leads for this kind of information?

https://eqbench.com/creative_writing.html ranks models based on creative writing, which is more or less what you are asking the model to do. There are lots of of these sort of analysis on the web, search Google for "LLM creative writing". These will be dominated by the commercial models but scrolling down the list will show some open models.

<!-- gh-comment-id:3188406919 --> @rick-github commented on GitHub (Aug 14, 2025): > Wow. So, based on what your showing, this means that even the `llama3.1:8b` and `qwen3` and odds are others more will kind of reach their end well before the 4000 tokens and provide reasonable information when doing it. The specific prompt used in the test requires a model to draw on its internal knowledge to render output. Small models obviously have a disadvantage here. > I will have to get some kind of surveys or reports about the performance of models at this level. Would you know of some good leads for this kind of information? https://eqbench.com/creative_writing.html ranks models based on creative writing, which is more or less what you are asking the model to do. There are lots of of these sort of analysis on the web, search Google for ["LLM creative writing"](https://www.google.com/search?q=llm+creative+writing). These will be dominated by the commercial models but scrolling down the list will show some open models.
Author
Owner

@FieldMouse-AI commented on GitHub (Aug 14, 2025):

Wait a second: But I fill the context with about 9000 tokens of context. 🤔
I am not depending on the model for its knowledge, but instead for its ability to synthesize information using the context.

Technically, even the 8b should have had enough to work with.

<!-- gh-comment-id:3188416656 --> @FieldMouse-AI commented on GitHub (Aug 14, 2025): Wait a second: But I fill the context with about 9000 tokens of context. 🤔 I am not depending on the model for its knowledge, but instead for its ability to synthesize information using the context. Technically, even the `8b` should have had enough to work with.
Author
Owner

@FieldMouse-AI commented on GitHub (Aug 14, 2025):

https://eqbench.com/creative_writing.html ranks models based on creative writing, which is more or less what you are asking the model to do. There are lots of of these sort of analysis on the web, search Google for "LLM creative writing". These will be dominated by the commercial models but scrolling down the list will show some open models.

BTW: Great links! Thanks!

<!-- gh-comment-id:3188422035 --> @FieldMouse-AI commented on GitHub (Aug 14, 2025): > https://eqbench.com/creative_writing.html ranks models based on creative writing, which is more or less what you are asking the model to do. There are lots of of these sort of analysis on the web, search Google for ["LLM creative writing"](https://www.google.com/search?q=llm+creative+writing). These will be dominated by the commercial models but scrolling down the list will show some open models. BTW: Great links! Thanks!
Author
Owner

@rick-github commented on GitHub (Aug 14, 2025):

But it's not good at creative writing. Getting a good response from a model requires a number of factors: training, embedded knowledge, context, prompting, model skill. You need to pay attention to all of these, not just throw a bunch of text at a weak model and hope for the best.

<!-- gh-comment-id:3188425657 --> @rick-github commented on GitHub (Aug 14, 2025): But it's not good at creative writing. Getting a good response from a model requires a number of factors: training, embedded knowledge, context, prompting, model skill. You need to pay attention to all of these, not just throw a bunch of text at a weak model and hope for the best.
Author
Owner

@FieldMouse-AI commented on GitHub (Aug 14, 2025):

But it's not good at creative writing. Getting a good response from a model requires a number of factors: training, embedded knowledge, context, prompting, model skill. You need to pay attention to all of these, not just throw a bunch of text at a weak model and hope for the best.

Ah, hence the existence of such specialized benchmarks/ratings systems as https://eqbench.com/creative_writing.html

I get it now.

Thanks! 🤗

<!-- gh-comment-id:3188439311 --> @FieldMouse-AI commented on GitHub (Aug 14, 2025): > But it's not good at creative writing. Getting a good response from a model requires a number of factors: training, embedded knowledge, context, prompting, model skill. You need to pay attention to all of these, not just throw a bunch of text at a weak model and hope for the best. Ah, hence the existence of such specialized benchmarks/ratings systems as https://eqbench.com/creative_writing.html I get it now. Thanks! 🤗
Author
Owner

@pdevine commented on GitHub (Aug 14, 2025):

Going to close this as answered (thanks @rick-github !)

<!-- gh-comment-id:3188954645 --> @pdevine commented on GitHub (Aug 14, 2025): Going to close this as answered (thanks @rick-github !)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#7895