[GH-ISSUE #2501] Simple tasks fail #1461

Closed
opened 2026-04-12 11:21:43 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @dtp555-1212 on GitHub (Feb 14, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2501

Simple tasks seem to be beyond what any of the open-source models (at least for all that I have tried) are able to accomplish. I can tease the results out of ‘Bing co-pilot’ but so far these types of tasks seem to allude the open-source models loaded into Ollama.

Can you tell me if I am doing something wrong, or a better prompt, or which model has the best chance of doing these right, or if indeed the released models can’t handle this type of thing?

  1. task one … generate a list of 10 sentences that have exactly 5 words each.
    I have 'never' seen it correctly generate 10 sentences in a row that have exactly 5 words each. It can ‘sometimes’ count the words in a single sentence correctly, if asked how it came to its conclusion, but often it is wrong. It also can’t definitively know if something is something is one word or two (e.g. the cat) …. It seems to improve after saying that a word will never have a space within it, but then quickly forgets that principle.

  2. task two … generate a list of 10 sentences that end with a verb followed by a plural noun.
    It can sometimes do a list of sentences that end with a verb, OR it can sometimes do a list of sentences that end with a plural noun, but I have never seen it correctly generate a list of sentences that satisfies both criteria.

I would love to hear any suggestions that would help with these types of tasks. Since ‘Bing-copilot’ can be coerced into doing this, and I have heard the open-source models are performing very well, I am hoping there is a simple explanation for these utter failures.

Thanks in advance.

P.S. I have tried given pre-prompting to say things like ‘ you are an expert linguist. You know parts of speech, you know how to count the words in a sentence. Assume a word never has a space in it. … ' I have also tried asking it to go step by step, and double check results... but none of this seems to have a positive effect.

Originally created by @dtp555-1212 on GitHub (Feb 14, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2501 Simple tasks seem to be beyond what any of the open-source models (at least for all that I have tried) are able to accomplish. I can tease the results out of ‘Bing co-pilot’ but so far these types of tasks seem to allude the open-source models loaded into Ollama. Can you tell me if I am doing something wrong, or a better prompt, or which model has the best chance of doing these right, or if indeed the released models can’t handle this type of thing? 1) task one … generate a list of 10 sentences that have exactly 5 words each. I have 'never' seen it correctly generate 10 sentences in a row that have exactly 5 words each. It can ‘sometimes’ count the words in a single sentence correctly, if asked how it came to its conclusion, but often it is wrong. It also can’t definitively know if something is something is one word or two (e.g. the cat) …. It seems to improve after saying that a word will never have a space within it, but then quickly forgets that principle. 2) task two … generate a list of 10 sentences that end with a verb followed by a plural noun. It can sometimes do a list of sentences that end with a verb, OR it can sometimes do a list of sentences that end with a plural noun, but I have never seen it correctly generate a list of sentences that satisfies both criteria. I would love to hear any suggestions that would help with these types of tasks. Since ‘Bing-copilot’ can be coerced into doing this, and I have heard the open-source models are performing very well, I am hoping there is a simple explanation for these utter failures. Thanks in advance. P.S. I have tried given pre-prompting to say things like ‘ you are an expert linguist. You know parts of speech, you know how to count the words in a sentence. Assume a word never has a space in it. … ' I have also tried asking it to go step by step, and double check results... but none of this seems to have a positive effect.
GiteaMirror added the question label 2026-04-12 11:21:43 -05:00
Author
Owner

@easp commented on GitHub (Feb 14, 2024):

What models have you tried?

<!-- gh-comment-id:1944707872 --> @easp commented on GitHub (Feb 14, 2024): What models have you tried?
Author
Owner

@dtp555-1212 commented on GitHub (Feb 14, 2024):

llama2 (all sizes and chat variants), mistral-openorca, orca2:13b (and tinyllama)

<!-- gh-comment-id:1944767297 --> @dtp555-1212 commented on GitHub (Feb 14, 2024): llama2 (all sizes and chat variants), mistral-openorca, orca2:13b (and tinyllama)
Author
Owner

@jmorganca commented on GitHub (May 10, 2024):

I would try Llama 3!

% ollama run llama3
>>> generate a list of 10 sentences that have exactly 5 words each.
Here are 10 sentences, each with exactly 5 words:

1. The sun is very bright.
2. I love playing tennis games.
3. Dogs are really good friends.
4. She ate a sandwich lunch.
5. He is a happy person.
6. Cats sleep all day long.
7. We went to beach trip.
8. My dog is very lazy.
9. They ate pizza dinner tonight.
10. The book is very old.
<!-- gh-comment-id:2105226516 --> @jmorganca commented on GitHub (May 10, 2024): I would try Llama 3! ``` % ollama run llama3 >>> generate a list of 10 sentences that have exactly 5 words each. Here are 10 sentences, each with exactly 5 words: 1. The sun is very bright. 2. I love playing tennis games. 3. Dogs are really good friends. 4. She ate a sandwich lunch. 5. He is a happy person. 6. Cats sleep all day long. 7. We went to beach trip. 8. My dog is very lazy. 9. They ate pizza dinner tonight. 10. The book is very old. ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#1461