[GH-ISSUE #15289] ollama models failing a simple question #9782

Open
opened 2026-04-12 22:39:53 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @jrvladr on GitHub (Apr 3, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15289

I run ollama under Windows 11 with following models:
ministral-3, phi4-mini, granite3.1-moe, mistral, llama3, qwen3-vl, alibayram/medgemma .
For each of them I gave a simple question:
I need to wash my car; the car wash is 50 meters from my house. What is the best way to get there—on foot or by car?
Every one of them answered: it's best to go on foot.
To tell the truth, when I clarified -- I told you that I need to wash a car. If I go on foot what will I wash?
-- then every one realized the slip-up. But nevertheless how can we trust any of them?

Originally created by @jrvladr on GitHub (Apr 3, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15289 I run ollama under Windows 11 with following models: _ministral-3, phi4-mini, granite3.1-moe, mistral, llama3, qwen3-vl, alibayram/medgemma ._ For each of them I gave a simple question: **I need to wash my car; the car wash is 50 meters from my house. What is the best way to get there—on foot or by car?** Every one of them answered: it's best to go on foot. To tell the truth, when I clarified -- **I told you that I need to wash a car. If I go on foot what will I wash?** -- then every one realized the slip-up. But nevertheless how can we trust any of them?
Author
Owner

@rick-github commented on GitHub (Apr 3, 2026):

You are just now realizing that LLMs are unreliable?

<!-- gh-comment-id:4183936853 --> @rick-github commented on GitHub (Apr 3, 2026): You are just now realizing that LLMs are unreliable?
Author
Owner

@3DPJamie commented on GitHub (Apr 3, 2026):

Quantisation has an affect as well. To the same question Q8 can give correct answer, while Q4 cannot: they loose logic. Try with higher quantised versions or go with Qwen3.5 models; they are really good in answering those questions. if you do a research you can actually see that big models like chatgpt fell into the same trap.

<!-- gh-comment-id:4184107562 --> @3DPJamie commented on GitHub (Apr 3, 2026): Quantisation has an affect as well. To the same question Q8 can give correct answer, while Q4 cannot: they loose logic. Try with higher quantised versions or go with Qwen3.5 models; they are really good in answering those questions. if you do a research you can actually see that big models like chatgpt fell into the same trap.
Author
Owner

@fcorneli commented on GitHub (Apr 3, 2026):

ollama run qwen3.5:122b "I need to wash my car; the car wash is 50 meters from my house. What is the best way to get there—on foot or by car?"

You should drive your car.

Here is the reasoning:

  1. The Goal: Your objective is to wash the car.
  2. The Requirement: For the car to be washed at a car wash, the car itself must be at the car wash.
  3. The Consequence: If you go on foot, the car stays at your house. You would arrive at the car wash, realize your car isn't there, have to walk back, start the car, and drive it 50 meters.

Verdict: Drive the car. Even though it is only 50 meters away (which is very close), it is the only way to get the car to the facility. Just drive slowly and safely, as it is a short distance!

Not only correct, but even funny.

<!-- gh-comment-id:4184291287 --> @fcorneli commented on GitHub (Apr 3, 2026): ``` ollama run qwen3.5:122b "I need to wash my car; the car wash is 50 meters from my house. What is the best way to get there—on foot or by car?" ``` > You should **drive your car**. > > Here is the reasoning: > > 1. **The Goal:** Your objective is to wash the **car**. > 2. **The Requirement:** For the car to be washed at a car wash, **the car itself must be at the car wash.** > 3. **The Consequence:** If you go on foot, the car stays at your house. You would arrive at the car wash, realize your car isn't there, have to walk back, start the car, and drive it 50 meters. > > **Verdict:** Drive the car. Even though it is only 50 meters away (which is very close), it is the only way to get the car to the facility. Just drive slowly and safely, as it is a short distance! Not only correct, but even funny.
Author
Owner

@jrvladr commented on GitHub (Apr 3, 2026):

Quantisation has an affect as well. To the same question Q8 can give correct answer, while Q4 cannot: they loose logic. Try with higher quantised versions or go with Qwen3.5 models; they are really good in answering those questions. if you do a research you can actually see that big models like chatgpt fell into the same trap.

You are right, I work with a small models due to poor hardware. Nevertheless the ChatGPT on that list looks disappointing

<!-- gh-comment-id:4185271852 --> @jrvladr commented on GitHub (Apr 3, 2026): > Quantisation has an affect as well. To the same question Q8 can give correct answer, while Q4 cannot: they loose logic. Try with higher quantised versions or go with Qwen3.5 models; they are really good in answering those questions. if you do a research you can actually see that big models like chatgpt fell into the same trap. You are right, I work with a small models due to poor hardware. Nevertheless the ChatGPT on that list looks disappointing
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9782