[GH-ISSUE #12966] Model runs forever on Windows even with minimal model smollm:135m #8595

Closed
opened 2026-04-12 21:19:36 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @natgate on GitHub (Nov 5, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12966

Description

Summary:
On Windows 10 (Intel i7-7500U, 16 GB RAM), running the smallest model smollm:135m never produces a response.
Generation hangs indefinitely — no tokens are returned.
This happens both from the Ollama desktop app and the PowerShell API call.


System Information

  • OS: Windows 10 22H2 (64-bit)
  • CPU: Intel Core i7-7500U @ 2.70 GHz (2 cores / 4 threads)
  • RAM: 16 GB
  • GPU: NVIDIA 940MX (unsupported, so CPU-only mode)
  • Ollama version: 0.12.9
  • Model: smollm:135m

Reproduction Steps

  1. Start Ollama (CPU-only mode).
  2. Run the following PowerShell command:
    (Invoke-WebRequest -Method POST -Body '{"model":"smollm:135m", "prompt":"hi", "stream": false}' -Uri http://localhost:11434/api/generate ).Content | ConvertFrom-Json
    
  3. Observe that the process never returns any text — it hangs indefinitely.

Expected Behavior

The model should return a short reply such as "Hello!" within a few seconds.


Actual Behavior

The command runs forever; no output is produced.


Relevant Log Snippet

time=2025-11-05T11:29:15.294+01:00 level=DEBUG source=runner.go:471 msg="bootstrap discovery took" duration=117.8962ms OLLAMA_LIBRARY_PATH="[D:\programs\lib\ollama D:\programs\lib\ollama\rocm]" extra_envs=map[]
time=2025-11-05T11:29:15.295+01:00 level=DEBUG source=runner.go:120 msg="evluating which if any devices to filter out" initial_count=0
time=2025-11-05T11:29:15.296+01:00 level=DEBUG source=runner.go:41 msg="GPU bootstrap discovery took" duration=509.5988ms
time=2025-11-05T11:29:15.300+01:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="15.9 GiB" available="9.0 GiB"
time=2025-11-05T11:29:15.300+01:00 level=INFO source=routes.go:1618 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB"
time=2025-11-05T11:29:27.454+01:00 level=DEBUG source=runner.go:267 msg="refreshing free memory"
time=2025-11-05T11:29:27.454+01:00 level=DEBUG source=runner.go:41 msg="overall device VRAM discovery took" duration=0s

Notes

  • Model files are stored on D:\Ollama\models

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.12.9

Originally created by @natgate on GitHub (Nov 5, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12966 ## Description **Summary:** On Windows 10 (Intel i7-7500U, 16 GB RAM), running the smallest model `smollm:135m` never produces a response. Generation hangs indefinitely — no tokens are returned. This happens both from the Ollama desktop app and the PowerShell API call. --- ## System Information - **OS:** Windows 10 22H2 (64-bit) - **CPU:** Intel Core i7-7500U @ 2.70 GHz (2 cores / 4 threads) - **RAM:** 16 GB - **GPU:** NVIDIA 940MX (unsupported, so CPU-only mode) - **Ollama version:** 0.12.9 - **Model:** `smollm:135m` --- ## Reproduction Steps 1. Start Ollama (CPU-only mode). 2. Run the following PowerShell command: ```powershell (Invoke-WebRequest -Method POST -Body '{"model":"smollm:135m", "prompt":"hi", "stream": false}' -Uri http://localhost:11434/api/generate ).Content | ConvertFrom-Json ``` 3. Observe that the process never returns any text — it hangs indefinitely. --- ## Expected Behavior The model should return a short reply such as `"Hello!"` within a few seconds. --- ## Actual Behavior The command runs forever; no output is produced. --- ## Relevant Log Snippet ``` time=2025-11-05T11:29:15.294+01:00 level=DEBUG source=runner.go:471 msg="bootstrap discovery took" duration=117.8962ms OLLAMA_LIBRARY_PATH="[D:\programs\lib\ollama D:\programs\lib\ollama\rocm]" extra_envs=map[] time=2025-11-05T11:29:15.295+01:00 level=DEBUG source=runner.go:120 msg="evluating which if any devices to filter out" initial_count=0 time=2025-11-05T11:29:15.296+01:00 level=DEBUG source=runner.go:41 msg="GPU bootstrap discovery took" duration=509.5988ms time=2025-11-05T11:29:15.300+01:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="15.9 GiB" available="9.0 GiB" time=2025-11-05T11:29:15.300+01:00 level=INFO source=routes.go:1618 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB" time=2025-11-05T11:29:27.454+01:00 level=DEBUG source=runner.go:267 msg="refreshing free memory" time=2025-11-05T11:29:27.454+01:00 level=DEBUG source=runner.go:41 msg="overall device VRAM discovery took" duration=0s ``` --- ## Notes - Model files are stored on `D:\Ollama\models` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.12.9
GiteaMirror added the bug label 2026-04-12 21:19:36 -05:00
Author
Owner

@rick-github commented on GitHub (Nov 5, 2025):

https://github.com/ollama/ollama/issues/12699

<!-- gh-comment-id:3490516483 --> @rick-github commented on GitHub (Nov 5, 2025): https://github.com/ollama/ollama/issues/12699
Author
Owner

@pdevine commented on GitHub (Nov 5, 2025):

Going to close this as a dupe, but we can re-open if it's not solved in 0.12.10.

<!-- gh-comment-id:3493800411 --> @pdevine commented on GitHub (Nov 5, 2025): Going to close this as a dupe, but we can re-open if it's not solved in `0.12.10`.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#8595