[GH-ISSUE #1994] Ollama requests hangs after about 20 requests and needs to be restarted #26909

Closed
opened 2026-04-22 03:38:41 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @Shajan on GitHub (Jan 14, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/1994

Originally assigned to: @BruceMacD on GitHub.

Request hangs after about 20 requests.
Ollama version : 0.1.20, Linux with T4 GPU as well as Mac M2.
All subsequent api/generate request hangs for all models. The only way to resume is to restart ollama sudo systemctl restart ollama.

Repro

import requests

def query(session):
    url = "http://localhost:11434/api/generate"
    data = {
        "model": "llama2:7b",
        "prompt": "Why is the sky blue?",
        "stream": False,
    }

    with requests.post(url, json=data) as response: # Hangs about every 20 requests
        if response.ok:
            return response.text
        else:
            print(response)
            return None

def main():
    total = 0
    errors = 0

    with requests.Session() as session:
        for _ in range(100):
            total += 1
            r = query(session)
            if r is None:
                errors += 1
            success_rate = 100*((total - errors)/total)
            print(f"{total=} {errors=} {success_rate=:.2f}")

if __name__ == "__main__":
    main()

Originally created by @Shajan on GitHub (Jan 14, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/1994 Originally assigned to: @BruceMacD on GitHub. Request hangs after about 20 requests. Ollama version : 0.1.20, Linux with T4 GPU as well as Mac M2. All subsequent `api/generate` request hangs for all models. The only way to resume is to restart ollama `sudo systemctl restart ollama`. Repro ```python import requests def query(session): url = "http://localhost:11434/api/generate" data = { "model": "llama2:7b", "prompt": "Why is the sky blue?", "stream": False, } with requests.post(url, json=data) as response: # Hangs about every 20 requests if response.ok: return response.text else: print(response) return None def main(): total = 0 errors = 0 with requests.Session() as session: for _ in range(100): total += 1 r = query(session) if r is None: errors += 1 success_rate = 100*((total - errors)/total) print(f"{total=} {errors=} {success_rate=:.2f}") if __name__ == "__main__": main() ```
Author
Owner

@Shajan commented on GitHub (Jan 14, 2024):

Issue #1910 appears to be related. This issue appears to be different (unrelated to format='json').

<!-- gh-comment-id:1891066078 --> @Shajan commented on GitHub (Jan 14, 2024): Issue #1910 appears to be related. This issue appears to be different (unrelated to format='json').
Author
Owner

@Shajan commented on GitHub (Jan 15, 2024):

Same issue on versions 0.1.18, 0.1.19 (tested on linux)
Works fine on version 0.1.13 (tested for 1000 requests on linux)

<!-- gh-comment-id:1892640315 --> @Shajan commented on GitHub (Jan 15, 2024): Same issue on versions `0.1.18`, `0.1.19` (tested on linux) Works fine on version `0.1.13` (tested for 1000 requests on linux)
Author
Owner

@johanngerberding commented on GitHub (Jan 16, 2024):

Have the same issue with a gguf mistral model on a RTX6000 Quadro GPU on linux after 20-30 requests . Tested 0.1.13and 0.1.20.

<!-- gh-comment-id:1893623708 --> @johanngerberding commented on GitHub (Jan 16, 2024): Have the same issue with a gguf mistral model on a RTX6000 Quadro GPU on linux after 20-30 requests . Tested `0.1.13`and `0.1.20`.
Author
Owner

@BruceMacD commented on GitHub (Jan 16, 2024):

Thanks for the script in the report, I've reproduced this and found what is causing the issue. Working on getting to the root cause now.

<!-- gh-comment-id:1893954518 --> @BruceMacD commented on GitHub (Jan 16, 2024): Thanks for the script in the report, I've reproduced this and found what is causing the issue. Working on getting to the root cause now.
Author
Owner

@BruceMacD commented on GitHub (Jan 16, 2024):

We have a mitigation in for the next release by disabling prompt-caching: #2018

I'll follow up on why prompt-caching causes this in #2023

Thanks to everyone for the reports.

<!-- gh-comment-id:1894521948 --> @BruceMacD commented on GitHub (Jan 16, 2024): We have a mitigation in for the next release by disabling prompt-caching: #2018 I'll follow up on why prompt-caching causes this in #2023 Thanks to everyone for the reports.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#26909