[GH-ISSUE #4310] updating to v0.1.34 from v0.1.33 with models from file from llama3:70b-instruct-q4_0 with 23k SYSTEM data yields in >100s processing and 504 response #2687

Closed
opened 2026-04-12 13:00:50 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @aiboogie on GitHub (May 10, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4310

What is the issue?

I was using ollama v0.1.33 with models from file from llama3:70b-instruct-q4_0 with 23k SYSTEM data
Each prompt was not taking more than 20s (except the initial model load request) on a M1 Ultra 128GB Ram and 48 GPU cores.
I updated today to v0.1.34, using the same generated model, for all requests i get a huge processing time when reaching 100s it disconnects execution with a 504 response code.

Downgrading back to v0.1.33 solves the problem, so it must be something with v0.1.34

Please advise,
Thanks!

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.1.34

Originally created by @aiboogie on GitHub (May 10, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4310 ### What is the issue? I was using ollama v0.1.33 with models from file from llama3:70b-instruct-q4_0 with 23k SYSTEM data Each prompt was not taking more than 20s (except the initial model load request) on a M1 Ultra 128GB Ram and 48 GPU cores. I updated today to v0.1.34, using the same generated model, for all requests i get a huge processing time when reaching 100s it disconnects execution with a 504 response code. Downgrading back to v0.1.33 solves the problem, so it must be something with v0.1.34 Please advise, Thanks! ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.1.34
GiteaMirror added the bug label 2026-04-12 13:00:50 -05:00
Author
Owner

@jmorganca commented on GitHub (May 11, 2024):

Hi @aiboogie would it be possible to try again with 0.1.35? It's possible an inference subprocess from Ollama was hanging around – upgrading to 0.1.35 and restarting should clear any of those. Let me know if you're still seeing this afterwards 😊

<!-- gh-comment-id:2105460379 --> @jmorganca commented on GitHub (May 11, 2024): Hi @aiboogie would it be possible to try again with 0.1.35? It's possible an inference subprocess from Ollama was hanging around – upgrading to 0.1.35 and restarting should clear any of those. Let me know if you're still seeing this afterwards 😊
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#2687