[GH-ISSUE #1616] How to skip animation? #47410

Closed
opened 2026-04-28 03:45:01 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @kokizzu on GitHub (Dec 19, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1616

for example when i run:

ollama run mistral 
>>> some prompt

... very slow letter by letter output ... <-- how to make this faster?

Originally created by @kokizzu on GitHub (Dec 19, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1616 for example when i run: ``` ollama run mistral >>> some prompt ... very slow letter by letter output ... <-- how to make this faster? ```
Author
Owner

@igorschlum commented on GitHub (Dec 19, 2023):

hi @kokizzu it's like that when there is not enough available memory. If you are on a Mac with 8GB of RAM, try to restart and only launch Ollama and Terminal. It could be faster, but 8GB is not enough for Mistral.

<!-- gh-comment-id:1863580909 --> @igorschlum commented on GitHub (Dec 19, 2023): hi @kokizzu it's like that when there is not enough available memory. If you are on a Mac with 8GB of RAM, try to restart and only launch Ollama and Terminal. It could be faster, but 8GB is not enough for Mistral.
Author
Owner

@pdevine commented on GitHub (Dec 19, 2023):

@kokizzu what are the specs for your system? If you have a limited amount of GPU memory this would be expected behaviour.

<!-- gh-comment-id:1863627216 --> @pdevine commented on GitHub (Dec 19, 2023): @kokizzu what are the specs for your system? If you have a limited amount of GPU memory this would be expected behaviour.
Author
Owner

@kokizzu commented on GitHub (Dec 20, 2023):

32 core Ryzen 9 , 128GB RAM '__') but no nvidia GPU, just old AMD RX 6600 XT

<!-- gh-comment-id:1865263776 --> @kokizzu commented on GitHub (Dec 20, 2023): 32 core Ryzen 9 , 128GB RAM '__') but no nvidia GPU, just old AMD RX 6600 XT
Author
Owner

@pdevine commented on GitHub (Dec 20, 2023):

Unfortunately you are running on the CPU and it's just not very fast. The AMD support is just about to go in, but I'm not sure if that card is supported in ROCm 6. You can try it w/ PR #1146 or wait until that gets merged (hopefully later today or tomorrow).

If you are using the API (either /api/generate or /api/chat) you can set stream=false which will return everything as one response instead of token by token, but there isn't a way to do that in the REPL (i.e. the CLI).

Barring that, you can either try by renting an instance in the cloud (like on Fly.io or from Paperspace), or upgrade to a faster system. I'm going to go ahead and close the issue, but feel free to re-open it (or better, just ask on the discord) if you want to follow up.

<!-- gh-comment-id:1865285536 --> @pdevine commented on GitHub (Dec 20, 2023): Unfortunately you are running on the CPU and it's just not very fast. The AMD support is *just* about to go in, but I'm not sure if that card is supported in ROCm 6. You can try it w/ PR #1146 or wait until that gets merged (hopefully later today or tomorrow). If you are using the API (either `/api/generate` or `/api/chat`) you can set `stream=false` which will return everything as one response instead of token by token, but there isn't a way to do that in the REPL (i.e. the CLI). Barring that, you can either try by renting an instance in the cloud (like on Fly.io or from Paperspace), or upgrade to a faster system. I'm going to go ahead and close the issue, but feel free to re-open it (or better, just ask on the discord) if you want to follow up.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#47410