[GH-ISSUE #1693] Possible to increase speed / efficiency of model? #47466

Closed
opened 2026-04-28 03:52:31 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @theyluvEnething on GitHub (Dec 24, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1693

I'm trying out the Dolphin-Mixture model and it's quite fun, but really slow. (my specs are 64gb 3200mj ram, i7 4.5gh cpu and a 1080 ti)
but still it takes some time to start answering and when it starts it writes at maybe 2-3 words a second. Is it possible to make improvements to this?

Originally created by @theyluvEnething on GitHub (Dec 24, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1693 I'm trying out the Dolphin-Mixture model and it's quite fun, but really slow. (my specs are 64gb 3200mj ram, i7 4.5gh cpu and a 1080 ti) but still it takes some time to start answering and when it starts it writes at maybe 2-3 words a second. Is it possible to make improvements to this?
Author
Owner

@igorschlum commented on GitHub (Dec 24, 2023):

Hi @theyluvEnething It's a memory issue. Give me a prompt you want me to test and I will post a video showing speed when enough memory is available.

<!-- gh-comment-id:1868470731 --> @igorschlum commented on GitHub (Dec 24, 2023): Hi @theyluvEnething It's a memory issue. Give me a prompt you want me to test and I will post a video showing speed when enough memory is available.
Author
Owner

@theyluvEnething commented on GitHub (Dec 24, 2023):

Hi @igorschlum! I've been trying different prompts, but now I've "benchmarked" the prompt. I've tried the trolley problem right now:

_

("There is a runaway trolley barreling down the railway tracks. Ahead, on the tracks, there are five people tied up and unable to move. The trolley is headed straight for them. You are standing some distance off in the train yard, next to a lever. If you pull this lever, the trolley will switch to a different set of tracks. However, you notice that there is one person on the side track. You have two (and only two) options: Do nothing, in which case the trolley will kill the five people on the main track. Pull the lever, diverting the trolley onto the side track where it will kill one person."),

_

(this exact prompt) and after 30 seconds of waiting it began writign at a pretty good speed. After a total of 2 minutes and 15 seconds it finished with this answer:

_

("There is a runaway trolley barreling down the railway tracks. Ahead, on the tracks, there are five people tied up and unable to move. The trolley is headed straight for them. You are standing some distance off in the train yard, nex
... t to a lever. If you pull this lever, the trolley will switch to a different set of tracks. However, you notice that there is one person on the side track. You have two (and only two) options: Do nothing, in which case the trolley wi
... ll kill the five people on the main track. Pull the lever, diverting the trolley onto the side track where it will kill one person.
The Trolley Problem is a classic moral dilemma in philosophy. In this scenario, you have two options:
let the trolley kill five people on the main track or pull the lever and divert the trolley to the side
track, which would result in one person being killed instead of five.
While there are no universally correct answers to moral dilemmas like this one, many people argue that
it is morally preferable to pull the lever and sacrifice one life to save five others. This reasoning
follows the principle of consequentialism, which states that the rightness or wrongness of an action
depends on its consequences.
In this case, pulling the lever would result in a better outcome overall since it would save four more
lives than letting the trolley continue on its course. However, this decision is still difficult for
many people due to the inherent value we place on human life and the moral dilemma of actively causing
harm (even if it's ultimately to save more lives).
Remember that these types of problems are meant to provoke thought and discussion about morality and
ethics, and there is no one-size-fits-all answer.").

_

I've seem another issue on GitHub where someone described a similar "problem" and someone recommended the idea, that his GPU is not "good" enough and doesn't support hard-ware acceleration.

<!-- gh-comment-id:1868519155 --> @theyluvEnething commented on GitHub (Dec 24, 2023): Hi @igorschlum! I've been trying different prompts, but now I've "benchmarked" the prompt. I've tried the trolley problem right now: _ > ("There is a runaway trolley barreling down the railway tracks. Ahead, on the tracks, there are five people tied up and unable to move. The trolley is headed straight for them. You are standing some distance off in the train yard, next to a lever. If you pull this lever, the trolley will switch to a different set of tracks. However, you notice that there is one person on the side track. You have two (and only two) options: Do nothing, in which case the trolley will kill the five people on the main track. Pull the lever, diverting the trolley onto the side track where it will kill one person."), _ (this exact prompt) and after 30 seconds of waiting it began writign at a pretty good speed. After a total of 2 minutes and 15 seconds it finished with this answer: _ > ("There is a runaway trolley barreling down the railway tracks. Ahead, on the tracks, there are five people tied up and unable to move. The trolley is headed straight for them. You are standing some distance off in the train yard, nex > ... t to a lever. If you pull this lever, the trolley will switch to a different set of tracks. However, you notice that there is one person on the side track. You have two (and only two) options: Do nothing, in which case the trolley wi > ... ll kill the five people on the main track. Pull the lever, diverting the trolley onto the side track where it will kill one person. > The Trolley Problem is a classic moral dilemma in philosophy. In this scenario, you have two options: > let the trolley kill five people on the main track or pull the lever and divert the trolley to the side > track, which would result in one person being killed instead of five. > While there are no universally correct answers to moral dilemmas like this one, many people argue that > it is morally preferable to pull the lever and sacrifice one life to save five others. This reasoning > follows the principle of consequentialism, which states that the rightness or wrongness of an action > depends on its consequences. > In this case, pulling the lever would result in a better outcome overall since it would save four more > lives than letting the trolley continue on its course. However, this decision is still difficult for > many people due to the inherent value we place on human life and the moral dilemma of actively causing > harm (even if it's ultimately to save more lives). > Remember that these types of problems are meant to provoke thought and discussion about morality and > ethics, and there is no one-size-fits-all answer."). _ I've seem another issue on GitHub where someone described a similar "problem" and someone recommended the idea, that his GPU is not "good" enough and doesn't support hard-ware acceleration.
Author
Owner

@igorschlum commented on GitHub (Jan 9, 2024):

Hello @theyluvEnething

Yes it's a memory issue, I've read that there is a way to run ollama without GPU and use only CPU, it will make all memory available. On mac, it's not an issue as the memory is shared between CPU and GPU.

Here is the seed I get with enough memory on my Mac:

https://github.com/jmorganca/ollama/assets/2884312/0a73ad6d-e3cb-45b7-9d5a-b0f142228e73

<!-- gh-comment-id:1883294641 --> @igorschlum commented on GitHub (Jan 9, 2024): Hello @theyluvEnething Yes it's a memory issue, I've read that there is a way to run ollama without GPU and use only CPU, it will make all memory available. On mac, it's not an issue as the memory is shared between CPU and GPU. Here is the seed I get with enough memory on my Mac: https://github.com/jmorganca/ollama/assets/2884312/0a73ad6d-e3cb-45b7-9d5a-b0f142228e73
Author
Owner

@pdevine commented on GitHub (Mar 12, 2024):

I'm going to go ahead and close this. @theyluvEnething you can try to use a more heavily quantized version which should cut down on the memory requirements, but you're going to get diminishing returns past a certain point. Unfortunately a GTX 1080 only has 8GB of RAM and dolphin-mixtral is going to need 32GB+ to run quickly

<!-- gh-comment-id:1989681099 --> @pdevine commented on GitHub (Mar 12, 2024): I'm going to go ahead and close this. @theyluvEnething you can try to use a more heavily quantized version which should cut down on the memory requirements, but you're going to get diminishing returns past a certain point. Unfortunately a GTX 1080 only has 8GB of RAM and dolphin-mixtral is going to need 32GB+ to run quickly
Author
Owner

@fishesarethings commented on GitHub (Jun 22, 2024):

I have even slower speeds! i have 4 cpu cores when asked a question, 2 of the cores spring to life but the other stay at zero, i wait around 30min for a responce usually just to get no response at all. I am also using the 'fastest' model, wizardm2:7b. yet it is still slow, with the test promt it took around 8 min

<!-- gh-comment-id:2184040225 --> @fishesarethings commented on GitHub (Jun 22, 2024): I have even slower speeds! i have 4 cpu cores when asked a question, 2 of the cores spring to life but the other stay at zero, i wait around 30min for a responce usually just to get no response at all. I am also using the 'fastest' model, wizardm2:7b. yet it is still slow, with the test promt it took around 8 min
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#47466