[GH-ISSUE #1012] yarn-mistral doesn't work on MacOS #493

Closed
opened 2026-04-12 10:10:18 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @iddar on GitHub (Nov 6, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1012

I run llama run yarn-mistral:7b-128k or ollama run yarn-mistral in both times the CLI keep loading infinity.

image

In this example I wait over 8 minutes

Originally created by @iddar on GitHub (Nov 6, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1012 I run llama run `yarn-mistral:7b-128k` or `ollama run yarn-mistral` in both times the CLI keep loading infinity. <img width="691" alt="image" src="https://github.com/jmorganca/ollama/assets/199103/d48cf1e9-694f-4d6f-b88c-342dfa95f333"> In this example I wait over 8 minutes
Author
Owner

@xyproto commented on GitHub (Nov 6, 2023):

Could the system be out of memory?

<!-- gh-comment-id:1794706818 --> @xyproto commented on GitHub (Nov 6, 2023): Could the system be out of memory?
Author
Owner

@iddar commented on GitHub (Nov 6, 2023):

maybe, but I use any other model fine

<!-- gh-comment-id:1795850840 --> @iddar commented on GitHub (Nov 6, 2023): maybe, but I use any other model fine
Author
Owner

@mxyng commented on GitHub (Nov 6, 2023):

yarn-mistral has a much larger context and a much larger memory footprint than most other models. You may be running out of memory on this along. What are your system specs?

<!-- gh-comment-id:1796174408 --> @mxyng commented on GitHub (Nov 6, 2023): yarn-mistral has a much larger context and a much larger memory footprint than most other models. You may be running out of memory on this along. What are your system specs?
Author
Owner

@easp commented on GitHub (Nov 6, 2023):

@iddar What model of Mac? How much RAM?

On my 32GB M1 Max I see similar behavior to what you describe. I checked ~/.ollama/logs/server and it looks like the 128k version is slightly too big and ollama just seems to end up in a zombie state, rather than returning an error. The 64k version loads, but doesn't work very well. Not sure if it's because it isn't trained for chat/instruct, or because of some patches to llama.cpp's YARN support that haven't made it downstream to Ollama yet.

<!-- gh-comment-id:1796407800 --> @easp commented on GitHub (Nov 6, 2023): @iddar What model of Mac? How much RAM? On my 32GB M1 Max I see similar behavior to what you describe. I checked ~/.ollama/logs/server and it looks like the 128k version is slightly too big and ollama just seems to end up in a zombie state, rather than returning an error. The 64k version loads, but doesn't work very well. Not sure if it's because it isn't trained for chat/instruct, or because of some patches to llama.cpp's YARN support that haven't made it downstream to Ollama yet.
Author
Owner

@iddar commented on GitHub (Nov 6, 2023):

I have a 16G of ram, MacBook Pro M1

<!-- gh-comment-id:1796888307 --> @iddar commented on GitHub (Nov 6, 2023): I have a 16G of ram, MacBook Pro M1
Author
Owner

@igorschlum commented on GitHub (Nov 7, 2023):

I have a 32GB macbook M1Pro and it seems to work well.

(base) igor@macigor-2 ~ % ollama run yarn-mistral
pulling manifest
pulling 0e8703041ff2... 2% |█ | (106 MB/4.1 GB, 1.4 MB/s) [1m15s:46m25s]Error: max retries exceeded
(base) igor@macigor-2 ~ % ollama run yarn-mistral
pulling manifest
pulling 0e8703041ff2... 26% |██████████████████████ | (1.1/4.1 GB, 1.5 MB/s) [13m43s:33m6s]Error: max retries exceeded
(base) igor@macigor-2 ~ % ollama run yarn-mistral
pulling manifest
pulling 0e8703041ff2... 28% |████████████████████████ | (1.2/4.1 GB, 1.2 MB/s) [1m2s:41m33s]Error: max retries exceeded
(base) igor@macigor-2 ~ % ollama run yarn-mistral
pulling manifest
pulling 0e8703041ff2... 30% |█████████████████████████ | (1.3/4.1 GB, 1.5 MB/s) [1m5s:32m29s]Error: max retries exceeded
(base) igor@macigor-2 ~ % ollama run yarn-mistral
pulling manifest
pulling 0e8703041ff2... 63% |█████████████████████████████████████████████████████ | (2.6/4.1 GB, 1.4 MB/s) [19m5s:17m47s]Error: max retries exceeded
(base) igor@macigor-2 ~ % ollama run yarn-mistral
pulling manifest
pulling 0e8703041ff2... 100% |████████████████████████████████████████████████████████████████████████████████████████| (4.1/4.1 GB, 3.0 MB/s)
pulling e9d3a814cdd6... 100% |█████████████████████████████████████████████████████████████████████████████████████████████████| (17/17 B, 9 B/s)
pulling 98e4579df414... 100% |██████████████████████████████████████████████████████████████████████████████████████████████| (307/307 B, 150 B/s)
verifying sha256 digest
writing manifest
removing any unused layers
success

hello, what is yarn-mistral difference over other LLM?

1 Like

You should look at this post which explains why mistral was created.
....

<!-- gh-comment-id:1798378597 --> @igorschlum commented on GitHub (Nov 7, 2023): I have a 32GB macbook M1Pro and it seems to work well. (base) igor@macigor-2 ~ % ollama run yarn-mistral pulling manifest pulling 0e8703041ff2... 2% |█ | (106 MB/4.1 GB, 1.4 MB/s) [1m15s:46m25s]Error: max retries exceeded (base) igor@macigor-2 ~ % ollama run yarn-mistral pulling manifest pulling 0e8703041ff2... 26% |██████████████████████ | (1.1/4.1 GB, 1.5 MB/s) [13m43s:33m6s]Error: max retries exceeded (base) igor@macigor-2 ~ % ollama run yarn-mistral pulling manifest pulling 0e8703041ff2... 28% |████████████████████████ | (1.2/4.1 GB, 1.2 MB/s) [1m2s:41m33s]Error: max retries exceeded (base) igor@macigor-2 ~ % ollama run yarn-mistral pulling manifest pulling 0e8703041ff2... 30% |█████████████████████████ | (1.3/4.1 GB, 1.5 MB/s) [1m5s:32m29s]Error: max retries exceeded (base) igor@macigor-2 ~ % ollama run yarn-mistral pulling manifest pulling 0e8703041ff2... 63% |█████████████████████████████████████████████████████ | (2.6/4.1 GB, 1.4 MB/s) [19m5s:17m47s]Error: max retries exceeded (base) igor@macigor-2 ~ % ollama run yarn-mistral pulling manifest pulling 0e8703041ff2... 100% |████████████████████████████████████████████████████████████████████████████████████████| (4.1/4.1 GB, 3.0 MB/s) pulling e9d3a814cdd6... 100% |█████████████████████████████████████████████████████████████████████████████████████████████████| (17/17 B, 9 B/s) pulling 98e4579df414... 100% |██████████████████████████████████████████████████████████████████████████████████████████████| (307/307 B, 150 B/s) verifying sha256 digest writing manifest removing any unused layers success >>> hello, what is yarn-mistral difference over other LLM? 1 Like You should look at this post which explains why mistral was created. ....
Author
Owner

@easp commented on GitHub (Nov 7, 2023):

@iddar yarn-mistral is the 64k context version of the model.

When I run it on my system I see this in ~/.ollama/logs/server

ggml_metal_init: GPU name:   Apple M1 Max
ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007)
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 21845.34 MB
ggml_metal_init: maxTransferRate               = built-in GPU
llama_new_context_with_model: compute buffer total size = 4254.13 MB
llama_new_context_with_model: max tensor size =   102.54 MB
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  3918.58 MB, ( 3919.08 / 21845.34)
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =  8192.02 MB, (12111.09 / 21845.34)
ggml_metal_add_buffer: allocated 'alloc           ' buffer, size =  4248.02 MB, (**16359.11** / 21845.34)

Total memory required is 16.4GB, so it's not going to work on a 16GB Mac.

@igorschlum yarn-mistral works fine on my 32GB machine for simple prompts, but behaves strangely and inconsistently when I ask it to summarize larger amounts of text.

For example, one time I fed it a 21kbyte text file: ollama run yarn-mistral --verbose "$(cat iliad-book1-21k.txt)" "Please summarize the proceeding text" It churned away for ~5 minutes spitting out more of the Iliad (or something resembling it) that I'd fed it, which eventually degenerated into newlines, random characters and fragments of words/sentences towards the end. The statistics reported were:

total duration:       5m14.288074583s
load duration:        11.532973417s
prompt eval count:    6571 token(s)
prompt eval duration: 22.774589s
prompt eval rate:     288.52 tokens/s
eval count:           4865 token(s)
eval duration:        4m38.797253s
eval rate:            17.45 tokens/s

I repeated the same thing a couple minutes later (before the ollama-runner was automatically terminated), and this was the whole result:

% ollama run yarn-mistral --verbose "$(cat iliad-book1-21k.txt)" "Please summarize the proceeding text"
 for a friend who does not have access to it.

total duration:       2.412436792s
load duration:        5.121ms
prompt eval count:    1 token(s)
eval count:           12 token(s)
eval duration:        2.316456s
eval rate:            5.18 tokens/s

I may well be trying to get mistral-yarn to do something it's not able to do, and I know there is some randomness involved in generating text, but I'd expect that the prompt eval count would be the same when the input is the same.

<!-- gh-comment-id:1799876423 --> @easp commented on GitHub (Nov 7, 2023): @iddar yarn-mistral is the 64k context version of the model. When I run it on my system I see this in ~/.ollama/logs/server ``` ggml_metal_init: GPU name: Apple M1 Max ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007) ggml_metal_init: hasUnifiedMemory = true ggml_metal_init: recommendedMaxWorkingSetSize = 21845.34 MB ggml_metal_init: maxTransferRate = built-in GPU llama_new_context_with_model: compute buffer total size = 4254.13 MB llama_new_context_with_model: max tensor size = 102.54 MB ggml_metal_add_buffer: allocated 'data ' buffer, size = 3918.58 MB, ( 3919.08 / 21845.34) ggml_metal_add_buffer: allocated 'kv ' buffer, size = 8192.02 MB, (12111.09 / 21845.34) ggml_metal_add_buffer: allocated 'alloc ' buffer, size = 4248.02 MB, (**16359.11** / 21845.34) ``` Total memory required is 16.4GB, so it's not going to work on a 16GB Mac. @igorschlum yarn-mistral works fine on my 32GB machine for simple prompts, but behaves strangely and inconsistently when I ask it to summarize larger amounts of text. For example, one time I fed it a 21kbyte text file: `ollama run yarn-mistral --verbose "$(cat iliad-book1-21k.txt)" "Please summarize the proceeding text"` It churned away for ~5 minutes spitting out more of the Iliad (or something resembling it) that I'd fed it, which eventually degenerated into newlines, random characters and fragments of words/sentences towards the end. The statistics reported were: ``` total duration: 5m14.288074583s load duration: 11.532973417s prompt eval count: 6571 token(s) prompt eval duration: 22.774589s prompt eval rate: 288.52 tokens/s eval count: 4865 token(s) eval duration: 4m38.797253s eval rate: 17.45 tokens/s ``` I repeated the same thing a couple minutes later (before the ollama-runner was automatically terminated), and this was the whole result: ``` % ollama run yarn-mistral --verbose "$(cat iliad-book1-21k.txt)" "Please summarize the proceeding text" for a friend who does not have access to it. total duration: 2.412436792s load duration: 5.121ms prompt eval count: 1 token(s) eval count: 12 token(s) eval duration: 2.316456s eval rate: 5.18 tokens/s ``` I may well be trying to get mistral-yarn to do something it's not able to do, and I know there is some randomness involved in generating text, but I'd expect that the prompt eval count would be the same when the input is the same.
Author
Owner

@iddar commented on GitHub (Nov 10, 2023):

Thank you very much for the clarification

<!-- gh-comment-id:1805944066 --> @iddar commented on GitHub (Nov 10, 2023): Thank you very much for the clarification
Author
Owner

@igorschlum commented on GitHub (Nov 23, 2023):

Hi @easp,

I've found a solution to the issue you mentioned.

The correct syntax is to omit the quotes around the prompt and keep them only for the path. For example, instead of using:

% ollama run yarn-mistral --verbose "$(cat iliad-book1-21k.txt)" Please summarize the proceeding text
for a friend who does not have access to it.

In my case I gave the full path of the file :

% ollama run llama2 --verbose "$(cat /Users/igor/song.txt)" Please translate in spanish the proceeding text

I hope this helps!

<!-- gh-comment-id:1824606150 --> @igorschlum commented on GitHub (Nov 23, 2023): Hi @easp, I've found a solution to the issue you mentioned. The correct syntax is to omit the quotes around the prompt and keep them only for the path. For example, instead of using: % ollama run yarn-mistral --verbose "$(cat iliad-book1-21k.txt)" Please summarize the proceeding text for a friend who does not have access to it. In my case I gave the full path of the file : % ollama run llama2 --verbose "$(cat /Users/igor/song.txt)" Please translate in spanish the proceeding text I hope this helps!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#493