[GH-ISSUE #2960] [Win11] mistral 7B performance down between 0.1.28 and 0.1.27 #63857

Closed
opened 2026-05-03 15:14:12 -05:00 by GiteaMirror · 14 comments
Owner

Originally created by @stevengans on GitHub (Mar 6, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2960

Originally assigned to: @dhiltgen on GitHub.

Using:
curl http://localhost:11434/api/chat -d '{ "model": "mistral", "messages": [ { "role": "user", "content": "why is the sky blue?" } ] }'

Hardware:
Gpu: Nvidia RTX A5000
Cpu: Intel i5-12600K
Mem: 64GB
Os: Windows 11 21H2

Performance is x4 slower on calls from what I've witnessed.

Originally created by @stevengans on GitHub (Mar 6, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2960 Originally assigned to: @dhiltgen on GitHub. Using: `curl http://localhost:11434/api/chat -d '{ "model": "mistral", "messages": [ { "role": "user", "content": "why is the sky blue?" } ] }'` Hardware: Gpu: Nvidia RTX A5000 Cpu: Intel i5-12600K Mem: 64GB Os: Windows 11 21H2 Performance is x4 slower on calls from what I've witnessed.
GiteaMirror added the bug label 2026-05-03 15:14:12 -05:00
Author
Owner

@jmorganca commented on GitHub (Mar 6, 2024):

Hi @stevengans would it be possible to share the server logs? That will tell if Ollama is indeed using your GPU properly. You can find that by clicking on the Ollama taskbar item -> View Logs -> server.log. Thanks so much!

<!-- gh-comment-id:1981700521 --> @jmorganca commented on GitHub (Mar 6, 2024): Hi @stevengans would it be possible to share the server logs? That will tell if Ollama is indeed using your GPU properly. You can find that by clicking on the Ollama taskbar item -> View Logs -> server.log. Thanks so much!
Author
Owner

@stevengans commented on GitHub (Mar 6, 2024):

Tested on same loop:
server 0.1.27.log
server 0.1.28.log

server0.1.27.log total time: 3.21m
server0.1.28.log total time: 4.23m

<!-- gh-comment-id:1981757748 --> @stevengans commented on GitHub (Mar 6, 2024): Tested on same loop: [server 0.1.27.log](https://github.com/ollama/ollama/files/14515371/server.0.1.27.log) [server 0.1.28.log](https://github.com/ollama/ollama/files/14515372/server.0.1.28.log) server0.1.27.log total time: 3.21m server0.1.28.log total time: 4.23m
Author
Owner

@stevengans commented on GitHub (Mar 6, 2024):

I'll add that 0.1.27's logs are much nicer than 0.1.28...

<!-- gh-comment-id:1981758484 --> @stevengans commented on GitHub (Mar 6, 2024): I'll add that 0.1.27's logs are much nicer than 0.1.28...
Author
Owner

@dhiltgen commented on GitHub (Mar 6, 2024):

I just tried to repro on win 11 with a 4090 and perf seems relatively consistent for me.

0.1.27

PS C:\Users\danie> ollama run mistral
>>> /set verbose
Set 'verbose' mode.
>>> why is the sky blue?
 The color of the sky appears blue due to a phenomenon called Rayleigh scattering. As sunlight passes through
Earth's atmosphere, it interacts with different gases and particles present in the air. The short-wavelength blue
light gets scattered more easily than other colors because molecules in the atmosphere are smaller than the
wavelength of blue light. As a result, when we look up at the sky, we primarily see the scattered blue light,
making it appear blue to our eyes. However, during sunrise or sunset, the sky can display various shades of red,
pink, orange, and purple due to the presence of other gases like ozone and scattering at longer wavelengths.

total duration:       1.4990532s
load duration:        502µs
prompt eval count:    15 token(s)
prompt eval duration: 174.283ms
prompt eval rate:     86.07 tokens/s
eval count:           143 token(s)
eval duration:        1.323756s
eval rate:            108.03 tokens/s
>>> /bye

0.1.28:

PS C:\Users\danie> ollama run mistral
>>> /set verbose
Set 'verbose' mode.
>>> why is the sky blue?
 The color of the sky appears blue due to a natural phenomenon called Rayleigh scattering. As sunlight reaches
Earth's atmosphere, it interacts with gases and particles in the air, causing the shorter wavelengths of light
(blue and violet) to be scattered more effectively than longer wavelengths (yellow, orange, and red). Our eyes are
more sensitive to blue light, and we see the sky as blue during a clear day. However, it may appear other colors
depending on the presence or absence of other pollutants or particles in the atmosphere.

total duration:       1.3887087s
load duration:        518.3µs
prompt eval count:    15 token(s)
prompt eval duration: 224.925ms
prompt eval rate:     66.69 tokens/s
eval count:           117 token(s)
eval duration:        1.162431s
eval rate:            100.65 tokens/s
>>> /bye

With my setup, the model fully loads into the GPU, whereas on the RTX A5000 it's only able to load 26/33 layers to GPU, so perhaps there's a perf regression when models are split between CPU and GPU?

@stevengans can you try another model that fully fits in your GPU to see if you see the same perf hit, or if things are roughly the same? I'll try some other models on my side as well to see if I can repro with a different approach.

<!-- gh-comment-id:1981928620 --> @dhiltgen commented on GitHub (Mar 6, 2024): I just tried to repro on win 11 with a 4090 and perf seems relatively consistent for me. 0.1.27 ``` PS C:\Users\danie> ollama run mistral >>> /set verbose Set 'verbose' mode. >>> why is the sky blue? The color of the sky appears blue due to a phenomenon called Rayleigh scattering. As sunlight passes through Earth's atmosphere, it interacts with different gases and particles present in the air. The short-wavelength blue light gets scattered more easily than other colors because molecules in the atmosphere are smaller than the wavelength of blue light. As a result, when we look up at the sky, we primarily see the scattered blue light, making it appear blue to our eyes. However, during sunrise or sunset, the sky can display various shades of red, pink, orange, and purple due to the presence of other gases like ozone and scattering at longer wavelengths. total duration: 1.4990532s load duration: 502µs prompt eval count: 15 token(s) prompt eval duration: 174.283ms prompt eval rate: 86.07 tokens/s eval count: 143 token(s) eval duration: 1.323756s eval rate: 108.03 tokens/s >>> /bye ``` 0.1.28: ``` PS C:\Users\danie> ollama run mistral >>> /set verbose Set 'verbose' mode. >>> why is the sky blue? The color of the sky appears blue due to a natural phenomenon called Rayleigh scattering. As sunlight reaches Earth's atmosphere, it interacts with gases and particles in the air, causing the shorter wavelengths of light (blue and violet) to be scattered more effectively than longer wavelengths (yellow, orange, and red). Our eyes are more sensitive to blue light, and we see the sky as blue during a clear day. However, it may appear other colors depending on the presence or absence of other pollutants or particles in the atmosphere. total duration: 1.3887087s load duration: 518.3µs prompt eval count: 15 token(s) prompt eval duration: 224.925ms prompt eval rate: 66.69 tokens/s eval count: 117 token(s) eval duration: 1.162431s eval rate: 100.65 tokens/s >>> /bye ``` With my setup, the model fully loads into the GPU, whereas on the RTX A5000 it's only able to load 26/33 layers to GPU, so perhaps there's a perf regression when models are split between CPU and GPU? @stevengans can you try another model that fully fits in your GPU to see if you see the same perf hit, or if things are roughly the same? I'll try some other models on my side as well to see if I can repro with a different approach.
Author
Owner

@dhiltgen commented on GitHub (Mar 6, 2024):

I just tried a larger model on my GPU: mixtral -> 27/33 layers to GPU

Perf actually got slightly better (rounding error though)

0.1.27

>>> why is the sky blue?
 The sky appears blue because of a process called Rayleigh scattering. As sunlight reaches Earth's atmosphere, it
is made up of different colors, which are essentially different wavelengths of light. Shorter wavelengths (like
violet and blue) are scattered in all directions more than other colors like red, orange, and yellow, which have
longer wavelengths.

Even though violet light is scattered more than blue light, the sky appears blue rather than violet because our
eyes are more sensitive to blue light and because sunlight reaches us with less violet light to begin with.
Additionally, some of the violet light gets absorbed by the ozone layer in the atmosphere. As a result, the sky we
observe is predominantly blue.

total duration:       8.9477387s
load duration:        504.6µs
prompt eval count:    16 token(s)
prompt eval duration: 581.987ms
prompt eval rate:     27.49 tokens/s
eval count:           154 token(s)
eval duration:        8.361723s
eval rate:            18.42 tokens/s
>>>

0.1.28

>>> why is the sky blue?
 The sky appears blue because of a process called Rayleigh scattering. As sunlight passes through the Earth's
atmosphere, it interacts with molecules and particles in the air such as nitrogen and oxygen. These particles
scatter the light in all directions. Blue light has a shorter wavelength and gets scattered more easily than other
colors, such as red or yellow, which have longer wavelengths. As a result, when we look up at the sky, we
predominantly see the blue light that has been scattered, making the sky appear blue.

total duration:       6.3294156s
load duration:        1.0415ms
prompt eval count:    16 token(s)
prompt eval duration: 605.577ms
prompt eval rate:     26.42 tokens/s
eval count:           111 token(s)
eval duration:        5.72223s
eval rate:            19.40 tokens/s
>>>
<!-- gh-comment-id:1981985001 --> @dhiltgen commented on GitHub (Mar 6, 2024): I just tried a larger model on my GPU: mixtral -> 27/33 layers to GPU Perf actually got slightly better (rounding error though) 0.1.27 ``` >>> why is the sky blue? The sky appears blue because of a process called Rayleigh scattering. As sunlight reaches Earth's atmosphere, it is made up of different colors, which are essentially different wavelengths of light. Shorter wavelengths (like violet and blue) are scattered in all directions more than other colors like red, orange, and yellow, which have longer wavelengths. Even though violet light is scattered more than blue light, the sky appears blue rather than violet because our eyes are more sensitive to blue light and because sunlight reaches us with less violet light to begin with. Additionally, some of the violet light gets absorbed by the ozone layer in the atmosphere. As a result, the sky we observe is predominantly blue. total duration: 8.9477387s load duration: 504.6µs prompt eval count: 16 token(s) prompt eval duration: 581.987ms prompt eval rate: 27.49 tokens/s eval count: 154 token(s) eval duration: 8.361723s eval rate: 18.42 tokens/s >>> ``` 0.1.28 ``` >>> why is the sky blue? The sky appears blue because of a process called Rayleigh scattering. As sunlight passes through the Earth's atmosphere, it interacts with molecules and particles in the air such as nitrogen and oxygen. These particles scatter the light in all directions. Blue light has a shorter wavelength and gets scattered more easily than other colors, such as red or yellow, which have longer wavelengths. As a result, when we look up at the sky, we predominantly see the blue light that has been scattered, making the sky appear blue. total duration: 6.3294156s load duration: 1.0415ms prompt eval count: 16 token(s) prompt eval duration: 605.577ms prompt eval rate: 26.42 tokens/s eval count: 111 token(s) eval duration: 5.72223s eval rate: 19.40 tokens/s >>> ```
Author
Owner

@stevengans commented on GitHub (Mar 7, 2024):

Fits on mixtral:8x7b-instruct-v0.1-q2_K. Will run tests on this model on both 0.1.28 and 0.1.27.

<!-- gh-comment-id:1982348346 --> @stevengans commented on GitHub (Mar 7, 2024): Fits on mixtral:8x7b-instruct-v0.1-q2_K. Will run tests on this model on both 0.1.28 and 0.1.27.
Author
Owner

@stevengans commented on GitHub (Mar 7, 2024):

@dhiltgen
Ran the test for 0.1.28 (will now run the test for 0.1.27) but hit another issue below:
llm_load_tensors: offloaded 33/33 layers to GPU llm_load_tensors: CPU buffer size = 41.02 MiB llm_load_tensors: CUDA0 buffer size = 14877.55 MiB

It goes from working to hanging:

{"function":"print_timings","level":"INFO","line":278,"msg":"generation eval time = 3584.46 ms / 162 runs ( 22.13 ms per token, 45.20 tokens per second)","n_decoded":162,"n_tokens_second":45.19511736502422,"slot_id":0,"t_token":22.126283950617285,"t_token_generation":3584.458,"task_id":199424,"tid":"3744","timestamp":1709815679} {"function":"print_timings","level":"INFO","line":287,"msg":" total time = 4034.65 ms","slot_id":0,"t_prompt_processing":450.195,"t_token_generation":3584.458,"t_total":4034.6530000000002,"task_id":199424,"tid":"3744","timestamp":1709815679} {"function":"update_slots","level":"INFO","line":1635,"msg":"slot released","n_cache_tokens":1285,"n_ctx":2048,"n_past":1284,"n_system_tokens":0,"slot_id":0,"task_id":199424,"tid":"3744","timestamp":1709815679,"truncated":false} [GIN] 2024/03/07 - 07:47:59 | 200 | 4.0380037s | 127.0.0.1 | POST "/api/chat" {"function":"launch_slot_with_data","level":"INFO","line":826,"msg":"slot is processing task","slot_id":0,"task_id":199589,"tid":"3744","timestamp":1709815679} {"function":"update_slots","level":"INFO","line":1801,"msg":"slot progression","n_past":1052,"n_prompt_tokens_processed":77,"slot_id":0,"task_id":199589,"tid":"3744","timestamp":1709815679} {"function":"update_slots","level":"INFO","line":1825,"msg":"kv cache rm [p0, end)","p0":1052,"slot_id":0,"task_id":199589,"tid":"3744","timestamp":1709815679} {"function":"print_timings","level":"INFO","line":264,"msg":"prompt eval time = 470.96 ms / 77 tokens ( 6.12 ms per token, 163.50 tokens per second)","n_prompt_tokens_processed":77,"n_tokens_second":163.49653259950995,"slot_id":0,"t_prompt_processing":470.958,"t_token":6.116337662337663,"task_id":199589,"tid":"3744","timestamp":1709815690} {"function":"print_timings","level":"INFO","line":278,"msg":"generation eval time = 10514.48 ms / 474 runs ( 22.18 ms per token, 45.08 tokens per second)","n_decoded":474,"n_tokens_second":45.0806844389181,"slot_id":0,"t_token":22.18244936708861,"t_token_generation":10514.481,"task_id":199589,"tid":"3744","timestamp":1709815690} {"function":"print_timings","level":"INFO","line":287,"msg":" total time = 10985.44 ms","slot_id":0,"t_prompt_processing":470.958,"t_token_generation":10514.481,"t_total":10985.439,"task_id":199589,"tid":"3744","timestamp":1709815690} {"function":"update_slots","level":"INFO","line":1635,"msg":"slot released","n_cache_tokens":1603,"n_ctx":2048,"n_past":1602,"n_system_tokens":0,"slot_id":0,"task_id":199589,"tid":"3744","timestamp":1709815690,"truncated":false} [GIN] 2024/03/07 - 07:48:10 | 200 | 10.9893584s | 127.0.0.1 | POST "/api/chat" {"function":"launch_slot_with_data","level":"INFO","line":826,"msg":"slot is processing task","slot_id":0,"task_id":200066,"tid":"3744","timestamp":1709815690} {"function":"update_slots","level":"INFO","line":1801,"msg":"slot progression","n_past":1052,"n_prompt_tokens_processed":70,"slot_id":0,"task_id":200066,"tid":"3744","timestamp":1709815690} {"function":"update_slots","level":"INFO","line":1825,"msg":"kv cache rm [p0, end)","p0":1052,"slot_id":0,"task_id":200066,"tid":"3744","timestamp":1709815690} {"function":"update_slots","level":"INFO","line":1598,"msg":"slot context shift","n_cache_tokens":2048,"n_ctx":2048,"n_discard":1023,"n_keep":1,"n_left":2046,"n_past":2047,"n_system_tokens":0,"slot_id":0,"task_id":200066,"tid":"3744","timestamp":1709815712} {"function":"update_slots","level":"INFO","line":1598,"msg":"slot context shift","n_cache_tokens":2048,"n_ctx":2048,"n_discard":1023,"n_keep":1,"n_left":2046,"n_past":2047,"n_system_tokens":0,"slot_id":0,"task_id":200066,"tid":"3744","timestamp":1709815738} {"function":"update_slots","level":"INFO","line":1598,"msg":"slot context shift","n_cache_tokens":2048,"n_ctx":2048,"n_discard":1023,"n_keep":1,"n_left":2046,"n_past":2047,"n_system_tokens":0,"slot_id":0,"task_id":200066,"tid":"3744","timestamp":1709815763}

<!-- gh-comment-id:1983598280 --> @stevengans commented on GitHub (Mar 7, 2024): @dhiltgen Ran the test for 0.1.28 (will now run the test for 0.1.27) but hit another issue below: `llm_load_tensors: offloaded 33/33 layers to GPU llm_load_tensors: CPU buffer size = 41.02 MiB llm_load_tensors: CUDA0 buffer size = 14877.55 MiB` It goes from working to hanging: `{"function":"print_timings","level":"INFO","line":278,"msg":"generation eval time = 3584.46 ms / 162 runs ( 22.13 ms per token, 45.20 tokens per second)","n_decoded":162,"n_tokens_second":45.19511736502422,"slot_id":0,"t_token":22.126283950617285,"t_token_generation":3584.458,"task_id":199424,"tid":"3744","timestamp":1709815679} {"function":"print_timings","level":"INFO","line":287,"msg":" total time = 4034.65 ms","slot_id":0,"t_prompt_processing":450.195,"t_token_generation":3584.458,"t_total":4034.6530000000002,"task_id":199424,"tid":"3744","timestamp":1709815679} {"function":"update_slots","level":"INFO","line":1635,"msg":"slot released","n_cache_tokens":1285,"n_ctx":2048,"n_past":1284,"n_system_tokens":0,"slot_id":0,"task_id":199424,"tid":"3744","timestamp":1709815679,"truncated":false} [GIN] 2024/03/07 - 07:47:59 | 200 | 4.0380037s | 127.0.0.1 | POST "/api/chat" {"function":"launch_slot_with_data","level":"INFO","line":826,"msg":"slot is processing task","slot_id":0,"task_id":199589,"tid":"3744","timestamp":1709815679} {"function":"update_slots","level":"INFO","line":1801,"msg":"slot progression","n_past":1052,"n_prompt_tokens_processed":77,"slot_id":0,"task_id":199589,"tid":"3744","timestamp":1709815679} {"function":"update_slots","level":"INFO","line":1825,"msg":"kv cache rm [p0, end)","p0":1052,"slot_id":0,"task_id":199589,"tid":"3744","timestamp":1709815679} {"function":"print_timings","level":"INFO","line":264,"msg":"prompt eval time = 470.96 ms / 77 tokens ( 6.12 ms per token, 163.50 tokens per second)","n_prompt_tokens_processed":77,"n_tokens_second":163.49653259950995,"slot_id":0,"t_prompt_processing":470.958,"t_token":6.116337662337663,"task_id":199589,"tid":"3744","timestamp":1709815690} {"function":"print_timings","level":"INFO","line":278,"msg":"generation eval time = 10514.48 ms / 474 runs ( 22.18 ms per token, 45.08 tokens per second)","n_decoded":474,"n_tokens_second":45.0806844389181,"slot_id":0,"t_token":22.18244936708861,"t_token_generation":10514.481,"task_id":199589,"tid":"3744","timestamp":1709815690} {"function":"print_timings","level":"INFO","line":287,"msg":" total time = 10985.44 ms","slot_id":0,"t_prompt_processing":470.958,"t_token_generation":10514.481,"t_total":10985.439,"task_id":199589,"tid":"3744","timestamp":1709815690} {"function":"update_slots","level":"INFO","line":1635,"msg":"slot released","n_cache_tokens":1603,"n_ctx":2048,"n_past":1602,"n_system_tokens":0,"slot_id":0,"task_id":199589,"tid":"3744","timestamp":1709815690,"truncated":false} [GIN] 2024/03/07 - 07:48:10 | 200 | 10.9893584s | 127.0.0.1 | POST "/api/chat" {"function":"launch_slot_with_data","level":"INFO","line":826,"msg":"slot is processing task","slot_id":0,"task_id":200066,"tid":"3744","timestamp":1709815690} {"function":"update_slots","level":"INFO","line":1801,"msg":"slot progression","n_past":1052,"n_prompt_tokens_processed":70,"slot_id":0,"task_id":200066,"tid":"3744","timestamp":1709815690} {"function":"update_slots","level":"INFO","line":1825,"msg":"kv cache rm [p0, end)","p0":1052,"slot_id":0,"task_id":200066,"tid":"3744","timestamp":1709815690} {"function":"update_slots","level":"INFO","line":1598,"msg":"slot context shift","n_cache_tokens":2048,"n_ctx":2048,"n_discard":1023,"n_keep":1,"n_left":2046,"n_past":2047,"n_system_tokens":0,"slot_id":0,"task_id":200066,"tid":"3744","timestamp":1709815712} {"function":"update_slots","level":"INFO","line":1598,"msg":"slot context shift","n_cache_tokens":2048,"n_ctx":2048,"n_discard":1023,"n_keep":1,"n_left":2046,"n_past":2047,"n_system_tokens":0,"slot_id":0,"task_id":200066,"tid":"3744","timestamp":1709815738} {"function":"update_slots","level":"INFO","line":1598,"msg":"slot context shift","n_cache_tokens":2048,"n_ctx":2048,"n_discard":1023,"n_keep":1,"n_left":2046,"n_past":2047,"n_system_tokens":0,"slot_id":0,"task_id":200066,"tid":"3744","timestamp":1709815763}`
Author
Owner

@Pat02 commented on GitHub (Mar 7, 2024):

I am getting something very similar, when I run 'ollama run mistral' to test it works well for a few minutes, but quickly it just hangs and I never get the ollama prompt '>>>' , and the chat endpoint also hangs.

Hardware:
Gpu: Nvidia 4090
Cpu: Intel i9-14900K
Mem: 128GB
Os: Ubuntu Server 22.04.4 LTS

<!-- gh-comment-id:1983816805 --> @Pat02 commented on GitHub (Mar 7, 2024): I am getting something very similar, when I run 'ollama run mistral' to test it works well for a few minutes, but quickly it just hangs and I never get the ollama prompt '>>>' , and the chat endpoint also hangs. Hardware: Gpu: Nvidia 4090 Cpu: Intel i9-14900K Mem: 128GB Os: Ubuntu Server 22.04.4 LTS
Author
Owner

@stevengans commented on GitHub (Mar 7, 2024):

@Pat02 I'm testing on RTX A5000. Can you run the test to see if this is the case on 0.1.27?

<!-- gh-comment-id:1983880773 --> @stevengans commented on GitHub (Mar 7, 2024): @Pat02 I'm testing on RTX A5000. Can you run the test to see if this is the case on 0.1.27?
Author
Owner

@Pat02 commented on GitHub (Mar 7, 2024):

@stevengans is this what you mean by run the test?

also not sure how to see what version I'm running, I installed the latest version using the curl command this morning to make sure my problem was relevant.

/$ ollama run mistral

/set verbose
Set 'verbose' mode.
why is the sky blue?
The color of the sky appears blue due to a process called scattering of light by the atmosphere. When the sun emits light, it contains all colors of the visible light spectrum in different intensities. However, as this
light travels through the Earth's atmosphere, it interacts with molecules such as nitrogen and oxygen, which scatter shorter wavelengths (blue and violet) more than longer wavelengths (red, green, and yellow).

As a result, when we look up at the sky, we primarily see blue light because it is scattered in all directions and reaches our eyes from every direction. However, during sunrise or sunset, the sky can appear red, pink,
orange, or purple due to the different angles of the sun's rays interacting with the atmosphere and the presence of other pollutants or particles that scatter different wavelengths more effectively.

total duration: 1.309652804s
load duration: 1.025736ms
prompt eval duration: 17.495ms
prompt eval rate: 0.00 tokens/s
eval count: 184 token(s)
eval duration: 1.289725s
eval rate: 142.67 tokens/s

<!-- gh-comment-id:1984442548 --> @Pat02 commented on GitHub (Mar 7, 2024): @stevengans is this what you mean by run the test? also not sure how to see what version I'm running, I installed the latest version using the curl command this morning to make sure my problem was relevant. /$ ollama run mistral >>> /set verbose Set 'verbose' mode. >>> why is the sky blue? The color of the sky appears blue due to a process called scattering of light by the atmosphere. When the sun emits light, it contains all colors of the visible light spectrum in different intensities. However, as this light travels through the Earth's atmosphere, it interacts with molecules such as nitrogen and oxygen, which scatter shorter wavelengths (blue and violet) more than longer wavelengths (red, green, and yellow). As a result, when we look up at the sky, we primarily see blue light because it is scattered in all directions and reaches our eyes from every direction. However, during sunrise or sunset, the sky can appear red, pink, orange, or purple due to the different angles of the sun's rays interacting with the atmosphere and the presence of other pollutants or particles that scatter different wavelengths more effectively. total duration: 1.309652804s load duration: 1.025736ms prompt eval duration: 17.495ms prompt eval rate: 0.00 tokens/s eval count: 184 token(s) eval duration: 1.289725s eval rate: 142.67 tokens/s
Author
Owner

@batteryshark commented on GitHub (Mar 8, 2024):

I'm having a similar issue via serve where it's blowing out my context... like after a few separate generate calls, it starts sending

{"function":"update_slots","level":"INFO","line":1598,"msg":"slot context shift","n_cache_tokens":8192,"n_ctx":8192,"n_discard":4095,"n_keep":1,"n_left":8190,"n_past":8191,"n_system_tokens":0,"slot_id":0,"task_id":4194,"tid":"13692","timestamp":1709864519}

and either returns without understanding the context of what I gave it (which should fit fine in the context window) or it doesn't return at all and just spams this forever. I could understand if maybe it was a settings issue initially but it always works fine the first time I restart serve, it's just after the second or third run I get this loop or it doesn't reference my prompt.

<!-- gh-comment-id:1984933689 --> @batteryshark commented on GitHub (Mar 8, 2024): I'm having a similar issue via serve where it's blowing out my context... like after a few separate generate calls, it starts sending ``` {"function":"update_slots","level":"INFO","line":1598,"msg":"slot context shift","n_cache_tokens":8192,"n_ctx":8192,"n_discard":4095,"n_keep":1,"n_left":8190,"n_past":8191,"n_system_tokens":0,"slot_id":0,"task_id":4194,"tid":"13692","timestamp":1709864519} ``` and either returns without understanding the context of what I gave it (which should fit fine in the context window) or it doesn't return at all and just spams this forever. I could understand if maybe it was a settings issue initially but it always works fine the first time I restart serve, it's just after the second or third run I get this loop or it doesn't reference my prompt.
Author
Owner

@stevengans commented on GitHub (Mar 8, 2024):

@jmorganca There is a lot of interesting points here, but I think a new bug which has been encountered by @batteryshark @Pat02 and myself of a hang has been reproducible, whereas the initial bug I opened @dhiltgen hasn't been able to replicate and I haven't been able to as well...my guess is maybe I installed it badly? Would you be ok with closing this issue and then we can open a new issue for the above hang? Or would you want to keep this open?

<!-- gh-comment-id:1984946568 --> @stevengans commented on GitHub (Mar 8, 2024): @jmorganca There is a lot of interesting points here, but I think a new bug which has been encountered by @batteryshark @Pat02 and myself of a hang has been reproducible, whereas the initial bug I opened @dhiltgen hasn't been able to replicate and I haven't been able to as well...my guess is maybe I installed it badly? Would you be ok with closing this issue and then we can open a new issue for the above hang? Or would you want to keep this open?
Author
Owner

@dhiltgen commented on GitHub (Mar 21, 2024):

On the general topic of hanging, once #3218 is merged, we should be able to detect and recover from these failures.

<!-- gh-comment-id:2012341223 --> @dhiltgen commented on GitHub (Mar 21, 2024): On the general topic of hanging, once #3218 is merged, we should be able to detect and recover from these failures.
Author
Owner

@dhiltgen commented on GitHub (Apr 12, 2024):

I'm going to close this one based on the original issue.

Release 0.1.32 will switch us over to subprocessing, which will enable us to detect and recover from hangs in the underlying llama.cpp code.

<!-- gh-comment-id:2052622210 --> @dhiltgen commented on GitHub (Apr 12, 2024): I'm going to close this one based on the original issue. Release 0.1.32 will switch us over to subprocessing, which will enable us to detect and recover from hangs in the underlying llama.cpp code.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#63857