[GH-ISSUE #3917] I have noticed something extremely strange about what ollama does with Phi-3 models. #2429

Closed
opened 2026-04-12 12:44:34 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @phalexo on GitHub (Apr 25, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3917

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

(Pythogora) developer@ai:~/PROJECTS/gpt-pilot/pilot$ ~/ollama/ollama list
NAME ID SIZE MODIFIED
Meta-Llama-3-70B-Instruct-.Q5_K_M:latest 746bce3a52ed 49 GB 2 days ago
hermes-2-Pro-Mistral-7B.Q8_0:latest 86624d435749 7.7 GB 2 weeks ago
llama3:latest 71a106a91016 4.7 GB 5 days ago
mistral-7b-instruct-v0.2.Q6_K:latest 37b7edd947a2 5.9 GB 4 months ago
mixtral-8x7b-instruct-v0.1.Q6_K:latest 611ec22ab3e7 38 GB 4 weeks ago
notus-7b-v1.Q6_K:latest f04807d7e58e 5.9 GB 4 months ago
phi-3-mini-128k-instruct.Q6_K:latest 3a035ccf60bd 3.1 GB 44 minutes ago
phi-3-mini-4k-instruct.16b:latest 4eb8627d3836 7.6 GB 24 hours ago
phind-codellama-34b-v2.Q6_K:latest e7f0d1897af2 27 GB 4 weeks ago
stable-code-instruct-3b-Q8_0:latest 390e72938bcf 3.0 GB 4 weeks ago

One phi-3 model is 3.1GB and the other is 7.6GB. It appears to load multiple copies into GPUs.
If I run an interactive session, only one appears to respond, but with gpt-pilot, they seem to start talking all at the same time, giving me gibberish.

Thu Apr 25 14:06:02 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:04:00.0 Off |                  N/A |
| 22%   14C    P8    17W / 275W |  11927MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:05:00.0 Off |                  N/A |
| 22%   16C    P8    19W / 275W |  11136MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  On   | 00000000:08:00.0 Off |                  N/A |
| 22%   15C    P8    18W / 275W |  11136MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  On   | 00000000:09:00.0 Off |                  N/A |
| 22%   14C    P8    19W / 275W |  10421MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   4  NVIDIA GeForce ...  On   | 00000000:85:00.0 Off |                  N/A |
|  0%   22C    P8    12W / 177W |      0MiB /  4096MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   3651168      C   ...a_v11/ollama_llama_server    11922MiB |
|    1   N/A  N/A   3651168      C   ...a_v11/ollama_llama_server    11131MiB |
|    2   N/A  N/A   3651168      C   ...a_v11/ollama_llama_server    11131MiB |
|    3   N/A  N/A   3651168      C   ...a_v11/ollama_llama_server    10416MiB |
+-----------------------------------------------------------------------------+


OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

built from github source this morning.

Originally created by @phalexo on GitHub (Apr 25, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3917 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? (Pythogora) developer@ai:~/PROJECTS/gpt-pilot/pilot$ ~/ollama/ollama list NAME ID SIZE MODIFIED Meta-Llama-3-70B-Instruct-.Q5_K_M:latest 746bce3a52ed 49 GB 2 days ago hermes-2-Pro-Mistral-7B.Q8_0:latest 86624d435749 7.7 GB 2 weeks ago llama3:latest 71a106a91016 4.7 GB 5 days ago mistral-7b-instruct-v0.2.Q6_K:latest 37b7edd947a2 5.9 GB 4 months ago mixtral-8x7b-instruct-v0.1.Q6_K:latest 611ec22ab3e7 38 GB 4 weeks ago notus-7b-v1.Q6_K:latest f04807d7e58e 5.9 GB 4 months ago phi-3-mini-128k-instruct.Q6_K:latest 3a035ccf60bd 3.1 GB 44 minutes ago phi-3-mini-4k-instruct.16b:latest 4eb8627d3836 7.6 GB 24 hours ago phind-codellama-34b-v2.Q6_K:latest e7f0d1897af2 27 GB 4 weeks ago stable-code-instruct-3b-Q8_0:latest 390e72938bcf 3.0 GB 4 weeks ago **_One phi-3 model is 3.1GB and the other is 7.6GB. It appears to load multiple copies into GPUs. If I run an interactive session, only one appears to respond, but with gpt-pilot, they seem to start talking all at the same time, giving me gibberish._** ```bash Thu Apr 25 14:06:02 2024 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... On | 00000000:04:00.0 Off | N/A | | 22% 14C P8 17W / 275W | 11927MiB / 12288MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA GeForce ... On | 00000000:05:00.0 Off | N/A | | 22% 16C P8 19W / 275W | 11136MiB / 12288MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 NVIDIA GeForce ... On | 00000000:08:00.0 Off | N/A | | 22% 15C P8 18W / 275W | 11136MiB / 12288MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 NVIDIA GeForce ... On | 00000000:09:00.0 Off | N/A | | 22% 14C P8 19W / 275W | 10421MiB / 12288MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 4 NVIDIA GeForce ... On | 00000000:85:00.0 Off | N/A | | 0% 22C P8 12W / 177W | 0MiB / 4096MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 3651168 C ...a_v11/ollama_llama_server 11922MiB | | 1 N/A N/A 3651168 C ...a_v11/ollama_llama_server 11131MiB | | 2 N/A N/A 3651168 C ...a_v11/ollama_llama_server 11131MiB | | 3 N/A N/A 3651168 C ...a_v11/ollama_llama_server 10416MiB | +-----------------------------------------------------------------------------+ ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version built from github source this morning.
GiteaMirror added the bug label 2026-04-12 12:44:34 -05:00
Author
Owner

@davidearlyoung commented on GitHub (Apr 27, 2024):

I think that this might be related to the interesting development that can be seen at: https://github.com/ollama/ollama/pull/3418#issuecomment-2080159140
Near the bottom of the #3418 issue you can see that dhiltgen has merged into the main.

<!-- gh-comment-id:2081025844 --> @davidearlyoung commented on GitHub (Apr 27, 2024): I think that this might be related to the interesting development that can be seen at: https://github.com/ollama/ollama/pull/3418#issuecomment-2080159140 Near the bottom of the #3418 issue you can see that dhiltgen has merged into the main.
Author
Owner

@phalexo commented on GitHub (Apr 27, 2024):

This does sound interesting. If this is what is happening then there is a
problem. I tried using phi3 models within an agent framework, and got
gibberish output that looked like multiple models were all "talking" at the
same time stumbling over each other. They would need some kind of
multiplexor to control this behavior.

On Sat, Apr 27, 2024, 12:28 PM David Young @.***> wrote:

I think that this might be related to the interesting development that can
be seen at: #3418 (comment)
https://github.com/ollama/ollama/pull/3418#issuecomment-2080159140
Near the bottom of the #3418 https://github.com/ollama/ollama/pull/3418
issue you can see that dhiltgen has merged into the main.


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/3917#issuecomment-2081025844,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABDD3ZPJ37BZ7D6U6YJAQGTY7PG4NAVCNFSM6AAAAABGZLKCCCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBRGAZDKOBUGQ
.
You are receiving this because you authored the thread.Message ID:
@.***>

<!-- gh-comment-id:2081070611 --> @phalexo commented on GitHub (Apr 27, 2024): This does sound interesting. If this is what is happening then there is a problem. I tried using phi3 models within an agent framework, and got gibberish output that looked like multiple models were all "talking" at the same time stumbling over each other. They would need some kind of multiplexor to control this behavior. On Sat, Apr 27, 2024, 12:28 PM David Young ***@***.***> wrote: > I think that this might be related to the interesting development that can > be seen at: #3418 (comment) > <https://github.com/ollama/ollama/pull/3418#issuecomment-2080159140> > Near the bottom of the #3418 <https://github.com/ollama/ollama/pull/3418> > issue you can see that dhiltgen has merged into the main. > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/3917#issuecomment-2081025844>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABDD3ZPJ37BZ7D6U6YJAQGTY7PG4NAVCNFSM6AAAAABGZLKCCCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBRGAZDKOBUGQ> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> >
Author
Owner

@davidearlyoung commented on GitHub (Apr 27, 2024):

That is a really good point to reiterate.

<!-- gh-comment-id:2081095621 --> @davidearlyoung commented on GitHub (Apr 27, 2024): That is a really good point to reiterate.
Author
Owner

@dhiltgen commented on GitHub (May 1, 2024):

Initial support for phi3 was added in 0.1.32, and conversion should be working in 0.1.33. Please give the latest RC a try and let us know if you're still having problems.

https://github.com/ollama/ollama/releases

<!-- gh-comment-id:2089206361 --> @dhiltgen commented on GitHub (May 1, 2024): Initial support for phi3 was added in 0.1.32, and conversion should be working in 0.1.33. Please give the latest RC a try and let us know if you're still having problems. https://github.com/ollama/ollama/releases
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#2429