[GH-ISSUE #4560] phi3 medium small vision #28620

Open
opened 2026-04-22 07:03:41 -05:00 by GiteaMirror · 18 comments
Owner
Originally created by @olumolu on GitHub (May 21, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4560 https://huggingface.co/microsoft/Phi-3-medium-128k-instruct https://huggingface.co/microsoft/Phi-3-medium-4k-instruct https://huggingface.co/microsoft/Phi-3-small-8k-instruct https://huggingface.co/microsoft/Phi-3-small-128k-instruct Suggested by @Qualzz https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/
GiteaMirror added the model label 2026-04-22 07:03:41 -05:00
Author
Owner

@Qualzz commented on GitHub (May 21, 2024):

There is also the vision model:
https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/

<!-- gh-comment-id:2123263597 --> @Qualzz commented on GitHub (May 21, 2024): There is also the vision model: https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/
Author
Owner

@coder543 commented on GitHub (May 21, 2024):

https://github.com/ggerganov/llama.cpp/pull/7225 just merged, so I wonder if it's time to get the 128k models added to the library as well.

<!-- gh-comment-id:2123471499 --> @coder543 commented on GitHub (May 21, 2024): https://github.com/ggerganov/llama.cpp/pull/7225 just merged, so I wonder if it's time to get the 128k models added to the library as well.
Author
Owner

@sammcj commented on GitHub (May 21, 2024):

Ollama probably just needs to bump the llama.cpp submodule version, I have mine auto-updating each morning and it works great:

ollama run phi-3-medium-4k-instruct-bartowski:q5_k_m tell me a short joke --verbose
 Why don't scientists trust atoms? Because they make up everything!

(Note: This is an example of light-hearted humor that plays on the double meaning of "make up" as both to compose and to fabricate.)

total duration:       2.782812333s
load duration:        3.342083ms
prompt eval count:    9 token(s)
prompt eval duration: 801.181ms
prompt eval rate:     11.23 tokens/s
eval count:           51 token(s)
eval duration:        1.976573s
eval rate:            25.80 tokens/s

Modelfile:

FROM ./Phi-3-medium-4k-instruct-Q5_K_M.gguf
#FROM ./Phi-3-medium-128k-instruct-Q5_K_M.gguf

#  ollama create phi-3-medium-4k-instruct-bartowski:q5_k_m -f Modelfile-phi
#  ollama create phi-3-medium-128k-instruct-bartowski:q5_k_m -f Modelfile-phi

PARAMETER num_ctx 4096
#PARAMETER num_ctx 16384

TEMPLATE """{{ if .System }}<|system|>
{{ .System }}<|end|>
{{ end }}{{ if .Prompt }}<|user|>
{{ .Prompt }}<|end|>
{{ end }}<|assistant|>
{{ .Response }}<|end|>
"""

PARAMETER num_keep -1

PARAMETER stop "<|user|>"
PARAMETER stop "<|assistant|>"
PARAMETER stop "<|system|>"
PARAMETER stop "<|end|>"
PARAMETER stop "<|endoftext|>"
<!-- gh-comment-id:2123582706 --> @sammcj commented on GitHub (May 21, 2024): Ollama probably just needs to bump the llama.cpp submodule version, I have mine auto-updating each morning and it works great: ``` ollama run phi-3-medium-4k-instruct-bartowski:q5_k_m tell me a short joke --verbose Why don't scientists trust atoms? Because they make up everything! (Note: This is an example of light-hearted humor that plays on the double meaning of "make up" as both to compose and to fabricate.) total duration: 2.782812333s load duration: 3.342083ms prompt eval count: 9 token(s) prompt eval duration: 801.181ms prompt eval rate: 11.23 tokens/s eval count: 51 token(s) eval duration: 1.976573s eval rate: 25.80 tokens/s ``` Modelfile: ``` FROM ./Phi-3-medium-4k-instruct-Q5_K_M.gguf #FROM ./Phi-3-medium-128k-instruct-Q5_K_M.gguf # ollama create phi-3-medium-4k-instruct-bartowski:q5_k_m -f Modelfile-phi # ollama create phi-3-medium-128k-instruct-bartowski:q5_k_m -f Modelfile-phi PARAMETER num_ctx 4096 #PARAMETER num_ctx 16384 TEMPLATE """{{ if .System }}<|system|> {{ .System }}<|end|> {{ end }}{{ if .Prompt }}<|user|> {{ .Prompt }}<|end|> {{ end }}<|assistant|> {{ .Response }}<|end|> """ PARAMETER num_keep -1 PARAMETER stop "<|user|>" PARAMETER stop "<|assistant|>" PARAMETER stop "<|system|>" PARAMETER stop "<|end|>" PARAMETER stop "<|endoftext|>" ```
Author
Owner

@mxyng commented on GitHub (May 21, 2024):

Phi3 medium 4k is available here. 128k, small, and vision models are coming soon ™️

<!-- gh-comment-id:2123598628 --> @mxyng commented on GitHub (May 21, 2024): Phi3 medium 4k is available [here](https://ollama.com/library/phi3:medium). 128k, small, and vision models are coming soon :tm:
Author
Owner

@coder543 commented on GitHub (May 22, 2024):

While you're quantizing models, I also have noticed several times that the mini-4k model only has fp16 and q4_K_M. I guess it would be cool to have the rest of the quantizations, but I'll probably stop caring once the mini-128k model is quantized and usable, haha

<!-- gh-comment-id:2123675110 --> @coder543 commented on GitHub (May 22, 2024): While you're quantizing models, I also have noticed several times that the mini-4k model only has fp16 and q4_K_M. I guess it would be cool to have the rest of the quantizations, but I'll probably stop caring once the mini-128k model is quantized and usable, haha
Author
Owner

@JayaKrishnaSK commented on GitHub (May 22, 2024):

Hi all, just now i have pulled Phi3 medium in ollama and i'm getting error "ollama run phi3:medium
Error: llama runner process no longer running: -1" but phi3 mini is working fine. system info: Linux (POP OS), CPU: AMD 9 5900, RAM: 32GB and GPU: Nvidia RTX 3070 8GB

<!-- gh-comment-id:2123753825 --> @JayaKrishnaSK commented on GitHub (May 22, 2024): Hi all, just now i have pulled Phi3 medium in ollama and i'm getting error "ollama run phi3:medium Error: llama runner process no longer running: -1" but phi3 mini is working fine. system info: Linux (POP OS), CPU: AMD 9 5900, RAM: 32GB and GPU: Nvidia RTX 3070 8GB
Author
Owner

@coder543 commented on GitHub (May 22, 2024):

Getting the new Phi-3 models working is going to require a newer version of llama.cpp than ollama currently has. Some of the required changes were merged into llama.cpp only a few hours ago. Some people in this thread have been testing it by building ollama from scratch with the updated llama.cpp, but I haven't tried to do that yet.

<!-- gh-comment-id:2123755256 --> @coder543 commented on GitHub (May 22, 2024): Getting the new Phi-3 models working is going to require a newer version of llama.cpp than ollama currently has. Some of the required changes were merged into llama.cpp only a few hours ago. Some people in this thread have been testing it by building ollama from scratch with the updated llama.cpp, but I haven't tried to do that yet.
Author
Owner

@JayaKrishnaSK commented on GitHub (May 22, 2024):

I have updated ollama and now i'm able to run Phi3 medium. Thank you.

<!-- gh-comment-id:2123765441 --> @JayaKrishnaSK commented on GitHub (May 22, 2024): I have updated ollama and now i'm able to run Phi3 medium. Thank you.
Author
Owner

@sammcj commented on GitHub (May 22, 2024):

FYI for the future to update the llama.cpp submodule when building Ollama:

  1. Clone Ollama
  2. git submodule init
  3. git submodule update --remote
  4. OLLAMA_SKIP_PATCHING=true go generate ./...
  5. go build
<!-- gh-comment-id:2123765829 --> @sammcj commented on GitHub (May 22, 2024): FYI for the future to update the llama.cpp submodule when building Ollama: 1. Clone Ollama 2. `git submodule init` 3. `git submodule update --remote` 4. `OLLAMA_SKIP_PATCHING=true go generate ./...` 5. `go build`
Author
Owner

@coder543 commented on GitHub (May 22, 2024):

And I see this popped up a few seconds ago!

<!-- gh-comment-id:2123770761 --> @coder543 commented on GitHub (May 22, 2024): And I see [this](https://github.com/ollama/ollama/releases/tag/v0.1.39-rc1) popped up a few seconds ago!
Author
Owner

@coder543 commented on GitHub (May 22, 2024):

phi3:14b-medium-4k-instruct-q8_0 seems to be working well in some quick testing!

<!-- gh-comment-id:2123791440 --> @coder543 commented on GitHub (May 22, 2024): `phi3:14b-medium-4k-instruct-q8_0` seems to be working well in some quick testing!
Author
Owner

@nonetrix commented on GitHub (May 22, 2024):

I am getting weird results, does there need to be a update I assume?
image
Edit: ah I see sorry, it's not packaged on NixOS gotta wait I guess

<!-- gh-comment-id:2123801968 --> @nonetrix commented on GitHub (May 22, 2024): I am getting weird results, does there need to be a update I assume? ![image](https://github.com/ollama/ollama/assets/45698918/8e22e5e0-ebcc-402f-a7f0-95d08fc117bf) Edit: ah I see sorry, it's not packaged on NixOS gotta wait I guess
Author
Owner

@coder543 commented on GitHub (May 22, 2024):

I also see similar things… simply saying “Hello” does seem to cause problems for it here. I’m seeing both **Instruction and ### Instruction just before it goes off the rails, so maybe those should be added as stop tokens?

<!-- gh-comment-id:2123816133 --> @coder543 commented on GitHub (May 22, 2024): I also see similar things… simply saying “Hello” does seem to cause problems for it here. I’m seeing both `**Instruction` and `### Instruction` just before it goes off the rails, so maybe those should be added as stop tokens?
Author
Owner

@coder543 commented on GitHub (May 22, 2024):

This could also be related: https://github.com/ggerganov/llama.cpp/pull/7449

<!-- gh-comment-id:2123817315 --> @coder543 commented on GitHub (May 22, 2024): This could also be related: https://github.com/ggerganov/llama.cpp/pull/7449
Author
Owner

@Qualzz commented on GitHub (May 23, 2024):

This could also be related: ggerganov/llama.cpp#7449

Merged in master !

<!-- gh-comment-id:2127642878 --> @Qualzz commented on GitHub (May 23, 2024): > This could also be related: [ggerganov/llama.cpp#7449](https://github.com/ggerganov/llama.cpp/pull/7449) Merged in master !
Author
Owner

@tkoenig89 commented on GitHub (Jun 3, 2024):

There is a recent PR opened to fix/implement GGUF conversion of phi3-vision: https://github.com/ggerganov/llama.cpp/pull/7705

This might be the missing piece of the puzzle to get this into ollama sooner or later.

<!-- gh-comment-id:2144401585 --> @tkoenig89 commented on GitHub (Jun 3, 2024): There is a recent PR opened to fix/implement GGUF conversion of phi3-vision: https://github.com/ggerganov/llama.cpp/pull/7705 This might be the missing piece of the puzzle to get this into ollama **sooner** or later.
Author
Owner

@coder543 commented on GitHub (Sep 26, 2024):

@olumolu can you link to the PR that fixed Phi3 small and Phi3 Vision? I didn’t think they were supported yet.

<!-- gh-comment-id:2376855726 --> @coder543 commented on GitHub (Sep 26, 2024): @olumolu can you link to the PR that fixed Phi3 small and Phi3 Vision? I didn’t think they were supported yet.
Author
Owner

@olumolu commented on GitHub (Sep 26, 2024):

Yes it is not it was opened since a long time so i closed this.

<!-- gh-comment-id:2377251017 --> @olumolu commented on GitHub (Sep 26, 2024): Yes it is not it was opened since a long time so i closed this.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#28620