[GH-ISSUE #10143] Llama 4 support #32416

Closed
opened 2026-04-22 13:38:47 -05:00 by GiteaMirror · 61 comments
Owner

Originally created by @UmutAlihan on GitHub (Apr 5, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10143

I know that it has only been a couple of hours since Llama 4 model family has been released. However I believe it is good practive to ping the repo about when its support on Ollama will be available 😄

Looking very forward to inference with this new very long context multimodal mixture of experts model family on Ollama

official release: https://ai.meta.com/blog/llama-4-multimodal-intelligence/

cheers

Originally created by @UmutAlihan on GitHub (Apr 5, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10143 I know that it has only been a couple of hours since Llama 4 model family has been released. However I believe it is good practive to ping the repo about when its support on Ollama will be available 😄 Looking very forward to inference with this new very long context multimodal mixture of experts model family on Ollama official release: https://ai.meta.com/blog/llama-4-multimodal-intelligence/ cheers
GiteaMirror added the model label 2026-04-22 13:38:47 -05:00
Author
Owner

@AlbertoSinigaglia commented on GitHub (Apr 5, 2025):

The former (Llama 4 Scout) fits on a single H100 GPU (with Int4 quantization)

I get that models are always more exploiting the mixture of experts, but:

109B parameters, 4bit quantization = 55 Gigabytes

This ignoring the VRAM used for the KVCaching, which for the 10M context length is going to be giant...

<!-- gh-comment-id:2781129032 --> @AlbertoSinigaglia commented on GitHub (Apr 5, 2025): > The former (Llama 4 Scout) fits on a single H100 GPU (with Int4 quantization) I get that models are always more exploiting the mixture of experts, but: ``` 109B parameters, 4bit quantization = 55 Gigabytes ``` This ignoring the VRAM used for the KVCaching, which for the 10M context length is going to be giant...
Author
Owner

@coder543 commented on GitHub (Apr 5, 2025):

which for the 10M context length is going to be giant

I don't think they said anything about fitting onto a single GPU with 10M context. Very, very few use cases right now are going to involve a 10M context window.

<!-- gh-comment-id:2781131072 --> @coder543 commented on GitHub (Apr 5, 2025): > which for the 10M context length is going to be giant I don't think they said anything about fitting onto a single GPU with 10M context. Very, _very_ few use cases right now are going to involve a 10M context window.
Author
Owner

@sasank-desaraju commented on GitHub (Apr 5, 2025):

Work ongoing at #10141

<!-- gh-comment-id:2781136478 --> @sasank-desaraju commented on GitHub (Apr 5, 2025): Work ongoing at #10141
Author
Owner

@blinkysc commented on GitHub (Apr 6, 2025):

which for the 10M context length is going to be giant

I don't think they said anything about fitting onto a single GPU with 10M context. Very, very few use cases right now are going to involve a 10M context window.

They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine

<!-- gh-comment-id:2781158609 --> @blinkysc commented on GitHub (Apr 6, 2025): > > which for the 10M context length is going to be giant > > I don't think they said anything about fitting onto a single GPU with 10M context. Very, _very_ few use cases right now are going to involve a 10M context window. They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine
Author
Owner

@JeffTax commented on GitHub (Apr 6, 2025):

Looking forward to this 😄

<!-- gh-comment-id:2781189573 --> @JeffTax commented on GitHub (Apr 6, 2025): Looking forward to this 😄
Author
Owner

@sanjibnarzary commented on GitHub (Apr 6, 2025):

I need it to fit in single V100 16GB GPU

<!-- gh-comment-id:2781208267 --> @sanjibnarzary commented on GitHub (Apr 6, 2025): I need it to fit in single V100 16GB GPU
Author
Owner

@lxyeternal commented on GitHub (Apr 6, 2025):

I need the Llama4.

<!-- gh-comment-id:2781229108 --> @lxyeternal commented on GitHub (Apr 6, 2025): I need the Llama4.
Author
Owner

@marcussacana commented on GitHub (Apr 6, 2025):

Is there any hope to this model be pruned?

<!-- gh-comment-id:2781293709 --> @marcussacana commented on GitHub (Apr 6, 2025): Is there any hope to this model be pruned?
Author
Owner

@AlbertoSinigaglia commented on GitHub (Apr 6, 2025):

which for the 10M context length is going to be giant

I don't think they said anything about fitting onto a single GPU with 10M context. Very, very few use cases right now are going to involve a 10M context window.

They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine

Maybe memory wise, but not sure about inference speed. Also, a 1M context length usually in my experience requires 200Gb of memory for kv cache... so...

<!-- gh-comment-id:2781300704 --> @AlbertoSinigaglia commented on GitHub (Apr 6, 2025): > > > which for the 10M context length is going to be giant > > > > I don't think they said anything about fitting onto a single GPU with 10M context. Very, _very_ few use cases right now are going to involve a 10M context window. > > They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine Maybe memory wise, but not sure about inference speed. Also, a 1M context length usually in my experience requires 200Gb of memory for kv cache... so...
Author
Owner

@puzanov commented on GitHub (Apr 6, 2025):

How hard the Llama4-sout model will be quantified?

<!-- gh-comment-id:2781316295 --> @puzanov commented on GitHub (Apr 6, 2025): How hard the Llama4-sout model will be quantified?
Author
Owner

@jimccadm commented on GitHub (Apr 6, 2025):

which for the 10M context length is going to be giant

I don't think they said anything about fitting onto a single GPU with 10M context. Very, very few use cases right now are going to involve a 10M context window.

They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine

I'll be testing it on a 128Gb Macbook Pro with max cores across the board as soon as it lands on the model list.

<!-- gh-comment-id:2781321487 --> @jimccadm commented on GitHub (Apr 6, 2025): > > > which for the 10M context length is going to be giant > > > > > > I don't think they said anything about fitting onto a single GPU with 10M context. Very, _very_ few use cases right now are going to involve a 10M context window. > > They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine I'll be testing it on a 128Gb Macbook Pro with max cores across the board as soon as it lands on the model list.
Author
Owner

@jano403 commented on GitHub (Apr 6, 2025):

@puzanov @coder543 @sanjibnarzary @marcussacana @UmutAlihan @AlbertoSinigaglia @lxyeternal @sasank-desaraju @blinkysc @jimccadm @JeffTax @dhiltgen @rick-github @bmizerany @drnic @anaisbetts @sqs @lstep @herval @mattt @slouffka @danielpunkass @andygill @vincentkoc @yuiseki @neomantra @gbaptista @enricoros @9876691 @prusnak Whoever's working on this. Please add tools support, please. Thank you very much 🙂

saAAAAAAAAAAAAAAAAAaaar DO NOT REDEEM

<!-- gh-comment-id:2781385633 --> @jano403 commented on GitHub (Apr 6, 2025): > [@puzanov](https://github.com/puzanov) [@coder543](https://github.com/coder543) [@sanjibnarzary](https://github.com/sanjibnarzary) [@marcussacana](https://github.com/marcussacana) [@UmutAlihan](https://github.com/UmutAlihan) [@AlbertoSinigaglia](https://github.com/AlbertoSinigaglia) [@lxyeternal](https://github.com/lxyeternal) [@sasank-desaraju](https://github.com/sasank-desaraju) [@blinkysc](https://github.com/blinkysc) [@jimccadm](https://github.com/jimccadm) [@JeffTax](https://github.com/JeffTax) [@dhiltgen](https://github.com/dhiltgen) [@rick-github](https://github.com/rick-github) [@bmizerany](https://github.com/bmizerany) [@drnic](https://github.com/drnic) [@anaisbetts](https://github.com/anaisbetts) [@sqs](https://github.com/sqs) [@lstep](https://github.com/lstep) [@herval](https://github.com/herval) [@mattt](https://github.com/mattt) [@slouffka](https://github.com/slouffka) [@danielpunkass](https://github.com/danielpunkass) [@andygill](https://github.com/andygill) [@vincentkoc](https://github.com/vincentkoc) [@yuiseki](https://github.com/yuiseki) [@neomantra](https://github.com/neomantra) [@gbaptista](https://github.com/gbaptista) [@enricoros](https://github.com/enricoros) [@9876691](https://github.com/9876691) [@prusnak](https://github.com/prusnak) Whoever's working on this. Please add tools support, please. Thank you very much 🙂 saAAAAAAAAAAAAAAAAAaaar DO NOT REDEEM
Author
Owner

@Jabher commented on GitHub (Apr 6, 2025):

which for the 10M context length is going to be giant

I don't think they said anything about fitting onto a single GPU with 10M context. Very, very few use cases right now are going to involve a 10M context window.

They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine

128gb mbp owner here, can't wait to try

<!-- gh-comment-id:2781415766 --> @Jabher commented on GitHub (Apr 6, 2025): > > > which for the 10M context length is going to be giant > > > > > > I don't think they said anything about fitting onto a single GPU with 10M context. Very, _very_ few use cases right now are going to involve a 10M context window. > > They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine 128gb mbp owner here, can't wait to try
Author
Owner

@pavankay commented on GitHub (Apr 6, 2025):

Has Llama 4 been released yet?

<!-- gh-comment-id:2781471056 --> @pavankay commented on GitHub (Apr 6, 2025): Has Llama 4 been released yet?
Author
Owner

@pavankay commented on GitHub (Apr 6, 2025):

On Ollama

<!-- gh-comment-id:2781471142 --> @pavankay commented on GitHub (Apr 6, 2025): On Ollama
Author
Owner

@jpapenfuss commented on GitHub (Apr 7, 2025):

which for the 10M context length is going to be giant

I don't think they said anything about fitting onto a single GPU with 10M context. Very, very few use cases right now are going to involve a 10M context window.

They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine

I'll be testing it on a 128Gb Macbook Pro with max cores across the board as soon as it lands on the model list.

It's not going to fit in 128 gigabits, no matter how hard it's quantified.

<!-- gh-comment-id:2781909628 --> @jpapenfuss commented on GitHub (Apr 7, 2025): > > > > which for the 10M context length is going to be giant > > > > > > > > > I don't think they said anything about fitting onto a single GPU with 10M context. Very, _very_ few use cases right now are going to involve a 10M context window. > > > > > > They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine > > I'll be testing it on a 128Gb Macbook Pro with max cores across the board as soon as it lands on the model list. It's not going to fit in 128 gigabits, no matter how hard it's quantified.
Author
Owner

@jimccadm commented on GitHub (Apr 7, 2025):

Agreed. I had a couple of attempts and relaxed rules in LM Studio, no dice, doesn't fit.

<!-- gh-comment-id:2782177392 --> @jimccadm commented on GitHub (Apr 7, 2025): Agreed. I had a couple of attempts and relaxed rules in LM Studio, no dice, doesn't fit.
Author
Owner

@oreaba commented on GitHub (Apr 7, 2025):

looking forward to!

<!-- gh-comment-id:2782366684 --> @oreaba commented on GitHub (Apr 7, 2025): looking forward to!
Author
Owner

@ghmer commented on GitHub (Apr 7, 2025):

It should be noted that the license does not permit usage of llama4 by Europeans. When offering those models, don’t forget to add a big warning message 😣

<!-- gh-comment-id:2782374028 --> @ghmer commented on GitHub (Apr 7, 2025): It should be noted that the license does not permit usage of llama4 by Europeans. When offering those models, don’t forget to add a big warning message 😣
Author
Owner

@croqaz commented on GitHub (Apr 7, 2025):

Has Llama 4 been released yet?
On Ollama

Why are you asking? Did you pay the devs to release it in a few hours, over the weekend?

<!-- gh-comment-id:2782525748 --> @croqaz commented on GitHub (Apr 7, 2025): > Has Llama 4 been released yet? > On Ollama Why are you asking? Did you pay the devs to release it in a few hours, over the weekend?
Author
Owner

@PawelSzpyt commented on GitHub (Apr 7, 2025):

It should be noted that the license does not permit usage of llama4 by Europeans. When offering those models, don’t forget to add a big warning message 😣

From Meta's use-policy:
"This restriction does not apply to end users of a product or service that incorporates any such multimodal models."

Perhaps you are end user of a product (like a free product called Ollama) that incorporates Llama model, and in this case you can use it. Not a legal advice though.

<!-- gh-comment-id:2783524010 --> @PawelSzpyt commented on GitHub (Apr 7, 2025): > It should be noted that the license does not permit usage of llama4 by Europeans. When offering those models, don’t forget to add a big warning message 😣 From Meta's use-policy: "This restriction does not apply to end users of a product or service that incorporates any such multimodal models." Perhaps you are end user of a product (like a free product called Ollama) that incorporates Llama model, and in this case you can use it. Not a legal advice though.
Author
Owner

@Kwisss commented on GitHub (Apr 8, 2025):

which for the 10M context length is going to be giant

I don't think they said anything about fitting onto a single GPU with 10M context. Very, very few use cases right now are going to involve a 10M context window.

They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine

I'll be testing it on a 128Gb Macbook Pro with max cores across the board as soon as it lands on the model list.

It's not going to fit in 128 gigabits, no matter how hard it's quantified.

Is it to late to take that bet?

<!-- gh-comment-id:2785629471 --> @Kwisss commented on GitHub (Apr 8, 2025): > > > > > which for the 10M context length is going to be giant > > > > > > > > > > > > I don't think they said anything about fitting onto a single GPU with 10M context. Very, _very_ few use cases right now are going to involve a 10M context window. > > > > > > > > > They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine > > > > > > I'll be testing it on a 128Gb Macbook Pro with max cores across the board as soon as it lands on the model list. > > It's not going to fit in 128 gigabits, no matter how hard it's quantified. Is it to late to take that bet?
Author
Owner

@colout commented on GitHub (Apr 8, 2025):

which for the 10M context length is going to be giant

I don't think they said anything about fitting onto a single GPU with 10M context. Very, very few use cases right now are going to involve a 10M context window.

They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine

I'll be testing it on a 128Gb Macbook Pro with max cores across the board as soon as it lands on the model list.

It's not going to fit in 128 gigabits, no matter how hard it's quantified.

Is it to late to take that bet?

Quantifying the model won't make it fit in 128 Gigabits.

However, you can quantize the model to make it fit in 128 Gigabytes of memory.

In all seriousness, I have a 8845hs mini pc with 96GB RAM (dual channel 5600mhz) that runs qwen2.5:14b-instruct-q4_K_M model at a reasonable enough speed for CPU-only inference (about 5-7tk/s <8k context).

I'd be happy to test once this comes out (in the meantime, I'd love to see a non-bnb q4_K_M in general that I can try. Even if it's just with python transformers library to get a baseline while I wait for ollama)

Edit: In case anyone's interested, I got unsloth's q4_K_M running through the bleeding edge llama.cpp. around 2.3 tk/s with an empty context window.

<!-- gh-comment-id:2786838189 --> @colout commented on GitHub (Apr 8, 2025): > > > > > > which for the 10M context length is going to be giant > > > > > > > > > > > > > > > I don't think they said anything about fitting onto a single GPU with 10M context. Very, _very_ few use cases right now are going to involve a 10M context window. > > > > > > > > > > > > They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine > > > > > > > > > I'll be testing it on a 128Gb Macbook Pro with max cores across the board as soon as it lands on the model list. > > > > > > It's not going to fit in 128 gigabits, no matter how hard it's quantified. > > Is it to late to take that bet? *Quantifying* the model won't make it fit in 128 Giga*bit*s. However, you can *quantize* the model to make it fit in 128 Giga*bytes* of memory. In all seriousness, I have a 8845hs mini pc with 96GB RAM (dual channel 5600mhz) that runs `qwen2.5:14b-instruct-q4_K_M` model at a reasonable enough speed for CPU-only inference (about 5-7tk/s <8k context). I'd be happy to test once this comes out (in the meantime, I'd love to see a non-bnb `q4_K_M` in general that I can try. Even if it's just with python transformers library to get a baseline while I wait for ollama) Edit: In case anyone's interested, I got unsloth's `q4_K_M` running through the bleeding edge llama.cpp. around 2.3 tk/s with an empty context window.
Author
Owner

@igorschlum commented on GitHub (Apr 9, 2025):

I have a 192GB max studio ready to test llama4 with Ollama and share results.

<!-- gh-comment-id:2788326397 --> @igorschlum commented on GitHub (Apr 9, 2025): I have a 192GB max studio ready to test llama4 with Ollama and share results.
Author
Owner

@Luap2003 commented on GitHub (Apr 9, 2025):

I have a server with two H100 GPUs, and I'm really interested in testing it, especially since the blog post mentioned it should fit on just one.

<!-- gh-comment-id:2788473014 --> @Luap2003 commented on GitHub (Apr 9, 2025): I have a server with two H100 GPUs, and I'm really interested in testing it, especially since the blog post mentioned it should fit on just one.
Author
Owner

@gileneusz commented on GitHub (Apr 9, 2025):

I have a rack with 64 B200s and can't wait to test it soon!

<!-- gh-comment-id:2789659220 --> @gileneusz commented on GitHub (Apr 9, 2025): I have a rack with 64 B200s and can't wait to test it soon!
Author
Owner

@pakoito commented on GitHub (Apr 9, 2025):

I have a White Citroën 2CV and it contributes as much to this conversation as your posts.

<!-- gh-comment-id:2791194862 --> @pakoito commented on GitHub (Apr 9, 2025): I have a White Citroën 2CV and it contributes as much to this conversation as your posts.
Author
Owner

@igorschlum commented on GitHub (Apr 10, 2025):

I had a blue one and an orange buggy. I regret them both.

<!-- gh-comment-id:2791308880 --> @igorschlum commented on GitHub (Apr 10, 2025): I had a blue one and an orange buggy. I regret them both.
Author
Owner

@dineshkumartp7 commented on GitHub (Apr 10, 2025):

which for the 10M context length is going to be giant

I don't think they said anything about fitting onto a single GPU with 10M context. Very, very few use cases right now are going to involve a 10M context window.

They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine

128gb mbp owner here, can't wait to try

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.05 Driver Version: 560.35.05 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100 80GB PCIe Off | 00000000:38:00.0 Off | 0 |
| N/A 36C P0 48W / 300W | 188MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A100 80GB PCIe Off | 00000000:A8:00.0 Off | 0 |
| N/A 31C P0 44W / 300W | 4MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA A100 80GB PCIe Off | 00000000:B8:00.0 Off | 0 |
| N/A 37C P0 43W / 300W | 4MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1391600 G /usr/libexec/Xorg 108MiB |
| 0 N/A N/A 1391624 G /usr/bin/gnome-shell 17MiB |
| 0 N/A N/A 4043059 C+G missioncenter 30MiB |
+-----------------------------------------------------------------------------------------+

Cant wait to try :)

<!-- gh-comment-id:2791832617 --> @dineshkumartp7 commented on GitHub (Apr 10, 2025): > > > > which for the 10M context length is going to be giant > > > > > > > > > I don't think they said anything about fitting onto a single GPU with 10M context. Very, _very_ few use cases right now are going to involve a 10M context window. > > > > > > They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine > > 128gb mbp owner here, can't wait to try +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 560.35.05 Driver Version: 560.35.05 CUDA Version: 12.6 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA A100 80GB PCIe Off | 00000000:38:00.0 Off | 0 | | N/A 36C P0 48W / 300W | 188MiB / 81920MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA A100 80GB PCIe Off | 00000000:A8:00.0 Off | 0 | | N/A 31C P0 44W / 300W | 4MiB / 81920MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 2 NVIDIA A100 80GB PCIe Off | 00000000:B8:00.0 Off | 0 | | N/A 37C P0 43W / 300W | 4MiB / 81920MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 1391600 G /usr/libexec/Xorg 108MiB | | 0 N/A N/A 1391624 G /usr/bin/gnome-shell 17MiB | | 0 N/A N/A 4043059 C+G missioncenter 30MiB | +-----------------------------------------------------------------------------------------+ Cant wait to try :)
Author
Owner

@AlbertoSinigaglia commented on GitHub (Apr 10, 2025):

This is getting out of hand

<!-- gh-comment-id:2791987504 --> @AlbertoSinigaglia commented on GitHub (Apr 10, 2025): This is getting out of hand
Author
Owner

@aravhawk commented on GitHub (Apr 10, 2025):

which for the 10M context length is going to be giant

I don't think they said anything about fitting onto a single GPU with 10M context. Very, very few use cases right now are going to involve a 10M context window.

They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine

I'll be testing it on a 128Gb Macbook Pro with max cores across the board as soon as it lands on the model list.

Try out ollama run aravhawk/llama4, I got it on there @ 4-bit quant and a 4096 token context window. It's ~65GB.
Also try experimenting with the context in the MODELFILE, that 128GB can easily fit it (as long as the GPU can handle it).

<!-- gh-comment-id:2795222342 --> @aravhawk commented on GitHub (Apr 10, 2025): > > > > which for the 10M context length is going to be giant > > > > > > > > > I don't think they said anything about fitting onto a single GPU with 10M context. Very, _very_ few use cases right now are going to involve a 10M context window. > > > > > > They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine > > I'll be testing it on a 128Gb Macbook Pro with max cores across the board as soon as it lands on the model list. Try out `ollama run aravhawk/llama4`, I got it on there @ 4-bit quant and a 4096 token context window. It's ~65GB. Also try experimenting with the context in the `MODELFILE`, that 128GB can easily fit it (as long as the GPU can handle it).
Author
Owner

@FlippingBinary commented on GitHub (Apr 10, 2025):

Try out ollama run aravhawk/llama4, I got it on there @ 4-bit quant and a 4096 token context window. It's ~65GB. Also try experimenting with the context in the MODELFILE, that 128GB can easily fit it (as long as the GPU can handle it).

Looks like ingu627/llama4-scout-q4 was published 7 hours earlier with exactly the same hash.

<!-- gh-comment-id:2795453374 --> @FlippingBinary commented on GitHub (Apr 10, 2025): > Try out `ollama run aravhawk/llama4`, I got it on there @ 4-bit quant and a 4096 token context window. It's ~65GB. Also try experimenting with the context in the `MODELFILE`, that 128GB can easily fit it (as long as the GPU can handle it). Looks like [`ingu627/llama4-scout-q4`](https://ollama.com/ingu627/llama4-scout-q4) was published 7 hours earlier with exactly the same hash.
Author
Owner

@aravhawk commented on GitHub (Apr 11, 2025):

Try out ollama run aravhawk/llama4, I got it on there @ 4-bit quant and a 4096 token context window. It's ~65GB. Also try experimenting with the context in the MODELFILE, that 128GB can easily fit it (as long as the GPU can handle it).

Looks like ingu627/llama4-scout-q4 was published 7 hours earlier with exactly the same hash.

Hey, apologies for the initial duplicate. I've since reuploaded the model. This current version uses the sharded GGUFs from Unsloth, which I configured for Ollama. My goal was just to share a useful setup, not claim original creation.
Additionally, I've added Maverick (aravhawk/llama4:400b) if anyone has the VRAM for it.

<!-- gh-comment-id:2795697173 --> @aravhawk commented on GitHub (Apr 11, 2025): > > Try out `ollama run aravhawk/llama4`, I got it on there @ 4-bit quant and a 4096 token context window. It's ~65GB. Also try experimenting with the context in the `MODELFILE`, that 128GB can easily fit it (as long as the GPU can handle it). > > Looks like [`ingu627/llama4-scout-q4`](https://ollama.com/ingu627/llama4-scout-q4) was published 7 hours earlier with exactly the same hash. Hey, apologies for the initial duplicate. I've since **reuploaded** the model. This current version uses the sharded GGUFs from Unsloth, which I configured for Ollama. My goal was just to share a useful setup, not claim original creation. Additionally, I've added Maverick (`aravhawk/llama4:400b`) if anyone has the VRAM for it.
Author
Owner

@mistrjirka commented on GitHub (Apr 12, 2025):

Try out ollama run aravhawk/llama4, I got it on there @ 4-bit quant and a 4096 token context window. It's ~65GB. Also try experimenting with the context in the MODELFILE, that 128GB can easily fit it (as long as the GPU can handle it).

Looks like ingu627/llama4-scout-q4 was published 7 hours earlier with exactly the same hash.

Hey, apologies for the initial duplicate. I've since reuploaded the model. This current version uses the sharded GGUFs from Unsloth, which I configured for Ollama. My goal was just to share a useful setup, not claim original creation. Additionally, I've added Maverick (aravhawk/llama4:400b) if anyone has the VRAM for it.

I updated my ollama through the script to the latest version but it errrors out on llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'llama4'.

Is the version that can support it not released yet?

<!-- gh-comment-id:2798938631 --> @mistrjirka commented on GitHub (Apr 12, 2025): > > > Try out `ollama run aravhawk/llama4`, I got it on there @ 4-bit quant and a 4096 token context window. It's ~65GB. Also try experimenting with the context in the `MODELFILE`, that 128GB can easily fit it (as long as the GPU can handle it). > > > > > > Looks like [`ingu627/llama4-scout-q4`](https://ollama.com/ingu627/llama4-scout-q4) was published 7 hours earlier with exactly the same hash. > > Hey, apologies for the initial duplicate. I've since **reuploaded** the model. This current version uses the sharded GGUFs from Unsloth, which I configured for Ollama. My goal was just to share a useful setup, not claim original creation. Additionally, I've added Maverick (`aravhawk/llama4:400b`) if anyone has the VRAM for it. I updated my ollama through the script to the latest version but it errrors out on llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'llama4'. Is the version that can support it not released yet?
Author
Owner

@AlbertoSinigaglia commented on GitHub (Apr 12, 2025):

Try out ollama run aravhawk/llama4, I got it on there @ 4-bit quant and a 4096 token context window. It's ~65GB. Also try experimenting with the context in the MODELFILE, that 128GB can easily fit it (as long as the GPU can handle it).

Looks like ingu627/llama4-scout-q4 was published 7 hours earlier with exactly the same hash.

Hey, apologies for the initial duplicate. I've since reuploaded the model. This current version uses the sharded GGUFs from Unsloth, which I configured for Ollama. My goal was just to share a useful setup, not claim original creation. Additionally, I've added Maverick (aravhawk/llama4:400b) if anyone has the VRAM for it.

I updated my ollama through the script to the latest version but it errrors out on llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'llama4'.

Is the version that can support it not released yet?

https://ollama.com/search?o=newest i don't see any llama4 available to be fair...

<!-- gh-comment-id:2798940682 --> @AlbertoSinigaglia commented on GitHub (Apr 12, 2025): > > > > Try out `ollama run aravhawk/llama4`, I got it on there @ 4-bit quant and a 4096 token context window. It's ~65GB. Also try experimenting with the context in the `MODELFILE`, that 128GB can easily fit it (as long as the GPU can handle it). > > > > > > > > > Looks like [`ingu627/llama4-scout-q4`](https://ollama.com/ingu627/llama4-scout-q4) was published 7 hours earlier with exactly the same hash. > > > > Hey, apologies for the initial duplicate. I've since **reuploaded** the model. This current version uses the sharded GGUFs from Unsloth, which I configured for Ollama. My goal was just to share a useful setup, not claim original creation. Additionally, I've added Maverick (`aravhawk/llama4:400b`) if anyone has the VRAM for it. > > I updated my ollama through the script to the latest version but it errrors out on llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'llama4'. > > Is the version that can support it not released yet? > > https://ollama.com/search?o=newest i don't see any llama4 available to be fair...
Author
Owner

@aravhawk commented on GitHub (Apr 12, 2025):

Try out ollama run aravhawk/llama4, I got it on there @ 4-bit quant and a 4096 token context window. It's ~65GB. Also try experimenting with the context in the MODELFILE, that 128GB can easily fit it (as long as the GPU can handle it).

Looks like ingu627/llama4-scout-q4 was published 7 hours earlier with exactly the same hash.

Hey, apologies for the initial duplicate. I've since reuploaded the model. This current version uses the sharded GGUFs from Unsloth, which I configured for Ollama. My goal was just to share a useful setup, not claim original creation. Additionally, I've added Maverick (aravhawk/llama4:400b) if anyone has the VRAM for it.

I updated my ollama through the script to the latest version but it errrors out on llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'llama4'.

Is the version that can support it not released yet?

Yes, it seems so. I just tried reinstalling ollama and doing ollama run aravhawk/llama4 on a GH200 machine, and recieved the same error. Looks like Ollama might not support the new arch.

<!-- gh-comment-id:2798940820 --> @aravhawk commented on GitHub (Apr 12, 2025): > > > > Try out `ollama run aravhawk/llama4`, I got it on there @ 4-bit quant and a 4096 token context window. It's ~65GB. Also try experimenting with the context in the `MODELFILE`, that 128GB can easily fit it (as long as the GPU can handle it). > > > > > > > > > Looks like [`ingu627/llama4-scout-q4`](https://ollama.com/ingu627/llama4-scout-q4) was published 7 hours earlier with exactly the same hash. > > > > > > Hey, apologies for the initial duplicate. I've since **reuploaded** the model. This current version uses the sharded GGUFs from Unsloth, which I configured for Ollama. My goal was just to share a useful setup, not claim original creation. Additionally, I've added Maverick (`aravhawk/llama4:400b`) if anyone has the VRAM for it. > > I updated my ollama through the script to the latest version but it errrors out on llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'llama4'. > > Is the version that can support it not released yet? Yes, it seems so. I just tried reinstalling `ollama` and doing `ollama run aravhawk/llama4` on a GH200 machine, and recieved the same error. Looks like Ollama might not support the new arch.
Author
Owner

@rick-github commented on GitHub (Apr 12, 2025):

llama4 support is in progress: #10141

<!-- gh-comment-id:2798941503 --> @rick-github commented on GitHub (Apr 12, 2025): llama4 support is in progress: #10141
Author
Owner

@mistrjirka commented on GitHub (Apr 12, 2025):

s ~

Well it is pos

Try out ollama run aravhawk/llama4, I got it on there @ 4-bit quant and a 4096 token context window. It's ~65GB. Also try experimenting with the context in the MODELFILE, that 128GB can easily fit it (as long as the GPU can handle it).

Looks like ingu627/llama4-scout-q4 was published 7 hours earlier with exactly the same hash.

Hey, apologies for the initial duplicate. I've since reuploaded the model. This current version uses the sharded GGUFs from Unsloth, which I configured for Ollama. My goal was just to share a useful setup, not claim original creation. Additionally, I've added Maverick (aravhawk/llama4:400b) if anyone has the VRAM for it.

I updated my ollama through the script to the latest version but it errrors out on llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'llama4'.
Is the version that can support it not released yet?

https://ollama.com/search?o=newest i don't see any llama4 available to be fair...

Well it is I can see it in the search results. https://ollama.com/search?q=llama4

<!-- gh-comment-id:2799025555 --> @mistrjirka commented on GitHub (Apr 12, 2025): > s ~ Well it is pos > > > > > Try out `ollama run aravhawk/llama4`, I got it on there @ 4-bit quant and a 4096 token context window. It's ~65GB. Also try experimenting with the context in the `MODELFILE`, that 128GB can easily fit it (as long as the GPU can handle it). > > > > > > > > > > > > Looks like [`ingu627/llama4-scout-q4`](https://ollama.com/ingu627/llama4-scout-q4) was published 7 hours earlier with exactly the same hash. > > > > > > > > > Hey, apologies for the initial duplicate. I've since **reuploaded** the model. This current version uses the sharded GGUFs from Unsloth, which I configured for Ollama. My goal was just to share a useful setup, not claim original creation. Additionally, I've added Maverick (`aravhawk/llama4:400b`) if anyone has the VRAM for it. > > > > > > I updated my ollama through the script to the latest version but it errrors out on llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'llama4'. > > Is the version that can support it not released yet? > > https://ollama.com/search?o=newest i don't see any llama4 available to be fair... Well it is I can see it in the search results. https://ollama.com/search?q=llama4
Author
Owner

@ZV-Liu commented on GitHub (Apr 14, 2025):

https://github.com/ollama/ollama/pull/10141 How long will it take to support llama4? I have tested and recompiled ollama on this branch. It can support llama4, but I can only use backend and CPU reasoning?

<!-- gh-comment-id:2800862361 --> @ZV-Liu commented on GitHub (Apr 14, 2025): https://github.com/ollama/ollama/pull/10141 How long will it take to support llama4? I have tested and recompiled ollama on this branch. It can support llama4, but I can only use backend and CPU reasoning?
Author
Owner

@batot1 commented on GitHub (Apr 14, 2025):

Error: unable to load model:

it@ai:~$ ollama run aravhawk/llama4
pulling manifest 
pulling 7701f347644c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  65 GB                         
pulling 8ab4849b038c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  254 B                         
pulling 75357d685f23... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏   28 B                         
pulling f75ae6c54a38... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  129 B                         
pulling 4408b47d76b7... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  491 B                         
verifying sha256 digest 
writing manifest 
success 
Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-7701f347644c67a57adacf853a353133e7c58daaa0fba18ec1623d490a0447b6
it@ai:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:           188Gi       4.6Gi       1.1Gi        44Mi       184Gi       183Gi
Swap:          976Mi       4.0Mi       972Mi
it@ai:~$ ollama run aravhawk/llama4
Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-7701f347644c67a57adacf853a353133e7c58daaa0fba18ec1623d490a0447b6
it@ai:~$ ollama -v
ollama version is 0.6.5
it@ai:~$ lspci |grep -i VGA
01:00.0 VGA compatible controller: NVIDIA Corporation AD102 [GeForce RTX 4090] (rev a1)

Any idea what is it wrong?
All other model in ollama working property only this model not working.

<!-- gh-comment-id:2801067966 --> @batot1 commented on GitHub (Apr 14, 2025): Error: unable to load model: ``` it@ai:~$ ollama run aravhawk/llama4 pulling manifest pulling 7701f347644c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 65 GB pulling 8ab4849b038c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 254 B pulling 75357d685f23... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 28 B pulling f75ae6c54a38... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 129 B pulling 4408b47d76b7... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 491 B verifying sha256 digest writing manifest success Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-7701f347644c67a57adacf853a353133e7c58daaa0fba18ec1623d490a0447b6 it@ai:~$ free -h total used free shared buff/cache available Mem: 188Gi 4.6Gi 1.1Gi 44Mi 184Gi 183Gi Swap: 976Mi 4.0Mi 972Mi it@ai:~$ ollama run aravhawk/llama4 Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-7701f347644c67a57adacf853a353133e7c58daaa0fba18ec1623d490a0447b6 it@ai:~$ ollama -v ollama version is 0.6.5 it@ai:~$ lspci |grep -i VGA 01:00.0 VGA compatible controller: NVIDIA Corporation AD102 [GeForce RTX 4090] (rev a1) ``` Any idea what is it wrong? All other model in ollama working property only this model not working.
Author
Owner

@rick-github commented on GitHub (Apr 14, 2025):

Any idea what is it wrong?

https://github.com/ollama/ollama/issues/10143#issuecomment-2798941503

<!-- gh-comment-id:2801121466 --> @rick-github commented on GitHub (Apr 14, 2025): > Any idea what is it wrong? https://github.com/ollama/ollama/issues/10143#issuecomment-2798941503
Author
Owner

@aravhawk commented on GitHub (Apr 16, 2025):

Error: unable to load model:

it@ai:~$ ollama run aravhawk/llama4
pulling manifest 
pulling 7701f347644c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  65 GB                         
pulling 8ab4849b038c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  254 B                         
pulling 75357d685f23... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏   28 B                         
pulling f75ae6c54a38... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  129 B                         
pulling 4408b47d76b7... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  491 B                         
verifying sha256 digest 
writing manifest 
success 
Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-7701f347644c67a57adacf853a353133e7c58daaa0fba18ec1623d490a0447b6
it@ai:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:           188Gi       4.6Gi       1.1Gi        44Mi       184Gi       183Gi
Swap:          976Mi       4.0Mi       972Mi
it@ai:~$ ollama run aravhawk/llama4
Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-7701f347644c67a57adacf853a353133e7c58daaa0fba18ec1623d490a0447b6
it@ai:~$ ollama -v
ollama version is 0.6.5
it@ai:~$ lspci |grep -i VGA
01:00.0 VGA compatible controller: NVIDIA Corporation AD102 [GeForce RTX 4090] (rev a1)

Any idea what is it wrong? All other model in ollama working property only this model not working.

Architectural issues, unfortunately 😔

<!-- gh-comment-id:2810418445 --> @aravhawk commented on GitHub (Apr 16, 2025): > Error: unable to load model: > > ``` > it@ai:~$ ollama run aravhawk/llama4 > pulling manifest > pulling 7701f347644c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 65 GB > pulling 8ab4849b038c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 254 B > pulling 75357d685f23... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 28 B > pulling f75ae6c54a38... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 129 B > pulling 4408b47d76b7... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 491 B > verifying sha256 digest > writing manifest > success > Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-7701f347644c67a57adacf853a353133e7c58daaa0fba18ec1623d490a0447b6 > it@ai:~$ free -h > total used free shared buff/cache available > Mem: 188Gi 4.6Gi 1.1Gi 44Mi 184Gi 183Gi > Swap: 976Mi 4.0Mi 972Mi > it@ai:~$ ollama run aravhawk/llama4 > Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-7701f347644c67a57adacf853a353133e7c58daaa0fba18ec1623d490a0447b6 > it@ai:~$ ollama -v > ollama version is 0.6.5 > it@ai:~$ lspci |grep -i VGA > 01:00.0 VGA compatible controller: NVIDIA Corporation AD102 [GeForce RTX 4090] (rev a1) > ``` > > Any idea what is it wrong? All other model in ollama working property only this model not working. Architectural issues, unfortunately 😔
Author
Owner

@aravhawk commented on GitHub (Apr 16, 2025):

#10141 How long will it take to support llama4? I have tested and recompiled ollama on this branch. It can support llama4, but I can only use backend and CPU reasoning?

I think you can recompile llama.cpp with CUDA support, but don't quote me on it.

<!-- gh-comment-id:2810420642 --> @aravhawk commented on GitHub (Apr 16, 2025): > [#10141](https://github.com/ollama/ollama/pull/10141) How long will it take to support llama4? I have tested and recompiled ollama on this branch. It can support llama4, but I can only use backend and CPU reasoning? I _think_ you can recompile llama.cpp with CUDA support, but don't quote me on it.
Author
Owner

@lee-b commented on GitHub (Apr 21, 2025):

Llama 4 (even Scout) is a great model. Very fast, and much more useful answers than most of the previous models I've tried. I'm running it on llama.cpp at the moment though, which lacks vision support. It would be great if ollama implements this with vision.

<!-- gh-comment-id:2818005367 --> @lee-b commented on GitHub (Apr 21, 2025): Llama 4 (even Scout) is a great model. Very fast, and much more useful answers than most of the previous models I've tried. I'm running it on llama.cpp at the moment though, which lacks vision support. It would be great if ollama implements this with vision.
Author
Owner

@ips972 commented on GitHub (Apr 24, 2025):

hi, any update on when ollama will support llama4 ? f16 or q1-8 etc... with all functions chat, vision and the huge max token size.. ??

<!-- gh-comment-id:2827179000 --> @ips972 commented on GitHub (Apr 24, 2025): hi, any update on when ollama will support llama4 ? f16 or q1-8 etc... with all functions chat, vision and the huge max token size.. ??
Author
Owner

@Notbici commented on GitHub (Apr 25, 2025):

Any workarounds for getting Llama 4 working on Ollama?

<!-- gh-comment-id:2829913267 --> @Notbici commented on GitHub (Apr 25, 2025): Any workarounds for getting Llama 4 working on Ollama?
Author
Owner

@mistrjirka commented on GitHub (Apr 25, 2025):

Any workarounds for getting Llama 4 working on Ollama?

you can compile the branch llama4 yourself. It is currently open pull request. Currently it seems to be in phase of code review by other contributors.

<!-- gh-comment-id:2829920274 --> @mistrjirka commented on GitHub (Apr 25, 2025): > Any workarounds for getting Llama 4 working on Ollama? you can compile the branch llama4 yourself. It is currently open pull request. Currently it seems to be in phase of code review by other contributors.
Author
Owner

@rsmirnov90 commented on GitHub (Apr 26, 2025):

Any workarounds for getting Llama 4 working on Ollama?

you can compile the branch llama4 yourself. It is currently open pull request. Currently it seems to be in phase of code review by other contributors.

I think it just disappeared... Or at least I don't see it in the branch list anymore (and I know it was there because I was checking it almost daily up until now).

<!-- gh-comment-id:2832579681 --> @rsmirnov90 commented on GitHub (Apr 26, 2025): > > Any workarounds for getting Llama 4 working on Ollama? > > you can compile the branch llama4 yourself. It is currently open pull request. Currently it seems to be in phase of code review by other contributors. I think it just disappeared... Or at least I don't see it in the branch list anymore (and I know it was there because I was checking it almost daily up until now).
Author
Owner

@igorschlum commented on GitHub (Apr 26, 2025):

@rsmirnov90 there is a new version of Ollama that support Llama4, it still can evolve, but it's there and you can try it.
https://github.com/ollama/ollama/releases/tag/v0.6.7-rc0

<!-- gh-comment-id:2832605937 --> @igorschlum commented on GitHub (Apr 26, 2025): @rsmirnov90 there is a new version of Ollama that support Llama4, it still can evolve, but it's there and you can try it. https://github.com/ollama/ollama/releases/tag/v0.6.7-rc0
Author
Owner

@ips972 commented on GitHub (Apr 27, 2025):

tried the new ollama with llama4 , works fine. but still lacks the perfotmance of vllm. hope that some day ollama gets to that performance level. its a much easier platform to manage then any other. espacially in multi user sessions.

<!-- gh-comment-id:2833436560 --> @ips972 commented on GitHub (Apr 27, 2025): tried the new ollama with llama4 , works fine. but still lacks the perfotmance of vllm. hope that some day ollama gets to that performance level. its a much easier platform to manage then any other. espacially in multi user sessions.
Author
Owner

@thorewi commented on GitHub (Apr 29, 2025):

Hello, is this implementation really multimodal with image processing (or maybe I'm using wrong model)? Because I'm getting negative answer (see attached picture)... I'm using ollama 0.6.7-rc0 and tried this models: https://ollama.com/ingu627/llama4-scout-q4 and https://ollama.com/aravhawk/llama4. Thank you for your help.

Image

<!-- gh-comment-id:2837869316 --> @thorewi commented on GitHub (Apr 29, 2025): Hello, is this implementation really multimodal with image processing (or maybe I'm using wrong model)? Because I'm getting negative answer (see attached picture)... I'm using ollama 0.6.7-rc0 and tried this models: https://ollama.com/ingu627/llama4-scout-q4 and https://ollama.com/aravhawk/llama4. Thank you for your help. ![Image](https://github.com/user-attachments/assets/5484076a-ae02-441f-9ab5-524e65ba4bbd)
Author
Owner

@igorschlum commented on GitHub (Apr 29, 2025):

@thorewi I think that Llama4 can process an image as it can describe an image as llama3.3 is able to, but llama4 cannot process an image making modifications to the image.

<!-- gh-comment-id:2837906131 --> @igorschlum commented on GitHub (Apr 29, 2025): @thorewi I think that Llama4 can process an image as it can describe an image as llama3.3 is able to, but llama4 cannot process an image making modifications to the image.
Author
Owner

@thorewi commented on GitHub (Apr 29, 2025):

@igorschlum Yes, that's exactly what I need, but I always get something like this: [attached picture] — basically no response. So the question is whether it’s working for anyone or not...

Image
<!-- gh-comment-id:2838169932 --> @thorewi commented on GitHub (Apr 29, 2025): @igorschlum Yes, that's exactly what I need, but I always get something like this: [attached picture] — basically no response. So the question is whether it’s working for anyone or not... <img width="1162" alt="Image" src="https://github.com/user-attachments/assets/175679a4-0165-4fee-9826-ca77787063fe" />
Author
Owner

@rick-github commented on GitHub (Apr 29, 2025):

aravhawk/llama4 doesn't support images:

$ curl -s localhost:11434/api/show -d '{"model":"aravhawk/llama4"}' | jq .capabilities
[
  "completion"
]

The lack of response may be due to something else. ollama server logs may aid in debugging.

<!-- gh-comment-id:2838558790 --> @rick-github commented on GitHub (Apr 29, 2025): aravhawk/llama4 doesn't support images: ```console $ curl -s localhost:11434/api/show -d '{"model":"aravhawk/llama4"}' | jq .capabilities [ "completion" ] ``` The lack of response may be due to something else. ollama [server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in debugging.
Author
Owner

@rick-github commented on GitHub (May 1, 2025):

https://ollama.com/library/llama4

<!-- gh-comment-id:2845714207 --> @rick-github commented on GitHub (May 1, 2025): https://ollama.com/library/llama4
Author
Owner

@aravhawk commented on GitHub (May 1, 2025):

*The models I've uploaded are not multimodal. I've updated the description to reflect that (I originally copied it directly from Meta)

<!-- gh-comment-id:2845739744 --> @aravhawk commented on GitHub (May 1, 2025): *The models I've uploaded are not multimodal. I've updated the description to reflect that (I originally copied it directly from Meta)
Author
Owner

@thepwagner commented on GitHub (May 3, 2025):

Is there a bug in the template currently?

On v0.6.7, using llama4:17b-scout-16e-instruct-q4_K_M b62dea0de67c.
When calling with tools, I'm getting:

time=2025-05-03T08:07:05.993-04:00 level=ERROR source=routes.go:1520 msg="chat prompt error" error="template: :6:10: executing \"\" at <.Tools>: can't evaluate field Tools in ty
pe api.Tools"

Using the with .Tools on L3 makes me think the range on L6 should just be over . - but I can't find the source anywhere to submit a PR:

{{- if .System }}<|header_start|>system<|header_end|>

{{- with .Tools }}Environment: ipython
You have access to the following functions. To call a function, please respond with JSON for a function call. Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.

{{- range .Tools }}{{ . }}{{ "\n\n" }}
<!-- gh-comment-id:2848608417 --> @thepwagner commented on GitHub (May 3, 2025): Is there a bug in the template currently? On `v0.6.7`, using `llama4:17b-scout-16e-instruct-q4_K_M b62dea0de67c`. When calling with tools, I'm getting: ``` time=2025-05-03T08:07:05.993-04:00 level=ERROR source=routes.go:1520 msg="chat prompt error" error="template: :6:10: executing \"\" at <.Tools>: can't evaluate field Tools in ty pe api.Tools" ``` Using the `with .Tools` on L3 makes me think the range on L6 should just be over `.` - but I can't find the source anywhere to submit a PR: ``` {{- if .System }}<|header_start|>system<|header_end|> {{- with .Tools }}Environment: ipython You have access to the following functions. To call a function, please respond with JSON for a function call. Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables. {{- range .Tools }}{{ . }}{{ "\n\n" }} ```
Author
Owner

@olumolu commented on GitHub (May 4, 2025):

Close this as support already merged.

<!-- gh-comment-id:2848995472 --> @olumolu commented on GitHub (May 4, 2025): Close this as support already merged.
Author
Owner

@addypy commented on GitHub (May 5, 2025):

I'm getting the same issue @thepwagner mentioned. Tested with ollama (docker) with both v0.6.8 and v0.6.7.

ModelHTTPError: status_code: 500, model_name: llama4, body: {'message': 'template: :6:10: executing "" at <.Tools>: can't evaluate field Tools in type api.Tools', 'type': 'api_error', 'param': None, 'code': None}

Any updates on this?

<!-- gh-comment-id:2850188150 --> @addypy commented on GitHub (May 5, 2025): I'm getting the same issue @thepwagner mentioned. Tested with ollama (docker) with both v0.6.8 and v0.6.7. ModelHTTPError: status_code: 500, model_name: llama4, body: {'message': 'template: :6:10: executing "" at <.Tools>: can\'t evaluate field Tools in type api.Tools', 'type': 'api_error', 'param': None, 'code': None} Any updates on this?
Author
Owner

@sherlock666 commented on GitHub (May 5, 2025):

Same issue as @thepwagner and @addypy
I would like to work with the tools function

for same code the llama3.2 can use the tools function correctly
while the llama4:scout will return with the same error
.... template: :6:10: executing "" at <.Tools>: can't evaluate field Tools in type api.Tools" ........

<!-- gh-comment-id:2850739122 --> @sherlock666 commented on GitHub (May 5, 2025): Same issue as @thepwagner and @addypy I would like to work with the tools function for same code the llama3.2 can use the tools function correctly while the llama4:scout will return with the same error .... template: :6:10: executing \"\" at <.Tools>: can't evaluate field Tools in type api.Tools" ........
Author
Owner

@mxyng commented on GitHub (May 5, 2025):

tools is fixed for llama 4. Please repull the model

<!-- gh-comment-id:2852629151 --> @mxyng commented on GitHub (May 5, 2025): tools is fixed for llama 4. Please repull the model
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#32416