[GH-ISSUE #10143] Llama 4 support #32416

New Issue

GiteaMirror · 2026-04-22T13:38:47-05:00

GiteaMirror commented

2026-04-22 13:38:47 -05:00

Originally created by @UmutAlihan on GitHub (Apr 5, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10143

I know that it has only been a couple of hours since Llama 4 model family has been released. However I believe it is good practive to ping the repo about when its support on Ollama will be available 😄

Looking very forward to inference with this new very long context multimodal mixture of experts model family on Ollama

official release: https://ai.meta.com/blog/llama-4-multimodal-intelligence/

cheers

Originally created by @UmutAlihan on GitHub (Apr 5, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10143 I know that it has only been a couple of hours since Llama 4 model family has been released. However I believe it is good practive to ping the repo about when its support on Ollama will be available 😄 Looking very forward to inference with this new very long context multimodal mixture of experts model family on Ollama official release: https://ai.meta.com/blog/llama-4-multimodal-intelligence/ cheers

GiteaMirror added the model label 2026-04-22 13:38:47 -05:00

GiteaMirror closed this issue

2026-04-22 13:38:48 -05:00

GiteaMirror commented

2026-04-22 13:38:50 -05:00

@AlbertoSinigaglia commented on GitHub (Apr 5, 2025):

The former (Llama 4 Scout) fits on a single H100 GPU (with Int4 quantization)

I get that models are always more exploiting the mixture of experts, but:

109B parameters, 4bit quantization = 55 Gigabytes

This ignoring the VRAM used for the KVCaching, which for the 10M context length is going to be giant...

@AlbertoSinigaglia commented on GitHub (Apr 5, 2025): > The former (Llama 4 Scout) fits on a single H100 GPU (with Int4 quantization) I get that models are always more exploiting the mixture of experts, but: ``` 109B parameters, 4bit quantization = 55 Gigabytes ``` This ignoring the VRAM used for the KVCaching, which for the 10M context length is going to be giant...

GiteaMirror commented

2026-04-22 13:38:52 -05:00

@coder543 commented on GitHub (Apr 5, 2025):

which for the 10M context length is going to be giant

I don't think they said anything about fitting onto a single GPU with 10M context. Very, very few use cases right now are going to involve a 10M context window.

@coder543 commented on GitHub (Apr 5, 2025): > which for the 10M context length is going to be giant I don't think they said anything about fitting onto a single GPU with 10M context. Very, _very_ few use cases right now are going to involve a 10M context window.

GiteaMirror commented

2026-04-22 13:38:54 -05:00

@sasank-desaraju commented on GitHub (Apr 5, 2025):

Work ongoing at #10141

@sasank-desaraju commented on GitHub (Apr 5, 2025): Work ongoing at #10141

GiteaMirror commented

2026-04-22 13:38:55 -05:00

@blinkysc commented on GitHub (Apr 6, 2025):

which for the 10M context length is going to be giant

I don't think they said anything about fitting onto a single GPU with 10M context. Very, very few use cases right now are going to involve a 10M context window.

They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine

@blinkysc commented on GitHub (Apr 6, 2025): > > which for the 10M context length is going to be giant > > I don't think they said anything about fitting onto a single GPU with 10M context. Very, _very_ few use cases right now are going to involve a 10M context window. They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine

GiteaMirror commented

2026-04-22 13:38:55 -05:00

@JeffTax commented on GitHub (Apr 6, 2025):

Looking forward to this 😄

@JeffTax commented on GitHub (Apr 6, 2025): Looking forward to this 😄

GiteaMirror commented

2026-04-22 13:38:56 -05:00

@sanjibnarzary commented on GitHub (Apr 6, 2025):

I need it to fit in single V100 16GB GPU

@sanjibnarzary commented on GitHub (Apr 6, 2025): I need it to fit in single V100 16GB GPU

GiteaMirror commented

2026-04-22 13:38:56 -05:00

@lxyeternal commented on GitHub (Apr 6, 2025):

I need the Llama4.

@lxyeternal commented on GitHub (Apr 6, 2025): I need the Llama4.

GiteaMirror commented

2026-04-22 13:38:57 -05:00

@marcussacana commented on GitHub (Apr 6, 2025):

Is there any hope to this model be pruned?

@marcussacana commented on GitHub (Apr 6, 2025): Is there any hope to this model be pruned?

GiteaMirror commented

2026-04-22 13:38:58 -05:00

@AlbertoSinigaglia commented on GitHub (Apr 6, 2025):

which for the 10M context length is going to be giant

I don't think they said anything about fitting onto a single GPU with 10M context. Very, very few use cases right now are going to involve a 10M context window.

They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine

Maybe memory wise, but not sure about inference speed. Also, a 1M context length usually in my experience requires 200Gb of memory for kv cache... so...

@AlbertoSinigaglia commented on GitHub (Apr 6, 2025): > > > which for the 10M context length is going to be giant > > > > I don't think they said anything about fitting onto a single GPU with 10M context. Very, _very_ few use cases right now are going to involve a 10M context window. > > They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine Maybe memory wise, but not sure about inference speed. Also, a 1M context length usually in my experience requires 200Gb of memory for kv cache... so...

GiteaMirror commented

2026-04-22 13:38:58 -05:00

@puzanov commented on GitHub (Apr 6, 2025):

How hard the Llama4-sout model will be quantified?

@puzanov commented on GitHub (Apr 6, 2025): How hard the Llama4-sout model will be quantified?

GiteaMirror commented

2026-04-22 13:38:59 -05:00

@jimccadm commented on GitHub (Apr 6, 2025):

which for the 10M context length is going to be giant

I don't think they said anything about fitting onto a single GPU with 10M context. Very, very few use cases right now are going to involve a 10M context window.

They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine

I'll be testing it on a 128Gb Macbook Pro with max cores across the board as soon as it lands on the model list.

@jimccadm commented on GitHub (Apr 6, 2025): > > > which for the 10M context length is going to be giant > > > > > > I don't think they said anything about fitting onto a single GPU with 10M context. Very, _very_ few use cases right now are going to involve a 10M context window. > > They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine I'll be testing it on a 128Gb Macbook Pro with max cores across the board as soon as it lands on the model list.

GiteaMirror commented

2026-04-22 13:38:59 -05:00

@jano403 commented on GitHub (Apr 6, 2025):

@puzanov @coder543 @sanjibnarzary @marcussacana @UmutAlihan @AlbertoSinigaglia @lxyeternal @sasank-desaraju @blinkysc @jimccadm @JeffTax @dhiltgen @rick-github @bmizerany @drnic @anaisbetts @sqs @lstep @herval @mattt @slouffka @danielpunkass @andygill @vincentkoc @yuiseki @neomantra @gbaptista @enricoros @9876691 @prusnak Whoever's working on this. Please add tools support, please. Thank you very much 🙂

saAAAAAAAAAAAAAAAAAaaar DO NOT REDEEM

@jano403 commented on GitHub (Apr 6, 2025): > [@puzanov](https://github.com/puzanov) [@coder543](https://github.com/coder543) [@sanjibnarzary](https://github.com/sanjibnarzary) [@marcussacana](https://github.com/marcussacana) [@UmutAlihan](https://github.com/UmutAlihan) [@AlbertoSinigaglia](https://github.com/AlbertoSinigaglia) [@lxyeternal](https://github.com/lxyeternal) [@sasank-desaraju](https://github.com/sasank-desaraju) [@blinkysc](https://github.com/blinkysc) [@jimccadm](https://github.com/jimccadm) [@JeffTax](https://github.com/JeffTax) [@dhiltgen](https://github.com/dhiltgen) [@rick-github](https://github.com/rick-github) [@bmizerany](https://github.com/bmizerany) [@drnic](https://github.com/drnic) [@anaisbetts](https://github.com/anaisbetts) [@sqs](https://github.com/sqs) [@lstep](https://github.com/lstep) [@herval](https://github.com/herval) [@mattt](https://github.com/mattt) [@slouffka](https://github.com/slouffka) [@danielpunkass](https://github.com/danielpunkass) [@andygill](https://github.com/andygill) [@vincentkoc](https://github.com/vincentkoc) [@yuiseki](https://github.com/yuiseki) [@neomantra](https://github.com/neomantra) [@gbaptista](https://github.com/gbaptista) [@enricoros](https://github.com/enricoros) [@9876691](https://github.com/9876691) [@prusnak](https://github.com/prusnak) Whoever's working on this. Please add tools support, please. Thank you very much 🙂 saAAAAAAAAAAAAAAAAAaaar DO NOT REDEEM

GiteaMirror commented

2026-04-22 13:39:00 -05:00

@Jabher commented on GitHub (Apr 6, 2025):

which for the 10M context length is going to be giant

I don't think they said anything about fitting onto a single GPU with 10M context. Very, very few use cases right now are going to involve a 10M context window.

They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine

128gb mbp owner here, can't wait to try

@Jabher commented on GitHub (Apr 6, 2025): > > > which for the 10M context length is going to be giant > > > > > > I don't think they said anything about fitting onto a single GPU with 10M context. Very, _very_ few use cases right now are going to involve a 10M context window. > > They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine 128gb mbp owner here, can't wait to try

GiteaMirror commented

2026-04-22 13:39:01 -05:00

@pavankay commented on GitHub (Apr 6, 2025):

Has Llama 4 been released yet?

@pavankay commented on GitHub (Apr 6, 2025): Has Llama 4 been released yet?

GiteaMirror commented

2026-04-22 13:39:02 -05:00

@pavankay commented on GitHub (Apr 6, 2025):

On Ollama

@pavankay commented on GitHub (Apr 6, 2025): On Ollama

GiteaMirror commented

2026-04-22 13:39:03 -05:00

@jpapenfuss commented on GitHub (Apr 7, 2025):

which for the 10M context length is going to be giant

I don't think they said anything about fitting onto a single GPU with 10M context. Very, very few use cases right now are going to involve a 10M context window.

They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine

I'll be testing it on a 128Gb Macbook Pro with max cores across the board as soon as it lands on the model list.

It's not going to fit in 128 gigabits, no matter how hard it's quantified.

@jpapenfuss commented on GitHub (Apr 7, 2025): > > > > which for the 10M context length is going to be giant > > > > > > > > > I don't think they said anything about fitting onto a single GPU with 10M context. Very, _very_ few use cases right now are going to involve a 10M context window. > > > > > > They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine > > I'll be testing it on a 128Gb Macbook Pro with max cores across the board as soon as it lands on the model list. It's not going to fit in 128 gigabits, no matter how hard it's quantified.

GiteaMirror commented

2026-04-22 13:39:04 -05:00

@jimccadm commented on GitHub (Apr 7, 2025):

Agreed. I had a couple of attempts and relaxed rules in LM Studio, no dice, doesn't fit.

@jimccadm commented on GitHub (Apr 7, 2025): Agreed. I had a couple of attempts and relaxed rules in LM Studio, no dice, doesn't fit.

GiteaMirror commented

2026-04-22 13:39:04 -05:00

@oreaba commented on GitHub (Apr 7, 2025):

looking forward to!

@oreaba commented on GitHub (Apr 7, 2025): looking forward to!

GiteaMirror commented

2026-04-22 13:39:06 -05:00

@ghmer commented on GitHub (Apr 7, 2025):

It should be noted that the license does not permit usage of llama4 by Europeans. When offering those models, don’t forget to add a big warning message 😣

@ghmer commented on GitHub (Apr 7, 2025): It should be noted that the license does not permit usage of llama4 by Europeans. When offering those models, don’t forget to add a big warning message 😣

GiteaMirror commented

2026-04-22 13:39:06 -05:00

@croqaz commented on GitHub (Apr 7, 2025):

Has Llama 4 been released yet?
On Ollama

Why are you asking? Did you pay the devs to release it in a few hours, over the weekend?

@croqaz commented on GitHub (Apr 7, 2025): > Has Llama 4 been released yet? > On Ollama Why are you asking? Did you pay the devs to release it in a few hours, over the weekend?

GiteaMirror commented

2026-04-22 13:39:08 -05:00

@PawelSzpyt commented on GitHub (Apr 7, 2025):

It should be noted that the license does not permit usage of llama4 by Europeans. When offering those models, don’t forget to add a big warning message 😣

From Meta's use-policy:
"This restriction does not apply to end users of a product or service that incorporates any such multimodal models."

Perhaps you are end user of a product (like a free product called Ollama) that incorporates Llama model, and in this case you can use it. Not a legal advice though.

@PawelSzpyt commented on GitHub (Apr 7, 2025): > It should be noted that the license does not permit usage of llama4 by Europeans. When offering those models, don’t forget to add a big warning message 😣 From Meta's use-policy: "This restriction does not apply to end users of a product or service that incorporates any such multimodal models." Perhaps you are end user of a product (like a free product called Ollama) that incorporates Llama model, and in this case you can use it. Not a legal advice though.

GiteaMirror commented

2026-04-22 13:39:09 -05:00

@Kwisss commented on GitHub (Apr 8, 2025):

which for the 10M context length is going to be giant

I don't think they said anything about fitting onto a single GPU with 10M context. Very, very few use cases right now are going to involve a 10M context window.

They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine

I'll be testing it on a 128Gb Macbook Pro with max cores across the board as soon as it lands on the model list.

It's not going to fit in 128 gigabits, no matter how hard it's quantified.

Is it to late to take that bet?

@Kwisss commented on GitHub (Apr 8, 2025): > > > > > which for the 10M context length is going to be giant > > > > > > > > > > > > I don't think they said anything about fitting onto a single GPU with 10M context. Very, _very_ few use cases right now are going to involve a 10M context window. > > > > > > > > > They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine > > > > > > I'll be testing it on a 128Gb Macbook Pro with max cores across the board as soon as it lands on the model list. > > It's not going to fit in 128 gigabits, no matter how hard it's quantified. Is it to late to take that bet?

GiteaMirror commented

2026-04-22 13:39:10 -05:00

@colout commented on GitHub (Apr 8, 2025):

which for the 10M context length is going to be giant

I don't think they said anything about fitting onto a single GPU with 10M context. Very, very few use cases right now are going to involve a 10M context window.

They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine

I'll be testing it on a 128Gb Macbook Pro with max cores across the board as soon as it lands on the model list.

It's not going to fit in 128 gigabits, no matter how hard it's quantified.

Is it to late to take that bet?

Quantifying the model won't make it fit in 128 Gigabits.

However, you can quantize the model to make it fit in 128 Gigabytes of memory.

In all seriousness, I have a 8845hs mini pc with 96GB RAM (dual channel 5600mhz) that runs qwen2.5:14b-instruct-q4_K_M model at a reasonable enough speed for CPU-only inference (about 5-7tk/s <8k context).

I'd be happy to test once this comes out (in the meantime, I'd love to see a non-bnb q4_K_M in general that I can try. Even if it's just with python transformers library to get a baseline while I wait for ollama)

Edit: In case anyone's interested, I got unsloth's q4_K_M running through the bleeding edge llama.cpp. around 2.3 tk/s with an empty context window.

@colout commented on GitHub (Apr 8, 2025): > > > > > > which for the 10M context length is going to be giant > > > > > > > > > > > > > > > I don't think they said anything about fitting onto a single GPU with 10M context. Very, _very_ few use cases right now are going to involve a 10M context window. > > > > > > > > > > > > They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine > > > > > > > > > I'll be testing it on a 128Gb Macbook Pro with max cores across the board as soon as it lands on the model list. > > > > > > It's not going to fit in 128 gigabits, no matter how hard it's quantified. > > Is it to late to take that bet? *Quantifying* the model won't make it fit in 128 Giga*bit*s. However, you can *quantize* the model to make it fit in 128 Giga*bytes* of memory. In all seriousness, I have a 8845hs mini pc with 96GB RAM (dual channel 5600mhz) that runs `qwen2.5:14b-instruct-q4_K_M` model at a reasonable enough speed for CPU-only inference (about 5-7tk/s <8k context). I'd be happy to test once this comes out (in the meantime, I'd love to see a non-bnb `q4_K_M` in general that I can try. Even if it's just with python transformers library to get a baseline while I wait for ollama) Edit: In case anyone's interested, I got unsloth's `q4_K_M` running through the bleeding edge llama.cpp. around 2.3 tk/s with an empty context window.

GiteaMirror commented

2026-04-22 13:39:11 -05:00

@igorschlum commented on GitHub (Apr 9, 2025):

I have a 192GB max studio ready to test llama4 with Ollama and share results.

@igorschlum commented on GitHub (Apr 9, 2025): I have a 192GB max studio ready to test llama4 with Ollama and share results.

GiteaMirror commented

2026-04-22 13:39:12 -05:00

@Luap2003 commented on GitHub (Apr 9, 2025):

I have a server with two H100 GPUs, and I'm really interested in testing it, especially since the blog post mentioned it should fit on just one.

@Luap2003 commented on GitHub (Apr 9, 2025): I have a server with two H100 GPUs, and I'm really interested in testing it, especially since the blog post mentioned it should fit on just one.

GiteaMirror commented

2026-04-22 13:39:12 -05:00

@gileneusz commented on GitHub (Apr 9, 2025):

I have a rack with 64 B200s and can't wait to test it soon!

@gileneusz commented on GitHub (Apr 9, 2025): I have a rack with 64 B200s and can't wait to test it soon!

GiteaMirror commented

2026-04-22 13:39:13 -05:00

@pakoito commented on GitHub (Apr 9, 2025):

I have a White Citroën 2CV and it contributes as much to this conversation as your posts.

@pakoito commented on GitHub (Apr 9, 2025): I have a White Citroën 2CV and it contributes as much to this conversation as your posts.

GiteaMirror commented

2026-04-22 13:39:14 -05:00

@igorschlum commented on GitHub (Apr 10, 2025):

I had a blue one and an orange buggy. I regret them both.

@igorschlum commented on GitHub (Apr 10, 2025): I had a blue one and an orange buggy. I regret them both.

GiteaMirror commented

2026-04-22 13:39:15 -05:00

@dineshkumartp7 commented on GitHub (Apr 10, 2025):

which for the 10M context length is going to be giant

I don't think they said anything about fitting onto a single GPU with 10M context. Very, very few use cases right now are going to involve a 10M context window.

They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine

128gb mbp owner here, can't wait to try

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1391600 G /usr/libexec/Xorg 108MiB |
| 0 N/A N/A 1391624 G /usr/bin/gnome-shell 17MiB |
| 0 N/A N/A 4043059 C+G missioncenter 30MiB |
+-----------------------------------------------------------------------------------------+

Cant wait to try :)

@dineshkumartp7 commented on GitHub (Apr 10, 2025): > > > > which for the 10M context length is going to be giant > > > > > > > > > I don't think they said anything about fitting onto a single GPU with 10M context. Very, _very_ few use cases right now are going to involve a 10M context window. > > > > > > They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine > > 128gb mbp owner here, can't wait to try +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 560.35.05 Driver Version: 560.35.05 CUDA Version: 12.6 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA A100 80GB PCIe Off | 00000000:38:00.0 Off | 0 | | N/A 36C P0 48W / 300W | 188MiB / 81920MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA A100 80GB PCIe Off | 00000000:A8:00.0 Off | 0 | | N/A 31C P0 44W / 300W | 4MiB / 81920MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 2 NVIDIA A100 80GB PCIe Off | 00000000:B8:00.0 Off | 0 | | N/A 37C P0 43W / 300W | 4MiB / 81920MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 1391600 G /usr/libexec/Xorg 108MiB | | 0 N/A N/A 1391624 G /usr/bin/gnome-shell 17MiB | | 0 N/A N/A 4043059 C+G missioncenter 30MiB | +-----------------------------------------------------------------------------------------+ Cant wait to try :)

GiteaMirror commented

2026-04-22 13:39:16 -05:00

@AlbertoSinigaglia commented on GitHub (Apr 10, 2025):

This is getting out of hand

@AlbertoSinigaglia commented on GitHub (Apr 10, 2025): This is getting out of hand

GiteaMirror commented

2026-04-22 13:39:17 -05:00

@aravhawk commented on GitHub (Apr 10, 2025):

which for the 10M context length is going to be giant

I don't think they said anything about fitting onto a single GPU with 10M context. Very, very few use cases right now are going to involve a 10M context window.

They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine

I'll be testing it on a 128Gb Macbook Pro with max cores across the board as soon as it lands on the model list.

Try out ollama run aravhawk/llama4, I got it on there @ 4-bit quant and a 4096 token context window. It's ~65GB.
Also try experimenting with the context in the MODELFILE, that 128GB can easily fit it (as long as the GPU can handle it).

@aravhawk commented on GitHub (Apr 10, 2025): > > > > which for the 10M context length is going to be giant > > > > > > > > > I don't think they said anything about fitting onto a single GPU with 10M context. Very, _very_ few use cases right now are going to involve a 10M context window. > > > > > > They say it fits on H100 so < 80g. The AMD 395 with 128gb and Mac's with 128gb probably gonna be fine > > I'll be testing it on a 128Gb Macbook Pro with max cores across the board as soon as it lands on the model list. Try out `ollama run aravhawk/llama4`, I got it on there @ 4-bit quant and a 4096 token context window. It's ~65GB. Also try experimenting with the context in the `MODELFILE`, that 128GB can easily fit it (as long as the GPU can handle it).

GiteaMirror commented

2026-04-22 13:39:18 -05:00

@FlippingBinary commented on GitHub (Apr 10, 2025):

Try out ollama run aravhawk/llama4, I got it on there @ 4-bit quant and a 4096 token context window. It's ~65GB. Also try experimenting with the context in the MODELFILE, that 128GB can easily fit it (as long as the GPU can handle it).

Looks like ingu627/llama4-scout-q4 was published 7 hours earlier with exactly the same hash.

@FlippingBinary commented on GitHub (Apr 10, 2025): > Try out `ollama run aravhawk/llama4`, I got it on there @ 4-bit quant and a 4096 token context window. It's ~65GB. Also try experimenting with the context in the `MODELFILE`, that 128GB can easily fit it (as long as the GPU can handle it). Looks like [`ingu627/llama4-scout-q4`](https://ollama.com/ingu627/llama4-scout-q4) was published 7 hours earlier with exactly the same hash.

GiteaMirror commented

2026-04-22 13:39:19 -05:00

@aravhawk commented on GitHub (Apr 11, 2025):

Try out ollama run aravhawk/llama4, I got it on there @ 4-bit quant and a 4096 token context window. It's ~65GB. Also try experimenting with the context in the MODELFILE, that 128GB can easily fit it (as long as the GPU can handle it).

Looks like ingu627/llama4-scout-q4 was published 7 hours earlier with exactly the same hash.

Hey, apologies for the initial duplicate. I've since reuploaded the model. This current version uses the sharded GGUFs from Unsloth, which I configured for Ollama. My goal was just to share a useful setup, not claim original creation.
Additionally, I've added Maverick (aravhawk/llama4:400b) if anyone has the VRAM for it.

@aravhawk commented on GitHub (Apr 11, 2025): > > Try out `ollama run aravhawk/llama4`, I got it on there @ 4-bit quant and a 4096 token context window. It's ~65GB. Also try experimenting with the context in the `MODELFILE`, that 128GB can easily fit it (as long as the GPU can handle it). > > Looks like [`ingu627/llama4-scout-q4`](https://ollama.com/ingu627/llama4-scout-q4) was published 7 hours earlier with exactly the same hash. Hey, apologies for the initial duplicate. I've since **reuploaded** the model. This current version uses the sharded GGUFs from Unsloth, which I configured for Ollama. My goal was just to share a useful setup, not claim original creation. Additionally, I've added Maverick (`aravhawk/llama4:400b`) if anyone has the VRAM for it.

GiteaMirror commented

2026-04-22 13:39:20 -05:00

@mistrjirka commented on GitHub (Apr 12, 2025):

Try out ollama run aravhawk/llama4, I got it on there @ 4-bit quant and a 4096 token context window. It's ~65GB. Also try experimenting with the context in the MODELFILE, that 128GB can easily fit it (as long as the GPU can handle it).

Looks like ingu627/llama4-scout-q4 was published 7 hours earlier with exactly the same hash.

Hey, apologies for the initial duplicate. I've since reuploaded the model. This current version uses the sharded GGUFs from Unsloth, which I configured for Ollama. My goal was just to share a useful setup, not claim original creation. Additionally, I've added Maverick (aravhawk/llama4:400b) if anyone has the VRAM for it.

I updated my ollama through the script to the latest version but it errrors out on llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'llama4'.

Is the version that can support it not released yet?

@mistrjirka commented on GitHub (Apr 12, 2025): > > > Try out `ollama run aravhawk/llama4`, I got it on there @ 4-bit quant and a 4096 token context window. It's ~65GB. Also try experimenting with the context in the `MODELFILE`, that 128GB can easily fit it (as long as the GPU can handle it). > > > > > > Looks like [`ingu627/llama4-scout-q4`](https://ollama.com/ingu627/llama4-scout-q4) was published 7 hours earlier with exactly the same hash. > > Hey, apologies for the initial duplicate. I've since **reuploaded** the model. This current version uses the sharded GGUFs from Unsloth, which I configured for Ollama. My goal was just to share a useful setup, not claim original creation. Additionally, I've added Maverick (`aravhawk/llama4:400b`) if anyone has the VRAM for it. I updated my ollama through the script to the latest version but it errrors out on llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'llama4'. Is the version that can support it not released yet?

GiteaMirror commented

2026-04-22 13:39:22 -05:00

@AlbertoSinigaglia commented on GitHub (Apr 12, 2025):

Try out ollama run aravhawk/llama4, I got it on there @ 4-bit quant and a 4096 token context window. It's ~65GB. Also try experimenting with the context in the MODELFILE, that 128GB can easily fit it (as long as the GPU can handle it).

Looks like ingu627/llama4-scout-q4 was published 7 hours earlier with exactly the same hash.

Hey, apologies for the initial duplicate. I've since reuploaded the model. This current version uses the sharded GGUFs from Unsloth, which I configured for Ollama. My goal was just to share a useful setup, not claim original creation. Additionally, I've added Maverick (aravhawk/llama4:400b) if anyone has the VRAM for it.

I updated my ollama through the script to the latest version but it errrors out on llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'llama4'.

Is the version that can support it not released yet?

https://ollama.com/search?o=newest i don't see any llama4 available to be fair...

@AlbertoSinigaglia commented on GitHub (Apr 12, 2025): > > > > Try out `ollama run aravhawk/llama4`, I got it on there @ 4-bit quant and a 4096 token context window. It's ~65GB. Also try experimenting with the context in the `MODELFILE`, that 128GB can easily fit it (as long as the GPU can handle it). > > > > > > > > > Looks like [`ingu627/llama4-scout-q4`](https://ollama.com/ingu627/llama4-scout-q4) was published 7 hours earlier with exactly the same hash. > > > > Hey, apologies for the initial duplicate. I've since **reuploaded** the model. This current version uses the sharded GGUFs from Unsloth, which I configured for Ollama. My goal was just to share a useful setup, not claim original creation. Additionally, I've added Maverick (`aravhawk/llama4:400b`) if anyone has the VRAM for it. > > I updated my ollama through the script to the latest version but it errrors out on llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'llama4'. > > Is the version that can support it not released yet? > > https://ollama.com/search?o=newest i don't see any llama4 available to be fair...

GiteaMirror commented

2026-04-22 13:39:23 -05:00

@aravhawk commented on GitHub (Apr 12, 2025):

Try out ollama run aravhawk/llama4, I got it on there @ 4-bit quant and a 4096 token context window. It's ~65GB. Also try experimenting with the context in the MODELFILE, that 128GB can easily fit it (as long as the GPU can handle it).

Looks like ingu627/llama4-scout-q4 was published 7 hours earlier with exactly the same hash.

Hey, apologies for the initial duplicate. I've since reuploaded the model. This current version uses the sharded GGUFs from Unsloth, which I configured for Ollama. My goal was just to share a useful setup, not claim original creation. Additionally, I've added Maverick (aravhawk/llama4:400b) if anyone has the VRAM for it.

I updated my ollama through the script to the latest version but it errrors out on llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'llama4'.

Is the version that can support it not released yet?

Yes, it seems so. I just tried reinstalling ollama and doing ollama run aravhawk/llama4 on a GH200 machine, and recieved the same error. Looks like Ollama might not support the new arch.

@aravhawk commented on GitHub (Apr 12, 2025): > > > > Try out `ollama run aravhawk/llama4`, I got it on there @ 4-bit quant and a 4096 token context window. It's ~65GB. Also try experimenting with the context in the `MODELFILE`, that 128GB can easily fit it (as long as the GPU can handle it). > > > > > > > > > Looks like [`ingu627/llama4-scout-q4`](https://ollama.com/ingu627/llama4-scout-q4) was published 7 hours earlier with exactly the same hash. > > > > > > Hey, apologies for the initial duplicate. I've since **reuploaded** the model. This current version uses the sharded GGUFs from Unsloth, which I configured for Ollama. My goal was just to share a useful setup, not claim original creation. Additionally, I've added Maverick (`aravhawk/llama4:400b`) if anyone has the VRAM for it. > > I updated my ollama through the script to the latest version but it errrors out on llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'llama4'. > > Is the version that can support it not released yet? Yes, it seems so. I just tried reinstalling `ollama` and doing `ollama run aravhawk/llama4` on a GH200 machine, and recieved the same error. Looks like Ollama might not support the new arch.

GiteaMirror commented

2026-04-22 13:39:24 -05:00

@rick-github commented on GitHub (Apr 12, 2025):

llama4 support is in progress: #10141

@rick-github commented on GitHub (Apr 12, 2025): llama4 support is in progress: #10141

GiteaMirror commented

2026-04-22 13:39:24 -05:00

@mistrjirka commented on GitHub (Apr 12, 2025):

s ~

Well it is pos

Try out ollama run aravhawk/llama4, I got it on there @ 4-bit quant and a 4096 token context window. It's ~65GB. Also try experimenting with the context in the MODELFILE, that 128GB can easily fit it (as long as the GPU can handle it).

Looks like ingu627/llama4-scout-q4 was published 7 hours earlier with exactly the same hash.

Hey, apologies for the initial duplicate. I've since reuploaded the model. This current version uses the sharded GGUFs from Unsloth, which I configured for Ollama. My goal was just to share a useful setup, not claim original creation. Additionally, I've added Maverick (aravhawk/llama4:400b) if anyone has the VRAM for it.

I updated my ollama through the script to the latest version but it errrors out on llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'llama4'.
Is the version that can support it not released yet?

https://ollama.com/search?o=newest i don't see any llama4 available to be fair...

Well it is I can see it in the search results. https://ollama.com/search?q=llama4

@mistrjirka commented on GitHub (Apr 12, 2025): > s ~ Well it is pos > > > > > Try out `ollama run aravhawk/llama4`, I got it on there @ 4-bit quant and a 4096 token context window. It's ~65GB. Also try experimenting with the context in the `MODELFILE`, that 128GB can easily fit it (as long as the GPU can handle it). > > > > > > > > > > > > Looks like [`ingu627/llama4-scout-q4`](https://ollama.com/ingu627/llama4-scout-q4) was published 7 hours earlier with exactly the same hash. > > > > > > > > > Hey, apologies for the initial duplicate. I've since **reuploaded** the model. This current version uses the sharded GGUFs from Unsloth, which I configured for Ollama. My goal was just to share a useful setup, not claim original creation. Additionally, I've added Maverick (`aravhawk/llama4:400b`) if anyone has the VRAM for it. > > > > > > I updated my ollama through the script to the latest version but it errrors out on llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'llama4'. > > Is the version that can support it not released yet? > > https://ollama.com/search?o=newest i don't see any llama4 available to be fair... Well it is I can see it in the search results. https://ollama.com/search?q=llama4

GiteaMirror commented

2026-04-22 13:39:27 -05:00

@ZV-Liu commented on GitHub (Apr 14, 2025):

https://github.com/ollama/ollama/pull/10141 How long will it take to support llama4? I have tested and recompiled ollama on this branch. It can support llama4, but I can only use backend and CPU reasoning?

@ZV-Liu commented on GitHub (Apr 14, 2025): https://github.com/ollama/ollama/pull/10141 How long will it take to support llama4? I have tested and recompiled ollama on this branch. It can support llama4, but I can only use backend and CPU reasoning?

GiteaMirror commented

2026-04-22 13:39:27 -05:00

@batot1 commented on GitHub (Apr 14, 2025):

Error: unable to load model:

it@ai:~$ ollama run aravhawk/llama4
pulling manifest 
pulling 7701f347644c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  65 GB                         
pulling 8ab4849b038c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  254 B                         
pulling 75357d685f23... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏   28 B                         
pulling f75ae6c54a38... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  129 B                         
pulling 4408b47d76b7... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  491 B                         
verifying sha256 digest 
writing manifest 
success 
Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-7701f347644c67a57adacf853a353133e7c58daaa0fba18ec1623d490a0447b6
it@ai:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:           188Gi       4.6Gi       1.1Gi        44Mi       184Gi       183Gi
Swap:          976Mi       4.0Mi       972Mi
it@ai:~$ ollama run aravhawk/llama4
Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-7701f347644c67a57adacf853a353133e7c58daaa0fba18ec1623d490a0447b6
it@ai:~$ ollama -v
ollama version is 0.6.5
it@ai:~$ lspci |grep -i VGA
01:00.0 VGA compatible controller: NVIDIA Corporation AD102 [GeForce RTX 4090] (rev a1)

Any idea what is it wrong?
All other model in ollama working property only this model not working.

@batot1 commented on GitHub (Apr 14, 2025): Error: unable to load model: ``` it@ai:~$ ollama run aravhawk/llama4 pulling manifest pulling 7701f347644c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 65 GB pulling 8ab4849b038c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 254 B pulling 75357d685f23... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 28 B pulling f75ae6c54a38... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 129 B pulling 4408b47d76b7... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 491 B verifying sha256 digest writing manifest success Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-7701f347644c67a57adacf853a353133e7c58daaa0fba18ec1623d490a0447b6 it@ai:~$ free -h total used free shared buff/cache available Mem: 188Gi 4.6Gi 1.1Gi 44Mi 184Gi 183Gi Swap: 976Mi 4.0Mi 972Mi it@ai:~$ ollama run aravhawk/llama4 Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-7701f347644c67a57adacf853a353133e7c58daaa0fba18ec1623d490a0447b6 it@ai:~$ ollama -v ollama version is 0.6.5 it@ai:~$ lspci |grep -i VGA 01:00.0 VGA compatible controller: NVIDIA Corporation AD102 [GeForce RTX 4090] (rev a1) ``` Any idea what is it wrong? All other model in ollama working property only this model not working.

GiteaMirror commented

2026-04-22 13:39:28 -05:00

@rick-github commented on GitHub (Apr 14, 2025):

Any idea what is it wrong?

https://github.com/ollama/ollama/issues/10143#issuecomment-2798941503

@rick-github commented on GitHub (Apr 14, 2025): > Any idea what is it wrong? https://github.com/ollama/ollama/issues/10143#issuecomment-2798941503

GiteaMirror commented

2026-04-22 13:39:30 -05:00

@aravhawk commented on GitHub (Apr 16, 2025):

Error: unable to load model:

it@ai:~$ ollama run aravhawk/llama4
pulling manifest 
pulling 7701f347644c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  65 GB                         
pulling 8ab4849b038c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  254 B                         
pulling 75357d685f23... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏   28 B                         
pulling f75ae6c54a38... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  129 B                         
pulling 4408b47d76b7... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  491 B                         
verifying sha256 digest 
writing manifest 
success 
Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-7701f347644c67a57adacf853a353133e7c58daaa0fba18ec1623d490a0447b6
it@ai:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:           188Gi       4.6Gi       1.1Gi        44Mi       184Gi       183Gi
Swap:          976Mi       4.0Mi       972Mi
it@ai:~$ ollama run aravhawk/llama4
Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-7701f347644c67a57adacf853a353133e7c58daaa0fba18ec1623d490a0447b6
it@ai:~$ ollama -v
ollama version is 0.6.5
it@ai:~$ lspci |grep -i VGA
01:00.0 VGA compatible controller: NVIDIA Corporation AD102 [GeForce RTX 4090] (rev a1)

Any idea what is it wrong? All other model in ollama working property only this model not working.

Architectural issues, unfortunately 😔

@aravhawk commented on GitHub (Apr 16, 2025): > Error: unable to load model: > > ``` > it@ai:~$ ollama run aravhawk/llama4 > pulling manifest > pulling 7701f347644c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 65 GB > pulling 8ab4849b038c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 254 B > pulling 75357d685f23... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 28 B > pulling f75ae6c54a38... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 129 B > pulling 4408b47d76b7... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 491 B > verifying sha256 digest > writing manifest > success > Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-7701f347644c67a57adacf853a353133e7c58daaa0fba18ec1623d490a0447b6 > it@ai:~$ free -h > total used free shared buff/cache available > Mem: 188Gi 4.6Gi 1.1Gi 44Mi 184Gi 183Gi > Swap: 976Mi 4.0Mi 972Mi > it@ai:~$ ollama run aravhawk/llama4 > Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-7701f347644c67a57adacf853a353133e7c58daaa0fba18ec1623d490a0447b6 > it@ai:~$ ollama -v > ollama version is 0.6.5 > it@ai:~$ lspci |grep -i VGA > 01:00.0 VGA compatible controller: NVIDIA Corporation AD102 [GeForce RTX 4090] (rev a1) > ``` > > Any idea what is it wrong? All other model in ollama working property only this model not working. Architectural issues, unfortunately 😔

GiteaMirror commented

2026-04-22 13:39:30 -05:00

@aravhawk commented on GitHub (Apr 16, 2025):

#10141 How long will it take to support llama4? I have tested and recompiled ollama on this branch. It can support llama4, but I can only use backend and CPU reasoning?

I think you can recompile llama.cpp with CUDA support, but don't quote me on it.

@aravhawk commented on GitHub (Apr 16, 2025): > [#10141](https://github.com/ollama/ollama/pull/10141) How long will it take to support llama4? I have tested and recompiled ollama on this branch. It can support llama4, but I can only use backend and CPU reasoning? I _think_ you can recompile llama.cpp with CUDA support, but don't quote me on it.

GiteaMirror commented

2026-04-22 13:39:31 -05:00

@lee-b commented on GitHub (Apr 21, 2025):

Llama 4 (even Scout) is a great model. Very fast, and much more useful answers than most of the previous models I've tried. I'm running it on llama.cpp at the moment though, which lacks vision support. It would be great if ollama implements this with vision.

@lee-b commented on GitHub (Apr 21, 2025): Llama 4 (even Scout) is a great model. Very fast, and much more useful answers than most of the previous models I've tried. I'm running it on llama.cpp at the moment though, which lacks vision support. It would be great if ollama implements this with vision.

GiteaMirror commented

2026-04-22 13:39:32 -05:00

@ips972 commented on GitHub (Apr 24, 2025):

hi, any update on when ollama will support llama4 ? f16 or q1-8 etc... with all functions chat, vision and the huge max token size.. ??

@ips972 commented on GitHub (Apr 24, 2025): hi, any update on when ollama will support llama4 ? f16 or q1-8 etc... with all functions chat, vision and the huge max token size.. ??

GiteaMirror commented

2026-04-22 13:39:33 -05:00

@Notbici commented on GitHub (Apr 25, 2025):

Any workarounds for getting Llama 4 working on Ollama?

@Notbici commented on GitHub (Apr 25, 2025): Any workarounds for getting Llama 4 working on Ollama?

GiteaMirror commented

2026-04-22 13:39:34 -05:00

@mistrjirka commented on GitHub (Apr 25, 2025):

Any workarounds for getting Llama 4 working on Ollama?

you can compile the branch llama4 yourself. It is currently open pull request. Currently it seems to be in phase of code review by other contributors.

@mistrjirka commented on GitHub (Apr 25, 2025): > Any workarounds for getting Llama 4 working on Ollama? you can compile the branch llama4 yourself. It is currently open pull request. Currently it seems to be in phase of code review by other contributors.

GiteaMirror commented

2026-04-22 13:39:35 -05:00

@rsmirnov90 commented on GitHub (Apr 26, 2025):

Any workarounds for getting Llama 4 working on Ollama?

you can compile the branch llama4 yourself. It is currently open pull request. Currently it seems to be in phase of code review by other contributors.

I think it just disappeared... Or at least I don't see it in the branch list anymore (and I know it was there because I was checking it almost daily up until now).

@rsmirnov90 commented on GitHub (Apr 26, 2025): > > Any workarounds for getting Llama 4 working on Ollama? > > you can compile the branch llama4 yourself. It is currently open pull request. Currently it seems to be in phase of code review by other contributors. I think it just disappeared... Or at least I don't see it in the branch list anymore (and I know it was there because I was checking it almost daily up until now).

GiteaMirror commented

2026-04-22 13:39:36 -05:00

@igorschlum commented on GitHub (Apr 26, 2025):

@rsmirnov90 there is a new version of Ollama that support Llama4, it still can evolve, but it's there and you can try it.
https://github.com/ollama/ollama/releases/tag/v0.6.7-rc0

@igorschlum commented on GitHub (Apr 26, 2025): @rsmirnov90 there is a new version of Ollama that support Llama4, it still can evolve, but it's there and you can try it. https://github.com/ollama/ollama/releases/tag/v0.6.7-rc0

GiteaMirror commented

2026-04-22 13:39:40 -05:00

@ips972 commented on GitHub (Apr 27, 2025):

tried the new ollama with llama4 , works fine. but still lacks the perfotmance of vllm. hope that some day ollama gets to that performance level. its a much easier platform to manage then any other. espacially in multi user sessions.

@ips972 commented on GitHub (Apr 27, 2025): tried the new ollama with llama4 , works fine. but still lacks the perfotmance of vllm. hope that some day ollama gets to that performance level. its a much easier platform to manage then any other. espacially in multi user sessions.

GiteaMirror commented

2026-04-22 13:39:41 -05:00

@thorewi commented on GitHub (Apr 29, 2025):

Hello, is this implementation really multimodal with image processing (or maybe I'm using wrong model)? Because I'm getting negative answer (see attached picture)... I'm using ollama 0.6.7-rc0 and tried this models: https://ollama.com/ingu627/llama4-scout-q4 and https://ollama.com/aravhawk/llama4. Thank you for your help.

@thorewi commented on GitHub (Apr 29, 2025): Hello, is this implementation really multimodal with image processing (or maybe I'm using wrong model)? Because I'm getting negative answer (see attached picture)... I'm using ollama 0.6.7-rc0 and tried this models: https://ollama.com/ingu627/llama4-scout-q4 and https://ollama.com/aravhawk/llama4. Thank you for your help. ![Image](https://github.com/user-attachments/assets/5484076a-ae02-441f-9ab5-524e65ba4bbd)

GiteaMirror commented

2026-04-22 13:39:41 -05:00

@igorschlum commented on GitHub (Apr 29, 2025):

@thorewi I think that Llama4 can process an image as it can describe an image as llama3.3 is able to, but llama4 cannot process an image making modifications to the image.

@igorschlum commented on GitHub (Apr 29, 2025): @thorewi I think that Llama4 can process an image as it can describe an image as llama3.3 is able to, but llama4 cannot process an image making modifications to the image.

GiteaMirror commented

2026-04-22 13:39:42 -05:00

@thorewi commented on GitHub (Apr 29, 2025):

@igorschlum Yes, that's exactly what I need, but I always get something like this: [attached picture] — basically no response. So the question is whether it’s working for anyone or not...

@thorewi commented on GitHub (Apr 29, 2025): @igorschlum Yes, that's exactly what I need, but I always get something like this: [attached picture] — basically no response. So the question is whether it’s working for anyone or not... <img width="1162" alt="Image" src="https://github.com/user-attachments/assets/175679a4-0165-4fee-9826-ca77787063fe" />

GiteaMirror commented

2026-04-22 13:39:44 -05:00

@rick-github commented on GitHub (Apr 29, 2025):

aravhawk/llama4 doesn't support images:

$ curl -s localhost:11434/api/show -d '{"model":"aravhawk/llama4"}' | jq .capabilities
[
  "completion"
]

The lack of response may be due to something else. ollama server logs may aid in debugging.

@rick-github commented on GitHub (Apr 29, 2025): aravhawk/llama4 doesn't support images: ```console $ curl -s localhost:11434/api/show -d '{"model":"aravhawk/llama4"}' | jq .capabilities [ "completion" ] ``` The lack of response may be due to something else. ollama [server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in debugging.

GiteaMirror commented

2026-04-22 13:39:45 -05:00

@rick-github commented on GitHub (May 1, 2025):

https://ollama.com/library/llama4

@rick-github commented on GitHub (May 1, 2025): https://ollama.com/library/llama4

GiteaMirror commented

2026-04-22 13:39:45 -05:00

@aravhawk commented on GitHub (May 1, 2025):

*The models I've uploaded are not multimodal. I've updated the description to reflect that (I originally copied it directly from Meta)

@aravhawk commented on GitHub (May 1, 2025): *The models I've uploaded are not multimodal. I've updated the description to reflect that (I originally copied it directly from Meta)

GiteaMirror commented

2026-04-22 13:39:46 -05:00

@thepwagner commented on GitHub (May 3, 2025):

Is there a bug in the template currently?

On v0.6.7, using llama4:17b-scout-16e-instruct-q4_K_M b62dea0de67c.
When calling with tools, I'm getting:

time=2025-05-03T08:07:05.993-04:00 level=ERROR source=routes.go:1520 msg="chat prompt error" error="template: :6:10: executing \"\" at <.Tools>: can't evaluate field Tools in ty
pe api.Tools"

Using the with .Tools on L3 makes me think the range on L6 should just be over . - but I can't find the source anywhere to submit a PR:

{{- if .System }}<|header_start|>system<|header_end|>

{{- with .Tools }}Environment: ipython
You have access to the following functions. To call a function, please respond with JSON for a function call. Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.

{{- range .Tools }}{{ . }}{{ "\n\n" }}

@thepwagner commented on GitHub (May 3, 2025): Is there a bug in the template currently? On `v0.6.7`, using `llama4:17b-scout-16e-instruct-q4_K_M b62dea0de67c`. When calling with tools, I'm getting: ``` time=2025-05-03T08:07:05.993-04:00 level=ERROR source=routes.go:1520 msg="chat prompt error" error="template: :6:10: executing \"\" at <.Tools>: can't evaluate field Tools in ty pe api.Tools" ``` Using the `with .Tools` on L3 makes me think the range on L6 should just be over `.` - but I can't find the source anywhere to submit a PR: ``` {{- if .System }}<|header_start|>system<|header_end|> {{- with .Tools }}Environment: ipython You have access to the following functions. To call a function, please respond with JSON for a function call. Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables. {{- range .Tools }}{{ . }}{{ "\n\n" }} ```

GiteaMirror commented

2026-04-22 13:39:47 -05:00

@olumolu commented on GitHub (May 4, 2025):

Close this as support already merged.

@olumolu commented on GitHub (May 4, 2025): Close this as support already merged.

GiteaMirror commented

2026-04-22 13:39:47 -05:00

@addypy commented on GitHub (May 5, 2025):

I'm getting the same issue @thepwagner mentioned. Tested with ollama (docker) with both v0.6.8 and v0.6.7.

ModelHTTPError: status_code: 500, model_name: llama4, body: {'message': 'template: :6:10: executing "" at <.Tools>: can't evaluate field Tools in type api.Tools', 'type': 'api_error', 'param': None, 'code': None}

Any updates on this?

@addypy commented on GitHub (May 5, 2025): I'm getting the same issue @thepwagner mentioned. Tested with ollama (docker) with both v0.6.8 and v0.6.7. ModelHTTPError: status_code: 500, model_name: llama4, body: {'message': 'template: :6:10: executing "" at <.Tools>: can\'t evaluate field Tools in type api.Tools', 'type': 'api_error', 'param': None, 'code': None} Any updates on this?

GiteaMirror commented

2026-04-22 13:39:48 -05:00

@sherlock666 commented on GitHub (May 5, 2025):

Same issue as @thepwagner and @addypy
I would like to work with the tools function

for same code the llama3.2 can use the tools function correctly
while the llama4:scout will return with the same error
.... template: :6:10: executing "" at <.Tools>: can't evaluate field Tools in type api.Tools" ........

@sherlock666 commented on GitHub (May 5, 2025): Same issue as @thepwagner and @addypy I would like to work with the tools function for same code the llama3.2 can use the tools function correctly while the llama4:scout will return with the same error .... template: :6:10: executing \"\" at <.Tools>: can't evaluate field Tools in type api.Tools" ........

GiteaMirror commented

2026-04-22 13:39:49 -05:00

@mxyng commented on GitHub (May 5, 2025):

tools is fixed for llama 4. Please repull the model

@mxyng commented on GitHub (May 5, 2025): tools is fixed for llama 4. Please repull the model

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#32416