[GH-ISSUE #786] Image generation models #46885

Open
opened 2026-04-28 01:33:13 -05:00 by GiteaMirror · 22 comments
Owner

Originally created by @SabareeshGC on GitHub (Oct 13, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/786

It would be great if we can extend support for text to image models.

Originally created by @SabareeshGC on GitHub (Oct 13, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/786 It would be great if we can extend support for text to image models.
GiteaMirror added the feature request label 2026-04-28 01:33:13 -05:00
Author
Owner

@orkutmuratyilmaz commented on GitHub (Oct 18, 2023):

Hello @SabareeshGC,

There is a tutorial for importing models. Which model would you like to get supported by ollama? How about SDXL?

Best,
Orkut

<!-- gh-comment-id:1769037105 --> @orkutmuratyilmaz commented on GitHub (Oct 18, 2023): Hello @SabareeshGC, There is [a tutorial](https://github.com/jmorganca/ollama/blob/main/docs/import.md) for importing models. Which model would you like to get supported by ollama? How about [SDXL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)? Best, Orkut
Author
Owner

@SabareeshGC commented on GitHub (Oct 18, 2023):

Sdxl is great start, when looked at tutorial not sure it supports text to image

From: Orkut Murat Yılmaz @.>
Date: Wednesday, October 18, 2023 at 10:43 AM
To: jmorganca/ollama @.
>
Cc: Sabareesh Subramani @.>, Mention @.>
Subject: Re: [jmorganca/ollama] Requesting support for image generating models (Issue #786)
Caution: This is an external email and has a suspicious subject or content. Please take care when clicking links or opening attachments. When in doubt, contact your IT Department

Hello @SabareeshGChttps://github.com/SabareeshGC,

There is a tutorialhttps://github.com/jmorganca/ollama/blob/main/docs/import.md for importing models. Which model would you like to get supported by ollama? How about SDXLhttps://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0?

Best,
Orkut


Reply to this email directly, view it on GitHubhttps://github.com/jmorganca/ollama/issues/786#issuecomment-1769037105, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A3GUESUDF7F2QXW5RTT4B7LYAAILDAVCNFSM6AAAAAA57XUEPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRZGAZTOMJQGU.
You are receiving this because you were mentioned.Message ID: @.***>

<!-- gh-comment-id:1769054021 --> @SabareeshGC commented on GitHub (Oct 18, 2023): Sdxl is great start, when looked at tutorial not sure it supports text to image From: Orkut Murat Yılmaz ***@***.***> Date: Wednesday, October 18, 2023 at 10:43 AM To: jmorganca/ollama ***@***.***> Cc: Sabareesh Subramani ***@***.***>, Mention ***@***.***> Subject: Re: [jmorganca/ollama] Requesting support for image generating models (Issue #786) Caution: This is an external email and has a suspicious subject or content. Please take care when clicking links or opening attachments. When in doubt, contact your IT Department Hello @SabareeshGC<https://github.com/SabareeshGC>, There is a tutorial<https://github.com/jmorganca/ollama/blob/main/docs/import.md> for importing models. Which model would you like to get supported by ollama? How about SDXL<https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0>? Best, Orkut — Reply to this email directly, view it on GitHub<https://github.com/jmorganca/ollama/issues/786#issuecomment-1769037105>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A3GUESUDF7F2QXW5RTT4B7LYAAILDAVCNFSM6AAAAAA57XUEPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRZGAZTOMJQGU>. You are receiving this because you were mentioned.Message ID: ***@***.***>
Author
Owner

@hfabio commented on GitHub (Nov 9, 2023):

it could be awesome to use a stable diffusion model, like this one
is it possible to use it in ollama?

<!-- gh-comment-id:1802941232 --> @hfabio commented on GitHub (Nov 9, 2023): it could be awesome to use a stable diffusion model, like [this one](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) is it possible to use it in ollama?
Author
Owner

@orkutmuratyilmaz commented on GitHub (Nov 9, 2023):

@hfabio, from the importing tutorial:

Ollama supports a set of model architectures, with support for more coming soon:

Llama & Mistral
Falcon & RW
GPT-NeoX
BigCode
To view a model's architecture, check the config.json file in its HuggingFace repo. You should see an entry under architectures (e.g. LlamaForCausalLM).
<!-- gh-comment-id:1803281561 --> @orkutmuratyilmaz commented on GitHub (Nov 9, 2023): @hfabio, from the [importing tutorial](https://github.com/jmorganca/ollama/blob/main/docs/import.md): ``` Ollama supports a set of model architectures, with support for more coming soon: Llama & Mistral Falcon & RW GPT-NeoX BigCode To view a model's architecture, check the config.json file in its HuggingFace repo. You should see an entry under architectures (e.g. LlamaForCausalLM). ```
Author
Owner

@BananaAcid commented on GitHub (Dec 16, 2023):

What does this list above mean? ollama does not support image generating models yet?

<!-- gh-comment-id:1858878901 --> @BananaAcid commented on GitHub (Dec 16, 2023): What does this list above mean? ollama does not support image generating models yet?
Author
Owner

@teamsmiley commented on GitHub (Dec 20, 2023):

anyone success import sdxl model?

<!-- gh-comment-id:1864890035 --> @teamsmiley commented on GitHub (Dec 20, 2023): anyone success import sdxl model?
Author
Owner

@easp commented on GitHub (Dec 20, 2023):

@teamsmiley no, because Ollama doesn't support any text-to-image models.

<!-- gh-comment-id:1865284947 --> @easp commented on GitHub (Dec 20, 2023): @teamsmiley no, because Ollama doesn't support any text-to-image models.
Author
Owner

@sansmoraxz commented on GitHub (Dec 21, 2023):

There is also leejet/stable-diffusion.cpp for ggml

<!-- gh-comment-id:1866698754 --> @sansmoraxz commented on GitHub (Dec 21, 2023): There is also [leejet/stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp) for ggml
Author
Owner

@tripleo1 commented on GitHub (Jan 23, 2024):

@easp (not singling you out)

no, because Ollama doesn't support any text-to-image models.

why?

its going to take me a couple weeks to figure it out on my own, but could [somebody] provide search topics/clues to "onboard" myself quickly?

<!-- gh-comment-id:1905179073 --> @tripleo1 commented on GitHub (Jan 23, 2024): @easp (not singling you out) > no, because Ollama doesn't support any text-to-image models. why? its going to take me a couple weeks to figure it out on my own, but could [somebody] provide search topics/clues to "onboard" myself quickly?
Author
Owner

@easp commented on GitHub (Jan 23, 2024):

@tripleo There are multiple ways to answer that question. I'm not sure where to start. I guess I'll start by saying I don't have any inside line on what the Ollama developers are thinking.

With that out of the way, Ollama doesn't support any text-to-image models because no one has added support for text-to-image models. The team's resources are limited. Even if someone comes along and says "I'll do all the work of adding text-to-image support" the effort would be a multiplier on the communication and coordination costs of the project. Once it's added, it will likely reduce the progress and efficiency of other work on the project. Mitigating that will require increase the up-front communication and coordination costs.

Ollama currently uses llama.cpp to do a lot of the work of actually supporting a range of large language models. This choice allowed the team to focus on delivering value in other ways. Llama.cpp's reasons for not supporting text-to-image models are probably for similar reasons.

There is plenty to do already in the area of LLMs. Focus is a virtue.

<!-- gh-comment-id:1905273618 --> @easp commented on GitHub (Jan 23, 2024): @tripleo There are multiple ways to answer that question. I'm not sure where to start. I guess I'll start by saying I don't have any inside line on what the Ollama developers are thinking. With that out of the way, Ollama doesn't support any text-to-image models because no one has added support for text-to-image models. The team's resources are limited. Even if someone comes along and says "I'll do all the work of adding text-to-image support" the effort would be a multiplier on the communication and coordination costs of the project. Once it's added, it will likely reduce the progress and efficiency of other work on the project. Mitigating that will require increase the up-front communication and coordination costs. Ollama currently uses llama.cpp to do a lot of the work of actually supporting a range of large language models. This choice allowed the team to focus on delivering value in other ways. Llama.cpp's reasons for not supporting text-to-image models are probably for similar reasons. There is plenty to do already in the area of LLMs. Focus is a virtue.
Author
Owner

@YuanfengZhang commented on GitHub (Mar 17, 2024):

Check this repo.

<!-- gh-comment-id:2002377780 --> @YuanfengZhang commented on GitHub (Mar 17, 2024): Check this [repo](https://github.com/Dublit-Development/ollama-api?tab=readme-ov-file).
Author
Owner

@m4r1k commented on GitHub (Mar 29, 2024):

If Ollama can also run image generation models, it will become the next docker

<!-- gh-comment-id:2026914801 --> @m4r1k commented on GitHub (Mar 29, 2024): If Ollama can also run image generation models, it will become the next docker
Author
Owner

@tarasis commented on GitHub (Apr 20, 2024):

Given Llama3 can do images, I'm certainly interested to try it.

<!-- gh-comment-id:2067591628 --> @tarasis commented on GitHub (Apr 20, 2024): Given Llama3 can do images, I'm certainly interested to try it.
Author
Owner

@omerkarabacak commented on GitHub (Apr 21, 2024):

Given Llama3 can do images, I'm certainly interested to try it.

It only does ASCII images

<!-- gh-comment-id:2068079950 --> @omerkarabacak commented on GitHub (Apr 21, 2024): > Given Llama3 can do images, I'm certainly interested to try it. It only does ASCII images
Author
Owner

@nongmo677 commented on GitHub (Apr 26, 2024):

who succeeded, please call me

<!-- gh-comment-id:2079098138 --> @nongmo677 commented on GitHub (Apr 26, 2024): who succeeded, please call me
Author
Owner

@geroldmeisinger commented on GitHub (Jun 4, 2024):

we would have to find a balanced middle-ground between the minimalism that Ollama provides (a sophisticated and efficient model loader with a simple text prompt and configurable API endpoint) and the features needed to provide a txt2img application that is actually useful.

the most minimal version would have to have a preset of CFG, steps, sampler, schedular values (or provide simple commands to set them) and way to enter the positive and negative prompt. but I assume it will soon grow out of hand, because if you want to do something useful you need support for controlnets, loras, ipadapters etc. and also want visual representations to use certain features (mask editors, area prompting, image previews etc.) which doesn't really fit the Ollama paradigm. then someone has to define a opinionated way how these things play together. that's what Automatic1111 did with the downside that you are kind of restricted and it always takes very long to adopt new developments. ComfyUI is more modular, adopts new technology faster, but you have to setup everything on your own. and there is also still a lot of development in very fundamental features (see recent developments in Align-your-steps schedulers, Bosh3 ODE solver, GITS sampler, IPNMP sampler, high-CFG fixes, TensorRT support etc.) which need be implemented quickly. Automatic1111 lost a lot of its userbase because SDXL implementation took 1 week too long. Ollama then would be reduced to the efficient model loader for those UIs, which simply request the latent images, clip embeddings and VAE coding/decoding and the rest is done in the UI, and the REPL could provide a simple textprompt for testing plain txt2img.

default workflow in ComfyUI:
Screenshot from 2024-07-30 09-09-49

There are plenty of Stable Diffusion UIs and all of them are drowning in issues and features because of this:

  • Automatic1111: most popular, most fully-integrated environment
  • ComfyUI: very modular, usually has the newest features first, allows flexible workflows
  • Fooocus: restricted and opinionated but focused on providing a simple UI which produces good output out of the box (original author is the inventor of ControlNets and has a deep understanding of the inner workings of Stable Diffusion). probably the closest to Dall-E 3 (Fooocus provides a GPT2 prompt rewrite engine in the background)
  • SD.Next: fork and overhaul of Automatic1111 because everything took sooo long to implement
  • etc.
    (I think all of them provide API endpoints)

If you want to integrate it, HuggingFace diffusers has good integration and a lot of example code and documentation: https://huggingface.co/docs/diffusers . I'd recommend Stable Diffusion 1.5 because it has the lowest hardware requirements and integration is simpler and will be faster. Later models have more specifics which require more attention to detail.

<!-- gh-comment-id:2148250948 --> @geroldmeisinger commented on GitHub (Jun 4, 2024): we would have to find a balanced middle-ground between the minimalism that Ollama provides (a sophisticated and efficient model loader with a simple text prompt and configurable API endpoint) and the features needed to provide a txt2img application that is actually useful. the most minimal version would have to have a preset of CFG, steps, sampler, schedular values (or provide simple commands to set them) and way to enter the positive and negative prompt. but I assume it will soon grow out of hand, because if you want to do something useful you need support for controlnets, loras, ipadapters etc. and also want visual representations to use certain features (mask editors, area prompting, image previews etc.) which doesn't really fit the Ollama paradigm. then someone has to define a opinionated way how these things play together. that's what Automatic1111 did with the downside that you are kind of restricted and it always takes very long to adopt new developments. ComfyUI is more modular, adopts new technology faster, but you have to setup everything on your own. and there is also still a lot of development in very fundamental features (see recent developments in Align-your-steps schedulers, Bosh3 ODE solver, GITS sampler, IPNMP sampler, high-CFG fixes, TensorRT support etc.) which need be implemented quickly. Automatic1111 lost a lot of its userbase because SDXL implementation took 1 week too long. Ollama then would be reduced to the efficient model loader for those UIs, which simply request the latent images, clip embeddings and VAE coding/decoding and the rest is done in the UI, and the REPL could provide a simple textprompt for testing plain txt2img. default workflow in ComfyUI: ![Screenshot from 2024-07-30 09-09-49](https://github.com/user-attachments/assets/080387a5-b9fa-40c7-9f2a-d14c3ee921da) There are plenty of Stable Diffusion UIs and all of them are drowning in issues and features because of this: - Automatic1111: most popular, most fully-integrated environment - ComfyUI: very modular, usually has the newest features first, allows flexible workflows - Fooocus: restricted and opinionated but focused on providing a simple UI which produces good output out of the box (original author is the inventor of ControlNets and has a deep understanding of the inner workings of Stable Diffusion). probably the closest to Dall-E 3 (Fooocus provides a GPT2 prompt rewrite engine in the background) - SD.Next: fork and overhaul of Automatic1111 because everything took sooo long to implement - etc. (I think all of them provide API endpoints) If you want to integrate it, HuggingFace diffusers has good integration and a lot of example code and documentation: https://huggingface.co/docs/diffusers . I'd recommend Stable Diffusion 1.5 because it has the lowest hardware requirements and integration is simpler and will be faster. Later models have more specifics which require more attention to detail.
Author
Owner

@vlad-ivanov-name commented on GitHub (Jul 19, 2024):

I think one of the reasons ollama became popular is its consistently reliable user experience out of the box. If I had to guess, many people don't necessarily want or need diffusion models specifically in ollama but would like to have an app that would work just as well as ollama. My personal opinion is that apart from the focus of the development team, this is also about the programming language projects are written in. Go at least has basic features like static checking, pinned dependencies, self-contained binaries etc that often improve the experience of users consuming the end product. So I would understand if people tired of broken Python projects with half-assed dependency management and runtime crashes wanted to instead get UX similar to ollama.

I do also hope eventually to see a project similar in quality to ollama but for diffusion models.

<!-- gh-comment-id:2239550692 --> @vlad-ivanov-name commented on GitHub (Jul 19, 2024): I think one of the reasons ollama became popular is its consistently reliable user experience out of the box. If I had to guess, many people don't necessarily want or need diffusion models specifically in ollama but would like to have an app that would work just as well as ollama. My personal opinion is that apart from the focus of the development team, this is also about the programming language projects are written in. Go at least has basic features like static checking, pinned dependencies, self-contained binaries etc that often improve the experience of users consuming the end product. So I would understand if people tired of broken Python projects with half-assed dependency management and runtime crashes wanted to instead get UX similar to ollama. I do also hope eventually to see a project similar in quality to ollama but for diffusion models.
Author
Owner

@S0AndS0 commented on GitHub (Nov 25, 2024):

Hello @SabareeshGC,

There is a tutorial for importing models. Which model would you like to get supported by ollama? How about SDXL?

Best, Orkut

After finding a GGUF format of Stable Diffusion, and doing a bit of reading the friendly manual, I'll confess I be a bit stuck with the following error;

Error: Post "http://127.0.0.1:11434/api/generate": EOF

Steps to reproduce

Note; one, of many, prerequisites be getting Git LFS (Large Filesystem) setup, but such things are out of scope... for now.

Clone the GGUF repo

mkdir -vp "${HOME}/git/face/gpustack"
pushd "${_}"

git clone https://huggingface.co/gpustack/stable-diffusion-v1-4-GGUF
popd

Make a repo for Ollama configurations

mkdir -vp "${HOME}/git/hub/S0AndS0/"
pushd "${_}"

git init ollama-sdv14-gguf-q8
pushd "${_}"

tee Modelfile 1>/dev/null <<EOF
FROM ${HOME}/git/face/gpustack/stable-diffusion-v1-4-GGUF/stable-diffusion-v1-4-Q8_0.gguf
PARAMETER temperature 1
SYSTEM """
You make pretty images from text prompts provided
"""
EOF

popd 2

Tell Ollama about the Modelfile

ollama create sdx1 -f "${HOME}/git/hub/S0AndS0/ollama-sdv14-gguf-q8/Modelfile"

Restart service

sudo systemctl restart ollama.service

Attempt to run

ollama run sdx1
#> Error: Post "http://127.0.0.1:11434/api/generate": EOF

Notes and thoughts

Skimming other replies in this here Issue it seems there be some other bits of stuff that needs setup before any can hope for text-to-image models to work, but I figured I'd share where I got stuck in case anyone else wants to pickup the metaphorical baton from here.

<!-- gh-comment-id:2499257986 --> @S0AndS0 commented on GitHub (Nov 25, 2024): > Hello @SabareeshGC, > > There is [a tutorial](https://github.com/jmorganca/ollama/blob/main/docs/import.md) for importing models. Which model would you like to get supported by ollama? How about [SDXL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)? > > Best, Orkut After finding a [GGUF](https://huggingface.co/gpustack/stable-diffusion-v1-4-GGUF/tree/main) format of Stable Diffusion, and doing a bit of reading the friendly manual, I'll confess I be a bit stuck with the following error; ``` Error: Post "http://127.0.0.1:11434/api/generate": EOF ``` --- ## Steps to reproduce > Note; one, of many, prerequisites be getting Git LFS (Large Filesystem) setup, but such things are out of scope... for now. ### Clone the GGUF repo ```bash mkdir -vp "${HOME}/git/face/gpustack" pushd "${_}" git clone https://huggingface.co/gpustack/stable-diffusion-v1-4-GGUF popd ``` ### Make a repo for Ollama configurations ```bash mkdir -vp "${HOME}/git/hub/S0AndS0/" pushd "${_}" git init ollama-sdv14-gguf-q8 pushd "${_}" tee Modelfile 1>/dev/null <<EOF FROM ${HOME}/git/face/gpustack/stable-diffusion-v1-4-GGUF/stable-diffusion-v1-4-Q8_0.gguf PARAMETER temperature 1 SYSTEM """ You make pretty images from text prompts provided """ EOF popd 2 ``` ### Tell Ollama about the Modelfile ```bash ollama create sdx1 -f "${HOME}/git/hub/S0AndS0/ollama-sdv14-gguf-q8/Modelfile" ``` ### Restart service ```bash sudo systemctl restart ollama.service ``` ### Attempt to run ```bash ollama run sdx1 #> Error: Post "http://127.0.0.1:11434/api/generate": EOF ``` --- ## Notes and thoughts Skimming other replies in this here Issue it seems there be _some_ other bits of stuff that needs setup before any can hope for text-to-image models to work, but I figured I'd share where I got stuck in case anyone else wants to pickup the metaphorical baton from here.
Author
Owner

@reneleonhardt commented on GitHub (May 24, 2025):

The 0.7.0 release says

Ollama now supports multimodal models via Ollama’s new engine, starting with new vision multimodal models:

Is this issue still unresolved?

<!-- gh-comment-id:2906460762 --> @reneleonhardt commented on GitHub (May 24, 2025): The 0.7.0 release says > Ollama now supports multimodal models via Ollama’s new engine, starting with new vision multimodal models: Is this issue still unresolved?
Author
Owner

@CleyFaye commented on GitHub (Jun 4, 2025):

The 0.7.0 release says

Ollama now supports multimodal models via Ollama’s new engine, starting with new vision multimodal models:

Is this issue still unresolved?

Multimodal in this case likely reference providing multiple inputs, as opposed to producing output.

<!-- gh-comment-id:2940606685 --> @CleyFaye commented on GitHub (Jun 4, 2025): > The 0.7.0 release says > > > Ollama now supports multimodal models via Ollama’s new engine, starting with new vision multimodal models: > > Is this issue still unresolved? Multimodal in this case likely reference providing multiple inputs, as opposed to producing output.
Author
Owner

@VictorVow commented on GitHub (Jan 22, 2026):

This is now available on MacOS.

<!-- gh-comment-id:3785569107 --> @VictorVow commented on GitHub (Jan 22, 2026): [This is now available on MacOS.](https://github.com/ollama/ollama/releases/tag/v0.14.3)
Author
Owner

@aole commented on GitHub (Mar 10, 2026):

Any update on windows support?

<!-- gh-comment-id:4033932816 --> @aole commented on GitHub (Mar 10, 2026): Any update on windows support?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#46885