[GH-ISSUE #8047] Can add Stable Diffusion 3.5 model? #67202

Closed
opened 2026-05-04 09:36:48 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @dmmhk on GitHub (Dec 11, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/8047

Cannot find Diffusion 3.5 model on the ollama platform?Does Olama not support the Diffusion 3.5 model? Can we add an Diffusion 3.5 model?
Implementing text and image generation using API+JSON+Base64+Prompt。

Originally created by @dmmhk on GitHub (Dec 11, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/8047 Cannot find Diffusion 3.5 model on the ollama platform?Does Olama not support the Diffusion 3.5 model? Can we add an Diffusion 3.5 model? Implementing text and image generation using API+JSON+Base64+Prompt。
GiteaMirror added the model label 2026-05-04 09:36:48 -05:00
Author
Owner

@Salpingopharyngeus commented on GitHub (Dec 12, 2024):

Short answer is no, you shouldn't be able to as Ollama is dedicated to Large LANGUAGE models.

Longer answer is that in general, Stable diffusion and Large Language Models require different architectures, and while some llava models (and more recently Llama models) do have the ability to reason about images, being able to identify an image / the contents of an image is inherently different from being able to generate one, and ollama isn't designed or equiped for that.

As a real life example, if you give me a picture of a flower, and ask me to write a poem or tell you about the flower, with sufficent training or knowledge I could probably do that-but that doesn't necessary mean I could then draw the flower, or visualize let alone draw that flower given a text description. To successfully execute that drawing task, I'd probably need training of some sort.

For stable diffusion you should be looking into something like Automatic1111, forge, comfy, or swarm (to name a few of the more popular projects).

<!-- gh-comment-id:2537484338 --> @Salpingopharyngeus commented on GitHub (Dec 12, 2024): Short answer is no, you shouldn't be able to as Ollama is dedicated to Large **LANGUAGE** models. Longer answer is that in general, Stable diffusion and Large Language Models require different architectures, and while some llava models (and more recently Llama models) do have the ability to reason about images, being able to identify an image / the contents of an image is inherently different from being able to generate one, and ollama isn't designed or equiped for that. As a real life example, if you give me a picture of a flower, and ask me to write a poem or tell you about the flower, with sufficent training or knowledge I could probably do that-but that doesn't necessary mean I could then draw the flower, or visualize let alone draw that flower given a text description. To successfully execute that drawing task, I'd probably need training of some sort. For stable diffusion you should be looking into something like Automatic1111, forge, comfy, or swarm (to name a few of the more popular projects).
Author
Owner

@jmorganca commented on GitHub (Dec 23, 2024):

Merging with https://github.com/ollama/ollama/issues/786

<!-- gh-comment-id:2558702147 --> @jmorganca commented on GitHub (Dec 23, 2024): Merging with https://github.com/ollama/ollama/issues/786
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#67202