[GH-ISSUE #12449] "qwen3-embedding:latest" does not support generate #8272

Closed
opened 2026-04-12 20:48:50 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @saad039 on GitHub (Sep 29, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12449

What is the issue?

I'm running ollama inside docker and I'm unable to run the latest Qwen3 embedding model on my machine.

Steps to reproduce

docker run --gpus all -it -v ollama:/root/.ollama ollama/ollama

## docker exec into the running container 

ollama run qwen3-embedding:latest

I've even tried loading the model after running ollama serve. Same output there.

Specs

GPU: NVIDIA GeForce RTX 4060 (mobile)
CPU: Intel i7-14650HX
OS: Arch Linux
Kernel: 6.12.48-1-lts

Relevant log output

root@3d0500d308fe:/# ollama run qwen3-embedding:latest
Error: 400 Bad Request: "qwen3-embedding:latest" does not support generate

OS

Docker

GPU

Nvidia

CPU

Intel

Ollama version

0.12.3

Originally created by @saad039 on GitHub (Sep 29, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12449 ### What is the issue? I'm running ollama inside docker and I'm unable to run the latest Qwen3 embedding model on my machine. ## Steps to reproduce ```bash docker run --gpus all -it -v ollama:/root/.ollama ollama/ollama ## docker exec into the running container ollama run qwen3-embedding:latest ``` I've even tried loading the model after running `ollama serve`. Same output there. ## Specs GPU: NVIDIA GeForce RTX 4060 (mobile) CPU: Intel i7-14650HX OS: Arch Linux Kernel: 6.12.48-1-lts ### Relevant log output ```shell root@3d0500d308fe:/# ollama run qwen3-embedding:latest Error: 400 Bad Request: "qwen3-embedding:latest" does not support generate ``` ### OS Docker ### GPU Nvidia ### CPU Intel ### Ollama version 0.12.3
GiteaMirror added the bug label 2026-04-12 20:48:50 -05:00
Author
Owner

@mxyng commented on GitHub (Sep 29, 2025):

yes qwen3-embedding is not a text generation model so it doesn't support generate. try qwen3:0.6b or qwen3:4b if you're vram constrained

<!-- gh-comment-id:3349375457 --> @mxyng commented on GitHub (Sep 29, 2025): yes qwen3-embedding is not a text generation model so it doesn't support generate. try qwen3:0.6b or qwen3:4b if you're vram constrained
Author
Owner

@saad039 commented on GitHub (Sep 29, 2025):

It's an embedding model and that's exactly the purpose I want to use it for. It's listed as an available embedding model on ollama's docs.

If I'm missing something, let me know.

<!-- gh-comment-id:3349393132 --> @saad039 commented on GitHub (Sep 29, 2025): It's an embedding model and that's exactly the purpose I want to use it for. It's listed as an available embedding model on ollama's [docs](https://ollama.com/library/qwen3-embedding/tags). If I'm missing something, let me know.
Author
Owner

@mxyng commented on GitHub (Sep 29, 2025):

you need to use the embedding api. there's no analog in the cli so you can't ollama run it.

try this

curl -s localhost:11434/api/embed -d '{"model":"qwen3-embedding","input":"embed me!"}'
<!-- gh-comment-id:3349416880 --> @mxyng commented on GitHub (Sep 29, 2025): you need to use the [embedding api](https://docs.ollama.com/api#generate-embeddings). there's no analog in the cli so you can't `ollama run` it. try this ```shell curl -s localhost:11434/api/embed -d '{"model":"qwen3-embedding","input":"embed me!"}' ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#8272