[GH-ISSUE #7008] /api/embed uses 512 token context window even though model was configured with 8192 #30201

Closed
opened 2026-04-22 09:43:37 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @khromov on GitHub (Sep 27, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7008

What is the issue?

I'm using Continue.dev and have configured the following to generate embeddings:

"embeddingsProvider": {
    "provider": "ollama",
    "model": "mxbai-embed-large:latest"
  },

When inspecting the model, we see context is 8192:

ollama show --modelfile nomic-embed-text:latest | grep num_ctx
PARAMETER num_ctx 8192

However, it only seems to use 512 tokens in Ollama, while indexing we get:

[GIN] 2024/09/27 - 21:40:26 | 200 |  145.149375ms |       127.0.0.1 | POST     "/api/embed"
INFO [update_slots] input truncated | n_ctx=512 n_erase=258 n_keep=0 n_left=512 n_shift=256 tid="0x1ebf88f40" timestamp=1727466026

OS

macOS

GPU

Apple

CPU

No response

Ollama version

0.3.12

Originally created by @khromov on GitHub (Sep 27, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7008 ### What is the issue? I'm using Continue.dev and have configured the following to generate embeddings: ```json "embeddingsProvider": { "provider": "ollama", "model": "mxbai-embed-large:latest" }, ``` When inspecting the model, we see context is 8192: ``` ollama show --modelfile nomic-embed-text:latest | grep num_ctx PARAMETER num_ctx 8192 ``` However, it only seems to use 512 tokens in Ollama, while indexing we get: ``` [GIN] 2024/09/27 - 21:40:26 | 200 | 145.149375ms | 127.0.0.1 | POST "/api/embed" INFO [update_slots] input truncated | n_ctx=512 n_erase=258 n_keep=0 n_left=512 n_shift=256 tid="0x1ebf88f40" timestamp=1727466026 ``` ### OS macOS ### GPU Apple ### CPU _No response_ ### Ollama version 0.3.12
GiteaMirror added the bug label 2026-04-22 09:43:37 -05:00
Author
Owner

@rick-github commented on GitHub (Sep 28, 2024):

Is it possible that Continue.dev is sending "options":{"num_ctx":512} in the request?

$ curl -s localhost:11434/api/embed -d '{"model":"nomic-embed-text:latest","input":"'"$(echo {1..1000})"'"}' 
$ docker compose logs ollama | grep update_slots.*released
ollama-1  | DEBUG [update_slots] slot released | n_cache_tokens=1557 n_ctx=8192 n_past=1557 n_system_tokens=0 slot_id=0 task_id=58 tid="140157435772928" timestamp=1727489100 truncated=false
$ curl -s localhost:11434/api/embed -d '{"model":"nomic-embed-text:latest","options":{"num_ctx":512},"input":"'"$(echo {1..1000})"'"}' 
ollama-1  | DEBUG [update_slots] slot released | n_cache_tokens=256 n_ctx=512 n_past=256 n_system_tokens=0 slot_id=0 task_id=11 tid="139978549817344" timestamp=1727489269 truncated=true
<!-- gh-comment-id:2380354864 --> @rick-github commented on GitHub (Sep 28, 2024): Is it possible that Continue.dev is sending `"options":{"num_ctx":512}` in the request? ```console $ curl -s localhost:11434/api/embed -d '{"model":"nomic-embed-text:latest","input":"'"$(echo {1..1000})"'"}' $ docker compose logs ollama | grep update_slots.*released ollama-1 | DEBUG [update_slots] slot released | n_cache_tokens=1557 n_ctx=8192 n_past=1557 n_system_tokens=0 slot_id=0 task_id=58 tid="140157435772928" timestamp=1727489100 truncated=false $ curl -s localhost:11434/api/embed -d '{"model":"nomic-embed-text:latest","options":{"num_ctx":512},"input":"'"$(echo {1..1000})"'"}' ollama-1 | DEBUG [update_slots] slot released | n_cache_tokens=256 n_ctx=512 n_past=256 n_system_tokens=0 slot_id=0 task_id=11 tid="139978549817344" timestamp=1727489269 truncated=true ```
Author
Owner

@pdevine commented on GitHub (Oct 1, 2024):

Hey @khromov it looks like you're using mxbai-embed-large with continue.dev, but nomic-embed-text with ollama show.

mxbai-embed-large has a 512 token context, while nomic-embed-text has an 8192 token context.

I'll go ahead and close the issue.

<!-- gh-comment-id:2387277780 --> @pdevine commented on GitHub (Oct 1, 2024): Hey @khromov it looks like you're using `mxbai-embed-large` with continue.dev, but `nomic-embed-text` with `ollama show`. `mxbai-embed-large` has a 512 token context, while `nomic-embed-text` has an 8192 token context. I'll go ahead and close the issue.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#30201