[GH-ISSUE #7741] num_ctx does not increase context length above 2048 #4943

Closed
opened 2026-04-12 16:00:14 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @JamesGuthrie on GitHub (Nov 19, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7741

What is the issue?

I'm trying to use embedding models which support a context length greater than 2048, e.g. nomic-embed-text, and snowflake-arctic-embed:137m, which both support up to 8192 tokens.

It appears as though I cannot get the context length to go above 2048 when I provide a context size through the num_ctx parameter.

To illustrate, I used the following test strings:

small = "the quick brown fox jumped over the lazy dog" # 9 tokens
tok1008 = small * 112 # 1008 tokens
tok2016 = tok1008 * 2 # 2016 tokens
tok4032 = tok2016 * 2 # 4032 tokens

I pulled the snowflake-arctic-embed:137m model:

ollama pull snowflake-arctic-embed:137m

Now I perform the following tests to verify that sending num_ctx works in general:

import ollama

# ✅ this succeeds: give 1008 tokens to a model with context 1024
ollama.embed(model="snowflake-arctic-embed:137m", input=tok1008, truncate=False, options = {"num_ctx": 1024})

# ✅ this fails: give 2016 tokens to a model with context 1024
ollama.embed(model="snowflake-arctic-embed:137m", input=tok2016, truncate=False, options = {"num_ctx": 1024})

# ✅ this succeeds: give 2016 tokens to a model with context 2048
ollama.embed(model="snowflake-arctic-embed:137m", input=tok2016, truncate=False, options = {"num_ctx": 2048})

Now I perform the following tests to show that num_ctx cannot be used to set the context greater than 2048:

# ❌ this fails: give 4032 tokens to a model with context 4096
ollama.embed(model="snowflake-arctic-embed:137m", input=tok4032, truncate=False, options = {"num_ctx": 4096})

Now I validate that the number of tokens is what I think it is:

# ✅ this succeeds: give 1008 tokens to a model with context 1008
ollama.embed(model="snowflake-arctic-embed:137m", input=tok1008, truncate=False, options = {"num_ctx": 1008})

# ✅ this fails: give 1008 tokens to a model with context 1007
ollama.embed(model="snowflake-arctic-embed:137m", input=tok1008, truncate=False, options = {"num_ctx": 1007})

Note: I showed examples using the python library, but I verified the same behaviour via the API (using curl).

OS

Linux

GPU

No response

CPU

Intel

Ollama version

0.4.2

Originally created by @JamesGuthrie on GitHub (Nov 19, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7741 ### What is the issue? I'm trying to use embedding models which support a context length greater than 2048, e.g. `nomic-embed-text`, and `snowflake-arctic-embed:137m`, which both support up to 8192 tokens. It appears as though I cannot get the context length to go above 2048 when I provide a context size through the `num_ctx` parameter. To illustrate, I used the following test strings: ```python small = "the quick brown fox jumped over the lazy dog" # 9 tokens tok1008 = small * 112 # 1008 tokens tok2016 = tok1008 * 2 # 2016 tokens tok4032 = tok2016 * 2 # 4032 tokens ``` I pulled the `snowflake-arctic-embed:137m` model: ```bash ollama pull snowflake-arctic-embed:137m ``` Now I perform the following tests to verify that sending `num_ctx` works in general: ```python import ollama # ✅ this succeeds: give 1008 tokens to a model with context 1024 ollama.embed(model="snowflake-arctic-embed:137m", input=tok1008, truncate=False, options = {"num_ctx": 1024}) # ✅ this fails: give 2016 tokens to a model with context 1024 ollama.embed(model="snowflake-arctic-embed:137m", input=tok2016, truncate=False, options = {"num_ctx": 1024}) # ✅ this succeeds: give 2016 tokens to a model with context 2048 ollama.embed(model="snowflake-arctic-embed:137m", input=tok2016, truncate=False, options = {"num_ctx": 2048}) ``` Now I perform the following tests to show that num_ctx cannot be used to set the context greater than 2048: ```python # ❌ this fails: give 4032 tokens to a model with context 4096 ollama.embed(model="snowflake-arctic-embed:137m", input=tok4032, truncate=False, options = {"num_ctx": 4096}) ``` Now I validate that the number of tokens is what I think it is: ```python # ✅ this succeeds: give 1008 tokens to a model with context 1008 ollama.embed(model="snowflake-arctic-embed:137m", input=tok1008, truncate=False, options = {"num_ctx": 1008}) # ✅ this fails: give 1008 tokens to a model with context 1007 ollama.embed(model="snowflake-arctic-embed:137m", input=tok1008, truncate=False, options = {"num_ctx": 1007}) ``` Note: I showed examples using the python library, but I verified the same behaviour via the API (using curl). ### OS Linux ### GPU _No response_ ### CPU Intel ### Ollama version 0.4.2
GiteaMirror added the bug label 2026-04-12 16:00:14 -05:00
Author
Owner

@rick-github commented on GitHub (Nov 19, 2024):

Context length for snowflake-arctic-embed:137m is 2048. snowflake-acrtic-embed-m-long from HF can be scaled to 8192 with RPE.

Context length for nomic-embed-text is 2048. The model on HF supports 8192, so it may be an import issue.

<!-- gh-comment-id:2485808342 --> @rick-github commented on GitHub (Nov 19, 2024): Context length for snowflake-arctic-embed:137m is [2048](https://ollama.com/library/snowflake-arctic-embed:137m/blobs/4c5716ded514). [snowflake-acrtic-embed-m-long](https://huggingface.co/Snowflake/snowflake-arctic-embed-l#snowflake-arctic-embed-m-long) from HF can be scaled to 8192 with RPE. Context length for nomic-embed-text is [2048](https://ollama.com/library/nomic-embed-text/blobs/970aa74c0a90). The model on [HF](https://huggingface.co/nomic-ai/nomic-embed-text-v1) supports 8192, so it may be an import issue.
Author
Owner

@JamesGuthrie commented on GitHub (Nov 20, 2024):

Thanks!

<!-- gh-comment-id:2488188345 --> @JamesGuthrie commented on GitHub (Nov 20, 2024): Thanks!
Author
Owner

@Udayk02 commented on GitHub (Jan 21, 2025):

Hi @JamesGuthrie ! have you observed that the default num_ctx for the model nomic-embed-text is set to 8192 itself? Ofcourse, the maximum context length is at 2048 but, I don't know the reason why they set it to 8192 in the first place.

<!-- gh-comment-id:2604666057 --> @Udayk02 commented on GitHub (Jan 21, 2025): Hi @JamesGuthrie ! have you observed that the default `num_ctx` for the model `nomic-embed-text` is set to 8192 itself? Ofcourse, the maximum context length is at 2048 but, I don't know the reason why they set it to 8192 in the first place.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#4943