[GH-ISSUE #10149] OLLAMA_CONTEXT_LENGTH=4096 but OllamaEmbeddings still shows 8192 #32418

Closed
opened 2026-04-22 13:39:52 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @khteh on GitHub (Apr 6, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10149

What is the issue?

import pytest
from langchain_ollama import OllamaEmbeddings
from src.config import config
pytest_plugins = ('pytest_asyncio',)

@pytest.mark.asyncio(loop_scope="function")
async def test_ollam_embeddings_vector_dimension():
    embeddings = OllamaEmbeddings(model="llama3.3", base_url=config.OLLAMA_URI, num_ctx=4096, num_gpu=1, temperature=0, top_k=10)
    result = await embeddings.aembed_documents(["Hello how are you doing"])
    dimension = (len(result[0])) # this should output 4096
    print(f"dimension: {dimension}")
    #assert 4096 == dimension

Relevant log output

I keep getting 8192 in the print.

root@ollama-0:/# ollama --version
ollama version is 0.6.2
root@ollama-0:/# export|grep OLLAMA_
declare -x OLLAMA_CONTEXT_LENGTH="4096"
declare -x OLLAMA_DEBUG="true"
declare -x OLLAMA_FLASH_ATTENTION="true"
declare -x OLLAMA_HOST="http://0.0.0.0:11434"
declare -x OLLAMA_MODELS="/models"
declare -x OLLAMA_SCHED_SPREAD="true"

OS

Linux

GPU

Nvidia

CPU

No response

Ollama version

0.6.2

Originally created by @khteh on GitHub (Apr 6, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10149 ### What is the issue? ``` import pytest from langchain_ollama import OllamaEmbeddings from src.config import config pytest_plugins = ('pytest_asyncio',) @pytest.mark.asyncio(loop_scope="function") async def test_ollam_embeddings_vector_dimension(): embeddings = OllamaEmbeddings(model="llama3.3", base_url=config.OLLAMA_URI, num_ctx=4096, num_gpu=1, temperature=0, top_k=10) result = await embeddings.aembed_documents(["Hello how are you doing"]) dimension = (len(result[0])) # this should output 4096 print(f"dimension: {dimension}") #assert 4096 == dimension ``` ### Relevant log output I keep getting `8192` in the print. ```shell root@ollama-0:/# ollama --version ollama version is 0.6.2 root@ollama-0:/# export|grep OLLAMA_ declare -x OLLAMA_CONTEXT_LENGTH="4096" declare -x OLLAMA_DEBUG="true" declare -x OLLAMA_FLASH_ATTENTION="true" declare -x OLLAMA_HOST="http://0.0.0.0:11434" declare -x OLLAMA_MODELS="/models" declare -x OLLAMA_SCHED_SPREAD="true" ``` ### OS Linux ### GPU Nvidia ### CPU _No response_ ### Ollama version 0.6.2
GiteaMirror added the bug label 2026-04-22 13:39:53 -05:00
Author
Owner

@rick-github commented on GitHub (Apr 7, 2025):

OLLAMA_CONTEXT_LENGTH is the size of the input context buffer. The length of the embedding output is determined by the embedding length value.

$ ollama show llama3.3
  Model
    architecture        llama     
    parameters          70.6B     
    context length      131072    
    embedding length    8192      
    quantization        Q4_K_M    
<!-- gh-comment-id:2782950233 --> @rick-github commented on GitHub (Apr 7, 2025): `OLLAMA_CONTEXT_LENGTH` is the size of the input context buffer. The length of the embedding output is determined by the `embedding length` value. ```console $ ollama show llama3.3 Model architecture llama parameters 70.6B context length 131072 embedding length 8192 quantization Q4_K_M ```
Author
Owner

@khteh commented on GitHub (Apr 7, 2025):

Why does it default to 8192 magic number. I am not sure about other graph DB but it breaks Neo4J. How can I configure this value?

<!-- gh-comment-id:2783036897 --> @khteh commented on GitHub (Apr 7, 2025): Why does it default to `8192` magic number. I am not sure about other graph DB but it breaks Neo4J. How can I configure this value?
Author
Owner

@rick-github commented on GitHub (Apr 7, 2025):

It's not a magic number and it's not configurable. It's an attribute of the model, same as the number of parameters or the max context length. If you want to get embeds with a shorter length, look for other embedding models.

<!-- gh-comment-id:2783319599 --> @rick-github commented on GitHub (Apr 7, 2025): It's not a magic number and it's not configurable. It's an attribute of the model, same as the number of parameters or the max context length. If you want to get embeds with a shorter length, look for other [embedding models](https://ollama.com/search?c=embedding).
Author
Owner

@khteh commented on GitHub (Apr 8, 2025):

I have switched to https://ollama.com/library/nomic-embed-text but to no avail!

chromadb.errors.InvalidArgumentError: Collection expecting embedding with dimension of 8192, got 768

/reopen

<!-- gh-comment-id:2785345248 --> @khteh commented on GitHub (Apr 8, 2025): I have switched to https://ollama.com/library/nomic-embed-text but to no avail! ``` chromadb.errors.InvalidArgumentError: Collection expecting embedding with dimension of 8192, got 768 ``` /reopen
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#32418