[GH-ISSUE #1707] [Bug] Calling embedding endpoint within short time will receive None #26725

Closed
opened 2026-04-22 03:11:54 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @samx81 on GitHub (Dec 25, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1707

I'm currently using llama_index to make LLM do a document QA,
but I notice the embedding endpoint often output:

{"embedding":null}

when I do condense_question chat mode which calls LLM to rephrase question (and match context) before actually answer.

Like:

llm = Ollama(model="dolphin2.2-mistral:7b-q4_K_M")
ollama_embedding = OllamaEmbedding(
    model_name="dolphin2.2-mistral:7b-q4_K_M",
    ollama_additional_kwargs={"mirostat": 0},
)

service_context = ServiceContext.from_defaults(llm=llm, embed_model=ollama_embedding)
chat_engine = index.as_chat_engine(service_context=service_context,chat_mode='condense_plus_context')

resp= chat_engine.chat('abc')
print(resp)
# Often happens when LLM is called second time
resp= chat_engine.chat('cde')
print(resp)

I guess it is probably due to ollama or llama.cpp needs to handle different behavior in the same time?
Because if I modify llama_index to check for null and retry, the above code will work:

## In llama_index.embeddings.OllamaEmbedding
while True:
    response = requests.post(
        url=f"{self.base_url}/api/embeddings",
        headers={"Content-Type": "application/json"},
        json=ollama_request_body,
    )
    response.encoding = "utf-8"
    
    if response.status_code != 200:
        optional_detail = response.json().get("error")
        raise ValueError(
            f"Ollama call failed with status code {response.status_code}."
            f" Details: {optional_detail}"
        )
    if response.json()["embedding"]:
        break
Originally created by @samx81 on GitHub (Dec 25, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1707 I'm currently using `llama_index` to make LLM do a document QA, but I notice the embedding endpoint often output: ```json {"embedding":null} ``` when I do `condense_question` chat mode which calls LLM to rephrase question (and match context) before actually answer. Like: ```python llm = Ollama(model="dolphin2.2-mistral:7b-q4_K_M") ollama_embedding = OllamaEmbedding( model_name="dolphin2.2-mistral:7b-q4_K_M", ollama_additional_kwargs={"mirostat": 0}, ) service_context = ServiceContext.from_defaults(llm=llm, embed_model=ollama_embedding) chat_engine = index.as_chat_engine(service_context=service_context,chat_mode='condense_plus_context') resp= chat_engine.chat('abc') print(resp) # Often happens when LLM is called second time resp= chat_engine.chat('cde') print(resp) ``` I guess it is probably due to ollama or llama.cpp needs to handle different behavior in the same time? Because if I modify `llama_index` to check for `null` and retry, the above code will work: ```python ## In llama_index.embeddings.OllamaEmbedding while True: response = requests.post( url=f"{self.base_url}/api/embeddings", headers={"Content-Type": "application/json"}, json=ollama_request_body, ) response.encoding = "utf-8" if response.status_code != 200: optional_detail = response.json().get("error") raise ValueError( f"Ollama call failed with status code {response.status_code}." f" Details: {optional_detail}" ) if response.json()["embedding"]: break ```
GiteaMirror added the bug label 2026-04-22 03:11:54 -05:00
Author
Owner

@jmorganca commented on GitHub (May 6, 2024):

Hi all, this should be fixed now!

<!-- gh-comment-id:2097104278 --> @jmorganca commented on GitHub (May 6, 2024): Hi all, this should be fixed now!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#26725