[GH-ISSUE #14657] bge-m3 only returns NaN on bitcoin whitepaper, other docs #71551

Open
opened 2026-05-05 02:06:25 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @higgsaxhh on GitHub (Mar 6, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14657

What is the issue?

The bge-m3 model returns NaN values through the OpenAI-compatible embeddings API (/v1/embeddings) when processing certain text content, particularly technical documents.

This causes a 500 error with the message: failed to encode response: json: unsupported value: NaN

Environment

  • Ollama Version: 0.17.6
  • OS: Windows 10 / MSYS_NT-10.0-26100
  • Model: bge-m3:latest (ID: 790764642607, Size: 1.2 GB)
  • GPU: NVIDIA GeFORCE RTX 2080 Ti

Steps to Reproduce

  1. Pull the bge-m3 model
ollama pull bge-m3
  1. Test with simple text
  curl -X POST http://localhost:11434/v1/embeddings \
    -H "Content-Type: application/json" \
    -d '{
      "model": "bge-m3",
      "input": "This is a test sentence."
    }'

Returns valid 1024-dimensional embedding

  1. Test with technical document content
  curl -X POST http://localhost:11434/v1/embeddings \
    -H "Content-Type: application/json" \
    -d '{
      "model": "bge-m3",
      "input": "Bitcoin: A Peer-to-Peer Electronic Cash System. Abstract. A purely peer-to-peer version of electronic cash
  would allow online payments to be sent directly from one party to another without going through a financial institution.
  Digital signatures provide part of the solution, but the main benefits are lost if a trusted third party is still required to
   prevent double-spending. We propose a solution to the double-spending problem using a peer-to-peer network. The network
  timestamps transactions by hashing them into an ongoing chain of hash-based proof-of-work, forming a record that cannot be
  changed without redoing the proof-of-work."
    }'

Expected Behavior

Should return a valid 1024-dimensional embedding array, similar to the simple text case.

Actual Behavior

  {
    "error": {
      "message": "failed to encode response: json: unsupported value: NaN",
      "type": "api_error",
      "param": null,
      "code": null
    }
  }

Additional Context

  • The same technical text works perfectly with nomic-embed-text:latest, which returns valid 768-dimensional embeddings without any NaN values
  • This issue occurs consistently with content from technical PDFs (e.g., Bitcoin whitepaper, research papers)
  • The issue appears to be specific to bge-m3 - other embedding models handle the same content without issues

Relevant log output

openai.InternalServerError: Error code: 500 - {'error': {'message': 'failed to encode response: json: unsupported value: NaN', 'type': 'api_error', 'param': None, 'code': None}}

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.17.6

Originally created by @higgsaxhh on GitHub (Mar 6, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14657 ### What is the issue? The bge-m3 model returns NaN values through the OpenAI-compatible embeddings API (/v1/embeddings) when processing certain text content, particularly technical documents. This causes a 500 error with the message: `failed to encode response: json: unsupported value: NaN` Environment - Ollama Version: 0.17.6 - OS: Windows 10 / MSYS_NT-10.0-26100 - Model: bge-m3:latest (ID: 790764642607, Size: 1.2 GB) - GPU: NVIDIA GeFORCE RTX 2080 Ti Steps to Reproduce 1. Pull the bge-m3 model ``` ollama pull bge-m3 ``` 2. Test with simple text ``` curl -X POST http://localhost:11434/v1/embeddings \ -H "Content-Type: application/json" \ -d '{ "model": "bge-m3", "input": "This is a test sentence." }' ``` ✅ Returns valid 1024-dimensional embedding 3. Test with technical document content ``` curl -X POST http://localhost:11434/v1/embeddings \ -H "Content-Type: application/json" \ -d '{ "model": "bge-m3", "input": "Bitcoin: A Peer-to-Peer Electronic Cash System. Abstract. A purely peer-to-peer version of electronic cash would allow online payments to be sent directly from one party to another without going through a financial institution. Digital signatures provide part of the solution, but the main benefits are lost if a trusted third party is still required to prevent double-spending. We propose a solution to the double-spending problem using a peer-to-peer network. The network timestamps transactions by hashing them into an ongoing chain of hash-based proof-of-work, forming a record that cannot be changed without redoing the proof-of-work." }' ``` **Expected Behavior** Should return a valid 1024-dimensional embedding array, similar to the simple text case. **Actual Behavior** ``` { "error": { "message": "failed to encode response: json: unsupported value: NaN", "type": "api_error", "param": null, "code": null } } ``` Additional Context - The same technical text works perfectly with `nomic-embed-text:latest`, which returns valid 768-dimensional embeddings without any NaN values - This issue occurs consistently with content from technical PDFs (e.g., Bitcoin whitepaper, research papers) - The issue appears to be specific to bge-m3 - other embedding models handle the same content without issues ### Relevant log output ```shell openai.InternalServerError: Error code: 500 - {'error': {'message': 'failed to encode response: json: unsupported value: NaN', 'type': 'api_error', 'param': None, 'code': None}} ``` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.17.6
GiteaMirror added the bug label 2026-05-05 02:06:25 -05:00
Author
Owner

@telunyang commented on GitHub (Mar 9, 2026):

Have you tried the solution from https://github.com/ollama/ollama/issues/13572?

Setting OLLAMA_FLASH_ATTENTION=false solves the problem.

<!-- gh-comment-id:4023570559 --> @telunyang commented on GitHub (Mar 9, 2026): Have you tried the solution from [https://github.com/ollama/ollama/issues/13572](https://github.com/ollama/ollama/issues/13572)? Setting `OLLAMA_FLASH_ATTENTION=false` solves the problem.
Author
Owner

@mvanhorn commented on GitHub (Mar 9, 2026):

I've submitted a fix in #14739. Added NaN/Inf detection in the embedding response path (both runner-level handlers and the deprecated EmbeddingsHandler) to prevent the JSON serialization crash. The fix returns a clear error message instead of the cryptic json: unsupported value: NaN.

<!-- gh-comment-id:4024443460 --> @mvanhorn commented on GitHub (Mar 9, 2026): I've submitted a fix in #14739. Added NaN/Inf detection in the embedding response path (both runner-level handlers and the deprecated EmbeddingsHandler) to prevent the JSON serialization crash. The fix returns a clear error message instead of the cryptic `json: unsupported value: NaN`.
Author
Owner

@higgsaxhh commented on GitHub (Mar 9, 2026):

I've submitted a fix in #14739. Added NaN/Inf detection in the embedding response path (both runner-level handlers and the deprecated EmbeddingsHandler) to prevent the JSON serialization crash. The fix returns a clear error message instead of the cryptic json: unsupported value: NaN.

Thanks @mvanhorn. Does this solve the embedding problem or just more gracefully handle the error?

<!-- gh-comment-id:4026862589 --> @higgsaxhh commented on GitHub (Mar 9, 2026): > I've submitted a fix in [#14739](https://github.com/ollama/ollama/pull/14739). Added NaN/Inf detection in the embedding response path (both runner-level handlers and the deprecated EmbeddingsHandler) to prevent the JSON serialization crash. The fix returns a clear error message instead of the cryptic `json: unsupported value: NaN`. Thanks @mvanhorn. Does this solve the embedding problem or just more gracefully handle the error?
Author
Owner

@mvanhorn commented on GitHub (Mar 9, 2026):

Good question - to be clear, #14739 handles the error gracefully rather than fixing the root cause of why bge-m3 produces NaN values. The underlying NaN generation is a model/compute issue (likely related to flash attention as @telunyang mentioned - disabling it with OLLAMA_FLASH_ATTENTION=false is the current workaround).

What #14739 does is prevent the server from crashing with a cryptic json: unsupported value: NaN 500 error, and instead returns a clear error message explaining what happened. That way you get actionable feedback instead of a stack trace.

<!-- gh-comment-id:4027575663 --> @mvanhorn commented on GitHub (Mar 9, 2026): Good question - to be clear, #14739 handles the error gracefully rather than fixing the root cause of why bge-m3 produces NaN values. The underlying NaN generation is a model/compute issue (likely related to flash attention as @telunyang mentioned - disabling it with `OLLAMA_FLASH_ATTENTION=false` is the current workaround). What #14739 does is prevent the server from crashing with a cryptic `json: unsupported value: NaN` 500 error, and instead returns a clear error message explaining what happened. That way you get actionable feedback instead of a stack trace.
Author
Owner

@neuwcodebox commented on GitHub (Apr 7, 2026):

This issue still occurs in the latest ollama version (v0.20.1) and was resolved by setting the OLLAMA_FLASH_ATTENTION environment variable to false.

<!-- gh-comment-id:4196028561 --> @neuwcodebox commented on GitHub (Apr 7, 2026): This issue still occurs in the latest ollama version (v0.20.1) and was resolved by setting the OLLAMA_FLASH_ATTENTION environment variable to false.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71551