[GH-ISSUE #12787] Embedding workers crash with SIGTRAP under load (regression in v0.12.4+, on MacOS) #70541

Closed
opened 2026-05-04 21:55:39 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @zeuslawyer on GitHub (Oct 27, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12787

What is the issue?

Description:

Starting with v0.12.4, Ollama's embedding generation (using nomic-embed-text) fails when processing multiple documents sequentially. The server spawns workers on random ports that crash with SIGTRAP: trace trap during llama_decode,
causing EOF errors.

Environment:

  • Ollama versions tested:
    • v0.12.3 - Works correctly
    • v0.12.4 - Broken
    • v0.12.5 - Broken
    • v0.12.6 - Broken
  • OS: macOS (Darwin 25.0.0)
  • Model: nomic-embed-text
  • Client: Node.js using ollama npm package v0.5.16

Steps to Reproduce:

  1. Process 300+ documents sequentially
  2. Generate embeddings for 700-char text chunks (~3-5 chunks per document)
  3. Make embedding requests with 500ms delay between chunks and 1s delay between documents

Expected Behavior:

Embeddings should be generated successfully for all chunks, as they were in v0.12.3.

Actual Behavior:

After processing a few documents, requests fail with:
ResponseError: do embedding request: Post "http://127.0.0.1:<random_port>/embedding": EOF

Random ports observed: 53702, 54350, 54572, 55056, 55176, 55582, 56639

The error message suggests Ollama server is trying to delegate to a worker process on a random port, but the worker crashes before responding.

Server-side crash (from Ollama logs):
SIGTRAP: trace trap
PC=0x197ae6b1c m=9 sigcode=0
signal arrived during cgo execution

goroutine 7 [syscall]:
github.com/ollama/ollama/llama._Cfunc_llama_decode(...)
github.com/ollama/ollama/llama.(*Context).Decode(...)
github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(...)
github.com/ollama/ollama/runner/llamarunner.(*Server).run(...)
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1()

Client Configuration (confirmed correct):
const client = new Ollama({ host: 'http://127.0.0.1:11434' });
const response = await client.embeddings({
model: 'nomic-embed-text',
prompt: text
});

Client sends requests to port 11434 correctly. The random port appears in Ollama's internal error message when delegating to workers.

Observations:

  • lsof -i :11434 shows Ollama listening correctly
  • lsof -i :<random_port> shows nothing (worker died immediately)
  • Single embedding requests work fine
  • Crash occurs sporadically after processing many chunks
  • Rate limiting (500ms-1000ms delays) doesn't prevent the crash
  • Setting OLLAMA_EMBEDDING_NUM_CTX=2048 didn't help

Workaround:

Downgrade to v0.12.3.

Additional Context:

This appears to be a regression in worker process management or parallel processing introduced in v0.12.4. The workers crash under sequential load with SIGTRAP during llama_decode.

Relevant log output


OS

macOS

GPU

No response

CPU

Apple

Ollama version

0.12.4 onwards

Originally created by @zeuslawyer on GitHub (Oct 27, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12787 ### What is the issue? **Description:** Starting with v0.12.4, Ollama's embedding generation (using nomic-embed-text) fails when processing multiple documents sequentially. The server spawns workers on random ports that crash with SIGTRAP: trace trap during llama_decode, causing EOF errors. Environment: - Ollama versions tested: - ✅ v0.12.3 - Works correctly - ❌ v0.12.4 - Broken - ❌ v0.12.5 - Broken - ❌ v0.12.6 - Broken - OS: macOS (Darwin 25.0.0) - Model: nomic-embed-text - Client: Node.js using ollama npm package v0.5.16 Steps to Reproduce: 1. Process 300+ documents sequentially 2. Generate embeddings for 700-char text chunks (~3-5 chunks per document) 3. Make embedding requests with 500ms delay between chunks and 1s delay between documents Expected Behavior: Embeddings should be generated successfully for all chunks, as they were in v0.12.3. Actual Behavior: After processing a few documents, requests fail with: ResponseError: do embedding request: Post "http://127.0.0.1:<random_port>/embedding": EOF Random ports observed: 53702, 54350, 54572, 55056, 55176, 55582, 56639 The error message suggests Ollama server is trying to delegate to a worker process on a random port, but the worker crashes before responding. Server-side crash (from Ollama logs): SIGTRAP: trace trap PC=0x197ae6b1c m=9 sigcode=0 signal arrived during cgo execution goroutine 7 [syscall]: github.com/ollama/ollama/llama._Cfunc_llama_decode(...) github.com/ollama/ollama/llama.(*Context).Decode(...) github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(...) github.com/ollama/ollama/runner/llamarunner.(*Server).run(...) github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1() Client Configuration (confirmed correct): const client = new Ollama({ host: 'http://127.0.0.1:11434' }); const response = await client.embeddings({ model: 'nomic-embed-text', prompt: text }); Client sends requests to port 11434 correctly. The random port appears in Ollama's internal error message when delegating to workers. Observations: - lsof -i :11434 shows Ollama listening correctly - lsof -i :<random_port> shows nothing (worker died immediately) - Single embedding requests work fine - Crash occurs sporadically after processing many chunks - Rate limiting (500ms-1000ms delays) doesn't prevent the crash - Setting OLLAMA_EMBEDDING_NUM_CTX=2048 didn't help Workaround: Downgrade to v0.12.3. Additional Context: This appears to be a regression in worker process management or parallel processing introduced in v0.12.4. The workers crash under sequential load with SIGTRAP during llama_decode. ### Relevant log output ```shell ``` ### OS macOS ### GPU _No response_ ### CPU Apple ### Ollama version 0.12.4 onwards
GiteaMirror added the bug label 2026-05-04 21:55:39 -05:00
Author
Owner

@rogerdcarvalho commented on GitHub (Oct 27, 2025):

Experiencing the same

<!-- gh-comment-id:3449940101 --> @rogerdcarvalho commented on GitHub (Oct 27, 2025): Experiencing the same
Author
Owner

@rick-github commented on GitHub (Oct 27, 2025):

Post the full server log. SIGTRAP is part of the error recovery, the actual error will hopefully be earlier in the log.

<!-- gh-comment-id:3450290111 --> @rick-github commented on GitHub (Oct 27, 2025): Post the full server log. SIGTRAP is part of the error recovery, the actual error will hopefully be earlier in the log.
Author
Owner

@Tobiasmidskards commented on GitHub (Oct 27, 2025):

Related to https://github.com/ollama/ollama/issues/12585

<!-- gh-comment-id:3450402542 --> @Tobiasmidskards commented on GitHub (Oct 27, 2025): Related to https://github.com/ollama/ollama/issues/12585
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#70541