[GH-ISSUE #10071] ollama run dimavz/whisper-tiny Error: Post "http://127.0.0.1:11434/api/generate": EOF #32360

Closed
opened 2026-04-22 13:33:50 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @githuailoveyou on GitHub (Apr 1, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10071

What is the issue?

helpme.

exec :
ollama run dimavz/whisper-tiny
pulling manifest
pulling d76121b83ea6... 100% ██████████████████████████████████████████████████████████████████████████████▏ 44 MB
pulling faaa9dfb3bac... 100% ██████████████████████████████████████████████████████████████████████████████▏ 272 B
verifying sha256 digest
writing manifest
success
Error: Post "http://127.0.0.1:11434/api/generate": EOF

I don't know where the log is or why this is happening, right

Relevant log output


OS

macOS

GPU

Intel

CPU

Intel

Ollama version

0.6.2

Originally created by @githuailoveyou on GitHub (Apr 1, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10071 ### What is the issue? helpme. exec : ollama run dimavz/whisper-tiny pulling manifest pulling d76121b83ea6... 100% ██████████████████████████████████████████████████████████████████████████████▏ 44 MB pulling faaa9dfb3bac... 100% ██████████████████████████████████████████████████████████████████████████████▏ 272 B verifying sha256 digest writing manifest success Error: Post "http://127.0.0.1:11434/api/generate": EOF I don't know where the log is or why this is happening, right ### Relevant log output ```shell ``` ### OS macOS ### GPU Intel ### CPU Intel ### Ollama version 0.6.2
GiteaMirror added the bug label 2026-04-22 13:33:50 -05:00
Author
Owner

@rick-github commented on GitHub (Apr 1, 2025):

ollama doesn't currently support audio models.

<!-- gh-comment-id:2768653737 --> @rick-github commented on GitHub (Apr 1, 2025): ollama doesn't currently support audio models.
Author
Owner

@githuailoveyou commented on GitHub (Apr 1, 2025):

┭┮﹏┭┮

<!-- gh-comment-id:2768659030 --> @githuailoveyou commented on GitHub (Apr 1, 2025): ┭┮﹏┭┮
Author
Owner

@ScottA38 commented on GitHub (Sep 30, 2025):

@rick-github I am just wading into Ollama now, but I don’t understand how this can be the case. If this model can’t run via Ollama, then why is it listed upon the models page on the website (with many pulls)? How are these people running the model?

I am not questioning that you are correct, I am just asking for a more elaborate response to assist the newbie with knowing things.

Within llama serve, I get the following error corresponding to the error OP listed above:

time=2025-09-30T17:59:39.730+01:00 level=WARN source=memory.go:129 msg="model missing blk.0 layer size"
panic: interface conversion: interface {} is nil, not *ggml.array[string]

goroutine 12 [running]:
github.com/ollama/ollama/fs/ggml.GGML.GraphSize({{0x1054c9cb0, 0x140000f7270}, {0x1054c9c60, 0x14000189008}, 0x29ffb40}, 0x2000, 0x200, 0x2, {0x0, 0x0})
	github.com/ollama/ollama/fs/ggml/ggml.go:481 +0x11dc
github.com/ollama/ollama/llm.EstimateGPULayers({_, _, _}, _, {_, _, _}, {{0x2000, 0x200, 0xffffffffffffffff, ...}, ...}, ...)
	github.com/ollama/ollama/llm/memory.go:142 +0x5d0
github.com/ollama/ollama/llm.PredictServerFit({0x14000133b58?, 0x104675bfc?, 0x20?}, 0x1400048b230, {0x14000133898?, 0x104676158?, 0x14000133a50?}, {0x0, 0x0, 0x0}, ...)
	github.com/ollama/ollama/llm/memory.go:23 +0xb0
github.com/ollama/ollama/server.pickBestFullFitByLibrary(0x14000395110, 0x1400048b230, {0x14000728000?, 0x1?, 0x1?}, 0x140003abcb8)
	github.com/ollama/ollama/server/sched.go:787 +0x510
github.com/ollama/ollama/server.(*Scheduler).processPending(0x14000111380, {0x1054cdd40, 0x140006a05a0})
	github.com/ollama/ollama/server/sched.go:229 +0xdd8
github.com/ollama/ollama/server.(*Scheduler).Run.func1()
	github.com/ollama/ollama/server/sched.go:110 +0x28
created by github.com/ollama/ollama/server.(*Scheduler).Run in goroutine 1
	github.com/ollama/ollama/server/sched.go:109 +0xc0
<!-- gh-comment-id:3353077922 --> @ScottA38 commented on GitHub (Sep 30, 2025): @rick-github I am just wading into Ollama now, but I don’t understand how this can be the case. If this model can’t run via Ollama, then why is it listed upon the models page on the website (with many pulls)? How are these people running the model? I am not questioning that you are correct, I am just asking for a more elaborate response to assist the newbie with knowing things. Within `llama serve`, I get the following error corresponding to the error OP listed above: ```log time=2025-09-30T17:59:39.730+01:00 level=WARN source=memory.go:129 msg="model missing blk.0 layer size" panic: interface conversion: interface {} is nil, not *ggml.array[string] goroutine 12 [running]: github.com/ollama/ollama/fs/ggml.GGML.GraphSize({{0x1054c9cb0, 0x140000f7270}, {0x1054c9c60, 0x14000189008}, 0x29ffb40}, 0x2000, 0x200, 0x2, {0x0, 0x0}) github.com/ollama/ollama/fs/ggml/ggml.go:481 +0x11dc github.com/ollama/ollama/llm.EstimateGPULayers({_, _, _}, _, {_, _, _}, {{0x2000, 0x200, 0xffffffffffffffff, ...}, ...}, ...) github.com/ollama/ollama/llm/memory.go:142 +0x5d0 github.com/ollama/ollama/llm.PredictServerFit({0x14000133b58?, 0x104675bfc?, 0x20?}, 0x1400048b230, {0x14000133898?, 0x104676158?, 0x14000133a50?}, {0x0, 0x0, 0x0}, ...) github.com/ollama/ollama/llm/memory.go:23 +0xb0 github.com/ollama/ollama/server.pickBestFullFitByLibrary(0x14000395110, 0x1400048b230, {0x14000728000?, 0x1?, 0x1?}, 0x140003abcb8) github.com/ollama/ollama/server/sched.go:787 +0x510 github.com/ollama/ollama/server.(*Scheduler).processPending(0x14000111380, {0x1054cdd40, 0x140006a05a0}) github.com/ollama/ollama/server/sched.go:229 +0xdd8 github.com/ollama/ollama/server.(*Scheduler).Run.func1() github.com/ollama/ollama/server/sched.go:110 +0x28 created by github.com/ollama/ollama/server.(*Scheduler).Run in goroutine 1 github.com/ollama/ollama/server/sched.go:109 +0xc0 ```
Author
Owner

@rick-github commented on GitHub (Sep 30, 2025):

Somebody who wanted to experiment with audio pulled the model from HF or some place, converted it to GGUF, and imported it. It didn't work, but they uploaded it to the user library in ollama because they thought other users might be interested. Since there's no explanation on the model page about the non-functioning of the model, others have downloaded it. Somebody thought that it might be a fixable problem so they filed a ticket (#10071).

There are many models in the user library that don't work. Anything that requires endpoints for other modalities (image generation, speech generation, speech transcription/recognition, video) will not work because ollama does not have the supporting infrastructure for processing those modalities.

time=2025-09-30T17:59:39.730+01:00 level=WARN source=memory.go:129 msg="model missing blk.0 layer size"

A certain structure is required in a model in order for the tensors to be recognized. If you look at the tensor list for a working model (eg, qwen2.5:0.5b) you will see that the tensors follow a pattern: the input layer (token_embd.weight), a blk tensor for each layer, and the output layer (output_norm.weight). There are some small variations, but generally all models used for LLM text-text inference follow this pattern. This is how the inference engine determines where to feed tokens in, what layers to pass them through, and where to extract the results from.

dimavz/whisper-tiny does not follow this pattern, because it is not meant to be treated as an LLM for text-text inference. Hence any inference engine (ollama, llama.cpp, LMStudio, vLLM, Mistral.rs, etc) that tries to load this model for text-text will fail. Some of these engines might have support for this model, but it will require flags to tell the engine how to use the model. I believe llama.cpp has limited support for audio with the mtmd library. The whisper class of models have their own bit of software for processing the model weights. This is what ollama currently lacks, so ollama currently can not support this model or audio models in general.

<!-- gh-comment-id:3353191208 --> @rick-github commented on GitHub (Sep 30, 2025): Somebody who wanted to experiment with audio pulled the model from HF or some place, converted it to GGUF, and imported it. It didn't work, but they uploaded it to the user library in ollama because they thought other users might be interested. Since there's no explanation on the model page about the non-functioning of the model, others have downloaded it. Somebody thought that it might be a fixable problem so they filed a ticket (#10071). There are many models in the user library that don't work. Anything that requires endpoints for other modalities (image generation, speech generation, speech transcription/recognition, video) will not work because ollama does not have the supporting infrastructure for processing those modalities. ``` time=2025-09-30T17:59:39.730+01:00 level=WARN source=memory.go:129 msg="model missing blk.0 layer size" ``` A certain structure is required in a model in order for the tensors to be recognized. If you look at the tensor list for a working model (eg, [qwen2.5:0.5b](https://ollama.com/library/qwen2.5:0.5b/blobs/c5396e06af29)) you will see that the tensors follow a pattern: the input layer (`token_embd.weight`), a `blk` tensor for each layer, and the output layer (`output_norm.weight`). There are some small variations, but generally all models used for LLM text-text inference follow this pattern. This is how the inference engine determines where to feed tokens in, what layers to pass them through, and where to extract the results from. [dimavz/whisper-tiny](https://ollama.com/dimavz/whisper-tiny:latest/blobs/d76121b83ea6) does not follow this pattern, because it is not meant to be treated as an LLM for text-text inference. Hence any inference engine (ollama, llama.cpp, LMStudio, vLLM, Mistral.rs, etc) that tries to load this model for text-text will fail. Some of these engines might have support for this model, but it will require flags to tell the engine how to use the model. I believe llama.cpp has limited support for audio with the mtmd library. The whisper class of models have their [own bit of software](https://github.com/openai/whisper/tree/main/whisper) for processing the model weights. This is what ollama currently lacks, so ollama currently can not support this model or audio models in general.
Author
Owner

@ScottA38 commented on GitHub (Oct 4, 2025):

Sorry for the delay in response, thankyou for your answer, really great for me as a someone who is just beginning exploring A.I

<!-- gh-comment-id:3368355824 --> @ScottA38 commented on GitHub (Oct 4, 2025): Sorry for the delay in response, thankyou for your answer, really great for me as a someone who is just beginning exploring A.I
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#32360