[GH-ISSUE #501] large embedded file fails on model create #230

Closed
opened 2026-04-12 09:45:02 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @BruceMacD on GitHub (Sep 9, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/501

Adding a large file to an embedding may cause an unexpected error.

ollama crate exampleModel -f Modelfile
...
Error: unexpected end to create model
FROM codellama

SYSTEM """
You are a DND game master that reviews dice rolls and responds with JSON  in the following format: "{\"action\":\"do stuff\"}"
"""

EMBED embeds/*.txt
 2% || (4367/151236, 31 it/s) [4m59s:1h19m37s]creating model system layer

There shouldn’t be a limit. The buffer size may be reaching its capacity.

Originally created by @BruceMacD on GitHub (Sep 9, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/501 Adding a large file to an embedding may cause an unexpected error. ``` ollama crate exampleModel -f Modelfile ... Error: unexpected end to create model ``` ``` FROM codellama SYSTEM """ You are a DND game master that reviews dice rolls and responds with JSON in the following format: "{\"action\":\"do stuff\"}" """ EMBED embeds/*.txt ``` ``` 2% || (4367/151236, 31 it/s) [4m59s:1h19m37s]creating model system layer ``` There shouldn’t be a limit. The buffer size may be reaching its capacity.
GiteaMirror added the bug label 2026-04-12 09:45:02 -05:00
Author
Owner

@fmackenzie commented on GitHub (Sep 28, 2023):

I'm also experiencing this. I've tried a number of things, including breaking down the text files into smaller chunks with overlapping chunks, updating the num_ctx value (to 4096, 8192, and 16384). I've tried this on a VM with 32GB of RAM, and bare metal with 64GB of RAM, both on Ubuntu linux. All have the same outcome, so I am wondering if it is tied to the total amount of content, not the size of a specific file. The experience is consistent when specifying each file individually instead of specifying the entire folder.

It's interesting. I see that when Ollama is started up, there are 5 handlers for the EmbeddingHandler:

[GIN-debug] POST   /api/embeddings           --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers)

When it is doing the model creation, I can see that it uses the 5 handlers on a specific port, then as it continues, it switches to a different port (almost as though it isn't closing the port and has to get a new port for the embedding), and eventually it it gets to a port where the ollama serve process just crashes (see below):

{"timestamp":1695985330,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":50080,"status":200,"method":"POST","path":"/embedding","params":{}}
{"timestamp":1695985335,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":50080,"status":200,"method":"POST","path":"/embedding","params":{}}
{"timestamp":1695985341,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":50080,"status":200,"method":"POST","path":"/embedding","params":{}}
{"timestamp":1695985348,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":50080,"status":200,"method":"POST","path":"/embedding","params":{}}
{"timestamp":1695985354,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":50080,"status":200,"method":"POST","path":"/embedding","params":{}}
{"timestamp":1695985361,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":35734,"status":200,"method":"POST","path":"/embedding","params":{}}
{"timestamp":1695985367,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":35734,"status":200,"method":"POST","path":"/embedding","params":{}}
{"timestamp":1695985374,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":35734,"status":200,"method":"POST","path":"/embedding","params":{}}
{"timestamp":1695985381,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":35734,"status":200,"method":"POST","path":"/embedding","params":{}}
{"timestamp":1695985381,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":35734,"status":200,"method":"POST","path":"/embedding","params":{}}
{"timestamp":1695985388,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":37356,"status":200,"method":"POST","path":"/embedding","params":{}}
2023/09/29 07:03:13 images.go:662: failed to generate embedding for '/data/git/ollama/data/test1/test_doc_p00004.txt' line 4: POST embedding: Post "http://127.0.0.1:58936/embedding": EOF
2023/09/29 07:03:13 llama.go:320: llama runner exited with error: signal: killed
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0xb1f748]

goroutine 53 [running]:
github.com/jmorganca/ollama/server.embeddingLayers({0xc00033c618, 0x14}, {{0xc000028280, 0x6}, 0xc00009f080, {0xc000076e40, 0x1, 0x1}, 0xc000076b70})
	/data/git/ollama/server/images.go:660 +0xfa8
github.com/jmorganca/ollama/server.CreateModel({0x1070850, 0xc0000921e0}, {0xc00033c618, 0x14}, {0xc0000281e0, 0xc}, {0xc00002e200, 0x38}, 0xc000076b70)
	/data/git/ollama/server/images.go:527 +0x2135
github.com/jmorganca/ollama/server.CreateModelHandler.func1()
	/data/git/ollama/server/routes.go:358 +0x151
created by github.com/jmorganca/ollama/server.CreateModelHandler
	/data/git/ollama/server/routes.go:349 +0x23d

<!-- gh-comment-id:1739997615 --> @fmackenzie commented on GitHub (Sep 28, 2023): I'm also experiencing this. I've tried a number of things, including breaking down the text files into smaller chunks with overlapping chunks, updating the num_ctx value (to 4096, 8192, and 16384). I've tried this on a VM with 32GB of RAM, and bare metal with 64GB of RAM, both on Ubuntu linux. All have the same outcome, so I am wondering if it is tied to the total amount of content, not the size of a specific file. The experience is consistent when specifying each file individually instead of specifying the entire folder. It's interesting. I see that when Ollama is started up, there are 5 handlers for the EmbeddingHandler: ``` [GIN-debug] POST /api/embeddings --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers) ``` When it is doing the model creation, I can see that it uses the 5 handlers on a specific port, then as it continues, it switches to a different port (almost as though it isn't closing the port and has to get a new port for the embedding), and eventually it it gets to a port where the ollama serve process just crashes (see below): ``` {"timestamp":1695985330,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":50080,"status":200,"method":"POST","path":"/embedding","params":{}} {"timestamp":1695985335,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":50080,"status":200,"method":"POST","path":"/embedding","params":{}} {"timestamp":1695985341,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":50080,"status":200,"method":"POST","path":"/embedding","params":{}} {"timestamp":1695985348,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":50080,"status":200,"method":"POST","path":"/embedding","params":{}} {"timestamp":1695985354,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":50080,"status":200,"method":"POST","path":"/embedding","params":{}} {"timestamp":1695985361,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":35734,"status":200,"method":"POST","path":"/embedding","params":{}} {"timestamp":1695985367,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":35734,"status":200,"method":"POST","path":"/embedding","params":{}} {"timestamp":1695985374,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":35734,"status":200,"method":"POST","path":"/embedding","params":{}} {"timestamp":1695985381,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":35734,"status":200,"method":"POST","path":"/embedding","params":{}} {"timestamp":1695985381,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":35734,"status":200,"method":"POST","path":"/embedding","params":{}} {"timestamp":1695985388,"level":"INFO","function":"log_server_request","line":1157,"message":"request","remote_addr":"127.0.0.1","remote_port":37356,"status":200,"method":"POST","path":"/embedding","params":{}} 2023/09/29 07:03:13 images.go:662: failed to generate embedding for '/data/git/ollama/data/test1/test_doc_p00004.txt' line 4: POST embedding: Post "http://127.0.0.1:58936/embedding": EOF 2023/09/29 07:03:13 llama.go:320: llama runner exited with error: signal: killed panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0xb1f748] goroutine 53 [running]: github.com/jmorganca/ollama/server.embeddingLayers({0xc00033c618, 0x14}, {{0xc000028280, 0x6}, 0xc00009f080, {0xc000076e40, 0x1, 0x1}, 0xc000076b70}) /data/git/ollama/server/images.go:660 +0xfa8 github.com/jmorganca/ollama/server.CreateModel({0x1070850, 0xc0000921e0}, {0xc00033c618, 0x14}, {0xc0000281e0, 0xc}, {0xc00002e200, 0x38}, 0xc000076b70) /data/git/ollama/server/images.go:527 +0x2135 github.com/jmorganca/ollama/server.CreateModelHandler.func1() /data/git/ollama/server/routes.go:358 +0x151 created by github.com/jmorganca/ollama/server.CreateModelHandler /data/git/ollama/server/routes.go:349 +0x23d ```
Author
Owner

@chronicblondiee commented on GitHub (Oct 8, 2023):

I am also running into the same issue Ryzen 5900X 64 RAM I have even tried spiting the large file into various different smaller files still seems to fail at some point?

I am also looking to make something DnD related and tried to import the rules as a txt file dataset but haven't been able to make it work..

below two different runs with two versions of the DnD rule set one with new lines the other with all new lines stripped I thought it would help didn't seem to make a difference

ollama create dnd-gen -f ./Modelfile
parsing modelfile    
looking for model    
creating model template layer    
creating model system layer    
creating parameter layer    
creating embeddings for file /tmp/llm-model-stuff/mistral/data/DnD_Basic_rules.txt   2% |███                                                                                                                                                                                            | (892/36409, 3 it/s) [5m0s:3h46m51s]creating parameter layer

ollama create dnd-gen -f ./Modelfile
parsing modelfile    
looking for model    
creating model template layer    
creating model system layer    
creating parameter layer    
creating embeddings for file /tmp/llm-model-stuff/mistral/DnD_BasicRules_2018.txt   3% |█████                                                                                                                                                                                           | (591/18119, 2 it/s) [5m0s:2h53m35s]creating parameter layer
Error: unexpected end to create model
<!-- gh-comment-id:1752172264 --> @chronicblondiee commented on GitHub (Oct 8, 2023): I am also running into the same issue Ryzen 5900X 64 RAM I have even tried spiting the large file into various different smaller files still seems to fail at some point? I am also looking to make something DnD related and tried to import the rules as a txt file dataset but haven't been able to make it work.. below two different runs with two versions of the DnD rule set one with new lines the other with all new lines stripped I thought it would help didn't seem to make a difference ``` ollama create dnd-gen -f ./Modelfile parsing modelfile looking for model creating model template layer creating model system layer creating parameter layer creating embeddings for file /tmp/llm-model-stuff/mistral/data/DnD_Basic_rules.txt 2% |███ | (892/36409, 3 it/s) [5m0s:3h46m51s]creating parameter layer ollama create dnd-gen -f ./Modelfile parsing modelfile looking for model creating model template layer creating model system layer creating parameter layer creating embeddings for file /tmp/llm-model-stuff/mistral/DnD_BasicRules_2018.txt 3% |█████ | (591/18119, 2 it/s) [5m0s:2h53m35s]creating parameter layer Error: unexpected end to create model ```
Author
Owner

@vividfog commented on GitHub (Oct 9, 2023):

+1 same issue and symptoms as above. M1 Max 32 GB. The only workaround I've found is to not do large embedding runs for now.

Edit: An OK workaround is to use multiple EMBED lines, in which no individual batch is too large to work out well.

Ollama will find if embeddings already exist for an EMBED line. It's then possible to ollama create my_rag_model -f my_rag_model.Modelfile again and again, each time with one more EMBEDs pointing to new content as time goes on. Only the new content is processed, old content is reused.

A real fix would of course be nice! How to troubleshoot further?

<!-- gh-comment-id:1753362330 --> @vividfog commented on GitHub (Oct 9, 2023): +1 same issue and symptoms as above. M1 Max 32 GB. The only workaround I've found is to not do large embedding runs for now. Edit: An OK workaround is to use multiple EMBED lines, in which no individual batch is too large to work out well. Ollama will find if embeddings already exist for an EMBED line. It's then possible to `ollama create my_rag_model -f my_rag_model.Modelfile` again and again, each time with one more EMBEDs pointing to new content as time goes on. Only the new content is processed, old content is reused. A real fix would of course be nice! How to troubleshoot further?
Author
Owner

@chronicblondiee commented on GitHub (Oct 9, 2023):

@vividfog I am trying your method with my data I used split -l 600 mydata.txt to split by line and just going to try run through each of the EMBED Layers.

I will note that the max number of lines I could input per EMBED txt varied in my testing anything above 800 seemed to be unstable at least on my system. Open SUSE Tumbleweed 12core Ryzen 9 5900x 64GB RAM

Edit: I got it working using @vividfog method I am going to try write automate this with some ansible maybe until there is a proper fix if I get a good solution working in an automated way I will post the solution here!

Edit2: https://github.com/jmorganca/ollama/tree/main/examples/langchain-document @vividfog @BruceMacD @fmackenzie that example is using langchain and a vector store to store all the embeddings locally much better way of going about for loading in large datasets been working for my use case !

<!-- gh-comment-id:1753545367 --> @chronicblondiee commented on GitHub (Oct 9, 2023): @vividfog I am trying your method with my data I used split -l 600 mydata.txt to split by line and just going to try run through each of the EMBED Layers. I will note that the max number of lines I could input per EMBED txt varied in my testing anything above 800 seemed to be unstable at least on my system. Open SUSE Tumbleweed 12core Ryzen 9 5900x 64GB RAM Edit: I got it working using @vividfog method I am going to try write automate this with some ansible maybe until there is a proper fix if I get a good solution working in an automated way I will post the solution here! Edit2: https://github.com/jmorganca/ollama/tree/main/examples/langchain-document @vividfog @BruceMacD @fmackenzie that example is using langchain and a vector store to store all the embeddings locally much better way of going about for loading in large datasets been working for my use case !
Author
Owner

@BruceMacD commented on GitHub (Oct 27, 2023):

Closing this for now as we removed this feature for the time being.

<!-- gh-comment-id:1783398226 --> @BruceMacD commented on GitHub (Oct 27, 2023): Closing this for now as we removed this feature for the time being.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#230