[GH-ISSUE #1691] loading many models 1 after another corrupts ollama #26713

Closed
opened 2026-04-22 03:10:16 -05:00 by GiteaMirror · 12 comments
Owner

Originally created by @iplayfast on GitHub (Dec 23, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1691

Originally assigned to: @dhiltgen on GitHub.

I feel this is a major bug, as anyone using ollama for an extended time using several models will have the same issue.

I'm using https://github.com/iplayfast/OllamaPlayground/tree/main/createnotes#readme which tests all the models on your system. It initially loads each model and says hello just to test. This is where the problem lies.

ollama serve
Error: listen tcp 127.0.0.1:11434: bind: address already in use

These are my models:

 ollama list
NAME                           	ID          	SIZE  	MODIFIED     
chris/mr_t:latest              	e792712b8728	3.8 GB	9 hours ago 	
DrunkSally:latest              	7b378c3757fc	3.8 GB	7 days ago  	
Guido:latest                   	158599e734fb	26 GB 	7 days ago  	
Jim:latest                     	2c7476fb37de	3.8 GB	6 weeks ago 	
Mario:latest                   	902e3a8e5ed7	3.8 GB	6 weeks ago 	
MrT:latest                     	e792712b8728	3.8 GB	8 days ago  	
Polly:latest                   	19982222ada1	4.1 GB	5 weeks ago 	
Sally:latest                   	903b51bbe623	3.8 GB	10 days ago 	
Ted:latest                     	fdabf1286f32	4.1 GB	7 days ago  	
alfred:latest                  	e46325710c52	23 GB 	4 weeks ago 	
bakllava:latest                	3dd68bd4447c	4.7 GB	4 days ago  	
codebooga:latest               	05b83c5673dc	19 GB 	5 weeks ago 	
codellama:latest               	8fdf8f752f6e	3.8 GB	3 weeks ago 	
codeup:latest                  	54289661f7a9	7.4 GB	6 weeks ago 	
deepseek-coder:33b             	2941d6ab92f3	18 GB 	4 weeks ago 	
deepseek-coder:latest          	140a485970a6	776 MB	2 weeks ago 	
deepseek-llm:latest            	9aab369a853b	4.0 GB	9 days ago  	
dolphin-mixtral:latest         	4b33b01bf336	26 GB 	8 days ago  	
everythinglm:latest            	bf6610a21b1e	7.4 GB	6 weeks ago 	
falcon:latest                  	4280f7257e73	4.2 GB	11 hours ago	
llama2:latest                  	fe938a131f40	3.8 GB	6 weeks ago 	
llama2-uncensored:latest       	44040b922233	3.8 GB	4 weeks ago 	
llava:latest                   	e4c3eb471fd8	4.5 GB	9 days ago  	
magicoder:latest               	8007de06f5d9	3.8 GB	2 weeks ago 	
medllama2:latest               	a53737ec0c72	3.8 GB	6 weeks ago 	
mistral:7b                     	d364aa8d131e	4.1 GB	6 weeks ago 	
mistral:instruct               	8aa307f73b26	4.1 GB	2 months ago	
mistral:latest                 	8aa307f73b26	4.1 GB	2 months ago	
mistral:text                   	3e3d0b9dcb6a	4.1 GB	6 weeks ago 	
mistrallite:latest             	5393d4f5f262	4.1 GB	6 weeks ago 	
mixtralcpu:latest              	8fca5114ed19	26 GB 	9 hours ago 	
neural-chat:latest             	f4c6a8e532e8	4.1 GB	2 weeks ago 	
nexusraven:latest              	336957c1d527	7.4 GB	6 weeks ago 	
openhermes2.5-mistral:latest   	ca4cd4e8a562	4.1 GB	5 weeks ago 	
orca2:13b                      	a8dcfac3ac32	7.4 GB	4 weeks ago 	
orca2:latest                   	ea98cc422de3	3.8 GB	2 weeks ago 	
phi:latest                     	e22226989b6c	1.6 GB	3 days ago  	
phind-codellama:latest         	64cce35068a2	19 GB 	5 weeks ago 	
samantha-mistral:latest        	f7c8c9be1da0	4.1 GB	6 weeks ago 	
solar:latest                   	059fdabbe6e6	6.1 GB	5 days ago  	
sqlcoder:latest                	77ac14348387	4.1 GB	6 weeks ago 	
stablelm-zephyr:latest         	7c596e78b1fc	1.6 GB	2 weeks ago 	
starling-lm:latest             	0eab7e16513a	4.1 GB	3 weeks ago 	
uncensored:latest              	8fb4f61e2281	8.9 GB	2 days ago  	
wizard-math:latest             	5ab8dc2115d3	4.1 GB	9 hours ago 	
wizard-vicuna-uncensored:7b    	72fc3c2b99dc	3.8 GB	10 days ago 	
wizard-vicuna-uncensored:latest	72fc3c2b99dc	3.8 GB	5 weeks ago 	
wizardlm-uncensored:latest     	886a369d74fc	7.4 GB	13 days ago 	
xwinlm:latest                  	0fa68068d970	3.8 GB	6 weeks ago 	
yarn-mistral:latest            	8e9c368a0ae4	4.1 GB	9 days ago  	
yi:latest                      	59e2d70c6939	3.5 GB	10 days ago 	
zephyr:latest                  	1629f2a8a495	4.1 GB	6 weeks ago 	

This is the output after loading them one after another:

python CreateNotes.py 
Attempting to load each model to see if they can be loaded
   attempting to load model chris/mr_t:latest
         model chris/mr_t:latest loaded in 4.0 seconds
   attempting to load model DrunkSally:latest
         model DrunkSally:latest loaded in 8.7 seconds
   attempting to load model Guido:latest
         model Guido:latest loaded in 33.1 seconds
   attempting to load model Jim:latest
         model Jim:latest loaded in 4.0 seconds
   attempting to load model Mario:latest
         model Mario:latest loaded in 1.2 seconds
   attempting to load model MrT:latest
         model MrT:latest loaded in 0.7 seconds
   attempting to load model Polly:latest
         model Polly:latest loaded in 37.9 seconds
   attempting to load model Sally:latest
         model Sally:latest loaded in 2.1 seconds
   attempting to load model Ted:latest
         model Ted:latest loaded in 1.7 seconds
   attempting to load model alfred:latest
         model alfred:latest loaded in 155.2 seconds
   attempting to load model bakllava:latest
         model bakllava:latest loaded in 32.8 seconds
   attempting to load model codebooga:latest
         model codebooga:latest loaded in 110.1 seconds
   attempting to load model codellama:latest
         model codellama:latest loaded in 24.7 seconds
   attempting to load model codeup:latest
         model codeup:latest loaded in 52.7 seconds
   attempting to load model deepseek-coder:33b
         model deepseek-coder:33b loaded in 107.6 seconds
   attempting to load model deepseek-coder:latest
         model deepseek-coder:latest loaded in 7.0 seconds
   attempting to load model deepseek-llm:latest
         model deepseek-llm:latest loaded in 14.4 seconds
   attempting to load model dolphin-mixtral:latest
         model dolphin-mixtral:latest loaded in 93.7 seconds
   attempting to load model everythinglm:latest
         model everythinglm:latest loaded in 48.2 seconds
   attempting to load model falcon:latest
         model falcon:latest loaded in 34.2 seconds
   attempting to load model llama2:latest
         model llama2:latest loaded in 5.9 seconds
   attempting to load model llama2-uncensored:latest
         model llama2-uncensored:latest loaded in 1.8 seconds
   attempting to load model llava:latest
         model llava:latest loaded in 6.2 seconds
   attempting to load model magicoder:latest
         model magicoder:latest loaded in 3.7 seconds
   attempting to load model medllama2:latest
         model medllama2:latest loaded in 22.5 seconds
   attempting to load model mistral:7b
         model mistral:7b loaded in 26.0 seconds
   attempting to load model mistral:instruct
         model mistral:instruct loaded in 0.1 seconds
   attempting to load model mistral:latest
         model mistral:latest loaded in 0.1 seconds
   attempting to load model mistral:text
         model mistral:text loaded in 36.3 seconds
   attempting to load model mistrallite:latest
         model mistrallite:latest loaded in 38.2 seconds
   attempting to load model mixtralcpu:latest
Error: Ollama call failed with status code 500. Details: timed out waiting for llama runner to start
         model mixtralcpu:latest ------------not loaded------------ in 182.2 seconds
   attempting to load model nexusraven:latest
         model nexusraven:latest loaded in 60.8 seconds
   attempting to load model openhermes2.5-mistral:latest
         model openhermes2.5-mistral:latest loaded in 25.0 seconds
   attempting to load model orca2:13b
         model orca2:13b loaded in 46.4 seconds
   attempting to load model orca2:latest
         model orca2:latest loaded in 25.7 seconds
   attempting to load model phi:latest
         model phi:latest loaded in 17.9 seconds
   attempting to load model phind-codellama:latest
         model phind-codellama:latest loaded in 123.8 seconds
   attempting to load model samantha-mistral:latest
         model samantha-mistral:latest loaded in 33.1 seconds
   attempting to load model solar:latest
         model solar:latest loaded in 42.2 seconds
   attempting to load model sqlcoder:latest
Timed out after 300 seconds for question: are you there

sqlcoder isn't a big model. I had originally thought meditron was the problem so I removed it. and it just went to the next one.
mixtralcpu is from https://ollama.ai/chris/mixtralcpu which uses loads into memory instead of the gpu. (It loaded from command line fine).

Originally created by @iplayfast on GitHub (Dec 23, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1691 Originally assigned to: @dhiltgen on GitHub. I feel this is a major bug, as anyone using ollama for an extended time using several models will have the same issue. I'm using https://github.com/iplayfast/OllamaPlayground/tree/main/createnotes#readme which tests all the models on your system. It initially loads each model and says hello just to test. This is where the problem lies. ollama serve Error: listen tcp 127.0.0.1:11434: bind: address already in use These are my models: ``` ollama list NAME ID SIZE MODIFIED chris/mr_t:latest e792712b8728 3.8 GB 9 hours ago DrunkSally:latest 7b378c3757fc 3.8 GB 7 days ago Guido:latest 158599e734fb 26 GB 7 days ago Jim:latest 2c7476fb37de 3.8 GB 6 weeks ago Mario:latest 902e3a8e5ed7 3.8 GB 6 weeks ago MrT:latest e792712b8728 3.8 GB 8 days ago Polly:latest 19982222ada1 4.1 GB 5 weeks ago Sally:latest 903b51bbe623 3.8 GB 10 days ago Ted:latest fdabf1286f32 4.1 GB 7 days ago alfred:latest e46325710c52 23 GB 4 weeks ago bakllava:latest 3dd68bd4447c 4.7 GB 4 days ago codebooga:latest 05b83c5673dc 19 GB 5 weeks ago codellama:latest 8fdf8f752f6e 3.8 GB 3 weeks ago codeup:latest 54289661f7a9 7.4 GB 6 weeks ago deepseek-coder:33b 2941d6ab92f3 18 GB 4 weeks ago deepseek-coder:latest 140a485970a6 776 MB 2 weeks ago deepseek-llm:latest 9aab369a853b 4.0 GB 9 days ago dolphin-mixtral:latest 4b33b01bf336 26 GB 8 days ago everythinglm:latest bf6610a21b1e 7.4 GB 6 weeks ago falcon:latest 4280f7257e73 4.2 GB 11 hours ago llama2:latest fe938a131f40 3.8 GB 6 weeks ago llama2-uncensored:latest 44040b922233 3.8 GB 4 weeks ago llava:latest e4c3eb471fd8 4.5 GB 9 days ago magicoder:latest 8007de06f5d9 3.8 GB 2 weeks ago medllama2:latest a53737ec0c72 3.8 GB 6 weeks ago mistral:7b d364aa8d131e 4.1 GB 6 weeks ago mistral:instruct 8aa307f73b26 4.1 GB 2 months ago mistral:latest 8aa307f73b26 4.1 GB 2 months ago mistral:text 3e3d0b9dcb6a 4.1 GB 6 weeks ago mistrallite:latest 5393d4f5f262 4.1 GB 6 weeks ago mixtralcpu:latest 8fca5114ed19 26 GB 9 hours ago neural-chat:latest f4c6a8e532e8 4.1 GB 2 weeks ago nexusraven:latest 336957c1d527 7.4 GB 6 weeks ago openhermes2.5-mistral:latest ca4cd4e8a562 4.1 GB 5 weeks ago orca2:13b a8dcfac3ac32 7.4 GB 4 weeks ago orca2:latest ea98cc422de3 3.8 GB 2 weeks ago phi:latest e22226989b6c 1.6 GB 3 days ago phind-codellama:latest 64cce35068a2 19 GB 5 weeks ago samantha-mistral:latest f7c8c9be1da0 4.1 GB 6 weeks ago solar:latest 059fdabbe6e6 6.1 GB 5 days ago sqlcoder:latest 77ac14348387 4.1 GB 6 weeks ago stablelm-zephyr:latest 7c596e78b1fc 1.6 GB 2 weeks ago starling-lm:latest 0eab7e16513a 4.1 GB 3 weeks ago uncensored:latest 8fb4f61e2281 8.9 GB 2 days ago wizard-math:latest 5ab8dc2115d3 4.1 GB 9 hours ago wizard-vicuna-uncensored:7b 72fc3c2b99dc 3.8 GB 10 days ago wizard-vicuna-uncensored:latest 72fc3c2b99dc 3.8 GB 5 weeks ago wizardlm-uncensored:latest 886a369d74fc 7.4 GB 13 days ago xwinlm:latest 0fa68068d970 3.8 GB 6 weeks ago yarn-mistral:latest 8e9c368a0ae4 4.1 GB 9 days ago yi:latest 59e2d70c6939 3.5 GB 10 days ago zephyr:latest 1629f2a8a495 4.1 GB 6 weeks ago ``` This is the output after loading them one after another: ``` python CreateNotes.py Attempting to load each model to see if they can be loaded attempting to load model chris/mr_t:latest model chris/mr_t:latest loaded in 4.0 seconds attempting to load model DrunkSally:latest model DrunkSally:latest loaded in 8.7 seconds attempting to load model Guido:latest model Guido:latest loaded in 33.1 seconds attempting to load model Jim:latest model Jim:latest loaded in 4.0 seconds attempting to load model Mario:latest model Mario:latest loaded in 1.2 seconds attempting to load model MrT:latest model MrT:latest loaded in 0.7 seconds attempting to load model Polly:latest model Polly:latest loaded in 37.9 seconds attempting to load model Sally:latest model Sally:latest loaded in 2.1 seconds attempting to load model Ted:latest model Ted:latest loaded in 1.7 seconds attempting to load model alfred:latest model alfred:latest loaded in 155.2 seconds attempting to load model bakllava:latest model bakllava:latest loaded in 32.8 seconds attempting to load model codebooga:latest model codebooga:latest loaded in 110.1 seconds attempting to load model codellama:latest model codellama:latest loaded in 24.7 seconds attempting to load model codeup:latest model codeup:latest loaded in 52.7 seconds attempting to load model deepseek-coder:33b model deepseek-coder:33b loaded in 107.6 seconds attempting to load model deepseek-coder:latest model deepseek-coder:latest loaded in 7.0 seconds attempting to load model deepseek-llm:latest model deepseek-llm:latest loaded in 14.4 seconds attempting to load model dolphin-mixtral:latest model dolphin-mixtral:latest loaded in 93.7 seconds attempting to load model everythinglm:latest model everythinglm:latest loaded in 48.2 seconds attempting to load model falcon:latest model falcon:latest loaded in 34.2 seconds attempting to load model llama2:latest model llama2:latest loaded in 5.9 seconds attempting to load model llama2-uncensored:latest model llama2-uncensored:latest loaded in 1.8 seconds attempting to load model llava:latest model llava:latest loaded in 6.2 seconds attempting to load model magicoder:latest model magicoder:latest loaded in 3.7 seconds attempting to load model medllama2:latest model medllama2:latest loaded in 22.5 seconds attempting to load model mistral:7b model mistral:7b loaded in 26.0 seconds attempting to load model mistral:instruct model mistral:instruct loaded in 0.1 seconds attempting to load model mistral:latest model mistral:latest loaded in 0.1 seconds attempting to load model mistral:text model mistral:text loaded in 36.3 seconds attempting to load model mistrallite:latest model mistrallite:latest loaded in 38.2 seconds attempting to load model mixtralcpu:latest Error: Ollama call failed with status code 500. Details: timed out waiting for llama runner to start model mixtralcpu:latest ------------not loaded------------ in 182.2 seconds attempting to load model nexusraven:latest model nexusraven:latest loaded in 60.8 seconds attempting to load model openhermes2.5-mistral:latest model openhermes2.5-mistral:latest loaded in 25.0 seconds attempting to load model orca2:13b model orca2:13b loaded in 46.4 seconds attempting to load model orca2:latest model orca2:latest loaded in 25.7 seconds attempting to load model phi:latest model phi:latest loaded in 17.9 seconds attempting to load model phind-codellama:latest model phind-codellama:latest loaded in 123.8 seconds attempting to load model samantha-mistral:latest model samantha-mistral:latest loaded in 33.1 seconds attempting to load model solar:latest model solar:latest loaded in 42.2 seconds attempting to load model sqlcoder:latest Timed out after 300 seconds for question: are you there ``` sqlcoder isn't a big model. I had originally thought meditron was the problem so I removed it. and it just went to the next one. mixtralcpu is from https://ollama.ai/chris/mixtralcpu which uses loads into memory instead of the gpu. (It loaded from command line fine).
GiteaMirror added the bug label 2026-04-22 03:10:16 -05:00
Author
Owner

@atorr0 commented on GitHub (Dec 23, 2023):

Hi, Do you have tried with systemctl restart ollama.service after each attempt?

<!-- gh-comment-id:1868377336 --> @atorr0 commented on GitHub (Dec 23, 2023): Hi, Do you have tried with `systemctl restart ollama.service` after each attempt?
Author
Owner

@iplayfast commented on GitHub (Dec 24, 2023):

Yes, that does clear the problem, but of course by then the program is borked. It isn't a good fix, if that is what you are suggesting. But it does reset ollama

<!-- gh-comment-id:1868451941 --> @iplayfast commented on GitHub (Dec 24, 2023): Yes, that does clear the problem, but of course by then the program is borked. It isn't a good fix, if that is what you are suggesting. But it does reset ollama
Author
Owner

@BruceMacD commented on GitHub (Jan 8, 2024):

Thanks for reporting this @iplayfast I think this could have been fixed in the most recent release. Please let me know if you're still seeing issues.

<!-- gh-comment-id:1881742366 --> @BruceMacD commented on GitHub (Jan 8, 2024): Thanks for reporting this @iplayfast I think this could have been fixed in the most recent release. Please let me know if you're still seeing issues.
Author
Owner

@iplayfast commented on GitHub (Jan 9, 2024):

No, still occurs...
Some thoughts:

  1. if two users are using two models and Ollama is swapping them back and forth as needed,
  • Where are the conversations saved?
  • Is that memory being saved/restored at each swap as well, or is the memory potentially growing and eventually interfering with the swap.
  1. -In my latest version of my software I load the largest models first and work my way down to the smallest, asking an assortment of questions, and evaluating the answers with the mistral model. In otherwords many model swaps. At about the 4th one down it dies. I'll push it so you can test yourself.
<!-- gh-comment-id:1883702350 --> @iplayfast commented on GitHub (Jan 9, 2024): No, still occurs... Some thoughts: 1. if two users are using two models and Ollama is swapping them back and forth as needed, - Where are the conversations saved? - Is that memory being saved/restored at each swap as well, or is the memory potentially growing and eventually interfering with the swap. 2. -In my latest version of my software I load the largest models first and work my way down to the smallest, asking an assortment of questions, and evaluating the answers with the mistral model. In otherwords many model swaps. At about the 4th one down it dies. I'll push it so you can test yourself.
Author
Owner

@iplayfast commented on GitHub (Jan 12, 2024):

version 0.1.20 did better, but my torture test still killed it.

python CreateNotes.py 
mixtral:latest
notux:latest
dolphin-mixtral:latest
Guido:latest
alfred:latest
phind-codellama:latest
codebooga:latest
deepseek-coder:33b
nexusraven:latest
everythinglm:latest
orca2:13b
codeup:latest
wizardlm-uncensored:latest
eas/nous-hermes-2-solar-10.7b:latest
solar:latest
llama-pro:latest
bakllava:latest
llava:latest
falcon:latest
Error: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
Error: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f80f9cdba10>: Failed to establish a new connection: [Errno 111] Connection refused'))
Error: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f80f9cf6f90>: Failed to establish a new connection: [Errno 111] Connection refused'))
Error: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f80f9cd8b50>: Failed to establish a new connection: [Errno 111] Connection refused'))
Error: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f80f9cf7110>: Failed to establish a new connection: [Errno 111] Connection refused'))
Error: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f80f9cfe750>: Failed to establish a new connection: [Errno 111] Connection refused'))
Error: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f80f9cfe5d0>: Failed to establish a new connection: [Errno 111] Connection refused'))
Error: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f80f9cf74d0>: Failed to establish a new connection: [Errno 111] Connection refused'))
Error: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f80f9d01f10>: Failed to establish a new connection: [Errno 111] Connection refused'))
Error: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f80f9cd8a10>: Failed to establish a new connection: [Errno 111] Connection refused'))
Error: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f80f9cf6b90>: Failed to establish a new connection: [Errno 111] Connection refused'))
Error: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f80f9d0d650>: Failed to establish a new connection: [Errno 111] Connection refused'))
Error: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f80f9cf5990>: Failed to establish a new connection: [Errno 111] Connection refused'))
<!-- gh-comment-id:1889734642 --> @iplayfast commented on GitHub (Jan 12, 2024): version 0.1.20 did better, but my torture test still killed it. ``` python CreateNotes.py mixtral:latest notux:latest dolphin-mixtral:latest Guido:latest alfred:latest phind-codellama:latest codebooga:latest deepseek-coder:33b nexusraven:latest everythinglm:latest orca2:13b codeup:latest wizardlm-uncensored:latest eas/nous-hermes-2-solar-10.7b:latest solar:latest llama-pro:latest bakllava:latest llava:latest falcon:latest Error: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)) Error: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f80f9cdba10>: Failed to establish a new connection: [Errno 111] Connection refused')) Error: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f80f9cf6f90>: Failed to establish a new connection: [Errno 111] Connection refused')) Error: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f80f9cd8b50>: Failed to establish a new connection: [Errno 111] Connection refused')) Error: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f80f9cf7110>: Failed to establish a new connection: [Errno 111] Connection refused')) Error: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f80f9cfe750>: Failed to establish a new connection: [Errno 111] Connection refused')) Error: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f80f9cfe5d0>: Failed to establish a new connection: [Errno 111] Connection refused')) Error: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f80f9cf74d0>: Failed to establish a new connection: [Errno 111] Connection refused')) Error: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f80f9d01f10>: Failed to establish a new connection: [Errno 111] Connection refused')) Error: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f80f9cd8a10>: Failed to establish a new connection: [Errno 111] Connection refused')) Error: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f80f9cf6b90>: Failed to establish a new connection: [Errno 111] Connection refused')) Error: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f80f9d0d650>: Failed to establish a new connection: [Errno 111] Connection refused')) Error: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f80f9cf5990>: Failed to establish a new connection: [Errno 111] Connection refused')) ```
Author
Owner

@iplayfast commented on GitHub (Jan 16, 2024):

I became suspicious when after testing again, it died on falcon again.
So I tried falcon on it's own. It died. I tried removing falcon and reinstalling it. Still died.
The problem might be with falcon.

<!-- gh-comment-id:1894524525 --> @iplayfast commented on GitHub (Jan 16, 2024): I became suspicious when after testing again, it died on falcon again. So I tried falcon on it's own. It died. I tried removing falcon and reinstalling it. Still died. The problem might be with falcon.
Author
Owner

@dhiltgen commented on GitHub (Jan 27, 2024):

Could you capture server logs from the time around the crash?

<!-- gh-comment-id:1912884238 --> @dhiltgen commented on GitHub (Jan 27, 2024): Could you capture server logs from the time around the crash?
Author
Owner

@iplayfast commented on GitHub (Jan 30, 2024):

I just finished running it with version 0.1.22 and it made it much farther in the test. It now doesn't crash but seems to be stuck in some infinite loop. While the test was running I did a systemctl restart ollama and it carried on after missing a few questions. I've updated my stress test so that questions are asked first and then evaluated after so there is less swapping of llms. github repo (see above) has been updated with CreateNotes and ViewResutls and the results.json. The questions are asked from largest model to smallest.

As for server logs, where would they be located, as I can't find them?

My current models are:

ollama list
NAME                                	ID          	SIZE  	MODIFIED     
chris/openhermes-agent:latest       	c674d4614455	5.1 GB	10 days ago 	
eas/nous-hermes-2-solar-10.7b:latest	5986dba75154	6.5 GB	3 weeks ago 	
DrunkSally:latest                   	7b378c3757fc	3.8 GB	6 weeks ago 	
Guido:latest                        	158599e734fb	26 GB 	6 weeks ago 	
Jim:latest                          	2c7476fb37de	3.8 GB	2 months ago	
Mario:latest                        	902e3a8e5ed7	3.8 GB	2 months ago	
MrT:latest                          	e792712b8728	3.8 GB	6 weeks ago 	
Polly:latest                        	19982222ada1	4.1 GB	2 months ago	
Sally:latest                        	903b51bbe623	3.8 GB	6 weeks ago 	
Ted:latest                          	fdabf1286f32	4.1 GB	6 weeks ago 	
alfred:latest                       	e46325710c52	23 GB 	2 months ago	
codebooga:latest                    	05b83c5673dc	19 GB 	2 months ago	
codellama:latest                    	8fdf8f752f6e	3.8 GB	2 months ago	
codeup:latest                       	54289661f7a9	7.4 GB	2 months ago	
deepseek-coder:33b                  	acec7c0b0fd9	18 GB 	3 weeks ago 	
deepseek-coder:latest               	3ddd2d3fc8d2	776 MB	3 weeks ago 	
deepseek-llm:latest                 	9aab369a853b	4.0 GB	6 weeks ago 	
dolphin-mistral:latest              	ecbf896611f5	4.1 GB	2 weeks ago 	
dolphin-mixtral:latest              	cfada4ba31c7	26 GB 	3 weeks ago 	
dolphin-phi:latest                  	c5761fc77240	1.6 GB	5 weeks ago 	
duckdb-nsql:latest                  	7a42116386ac	3.8 GB	3 days ago  	
everythinglm:latest                 	b005372bc34b	7.4 GB	3 weeks ago 	
llama-pro:latest                    	fc5c0d744444	4.7 GB	2 weeks ago 	
llama2:13b                          	d475bf4c50bc	7.4 GB	6 days ago  	
llama2:70b                          	e7f6c06ffef4	38 GB 	6 days ago  	
llama2:7b                           	78e26419b446	3.8 GB	6 days ago  	
llama2:latest                       	78e26419b446	3.8 GB	3 weeks ago 	
llama2-uncensored:latest            	44040b922233	3.8 GB	2 months ago	
llava:latest                        	cd3274b81a85	4.5 GB	3 weeks ago 	
magicoder:latest                    	8007de06f5d9	3.8 GB	7 weeks ago 	
medllama2:latest                    	a53737ec0c72	3.8 GB	2 months ago	
mistral:7b                          	61e88e884507	4.1 GB	3 weeks ago 	
mistral:instruct                    	61e88e884507	4.1 GB	3 weeks ago 	
mistral:latest                      	61e88e884507	4.1 GB	3 weeks ago 	
mistral:text                        	d19e34de4cb6	4.1 GB	3 weeks ago 	
mistrallite:latest                  	5393d4f5f262	4.1 GB	2 months ago	
mixtral:latest                      	7708c059a8bb	26 GB 	3 weeks ago 	
neural-chat:latest                  	89fa737d3b85	4.1 GB	3 weeks ago 	
nexusraven:latest                   	483a8282af74	7.4 GB	11 days ago 	
notus:latest                        	43c512e16786	4.1 GB	4 weeks ago 	
notux:latest                        	fe14e7d66184	26 GB 	4 weeks ago 	
nous-hermes2-mixtral:latest         	599da8dce2c1	26 GB 	13 days ago 	
nsfw:latest                         	328546e02f6f	13 GB 	3 days ago  	
nsfwstoryteller:latest              	328546e02f6f	13 GB 	3 days ago  	
openhermes:latest                   	95477a2659b7	4.1 GB	4 weeks ago 	
openhermes-agent:latest             	4d82cc75e3aa	5.1 GB	11 days ago 	
openhermes2.5-mistral:latest        	ca4cd4e8a562	4.1 GB	2 months ago	
orca-mini:latest                    	2dbd9f439647	2.0 GB	6 days ago  	
orca2:13b                           	a8dcfac3ac32	7.4 GB	2 months ago	
orca2:latest                        	ea98cc422de3	3.8 GB	7 weeks ago 	
phi:latest                          	e2fd6321a5fe	1.6 GB	3 weeks ago 	
phind-codellama:latest              	566e1b629c44	19 GB 	3 weeks ago 	
qwen:latest                         	0fddaff90ef5	4.5 GB	6 days ago  	
samantha-mistral:latest             	f7c8c9be1da0	4.1 GB	2 months ago	
solar:latest                        	059fdabbe6e6	6.1 GB	6 weeks ago 	
sqlcoder:latest                     	77ac14348387	4.1 GB	2 months ago	
stable-code:latest                  	aa5ab8afb862	1.6 GB	11 days ago 	
stablelm-zephyr:latest              	0a108dbd846e	1.6 GB	3 weeks ago 	
stablelm2:latest                    	ea04e74d6b59	982 MB	3 days ago  	
starling-lm:latest                  	ff4752739ae4	4.1 GB	3 weeks ago 	
tinydolphin:latest                  	97c9685cc5db	636 MB	3 days ago  	
tinyllama:latest                    	2644915ede35	637 MB	3 weeks ago 	
wizard-math:latest                  	5ab8dc2115d3	4.1 GB	5 weeks ago 	
wizard-vicuna-uncensored:7b         	72fc3c2b99dc	3.8 GB	6 weeks ago 	
wizard-vicuna-uncensored:latest     	72fc3c2b99dc	3.8 GB	2 months ago	
wizardcoder:latest                  	de9d848c1323	3.8 GB	4 weeks ago 	
wizardlm-uncensored:latest          	886a369d74fc	7.4 GB	7 weeks ago 	
xwinlm:latest                       	0fa68068d970	3.8 GB	2 months ago	
yarn-mistral:latest                 	8e9c368a0ae4	4.1 GB	6 weeks ago 	
yi:latest                           	a86526842143	3.5 GB	3 weeks ago 	
zephyr:latest                       	bbe38b81adec	4.1 GB	3 weeks ago 	

It seemed that sqlcoder started having problems, answering questions in strange ways. The results.json file can be searched for ": No Answer due to error"

The question "what fills you with joy" running from just the command line seemed to give a very long answer. and my software failed here, I restarted the server after several hours. Perhaps that's why as it's a code completion model.

Given that, Code completion models are so different than chat models there should be a way that:

  1. they can be recognised.
  2. a query can have a maximum response time.
  3. a query can have a maximum response length
<!-- gh-comment-id:1916151705 --> @iplayfast commented on GitHub (Jan 30, 2024): I just finished running it with version 0.1.22 and it made it much farther in the test. It now doesn't crash but seems to be stuck in some infinite loop. While the test was running I did a systemctl restart ollama and it carried on after missing a few questions. I've updated my stress test so that questions are asked first and then evaluated after so there is less swapping of llms. github repo (see above) has been updated with CreateNotes and ViewResutls and the results.json. The questions are asked from largest model to smallest. As for server logs, where would they be located, as I can't find them? My current models are: ``` ollama list NAME ID SIZE MODIFIED chris/openhermes-agent:latest c674d4614455 5.1 GB 10 days ago eas/nous-hermes-2-solar-10.7b:latest 5986dba75154 6.5 GB 3 weeks ago DrunkSally:latest 7b378c3757fc 3.8 GB 6 weeks ago Guido:latest 158599e734fb 26 GB 6 weeks ago Jim:latest 2c7476fb37de 3.8 GB 2 months ago Mario:latest 902e3a8e5ed7 3.8 GB 2 months ago MrT:latest e792712b8728 3.8 GB 6 weeks ago Polly:latest 19982222ada1 4.1 GB 2 months ago Sally:latest 903b51bbe623 3.8 GB 6 weeks ago Ted:latest fdabf1286f32 4.1 GB 6 weeks ago alfred:latest e46325710c52 23 GB 2 months ago codebooga:latest 05b83c5673dc 19 GB 2 months ago codellama:latest 8fdf8f752f6e 3.8 GB 2 months ago codeup:latest 54289661f7a9 7.4 GB 2 months ago deepseek-coder:33b acec7c0b0fd9 18 GB 3 weeks ago deepseek-coder:latest 3ddd2d3fc8d2 776 MB 3 weeks ago deepseek-llm:latest 9aab369a853b 4.0 GB 6 weeks ago dolphin-mistral:latest ecbf896611f5 4.1 GB 2 weeks ago dolphin-mixtral:latest cfada4ba31c7 26 GB 3 weeks ago dolphin-phi:latest c5761fc77240 1.6 GB 5 weeks ago duckdb-nsql:latest 7a42116386ac 3.8 GB 3 days ago everythinglm:latest b005372bc34b 7.4 GB 3 weeks ago llama-pro:latest fc5c0d744444 4.7 GB 2 weeks ago llama2:13b d475bf4c50bc 7.4 GB 6 days ago llama2:70b e7f6c06ffef4 38 GB 6 days ago llama2:7b 78e26419b446 3.8 GB 6 days ago llama2:latest 78e26419b446 3.8 GB 3 weeks ago llama2-uncensored:latest 44040b922233 3.8 GB 2 months ago llava:latest cd3274b81a85 4.5 GB 3 weeks ago magicoder:latest 8007de06f5d9 3.8 GB 7 weeks ago medllama2:latest a53737ec0c72 3.8 GB 2 months ago mistral:7b 61e88e884507 4.1 GB 3 weeks ago mistral:instruct 61e88e884507 4.1 GB 3 weeks ago mistral:latest 61e88e884507 4.1 GB 3 weeks ago mistral:text d19e34de4cb6 4.1 GB 3 weeks ago mistrallite:latest 5393d4f5f262 4.1 GB 2 months ago mixtral:latest 7708c059a8bb 26 GB 3 weeks ago neural-chat:latest 89fa737d3b85 4.1 GB 3 weeks ago nexusraven:latest 483a8282af74 7.4 GB 11 days ago notus:latest 43c512e16786 4.1 GB 4 weeks ago notux:latest fe14e7d66184 26 GB 4 weeks ago nous-hermes2-mixtral:latest 599da8dce2c1 26 GB 13 days ago nsfw:latest 328546e02f6f 13 GB 3 days ago nsfwstoryteller:latest 328546e02f6f 13 GB 3 days ago openhermes:latest 95477a2659b7 4.1 GB 4 weeks ago openhermes-agent:latest 4d82cc75e3aa 5.1 GB 11 days ago openhermes2.5-mistral:latest ca4cd4e8a562 4.1 GB 2 months ago orca-mini:latest 2dbd9f439647 2.0 GB 6 days ago orca2:13b a8dcfac3ac32 7.4 GB 2 months ago orca2:latest ea98cc422de3 3.8 GB 7 weeks ago phi:latest e2fd6321a5fe 1.6 GB 3 weeks ago phind-codellama:latest 566e1b629c44 19 GB 3 weeks ago qwen:latest 0fddaff90ef5 4.5 GB 6 days ago samantha-mistral:latest f7c8c9be1da0 4.1 GB 2 months ago solar:latest 059fdabbe6e6 6.1 GB 6 weeks ago sqlcoder:latest 77ac14348387 4.1 GB 2 months ago stable-code:latest aa5ab8afb862 1.6 GB 11 days ago stablelm-zephyr:latest 0a108dbd846e 1.6 GB 3 weeks ago stablelm2:latest ea04e74d6b59 982 MB 3 days ago starling-lm:latest ff4752739ae4 4.1 GB 3 weeks ago tinydolphin:latest 97c9685cc5db 636 MB 3 days ago tinyllama:latest 2644915ede35 637 MB 3 weeks ago wizard-math:latest 5ab8dc2115d3 4.1 GB 5 weeks ago wizard-vicuna-uncensored:7b 72fc3c2b99dc 3.8 GB 6 weeks ago wizard-vicuna-uncensored:latest 72fc3c2b99dc 3.8 GB 2 months ago wizardcoder:latest de9d848c1323 3.8 GB 4 weeks ago wizardlm-uncensored:latest 886a369d74fc 7.4 GB 7 weeks ago xwinlm:latest 0fa68068d970 3.8 GB 2 months ago yarn-mistral:latest 8e9c368a0ae4 4.1 GB 6 weeks ago yi:latest a86526842143 3.5 GB 3 weeks ago zephyr:latest bbe38b81adec 4.1 GB 3 weeks ago ``` It seemed that sqlcoder started having problems, answering questions in strange ways. The results.json file can be searched for ": No Answer due to error" The question "what fills you with joy" running from just the command line seemed to give a very long answer. and my software failed here, I restarted the server after several hours. Perhaps that's why as it's a code completion model. Given that, Code completion models are so different than chat models there should be a way that: 1. they can be recognised. 2. a query can have a maximum response time. 3. a query can have a maximum response length
Author
Owner

@dhiltgen commented on GitHub (Jan 30, 2024):

As for server logs, where would they be located, as I can't find them?

Depends on your platform. Check out https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md

<!-- gh-comment-id:1917414761 --> @dhiltgen commented on GitHub (Jan 30, 2024): > As for server logs, where would they be located, as I can't find them? Depends on your platform. Check out https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md
Author
Owner

@iplayfast commented on GitHub (Jan 30, 2024):

yikes that's a lot of data, are you looking for anything in partiuclar? I've included a small sample of around the time.
(note to self journalctl -u ollama -S "2024-01-30 17:01:45")

:14:33 FORGE ollama[2004316]: [GIN] 2024/01/29 - 03:14:33 | 200 |     208.013µs |       127.0.0.1 | POST     "/api/show"
Jan 29 03:14:33 FORGE ollama[2004316]: 2024/01/29 03:14:33 gpu.go:140: INFO CUDA Compute Capability detected: 8.9
Jan 29 03:14:33 FORGE ollama[2004316]: 2024/01/29 03:14:33 gpu.go:140: INFO CUDA Compute Capability detected: 8.9
Jan 29 03:14:33 FORGE ollama[2004316]: 2024/01/29 03:14:33 cpu_common.go:11: INFO CPU has AVX2
Jan 29 03:14:33 FORGE ollama[2004316]: 2024/01/29 03:14:33 dyn_ext_server.go:90: INFO Loading Dynamic llm server: /tmp/ollama4251586406/cuda_v11/libext_server.>
Jan 29 03:14:33 FORGE ollama[2004316]: 2024/01/29 03:14:33 dyn_ext_server.go:145: INFO Initializing llama server
Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /usr/share/ollama/.ollama/models/blobs>
Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv   0:                       general.architecture str              = llama
Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv   1:                               general.name str              = teknium
Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv   4:                          llama.block_count u32              = 32
Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 8
Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 10000.000000
Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv  11:                          general.file_type u32              = 2
Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32002]   = ["<unk>", "<s>", "</s>", "<0>
Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32002]   = [0.000000, 0.000000, 0.00000>
Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32002]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, >
Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 32000
Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv  18:               tokenizer.ggml.add_bos_token bool             = true
Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv  19:               tokenizer.ggml.add_eos_token bool             = false
Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% for message in messages %>
Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv  21:               general.quantization_version u32              = 2
Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - type  f32:   65 tensors
Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - type q4_0:  225 tensors
Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - type q6_K:    1 tensors
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_vocab: special tokens definition check successful ( 261/32002 ).
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: format           = GGUF V3 (latest)
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: arch             = llama
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: vocab type       = SPM
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_vocab          = 32002
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_merges         = 0
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_ctx_train      = 32768
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_embd           = 4096
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_head           = 32
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_head_kv        = 8
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_layer          = 32
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_rot            = 128
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_embd_head_k    = 128
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_embd_head_v    = 128
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_gqa            = 4
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_embd_k_gqa     = 1024
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_embd_v_gqa     = 1024
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_ff             = 14336
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_expert         = 0
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_expert_used    = 0
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: rope scaling     = linear
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: freq_base_train  = 10000.0
an 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: freq_scale_train = 1
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_yarn_orig_ctx  = 32768
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: rope_finetuned   = unknown
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: model type       = 7B
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: model ftype      = Q4_0
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: model params     = 7.24 B
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: model size       = 3.83 GiB (4.54 BPW)
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: general.name     = teknium
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: BOS token        = 1 '<s>'
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: EOS token        = 32000 '<|im_end|>'
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: UNK token        = 0 '<unk>'
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: LF token         = 13 '<0x0A>'
Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_tensors: ggml ctx size =    0.22 MiB
Jan 29 03:14:35 FORGE ollama[2004316]: llm_load_tensors: offloading 32 repeating layers to GPU
Jan 29 03:14:35 FORGE ollama[2004316]: llm_load_tensors: offloading non-repeating layers to GPU
Jan 29 03:14:35 FORGE ollama[2004316]: llm_load_tensors: offloaded 33/33 layers to GPU
Jan 29 03:14:35 FORGE ollama[2004316]: llm_load_tensors:        CPU buffer size =    70.32 MiB
Jan 29 03:14:35 FORGE ollama[2004316]: llm_load_tensors:      CUDA0 buffer size =  3847.56 MiB
Jan 29 03:14:35 FORGE ollama[2004316]: ...................................................................................................
Jan 29 03:14:35 FORGE ollama[2004316]: llama_new_context_with_model: n_ctx      = 2048
Jan 29 03:14:35 FORGE ollama[2004316]: llama_new_context_with_model: freq_base  = 10000.0
Jan 29 03:14:35 FORGE ollama[2004316]: llama_new_context_with_model: freq_scale = 1
Jan 29 03:14:35 FORGE ollama[2004316]: llama_kv_cache_init:      CUDA0 KV buffer size =   256.00 MiB
Jan 29 03:14:35 FORGE ollama[2004316]: llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
Jan 29 03:14:35 FORGE ollama[2004316]: llama_new_context_with_model:  CUDA_Host input buffer size   =    12.01 MiB
Jan 29 03:14:35 FORGE ollama[2004316]: llama_new_context_with_model:      CUDA0 compute buffer size =   156.00 MiB
Jan 29 03:14:35 FORGE ollama[2004316]: llama_new_context_with_model:  CUDA_Host compute buffer size =     8.00 MiB
Jan 29 03:14:35 FORGE ollama[2004316]: llama_new_context_with_model: graph splits (measure): 3
Jan 29 03:14:35 FORGE ollama[2004316]: 2024/01/29 03:14:35 dyn_ext_server.go:156: INFO Starting llama main loop
Jan 29 03:14:35 FORGE ollama[2004316]: [GIN] 2024/01/29 - 03:14:35 | 200 |  2.247827969s |       127.0.0.1 | POST     "/api/chat"
Jan 29 03:14:56 FORGE ollama[2004316]: 2024/01/29 03:14:56 dyn_ext_server.go:170: INFO loaded 0 images
Jan 29 03:14:57 FORGE ollama[2004316]: [GIN] 2024/01/29 - 03:14:57 | 200 |  358.002761ms |       127.0.0.1 | POST     "/api/chat"
Jan 29 03:15:38 FORGE ollama[2004316]: 2024/01/29 03:15:38 dyn_ext_server.go:170: INFO loaded 0 images
<!-- gh-comment-id:1917994229 --> @iplayfast commented on GitHub (Jan 30, 2024): yikes that's a lot of data, are you looking for anything in partiuclar? I've included a small sample of around the time. (note to self journalctl -u ollama -S "2024-01-30 17:01:45") ``` :14:33 FORGE ollama[2004316]: [GIN] 2024/01/29 - 03:14:33 | 200 | 208.013µs | 127.0.0.1 | POST "/api/show" Jan 29 03:14:33 FORGE ollama[2004316]: 2024/01/29 03:14:33 gpu.go:140: INFO CUDA Compute Capability detected: 8.9 Jan 29 03:14:33 FORGE ollama[2004316]: 2024/01/29 03:14:33 gpu.go:140: INFO CUDA Compute Capability detected: 8.9 Jan 29 03:14:33 FORGE ollama[2004316]: 2024/01/29 03:14:33 cpu_common.go:11: INFO CPU has AVX2 Jan 29 03:14:33 FORGE ollama[2004316]: 2024/01/29 03:14:33 dyn_ext_server.go:90: INFO Loading Dynamic llm server: /tmp/ollama4251586406/cuda_v11/libext_server.> Jan 29 03:14:33 FORGE ollama[2004316]: 2024/01/29 03:14:33 dyn_ext_server.go:145: INFO Initializing llama server Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /usr/share/ollama/.ollama/models/blobs> Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv 0: general.architecture str = llama Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv 1: general.name str = teknium Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv 2: llama.context_length u32 = 32768 Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv 3: llama.embedding_length u32 = 4096 Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv 4: llama.block_count u32 = 32 Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336 Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128 Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8 Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000 Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv 11: general.file_type u32 = 2 Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv 12: tokenizer.ggml.model str = llama Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32002] = ["<unk>", "<s>", "</s>", "<0> Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32002] = [0.000000, 0.000000, 0.00000> Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32002] = [2, 3, 3, 6, 6, 6, 6, 6, 6, > Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1 Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 32000 Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv 18: tokenizer.ggml.add_bos_token bool = true Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv 19: tokenizer.ggml.add_eos_token bool = false Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv 20: tokenizer.chat_template str = {% for message in messages %> Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - kv 21: general.quantization_version u32 = 2 Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - type f32: 65 tensors Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - type q4_0: 225 tensors Jan 29 03:14:33 FORGE ollama[2004316]: llama_model_loader: - type q6_K: 1 tensors Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_vocab: special tokens definition check successful ( 261/32002 ). Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: format = GGUF V3 (latest) Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: arch = llama Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: vocab type = SPM Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_vocab = 32002 Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_merges = 0 Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_ctx_train = 32768 Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_embd = 4096 Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_head = 32 Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_head_kv = 8 Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_layer = 32 Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_rot = 128 Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_embd_head_k = 128 Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_embd_head_v = 128 Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_gqa = 4 Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_embd_k_gqa = 1024 Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_embd_v_gqa = 1024 Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: f_norm_eps = 0.0e+00 Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_ff = 14336 Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_expert = 0 Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_expert_used = 0 Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: rope scaling = linear Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: freq_base_train = 10000.0 an 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: freq_scale_train = 1 Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: n_yarn_orig_ctx = 32768 Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: rope_finetuned = unknown Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: model type = 7B Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: model ftype = Q4_0 Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: model params = 7.24 B Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: model size = 3.83 GiB (4.54 BPW) Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: general.name = teknium Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: BOS token = 1 '<s>' Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: EOS token = 32000 '<|im_end|>' Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: UNK token = 0 '<unk>' Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_print_meta: LF token = 13 '<0x0A>' Jan 29 03:14:33 FORGE ollama[2004316]: llm_load_tensors: ggml ctx size = 0.22 MiB Jan 29 03:14:35 FORGE ollama[2004316]: llm_load_tensors: offloading 32 repeating layers to GPU Jan 29 03:14:35 FORGE ollama[2004316]: llm_load_tensors: offloading non-repeating layers to GPU Jan 29 03:14:35 FORGE ollama[2004316]: llm_load_tensors: offloaded 33/33 layers to GPU Jan 29 03:14:35 FORGE ollama[2004316]: llm_load_tensors: CPU buffer size = 70.32 MiB Jan 29 03:14:35 FORGE ollama[2004316]: llm_load_tensors: CUDA0 buffer size = 3847.56 MiB Jan 29 03:14:35 FORGE ollama[2004316]: ................................................................................................... Jan 29 03:14:35 FORGE ollama[2004316]: llama_new_context_with_model: n_ctx = 2048 Jan 29 03:14:35 FORGE ollama[2004316]: llama_new_context_with_model: freq_base = 10000.0 Jan 29 03:14:35 FORGE ollama[2004316]: llama_new_context_with_model: freq_scale = 1 Jan 29 03:14:35 FORGE ollama[2004316]: llama_kv_cache_init: CUDA0 KV buffer size = 256.00 MiB Jan 29 03:14:35 FORGE ollama[2004316]: llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB Jan 29 03:14:35 FORGE ollama[2004316]: llama_new_context_with_model: CUDA_Host input buffer size = 12.01 MiB Jan 29 03:14:35 FORGE ollama[2004316]: llama_new_context_with_model: CUDA0 compute buffer size = 156.00 MiB Jan 29 03:14:35 FORGE ollama[2004316]: llama_new_context_with_model: CUDA_Host compute buffer size = 8.00 MiB Jan 29 03:14:35 FORGE ollama[2004316]: llama_new_context_with_model: graph splits (measure): 3 Jan 29 03:14:35 FORGE ollama[2004316]: 2024/01/29 03:14:35 dyn_ext_server.go:156: INFO Starting llama main loop Jan 29 03:14:35 FORGE ollama[2004316]: [GIN] 2024/01/29 - 03:14:35 | 200 | 2.247827969s | 127.0.0.1 | POST "/api/chat" Jan 29 03:14:56 FORGE ollama[2004316]: 2024/01/29 03:14:56 dyn_ext_server.go:170: INFO loaded 0 images Jan 29 03:14:57 FORGE ollama[2004316]: [GIN] 2024/01/29 - 03:14:57 | 200 | 358.002761ms | 127.0.0.1 | POST "/api/chat" Jan 29 03:15:38 FORGE ollama[2004316]: 2024/01/29 03:15:38 dyn_ext_server.go:170: INFO loaded 0 images ```
Author
Owner

@iplayfast commented on GitHub (Jan 31, 2024):

Here is the function that eventually fails


def get_answer(ollama, question, timeout=1000):
    start_time = time.time()
    result = ''
    """Get an answer from the Ollama model with a timeout."""
    with concurrent.futures.ThreadPoolExecutor() as executor:
        future = executor.submit(ollama, question)
        try:
            result = future.result(timeout=timeout).strip()
        except concurrent.futures.TimeoutError:
            print(f"Timed out after {timeout} seconds for question: {question}")
            result = 'No Answer due to timeout'
        except Exception as e:
            print(f"Error: {e}")
            result =  'No Answer due to error'
    end_time = time.time()
    elapsed_time = end_time - start_time
    return result.strip(), elapsed_time
# Usage in your loop remains the same

Here is log at the time of the timeout. (after 1500 seconds)

Jan 30 20:46:10 FORGE ollama[3131650]: 2024/01/30 20:46:10 dyn_ext_server.go:145: INFO Initializing llama server
Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from /usr/share/ollama/.ollama/models/blobs/sha256:4a3019290402c9eadf89a3bf793102a52a2a44dd76ea7b07fca53f9cbb789a63 (version GGUF V2)
Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv   0:                       general.architecture str              = llama
Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv   1:                               general.name str              = ehartford
Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv   4:                          llama.block_count u32              = 32
Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 8
Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 10000.000000
Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv  11:                          general.file_type u32              = 2
Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32002]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32002]   = [0.000000, 0.000000, 0.000000, 0.0000...
Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32002]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 32000
Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv  18:               general.quantization_version u32              = 2
Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - type  f32:   65 tensors
Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - type q4_0:  225 tensors
Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - type q6_K:    1 tensors
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_vocab: special tokens definition check successful ( 261/32002 ).
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: format           = GGUF V2
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: arch             = llama
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: vocab type       = SPM
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_vocab          = 32002
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_merges         = 0
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_ctx_train      = 32768
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_embd           = 4096
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_head           = 32
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_head_kv        = 8
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_layer          = 32
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_rot            = 128
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_embd_head_k    = 128
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_embd_head_v    = 128
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_gqa            = 4
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_embd_k_gqa     = 1024
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_embd_v_gqa     = 1024
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_ff             = 14336
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_expert         = 0
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_expert_used    = 0
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: rope scaling     = linear
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: freq_base_train  = 10000.0
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: freq_scale_train = 1
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_yarn_orig_ctx  = 32768
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: rope_finetuned   = unknown
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: model type       = 7B
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: model ftype      = Q4_0
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: model params     = 7.24 B
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: model size       = 3.83 GiB (4.54 BPW)
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: general.name     = ehartford
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: BOS token        = 1 '<s>'
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: EOS token        = 32000 '<|im_end|>'
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: UNK token        = 0 '<unk>'
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: LF token         = 13 '<0x0A>'
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_tensors: ggml ctx size =    0.22 MiB
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_tensors: offloading 32 repeating layers to GPU
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_tensors: offloading non-repeating layers to GPU
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_tensors: offloaded 33/33 layers to GPU
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_tensors:        CPU buffer size =    70.32 MiB
Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_tensors:      CUDA0 buffer size =  3847.56 MiB
Jan 30 20:46:11 FORGE ollama[3131650]: ..................................................................................................
Jan 30 20:46:11 FORGE ollama[3131650]: llama_new_context_with_model: n_ctx      = 2048
Jan 30 20:46:11 FORGE ollama[3131650]: llama_new_context_with_model: freq_base  = 10000.0
Jan 30 20:46:11 FORGE ollama[3131650]: llama_new_context_with_model: freq_scale = 1
Jan 30 20:46:11 FORGE ollama[3131650]: llama_kv_cache_init:      CUDA0 KV buffer size =   256.00 MiB
Jan 30 20:46:11 FORGE ollama[3131650]: llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
Jan 30 20:46:11 FORGE ollama[3131650]: llama_new_context_with_model:  CUDA_Host input buffer size   =    12.01 MiB
Jan 30 20:46:11 FORGE ollama[3131650]: llama_new_context_with_model:      CUDA0 compute buffer size =   156.00 MiB
Jan 30 20:46:11 FORGE ollama[3131650]: llama_new_context_with_model:  CUDA_Host compute buffer size =     8.00 MiB
Jan 30 20:46:11 FORGE ollama[3131650]: llama_new_context_with_model: graph splits (measure): 3
Jan 30 20:46:11 FORGE ollama[3131650]: 2024/01/30 20:46:11 dyn_ext_server.go:156: INFO Starting llama main loop
Jan 30 20:46:11 FORGE ollama[3131650]: 2024/01/30 20:46:11 dyn_ext_server.go:170: INFO loaded 0 images
<!-- gh-comment-id:1918319664 --> @iplayfast commented on GitHub (Jan 31, 2024): Here is the function that eventually fails ``` def get_answer(ollama, question, timeout=1000): start_time = time.time() result = '' """Get an answer from the Ollama model with a timeout.""" with concurrent.futures.ThreadPoolExecutor() as executor: future = executor.submit(ollama, question) try: result = future.result(timeout=timeout).strip() except concurrent.futures.TimeoutError: print(f"Timed out after {timeout} seconds for question: {question}") result = 'No Answer due to timeout' except Exception as e: print(f"Error: {e}") result = 'No Answer due to error' end_time = time.time() elapsed_time = end_time - start_time return result.strip(), elapsed_time # Usage in your loop remains the same ``` Here is log at the time of the timeout. (after 1500 seconds) ``` Jan 30 20:46:10 FORGE ollama[3131650]: 2024/01/30 20:46:10 dyn_ext_server.go:145: INFO Initializing llama server Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from /usr/share/ollama/.ollama/models/blobs/sha256:4a3019290402c9eadf89a3bf793102a52a2a44dd76ea7b07fca53f9cbb789a63 (version GGUF V2) Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv 0: general.architecture str = llama Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv 1: general.name str = ehartford Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv 2: llama.context_length u32 = 32768 Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv 3: llama.embedding_length u32 = 4096 Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv 4: llama.block_count u32 = 32 Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336 Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128 Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8 Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000 Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv 11: general.file_type u32 = 2 Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv 12: tokenizer.ggml.model str = llama Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32002] = ["<unk>", "<s>", "</s>", "<0x00>", "<... Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32002] = [0.000000, 0.000000, 0.000000, 0.0000... Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32002] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1 Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 32000 Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - kv 18: general.quantization_version u32 = 2 Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - type f32: 65 tensors Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - type q4_0: 225 tensors Jan 30 20:46:10 FORGE ollama[3131650]: llama_model_loader: - type q6_K: 1 tensors Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_vocab: special tokens definition check successful ( 261/32002 ). Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: format = GGUF V2 Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: arch = llama Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: vocab type = SPM Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_vocab = 32002 Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_merges = 0 Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_ctx_train = 32768 Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_embd = 4096 Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_head = 32 Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_head_kv = 8 Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_layer = 32 Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_rot = 128 Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_embd_head_k = 128 Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_embd_head_v = 128 Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_gqa = 4 Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_embd_k_gqa = 1024 Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_embd_v_gqa = 1024 Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: f_norm_eps = 0.0e+00 Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_ff = 14336 Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_expert = 0 Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_expert_used = 0 Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: rope scaling = linear Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: freq_base_train = 10000.0 Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: freq_scale_train = 1 Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: n_yarn_orig_ctx = 32768 Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: rope_finetuned = unknown Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: model type = 7B Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: model ftype = Q4_0 Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: model params = 7.24 B Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: model size = 3.83 GiB (4.54 BPW) Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: general.name = ehartford Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: BOS token = 1 '<s>' Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: EOS token = 32000 '<|im_end|>' Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: UNK token = 0 '<unk>' Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_print_meta: LF token = 13 '<0x0A>' Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_tensors: ggml ctx size = 0.22 MiB Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_tensors: offloading 32 repeating layers to GPU Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_tensors: offloading non-repeating layers to GPU Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_tensors: offloaded 33/33 layers to GPU Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_tensors: CPU buffer size = 70.32 MiB Jan 30 20:46:10 FORGE ollama[3131650]: llm_load_tensors: CUDA0 buffer size = 3847.56 MiB Jan 30 20:46:11 FORGE ollama[3131650]: .................................................................................................. Jan 30 20:46:11 FORGE ollama[3131650]: llama_new_context_with_model: n_ctx = 2048 Jan 30 20:46:11 FORGE ollama[3131650]: llama_new_context_with_model: freq_base = 10000.0 Jan 30 20:46:11 FORGE ollama[3131650]: llama_new_context_with_model: freq_scale = 1 Jan 30 20:46:11 FORGE ollama[3131650]: llama_kv_cache_init: CUDA0 KV buffer size = 256.00 MiB Jan 30 20:46:11 FORGE ollama[3131650]: llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB Jan 30 20:46:11 FORGE ollama[3131650]: llama_new_context_with_model: CUDA_Host input buffer size = 12.01 MiB Jan 30 20:46:11 FORGE ollama[3131650]: llama_new_context_with_model: CUDA0 compute buffer size = 156.00 MiB Jan 30 20:46:11 FORGE ollama[3131650]: llama_new_context_with_model: CUDA_Host compute buffer size = 8.00 MiB Jan 30 20:46:11 FORGE ollama[3131650]: llama_new_context_with_model: graph splits (measure): 3 Jan 30 20:46:11 FORGE ollama[3131650]: 2024/01/30 20:46:11 dyn_ext_server.go:156: INFO Starting llama main loop Jan 30 20:46:11 FORGE ollama[3131650]: 2024/01/30 20:46:11 dyn_ext_server.go:170: INFO loaded 0 images ```
Author
Owner

@dhiltgen commented on GitHub (Mar 20, 2024):

This should be resolved by #3218

<!-- gh-comment-id:2009965438 --> @dhiltgen commented on GitHub (Mar 20, 2024): This should be resolved by #3218
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#26713