[GH-ISSUE #7640] Error: POST predict: Post "http://127.0.0.1:42623/completion": EOF #51386

Closed
opened 2026-04-28 19:46:39 -05:00 by GiteaMirror · 37 comments
Owner

Originally created by @phalexo on GitHub (Nov 13, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7640

What is the issue?

(CodeLlama) developer@ai:~/PROJECTS/OllamaModelFiles$ ~/ollama/ollama run gemma-2-27b-it-Q8_0:latest

Hello.
Error: POST predict: Post "http://127.0.0.1:42623/completion": EOF

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

Latest

Originally created by @phalexo on GitHub (Nov 13, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7640 ### What is the issue? (CodeLlama) developer@ai:~/PROJECTS/OllamaModelFiles$ ~/ollama/ollama run gemma-2-27b-it-Q8_0:latest >>> Hello. Error: POST predict: Post "http://127.0.0.1:42623/completion": EOF ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version Latest
GiteaMirror added the bug label 2026-04-28 19:46:39 -05:00
Author
Owner

@agreppin commented on GitHub (Nov 13, 2024):

had same issue and posted a log for that problem in #7638 (but AMD ROCm not Nvidia)

edit: my CPU does not support avx2, I just did go build .; now retrying with go build avx,rocm .; will confirm later ...

<!-- gh-comment-id:2472554811 --> @agreppin commented on GitHub (Nov 13, 2024): had same issue and posted a log for that problem in #7638 (but AMD ROCm not Nvidia) edit: my CPU does not support avx2, I just did go build .; now retrying with go build avx,rocm .; will confirm later ...
Author
Owner
<!-- gh-comment-id:2472704597 --> @rick-github commented on GitHub (Nov 13, 2024): https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues
Author
Owner

@phalexo commented on GitHub (Nov 13, 2024):

Did you resolve this issue? I noticed that the port number changes from one
attempt to another, as if it is being randomly overwritten.

On Wed, Nov 13, 2024, 1:24 AM Alain Greppin @.***>
wrote:

had same issue and posted a log for that exact problem in #7638
https://github.com/ollama/ollama/issues/7638


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/7640#issuecomment-2472554811,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABDD3ZP7VL5QZAE7QM235Q32ALWADAVCNFSM6AAAAABRVM4QT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZSGU2TIOBRGE
.
You are receiving this because you authored the thread.Message ID:
@.***>

<!-- gh-comment-id:2473958298 --> @phalexo commented on GitHub (Nov 13, 2024): Did you resolve this issue? I noticed that the port number changes from one attempt to another, as if it is being randomly overwritten. On Wed, Nov 13, 2024, 1:24 AM Alain Greppin ***@***.***> wrote: > had same issue and posted a log for that exact problem in #7638 > <https://github.com/ollama/ollama/issues/7638> > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/7640#issuecomment-2472554811>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABDD3ZP7VL5QZAE7QM235Q32ALWADAVCNFSM6AAAAABRVM4QT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZSGU2TIOBRGE> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> >
Author
Owner

@rick-github commented on GitHub (Nov 13, 2024):

The port number changes because it's randomly selected when a new server starts.

<!-- gh-comment-id:2473980662 --> @rick-github commented on GitHub (Nov 13, 2024): The port number changes because it's randomly selected when a new server starts.
Author
Owner

@phalexo commented on GitHub (Nov 13, 2024):

Re: " The port number changes because it's randomly selected when a new
server starts."

If you mean that when I run
ollama serve
the port is selected at random, then it is really what I am saying.

it is when I run

ollama run "model_name" and after I prompt it with "Hello." that it fails
and returns that error message, and every time it reports a different port.

On Wed, Nov 13, 2024 at 10:40 AM frob @.***> wrote:

The port number changes because it's randomly selected when a new server
starts.


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/7640#issuecomment-2473980662,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABDD3ZNAMBAJJCCTVAPTOQ32ANXFXAVCNFSM6AAAAABRVM4QT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZTHE4DANRWGI
.
You are receiving this because you authored the thread.Message ID:
@.***>

<!-- gh-comment-id:2474153765 --> @phalexo commented on GitHub (Nov 13, 2024): Re: " The port number changes because it's randomly selected when a new server starts." If you mean that when I run ollama serve the port is selected at random, then it is really what I am saying. it is when I run ollama run "model_name" and after I prompt it with "Hello." that it fails and returns that error message, and every time it reports a different port. On Wed, Nov 13, 2024 at 10:40 AM frob ***@***.***> wrote: > The port number changes because it's randomly selected when a new server > starts. > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/7640#issuecomment-2473980662>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABDD3ZNAMBAJJCCTVAPTOQ32ANXFXAVCNFSM6AAAAABRVM4QT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZTHE4DANRWGI> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> >
Author
Owner

@phalexo commented on GitHub (Nov 13, 2024):

Re: " The port number changes because it's randomly selected when a new
server starts."

If you mean that when I run
ollama serve
the port is selected at random, then it is really NOT what I am saying.

it is when I run

ollama run "model_name" and after I prompt it with "Hello." that it fails
and returns that error message, and every time it reports a different port.

On Wed, Nov 13, 2024 at 10:40 AM frob @.***> wrote:

The port number changes because it's randomly selected when a new server
starts.


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/7640#issuecomment-2473980662,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABDD3ZNAMBAJJCCTVAPTOQ32ANXFXAVCNFSM6AAAAABRVM4QT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZTHE4DANRWGI
.
You are receiving this because you authored the thread.Message ID:
@.***>

<!-- gh-comment-id:2474158599 --> @phalexo commented on GitHub (Nov 13, 2024): Re: " The port number changes because it's randomly selected when a new server starts." If you mean that when I run ollama serve the port is selected at random, then it is really NOT what I am saying. it is when I run ollama run "model_name" and after I prompt it with "Hello." that it fails and returns that error message, and every time it reports a different port. On Wed, Nov 13, 2024 at 10:40 AM frob ***@***.***> wrote: > The port number changes because it's randomly selected when a new server > starts. > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/7640#issuecomment-2473980662>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABDD3ZNAMBAJJCCTVAPTOQ32ANXFXAVCNFSM6AAAAABRVM4QT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZTHE4DANRWGI> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> >
Author
Owner

@rick-github commented on GitHub (Nov 13, 2024):

I should have been clearer about which server I was talking about. When you run ollama run model_name, the ollama server creates a new process, ollama_llama_server, otherwise known as the runner. The ollama server communicates with the runner over TCP, using a port that is randomly selected when the runner is launched. When the runner crashes, the ollama server returns the error indicating that the attempt to communicate with the runner on the allocated port has failed: Error: POST predict: Post "http://127.0.0.1:42623/completion": EOF. The ollama server will then start a new runner, which gets a different port number.

<!-- gh-comment-id:2474305965 --> @rick-github commented on GitHub (Nov 13, 2024): I should have been clearer about which server I was talking about. When you run `ollama run model_name`, the ollama server creates a new process, `ollama_llama_server`, otherwise known as the runner. The ollama server communicates with the runner over TCP, using a port that is randomly selected when the runner is launched. When the runner crashes, the ollama server returns the error indicating that the attempt to communicate with the runner on the allocated port has failed: `Error: POST predict: Post "http://127.0.0.1:42623/completion": EOF`. The ollama server will then start a new runner, which gets a different port number.
Author
Owner

@fragelius1 commented on GitHub (Nov 13, 2024):

the old server is still running and when your trying to run new one it crashes is my guess. coz if I go to http://localhost:11434/ its says that ollama is running

is there a way to jump in already running process in linux?

<!-- gh-comment-id:2474401747 --> @fragelius1 commented on GitHub (Nov 13, 2024): the old server is still running and when your trying to run new one it crashes is my guess. coz if I go to http://localhost:11434/ its says that ollama is running is there a way to jump in already running process in linux?
Author
Owner

@rick-github commented on GitHub (Nov 13, 2024):

The EOF error is from the runner crashing. The ollama server is not crashing. In Linux you can attach to a process with strace to see what it is doing.

<!-- gh-comment-id:2474407480 --> @rick-github commented on GitHub (Nov 13, 2024): The `EOF` error is from the runner crashing. The ollama server is not crashing. In Linux you can attach to a process with `strace` to see what it is doing.
Author
Owner

@fragelius1 commented on GitHub (Nov 13, 2024):

Indeed.

<!-- gh-comment-id:2474418173 --> @fragelius1 commented on GitHub (Nov 13, 2024): Indeed.
Author
Owner

@phalexo commented on GitHub (Nov 13, 2024):

It may have to do something either with model size or quantization. I am trying to run Qwen2.5-Coder-Instruct.Q8_0, which causes this issue.

I have 4 GPUs with with 12.2 GiB each, so the 8 bit model should fit without issues.

That said, I tried instead to pull the standard 4 bit model from Ollama repo, and it does not seem to have the same problem.

<!-- gh-comment-id:2474525747 --> @phalexo commented on GitHub (Nov 13, 2024): It may have to do something either with model size or quantization. I am trying to run Qwen2.5-Coder-Instruct.Q8_0, which causes this issue. I have 4 GPUs with with 12.2 GiB each, so the 8 bit model should fit without issues. That said, I tried instead to pull the standard 4 bit model from Ollama repo, and it does not seem to have the same problem.
Author
Owner

@phalexo commented on GitHub (Nov 13, 2024):

Forcing it to run with "cpu_avx" library gets rid of the error, but is extremely slow. So, appears to be GPU/CUDA related.

<!-- gh-comment-id:2474729624 --> @phalexo commented on GitHub (Nov 13, 2024): Forcing it to run with "cpu_avx" library gets rid of the error, but is extremely slow. So, appears to be GPU/CUDA related.
Author
Owner

@phalexo commented on GitHub (Nov 13, 2024):

Forcing it to run with "cpu_avx" library gets rid of the error, but is extremely slow. So, appears to be GPU/CUDA related.

Dropping the version to V0.3.11 fixes this problem. The build process looked quite different, just by eyeballing it.

<!-- gh-comment-id:2474922343 --> @phalexo commented on GitHub (Nov 13, 2024): > Forcing it to run with "cpu_avx" library gets rid of the error, but is extremely slow. So, appears to be GPU/CUDA related. Dropping the version to V0.3.11 fixes this problem. The build process looked quite different, just by eyeballing it.
Author
Owner

@fragelius1 commented on GitHub (Nov 14, 2024):

hmm seems to be only llama3.2 problem? pulled codegemma and it works okay... maybe coz its bigger than 5gb and dosnt go on gpu memory it works ok... i have old 4gb card... err nope its loaded in gpu 3,5gb size.

updt2: okay ran some more tests, and if you dont use AI for some time it will unload itself from gpu and then when trying to query it again it loaded only 1.8gb and crashed in this "error: post predict: ...." even on codegemma

updt3: even if it can load itself back to 3.6gb on card it still might crash

<!-- gh-comment-id:2475918365 --> @fragelius1 commented on GitHub (Nov 14, 2024): hmm seems to be only llama3.2 problem? pulled codegemma and it works okay... maybe coz its bigger than 5gb and dosnt go on gpu memory it works ok... i have old 4gb card... err nope its loaded in gpu 3,5gb size. updt2: okay ran some more tests, and if you dont use AI for some time it will unload itself from gpu and then when trying to query it again it loaded only 1.8gb and crashed in this "error: post predict: ...." even on codegemma updt3: even if it can load itself back to 3.6gb on card it still might crash
Author
Owner

@phalexo commented on GitHub (Nov 14, 2024):

I dropped the version to v0.3.11 and now it works with
qwen2.5-coder-instruct-32b.Q8_0, which is about 34GB.

On Thu, Nov 14, 2024, 5:05 AM fragelius1 @.***> wrote:

hmm seems to be only llama3.2 problem? pulled codegemma and it works
okay... maybe coz its bigger than 5gb and dosnt go on gpu memory it works
ok... i have old 4gb card


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/7640#issuecomment-2475918365,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABDD3ZOCR225D6ZLDVNPKOL2ARYXFAVCNFSM6AAAAABRVM4QT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZVHEYTQMZWGU
.
You are receiving this because you authored the thread.Message ID:
@.***>

<!-- gh-comment-id:2476527391 --> @phalexo commented on GitHub (Nov 14, 2024): I dropped the version to v0.3.11 and now it works with qwen2.5-coder-instruct-32b.Q8_0, which is about 34GB. On Thu, Nov 14, 2024, 5:05 AM fragelius1 ***@***.***> wrote: > hmm seems to be only llama3.2 problem? pulled codegemma and it works > okay... maybe coz its bigger than 5gb and dosnt go on gpu memory it works > ok... i have old 4gb card > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/7640#issuecomment-2475918365>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABDD3ZOCR225D6ZLDVNPKOL2ARYXFAVCNFSM6AAAAABRVM4QT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZVHEYTQMZWGU> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> >
Author
Owner

@jessegross commented on GitHub (Nov 14, 2024):

If someone who is running into the issue can post server logs, that would be the next step in debugging.

<!-- gh-comment-id:2477420342 --> @jessegross commented on GitHub (Nov 14, 2024): If someone who is running into the issue can post [server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues), that would be the next step in debugging.
Author
Owner

@konrad0101 commented on GitHub (Nov 15, 2024):

@jessegross I'm getting the same error on ollama 0.4.1 running llama 3.1 70B Q4_K_M on Linux with Nvidia (2x3090). Full logs are below.

Application layer:

File "/opt/venv/lib/python3.10/site-packages/ollama/_client.py", line 654, in chat
    return await self._request_stream(
  File "/opt/venv/lib/python3.10/site-packages/ollama/_client.py", line 518, in _request_stream
    response = await self._request(*args, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/ollama/_client.py", line 488, in _request
    raise ResponseError(e.response.text, e.response.status_code) from None
ollama._types.ResponseError: POST predict: Post "http://127.0.0.1:40081/completion": EOF

And from ollama logs:

Nov 14 22:42:39 ubuntu ollama[2477]: [GIN] 2024/11/14 - 22:42:39 | 200 |    7.405984ms |       127.0.0.1 | POST     "/api/embeddings"
Nov 14 22:42:41 ubuntu ollama[2477]: [GIN] 2024/11/14 - 22:42:41 | 200 |  1.366778533s |       127.0.0.1 | POST     "/api/chat"
Nov 14 22:42:41 ubuntu ollama[2477]: [GIN] 2024/11/14 - 22:42:41 | 200 |   10.587194ms |       127.0.0.1 | POST     "/api/embeddings"
Nov 14 22:42:44 ubuntu ollama[2477]: [GIN] 2024/11/14 - 22:42:44 | 200 |  1.223217799s |       127.0.0.1 | POST     "/api/chat"
Nov 14 22:42:44 ubuntu ollama[2477]: [GIN] 2024/11/14 - 22:42:44 | 200 |   11.002472ms |       127.0.0.1 | POST     "/api/embeddings"
Nov 14 22:42:46 ubuntu ollama[2477]: [GIN] 2024/11/14 - 22:42:46 | 200 |  1.032608795s |       127.0.0.1 | POST     "/api/chat"
Nov 14 22:42:59 ubuntu ollama[2477]: time=2024-11-14T22:42:59.301-05:00 level=WARN source=runner.go:126 msg="truncating input prompt" limit=8000 prompt=8101 numKeep=5
Nov 14 22:42:59 ubuntu ollama[2477]: time=2024-11-14T22:42:59.330-05:00 level=WARN source=runner.go:126 msg="truncating input prompt" limit=8000 prompt=8110 numKeep=5
Nov 14 22:43:44 ubuntu ollama[2477]: [GIN] 2024/11/14 - 22:43:44 | 200 | 45.650632769s |       127.0.0.1 | POST     "/api/chat"
Nov 14 22:43:44 ubuntu ollama[2477]: time=2024-11-14T22:43:44.970-05:00 level=WARN source=runner.go:126 msg="truncating input prompt" limit=8000 prompt=8137 numKeep=5
Nov 14 22:44:04 ubuntu ollama[2477]: [GIN] 2024/11/14 - 22:44:04 | 200 |          1m4s |       127.0.0.1 | POST     "/api/chat"
Nov 14 22:44:04 ubuntu ollama[2477]: time=2024-11-14T22:44:04.275-05:00 level=WARN source=runner.go:126 msg="truncating input prompt" limit=8000 prompt=8305 numKeep=5
Nov 14 22:44:08 ubuntu ollama[2477]: [GIN] 2024/11/14 - 22:44:08 | 200 |          1m9s |       127.0.0.1 | POST     "/api/chat"
Nov 14 22:44:26 ubuntu ollama[2477]: time=2024-11-14T22:44:26.367-05:00 level=ERROR source=runner.go:426 msg="failed to decode batch" error="could not find a KV slot for the batch - try reducing the size of the batch or increase the context. code: 1"
Nov 14 22:44:26 ubuntu ollama[2477]: time=2024-11-14T22:44:26.400-05:00 level=ERROR source=runner.go:426 msg="failed to decode batch" error="could not find a KV slot for the batch - try reducing the size of the batch or increase the context. code: 1"
Nov 14 22:44:26 ubuntu ollama[2477]: time=2024-11-14T22:44:26.432-05:00 level=ERROR source=runner.go:426 msg="failed to decode batch" error="could not find a KV slot for the batch - try reducing the size of the batch or increase the context. code: 1"
Nov 14 22:44:26 ubuntu ollama[2477]: time=2024-11-14T22:44:26.465-05:00 level=ERROR source=runner.go:426 msg="failed to decode batch" error="could not find a KV slot for the batch - try reducing the size of the batch or increase the context. code: 1"
Nov 14 22:44:26 ubuntu ollama[2477]: time=2024-11-14T22:44:26.497-05:00 level=ERROR source=runner.go:426 msg="failed to decode batch" error="could not find a KV slot for the batch - try reducing the size of the batch or increase the context. code: 1"
Nov 14 22:44:26 ubuntu ollama[2477]: time=2024-11-14T22:44:26.530-05:00 level=ERROR source=runner.go:426 msg="failed to decode batch" error="could not find a KV slot for the batch - try reducing the size of the batch or increase the context. code: 1"
Nov 14 22:44:26 ubuntu ollama[2477]: time=2024-11-14T22:44:26.562-05:00 level=ERROR source=runner.go:426 msg="failed to decode batch" error="could not find a KV slot for the batch - try reducing the size of the batch or increase the context. code: 1"
Nov 14 22:44:27 ubuntu ollama[2477]: llama_get_logits_ith: invalid logits id 826, reason: batch.logits[826] != true
Nov 14 22:44:27 ubuntu ollama[2477]: SIGSEGV: segmentation violation
Nov 14 22:44:27 ubuntu ollama[2477]: PC=0x5fefed64dc68 m=0 sigcode=1 addr=0x0
Nov 14 22:44:27 ubuntu ollama[2477]: signal arrived during cgo execution
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 7 gp=0xc0000f4000 m=0 mp=0x5fefed9cde80 [syscall]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.cgocall(0x5fefed4c5cc0, 0xc0001b1c78)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/cgocall.go:157 +0x4b fp=0xc0001b1c50 sp=0xc0001b1c18 pc=0x5fefed2483cb
Nov 14 22:44:27 ubuntu ollama[2477]: github.com/ollama/ollama/llama._Cfunc_gpt_sampler_csample(0x7042e83321a0, 0x7044f0006b10, 0x33a)
Nov 14 22:44:27 ubuntu ollama[2477]:         _cgo_gotypes.go:463 +0x4f fp=0xc0001b1c78 sp=0xc0001b1c50 pc=0x5fefed3454af
Nov 14 22:44:27 ubuntu ollama[2477]: main.(*Server).processBatch.(*SamplingContext).Sample.func2(0xc0003ec200?, 0x4?, 0x33a)
Nov 14 22:44:27 ubuntu ollama[2477]:         github.com/ollama/ollama/llama/llama.go:663 +0x86 fp=0xc0001b1cc8 sp=0xc0001b1c78 pc=0x5fefed4c1ca6
Nov 14 22:44:27 ubuntu ollama[2477]: github.com/ollama/ollama/llama.(*SamplingContext).Sample(...)
Nov 14 22:44:27 ubuntu ollama[2477]:         github.com/ollama/ollama/llama/llama.go:663
Nov 14 22:44:27 ubuntu ollama[2477]: main.(*Server).processBatch(0xc0000c0120, 0xc000134000, 0xc0001b1f10)
Nov 14 22:44:27 ubuntu ollama[2477]:         github.com/ollama/ollama/llama/runner/runner.go:458 +0x4ea fp=0xc0001b1ed0 sp=0xc0001b1cc8 pc=0x5fefed4c0fca
Nov 14 22:44:27 ubuntu ollama[2477]: main.(*Server).run(0xc0000c0120, {0x5fefed7ffa40, 0xc000098050})
Nov 14 22:44:27 ubuntu ollama[2477]:         github.com/ollama/ollama/llama/runner/runner.go:338 +0x1a5 fp=0xc0001b1fb8 sp=0xc0001b1ed0 pc=0x5fefed4c0765
Nov 14 22:44:27 ubuntu ollama[2477]: main.main.gowrap2()
Nov 14 22:44:27 ubuntu ollama[2477]:         github.com/ollama/ollama/llama/runner/runner.go:901 +0x28 fp=0xc0001b1fe0 sp=0xc0001b1fb8 pc=0x5fefed4c4ec8
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc0001b1fe8 sp=0xc0001b1fe0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by main.main in goroutine 1
Nov 14 22:44:27 ubuntu ollama[2477]:         github.com/ollama/ollama/llama/runner/runner.go:901 +0xc2b
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 1 gp=0xc0000061c0 m=nil [IO wait, 1 minutes]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0xc000032008?, 0xc0000e5908?, 0xf4?, 0xed?, 0xc0000e58e8?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc0000e5888 sp=0xc0000e5868 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.netpollblock(0x10?, 0xed247b26?, 0xef?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/netpoll.go:573 +0xf7 fp=0xc0000e58c0 sp=0xc0000e5888 pc=0x5fefed277257
Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.runtime_pollWait(0x70456c656fe0, 0x72)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/netpoll.go:345 +0x85 fp=0xc0000e58e0 sp=0xc0000e58c0 pc=0x5fefed2abaa5
Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*pollDesc).wait(0x3?, 0x70456c6113a8?, 0x0)
Nov 14 22:44:27 ubuntu ollama[2477]:         internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0000e5908 sp=0xc0000e58e0 pc=0x5fefed2fb9c7
Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*pollDesc).waitRead(...)
Nov 14 22:44:27 ubuntu ollama[2477]:         internal/poll/fd_poll_runtime.go:89
Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*FD).Accept(0xc000030100)
Nov 14 22:44:27 ubuntu ollama[2477]:         internal/poll/fd_unix.go:611 +0x2ac fp=0xc0000e59b0 sp=0xc0000e5908 pc=0x5fefed2fce8c
Nov 14 22:44:27 ubuntu ollama[2477]: net.(*netFD).accept(0xc000030100)
Nov 14 22:44:27 ubuntu ollama[2477]:         net/fd_unix.go:172 +0x29 fp=0xc0000e5a68 sp=0xc0000e59b0 pc=0x5fefed36b8a9
Nov 14 22:44:27 ubuntu ollama[2477]: net.(*TCPListener).accept(0xc0000801e0)
Nov 14 22:44:27 ubuntu ollama[2477]:         net/tcpsock_posix.go:159 +0x1e fp=0xc0000e5a90 sp=0xc0000e5a68 pc=0x5fefed37c5de
Nov 14 22:44:27 ubuntu ollama[2477]: net.(*TCPListener).Accept(0xc0000801e0)
Nov 14 22:44:27 ubuntu ollama[2477]:         net/tcpsock.go:327 +0x30 fp=0xc0000e5ac0 sp=0xc0000e5a90 pc=0x5fefed37b930
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*onceCloseListener).Accept(0xc00035e090?)
Nov 14 22:44:27 ubuntu ollama[2477]:         <autogenerated>:1 +0x24 fp=0xc0000e5ad8 sp=0xc0000e5ac0 pc=0x5fefed4a2a44
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*Server).Serve(0xc0000161e0, {0x5fefed7ff400, 0xc0000801e0})
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:3260 +0x33e fp=0xc0000e5c08 sp=0xc0000e5ad8 pc=0x5fefed49985e
Nov 14 22:44:27 ubuntu ollama[2477]: main.main()
Nov 14 22:44:27 ubuntu ollama[2477]:         github.com/ollama/ollama/llama/runner/runner.go:921 +0xfcc fp=0xc0000e5f50 sp=0xc0000e5c08 pc=0x5fefed4c4c4c
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.main()
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:271 +0x29d fp=0xc0000e5fe0 sp=0xc0000e5f50 pc=0x5fefed27ebdd
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc0000e5fe8 sp=0xc0000e5fe0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 2 gp=0xc000006c40 m=nil [force gc (idle), 103 minutes]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc000060fa8 sp=0xc000060f88 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goparkunlock(...)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:408
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.forcegchelper()
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:326 +0xb8 fp=0xc000060fe0 sp=0xc000060fa8 pc=0x5fefed27ee98
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc000060fe8 sp=0xc000060fe0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.init.6 in goroutine 1
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:314 +0x1a
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc000061780 sp=0xc000061760 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goparkunlock(...)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:408
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.bgsweep(0xc00008a000)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgcsweep.go:318 +0xdf fp=0xc0000617c8 sp=0xc000061780 pc=0x5fefed269b9f
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcenable.gowrap1()
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:203 +0x25 fp=0xc0000617e0 sp=0xc0000617c8 pc=0x5fefed25e685
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc0000617e8 sp=0xc0000617e0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcenable in goroutine 1
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:203 +0x66
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x56913?, 0x4d578a?, 0x0?, 0x0?, 0x0?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc000061f78 sp=0xc000061f58 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goparkunlock(...)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:408
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.(*scavengerState).park(0x5fefed9cd4c0)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgcscavenge.go:425 +0x49 fp=0xc000061fa8 sp=0xc000061f78 pc=0x5fefed267549
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.bgscavenge(0xc00008a000)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgcscavenge.go:658 +0x59 fp=0xc000061fc8 sp=0xc000061fa8 pc=0x5fefed267af9
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcenable.gowrap2()
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:204 +0x25 fp=0xc000061fe0 sp=0xc000061fc8 pc=0x5fefed25e625
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc000061fe8 sp=0xc000061fe0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcenable in goroutine 1
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:204 +0xa5
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 5 gp=0xc000007c00 m=nil [finalizer wait]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x0?, 0x5fefed7fb1a0?, 0x0?, 0xe0?, 0x1000000010?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc000060620 sp=0xc000060600 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.runfinq()
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mfinal.go:194 +0x107 fp=0xc0000607e0 sp=0xc000060620 pc=0x5fefed25d6c7
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc0000607e8 sp=0xc0000607e0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.createfing in goroutine 1
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mfinal.go:164 +0x3d
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 56 gp=0xc000007dc0 m=nil [GC worker (idle), 103 minutes]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x5fefed2c91f3?, 0xc00008c890?, 0x0?, 0x0?, 0x5fefedab6060?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc000062f50 sp=0xc000062f30 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker()
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1310 +0xe5 fp=0xc000062fe0 sp=0xc000062f50 pc=0x5fefed260585
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc000062fe8 sp=0xc000062fe0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1234 +0x1c
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 2740 gp=0xc0000f4700 m=nil [select]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0xc0000e7a80?, 0x2?, 0x18?, 0x77?, 0xc0000e7824?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc0000e7698 sp=0xc0000e7678 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.selectgo(0xc0000e7a80, 0xc0000e7820, 0xc000030300?, 0x0, 0x1?, 0x1)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/select.go:327 +0x725 fp=0xc0000e77b8 sp=0xc0000e7698 pc=0x5fefed2903e5
Nov 14 22:44:27 ubuntu ollama[2477]: main.(*Server).completion(0xc0000c0120, {0x5fefed7ff5b0, 0xc000130380}, 0xc000116480)
Nov 14 22:44:27 ubuntu ollama[2477]:         github.com/ollama/ollama/llama/runner/runner.go:652 +0x8fe fp=0xc0000e7ab8 sp=0xc0000e77b8 pc=0x5fefed4c26de
Nov 14 22:44:27 ubuntu ollama[2477]: main.(*Server).completion-fm({0x5fefed7ff5b0?, 0xc000130380?}, 0x5fefed49db8d?)
Nov 14 22:44:27 ubuntu ollama[2477]:         <autogenerated>:1 +0x36 fp=0xc0000e7ae8 sp=0xc0000e7ab8 pc=0x5fefed4c56b6
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.HandlerFunc.ServeHTTP(0xc000018ea0?, {0x5fefed7ff5b0?, 0xc000130380?}, 0x10?)
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:2171 +0x29 fp=0xc0000e7b10 sp=0xc0000e7ae8 pc=0x5fefed496629
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*ServeMux).ServeHTTP(0x5fefed251f85?, {0x5fefed7ff5b0, 0xc000130380}, 0xc000116480)
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:2688 +0x1ad fp=0xc0000e7b60 sp=0xc0000e7b10 pc=0x5fefed4984ad
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.serverHandler.ServeHTTP({0x5fefed7fe900?}, {0x5fefed7ff5b0?, 0xc000130380?}, 0x6?)
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:3142 +0x8e fp=0xc0000e7b90 sp=0xc0000e7b60 pc=0x5fefed4994ce
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*conn).serve(0xc00035e120, {0x5fefed7ffa08, 0xc0000a6db0})
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:2044 +0x5e8 fp=0xc0000e7fb8 sp=0xc0000e7b90 pc=0x5fefed495268
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*Server).Serve.gowrap3()
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:3290 +0x28 fp=0xc0000e7fe0 sp=0xc0000e7fb8 pc=0x5fefed499c48
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc0000e7fe8 sp=0xc0000e7fe0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by net/http.(*Server).Serve in goroutine 1
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:3290 +0x4b4
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 55 gp=0xc0000f4a80 m=nil [GC worker (idle), 103 minutes]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x5fefed2c91f3?, 0xc00008c890?, 0x0?, 0x0?, 0x5fefedab6060?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc00005cf50 sp=0xc00005cf30 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker()
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1310 +0xe5 fp=0xc00005cfe0 sp=0xc00005cf50 pc=0x5fefed260585
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc00005cfe8 sp=0xc00005cfe0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1234 +0x1c
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 3218 gp=0xc0000f4c40 m=nil [IO wait, 1 minutes]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x10?, 0x10?, 0xf0?, 0xd5?, 0xb?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc00005d5a8 sp=0xc00005d588 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.netpollblock(0x5fefed2e5558?, 0xed247b26?, 0xef?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/netpoll.go:573 +0xf7 fp=0xc00005d5e0 sp=0xc00005d5a8 pc=0x5fefed277257
Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.runtime_pollWait(0x70456c656c00, 0x72)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/netpoll.go:345 +0x85 fp=0xc00005d600 sp=0xc00005d5e0 pc=0x5fefed2abaa5
Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*pollDesc).wait(0xc00018c000?, 0xc0000a7031?, 0x0)
Nov 14 22:44:27 ubuntu ollama[2477]:         internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00005d628 sp=0xc00005d600 pc=0x5fefed2fb9c7
Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*pollDesc).waitRead(...)
Nov 14 22:44:27 ubuntu ollama[2477]:         internal/poll/fd_poll_runtime.go:89
Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*FD).Read(0xc00018c000, {0xc0000a7031, 0x1, 0x1})
Nov 14 22:44:27 ubuntu ollama[2477]:         internal/poll/fd_unix.go:164 +0x27a fp=0xc00005d6c0 sp=0xc00005d628 pc=0x5fefed2fc51a
Nov 14 22:44:27 ubuntu ollama[2477]: net.(*netFD).Read(0xc00018c000, {0xc0000a7031?, 0xc00005d748?, 0x5fefed2ad6d0?})
Nov 14 22:44:27 ubuntu ollama[2477]:         net/fd_posix.go:55 +0x25 fp=0xc00005d708 sp=0xc00005d6c0 pc=0x5fefed36a7a5
Nov 14 22:44:27 ubuntu ollama[2477]: net.(*conn).Read(0xc0004f0028, {0xc0000a7031?, 0x0?, 0xc0000a7028?})
Nov 14 22:44:27 ubuntu ollama[2477]:         net/net.go:185 +0x45 fp=0xc00005d750 sp=0xc00005d708 pc=0x5fefed374a65
Nov 14 22:44:27 ubuntu ollama[2477]: net.(*TCPConn).Read(0xc0000a7020?, {0xc0000a7031?, 0x0?, 0x0?})
Nov 14 22:44:27 ubuntu ollama[2477]:         <autogenerated>:1 +0x25 fp=0xc00005d780 sp=0xc00005d750 pc=0x5fefed380445
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*connReader).backgroundRead(0xc0000a7020)
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:681 +0x37 fp=0xc00005d7c8 sp=0xc00005d780 pc=0x5fefed48f1d7
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*connReader).startBackgroundRead.gowrap2()
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:677 +0x25 fp=0xc00005d7e0 sp=0xc00005d7c8 pc=0x5fefed48f105
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc00005d7e8 sp=0xc00005d7e0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by net/http.(*connReader).startBackgroundRead in goroutine 3181
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:677 +0xba
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 25 gp=0xc0000f4e00 m=nil [GC worker (idle), 5 minutes]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0xe10419f387c2?, 0x3?, 0xb7?, 0x5f?, 0x0?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc00005df50 sp=0xc00005df30 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker()
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1310 +0xe5 fp=0xc00005dfe0 sp=0xc00005df50 pc=0x5fefed260585
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc00005dfe8 sp=0xc00005dfe0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1234 +0x1c
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 27 gp=0xc0000f4fc0 m=nil [GC worker (idle)]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0xe150c9ef132d?, 0x3?, 0xd?, 0x8a?, 0x0?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc00005e750 sp=0xc00005e730 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker()
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1310 +0xe5 fp=0xc00005e7e0 sp=0xc00005e750 pc=0x5fefed260585
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc00005e7e8 sp=0xc00005e7e0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1234 +0x1c
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 26 gp=0xc0000f5180 m=nil [GC worker (idle)]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0xe150c9ef1535?, 0x1?, 0xef?, 0x43?, 0x0?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc00005ef50 sp=0xc00005ef30 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker()
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1310 +0xe5 fp=0xc00005efe0 sp=0xc00005ef50 pc=0x5fefed260585
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc00005efe8 sp=0xc00005efe0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1234 +0x1c
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 44 gp=0xc0004be000 m=nil [GC worker (idle), 103 minutes]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc0004c4750 sp=0xc0004c4730 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker()
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1310 +0xe5 fp=0xc0004c47e0 sp=0xc0004c4750 pc=0x5fefed260585
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc0004c47e8 sp=0xc0004c47e0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1234 +0x1c
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 57 gp=0xc0000f5340 m=nil [GC worker (idle), 103 minutes]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc00005f750 sp=0xc00005f730 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker()
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1310 +0xe5 fp=0xc00005f7e0 sp=0xc00005f750 pc=0x5fefed260585
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc00005f7e8 sp=0xc00005f7e0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1234 +0x1c
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 58 gp=0xc0000f5500 m=nil [GC worker (idle), 103 minutes]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc00005ff50 sp=0xc00005ff30 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker()
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1310 +0xe5 fp=0xc00005ffe0 sp=0xc00005ff50 pc=0x5fefed260585
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc00005ffe8 sp=0xc00005ffe0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1234 +0x1c
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 45 gp=0xc0004be1c0 m=nil [GC worker (idle), 103 minutes]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc0004c4f50 sp=0xc0004c4f30 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker()
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1310 +0xe5 fp=0xc0004c4fe0 sp=0xc0004c4f50 pc=0x5fefed260585
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc0004c4fe8 sp=0xc0004c4fe0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1234 +0x1c
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 59 gp=0xc0000f56c0 m=nil [GC worker (idle)]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0xe150c9ee7d37?, 0x3?, 0xc2?, 0x56?, 0x0?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc0004c0750 sp=0xc0004c0730 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker()
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1310 +0xe5 fp=0xc0004c07e0 sp=0xc0004c0750 pc=0x5fefed260585
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc0004c07e8 sp=0xc0004c07e0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1234 +0x1c
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 46 gp=0xc0004be380 m=nil [GC worker (idle)]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0xe150c9ef1625?, 0x1?, 0xb7?, 0xb3?, 0x0?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc0004c5750 sp=0xc0004c5730 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker()
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1310 +0xe5 fp=0xc0004c57e0 sp=0xc0004c5750 pc=0x5fefed260585
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc0004c57e8 sp=0xc0004c57e0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1234 +0x1c
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 47 gp=0xc0004be540 m=nil [GC worker (idle)]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0xe150c9ef1661?, 0x3?, 0x52?, 0xdf?, 0x0?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc0004c5f50 sp=0xc0004c5f30 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker()
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1310 +0xe5 fp=0xc0004c5fe0 sp=0xc0004c5f50 pc=0x5fefed260585
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc0004c5fe8 sp=0xc0004c5fe0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1234 +0x1c
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 48 gp=0xc0004be700 m=nil [GC worker (idle)]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x5fefedab72e0?, 0x1?, 0x98?, 0x4f?, 0x0?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc0004c6750 sp=0xc0004c6730 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker()
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1310 +0xe5 fp=0xc0004c67e0 sp=0xc0004c6750 pc=0x5fefed260585
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc0004c67e8 sp=0xc0004c67e0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1234 +0x1c
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 49 gp=0xc0004be8c0 m=nil [GC worker (idle), 1 minutes]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0xe140aadae303?, 0x1?, 0xcf?, 0x45?, 0x0?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc0004c6f50 sp=0xc0004c6f30 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker()
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1310 +0xe5 fp=0xc0004c6fe0 sp=0xc0004c6f50 pc=0x5fefed260585
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc0004c6fe8 sp=0xc0004c6fe0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1234 +0x1c
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 66 gp=0xc0004bea80 m=nil [GC worker (idle)]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0xe150c9f1ab5a?, 0x3?, 0x58?, 0x16?, 0x0?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc0004c7750 sp=0xc0004c7730 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker()
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1310 +0xe5 fp=0xc0004c77e0 sp=0xc0004c7750 pc=0x5fefed260585
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc0004c77e8 sp=0xc0004c77e0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1234 +0x1c
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 60 gp=0xc0000f5880 m=nil [GC worker (idle)]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0xe150c9efe45a?, 0x3?, 0x52?, 0x3?, 0x0?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc0004c0f50 sp=0xc0004c0f30 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker()
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1310 +0xe5 fp=0xc0004c0fe0 sp=0xc0004c0f50 pc=0x5fefed260585
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc0004c0fe8 sp=0xc0004c0fe0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/mgc.go:1234 +0x1c
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 3194 gp=0xc0000f5dc0 m=nil [IO wait]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x10?, 0x10?, 0xf0?, 0x25?, 0xb?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc0002625a8 sp=0xc000262588 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.netpollblock(0x5fefed2e5558?, 0xed247b26?, 0xef?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/netpoll.go:573 +0xf7 fp=0xc0002625e0 sp=0xc0002625a8 pc=0x5fefed277257
Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.runtime_pollWait(0x70456c656df0, 0x72)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/netpoll.go:345 +0x85 fp=0xc000262600 sp=0xc0002625e0 pc=0x5fefed2abaa5
Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*pollDesc).wait(0xc000200180?, 0xc0001182b1?, 0x0)
Nov 14 22:44:27 ubuntu ollama[2477]:         internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000262628 sp=0xc000262600 pc=0x5fefed2fb9c7
Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*pollDesc).waitRead(...)
Nov 14 22:44:27 ubuntu ollama[2477]:         internal/poll/fd_poll_runtime.go:89
Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*FD).Read(0xc000200180, {0xc0001182b1, 0x1, 0x1})
Nov 14 22:44:27 ubuntu ollama[2477]:         internal/poll/fd_unix.go:164 +0x27a fp=0xc0002626c0 sp=0xc000262628 pc=0x5fefed2fc51a
Nov 14 22:44:27 ubuntu ollama[2477]: net.(*netFD).Read(0xc000200180, {0xc0001182b1?, 0xc000262748?, 0x5fefed2ad6d0?})
Nov 14 22:44:27 ubuntu ollama[2477]:         net/fd_posix.go:55 +0x25 fp=0xc000262708 sp=0xc0002626c0 pc=0x5fefed36a7a5
Nov 14 22:44:27 ubuntu ollama[2477]: net.(*conn).Read(0xc0004f0020, {0xc0001182b1?, 0x0?, 0xc000118098?})
Nov 14 22:44:27 ubuntu ollama[2477]:         net/net.go:185 +0x45 fp=0xc000262750 sp=0xc000262708 pc=0x5fefed374a65
Nov 14 22:44:27 ubuntu ollama[2477]: net.(*TCPConn).Read(0x5fefed98e840?, {0xc0001182b1?, 0x0?, 0x0?})
Nov 14 22:44:27 ubuntu ollama[2477]:         <autogenerated>:1 +0x25 fp=0xc000262780 sp=0xc000262750 pc=0x5fefed380445
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*connReader).backgroundRead(0xc0001182a0)
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:681 +0x37 fp=0xc0002627c8 sp=0xc000262780 pc=0x5fefed48f1d7
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*connReader).startBackgroundRead.gowrap2()
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:677 +0x25 fp=0xc0002627e0 sp=0xc0002627c8 pc=0x5fefed48f105
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc0002627e8 sp=0xc0002627e0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by net/http.(*connReader).startBackgroundRead in goroutine 2740
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:677 +0xba
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 3149 gp=0xc000304000 m=nil [select]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0xc0000e1a80?, 0x2?, 0x60?, 0x0?, 0xc0000e1824?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc0000e1698 sp=0xc0000e1678 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.selectgo(0xc0000e1a80, 0xc0000e1820, 0x1f40?, 0x0, 0x1?, 0x1)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/select.go:327 +0x725 fp=0xc0000e17b8 sp=0xc0000e1698 pc=0x5fefed2903e5
Nov 14 22:44:27 ubuntu ollama[2477]: main.(*Server).completion(0xc0000c0120, {0x5fefed7ff5b0, 0xc0001300e0}, 0xc0000c2360)
Nov 14 22:44:27 ubuntu ollama[2477]:         github.com/ollama/ollama/llama/runner/runner.go:652 +0x8fe fp=0xc0000e1ab8 sp=0xc0000e17b8 pc=0x5fefed4c26de
Nov 14 22:44:27 ubuntu ollama[2477]: main.(*Server).completion-fm({0x5fefed7ff5b0?, 0xc0001300e0?}, 0x5fefed49db8d?)
Nov 14 22:44:27 ubuntu ollama[2477]:         <autogenerated>:1 +0x36 fp=0xc0000e1ae8 sp=0xc0000e1ab8 pc=0x5fefed4c56b6
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.HandlerFunc.ServeHTTP(0xc000018ea0?, {0x5fefed7ff5b0?, 0xc0001300e0?}, 0x10?)
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:2171 +0x29 fp=0xc0000e1b10 sp=0xc0000e1ae8 pc=0x5fefed496629
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*ServeMux).ServeHTTP(0x5fefed251f85?, {0x5fefed7ff5b0, 0xc0001300e0}, 0xc0000c2360)
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:2688 +0x1ad fp=0xc0000e1b60 sp=0xc0000e1b10 pc=0x5fefed4984ad
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.serverHandler.ServeHTTP({0x5fefed7fe900?}, {0x5fefed7ff5b0?, 0xc0001300e0?}, 0x6?)
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:3142 +0x8e fp=0xc0000e1b90 sp=0xc0000e1b60 pc=0x5fefed4994ce
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*conn).serve(0xc0000c03f0, {0x5fefed7ffa08, 0xc0000a6db0})
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:2044 +0x5e8 fp=0xc0000e1fb8 sp=0xc0000e1b90 pc=0x5fefed495268
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*Server).Serve.gowrap3()
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:3290 +0x28 fp=0xc0000e1fe0 sp=0xc0000e1fb8 pc=0x5fefed499c48
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc0000e1fe8 sp=0xc0000e1fe0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by net/http.(*Server).Serve in goroutine 1
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:3290 +0x4b4
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 3152 gp=0xc00030b500 m=nil [IO wait, 1 minutes]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x10?, 0x10?, 0xf0?, 0x65?, 0xb?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc0002365a8 sp=0xc000236588 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.netpollblock(0x5fefed2e5558?, 0xed247b26?, 0xef?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/netpoll.go:573 +0xf7 fp=0xc0002365e0 sp=0xc0002365a8 pc=0x5fefed277257
Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.runtime_pollWait(0x70456c656cf8, 0x72)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/netpoll.go:345 +0x85 fp=0xc000236600 sp=0xc0002365e0 pc=0x5fefed2abaa5
Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*pollDesc).wait(0xc000030080?, 0xc0000a6be1?, 0x0)
Nov 14 22:44:27 ubuntu ollama[2477]:         internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000236628 sp=0xc000236600 pc=0x5fefed2fb9c7
Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*pollDesc).waitRead(...)
Nov 14 22:44:27 ubuntu ollama[2477]:         internal/poll/fd_poll_runtime.go:89
Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*FD).Read(0xc000030080, {0xc0000a6be1, 0x1, 0x1})
Nov 14 22:44:27 ubuntu ollama[2477]:         internal/poll/fd_unix.go:164 +0x27a fp=0xc0002366c0 sp=0xc000236628 pc=0x5fefed2fc51a
Nov 14 22:44:27 ubuntu ollama[2477]: net.(*netFD).Read(0xc000030080, {0xc0000a6be1?, 0xc000236748?, 0x5fefed2ad6d0?})
Nov 14 22:44:27 ubuntu ollama[2477]:         net/fd_posix.go:55 +0x25 fp=0xc000236708 sp=0xc0002366c0 pc=0x5fefed36a7a5
Nov 14 22:44:27 ubuntu ollama[2477]: net.(*conn).Read(0xc000588008, {0xc0000a6be1?, 0x41d1?, 0xc0001182a8?})
Nov 14 22:44:27 ubuntu ollama[2477]:         net/net.go:185 +0x45 fp=0xc000236750 sp=0xc000236708 pc=0x5fefed374a65
Nov 14 22:44:27 ubuntu ollama[2477]: net.(*TCPConn).Read(0xc0001182a0?, {0xc0000a6be1?, 0x0?, 0x0?})
Nov 14 22:44:27 ubuntu ollama[2477]:         <autogenerated>:1 +0x25 fp=0xc000236780 sp=0xc000236750 pc=0x5fefed380445
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*connReader).backgroundRead(0xc0000a6bd0)
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:681 +0x37 fp=0xc0002367c8 sp=0xc000236780 pc=0x5fefed48f1d7
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*connReader).startBackgroundRead.gowrap2()
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:677 +0x25 fp=0xc0002367e0 sp=0xc0002367c8 pc=0x5fefed48f105
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc0002367e8 sp=0xc0002367e0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by net/http.(*connReader).startBackgroundRead in goroutine 3188
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:677 +0xba
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 3219 gp=0xc000216c40 m=nil [IO wait]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x10?, 0x10?, 0xf0?, 0x9d?, 0xb?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc000159da8 sp=0xc000159d88 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.netpollblock(0x5fefed2e5558?, 0xed247b26?, 0xef?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/netpoll.go:573 +0xf7 fp=0xc000159de0 sp=0xc000159da8 pc=0x5fefed277257
Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.runtime_pollWait(0x70456c656ee8, 0x72)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/netpoll.go:345 +0x85 fp=0xc000159e00 sp=0xc000159de0 pc=0x5fefed2abaa5
Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*pollDesc).wait(0xc000031800?, 0xc000118731?, 0x0)
Nov 14 22:44:27 ubuntu ollama[2477]:         internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000159e28 sp=0xc000159e00 pc=0x5fefed2fb9c7
Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*pollDesc).waitRead(...)
Nov 14 22:44:27 ubuntu ollama[2477]:         internal/poll/fd_poll_runtime.go:89
Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*FD).Read(0xc000031800, {0xc000118731, 0x1, 0x1})
Nov 14 22:44:27 ubuntu ollama[2477]:         internal/poll/fd_unix.go:164 +0x27a fp=0xc000159ec0 sp=0xc000159e28 pc=0x5fefed2fc51a
Nov 14 22:44:27 ubuntu ollama[2477]: net.(*netFD).Read(0xc000031800, {0xc000118731?, 0xc000159f48?, 0x5fefed2ad6d0?})
Nov 14 22:44:27 ubuntu ollama[2477]:         net/fd_posix.go:55 +0x25 fp=0xc000159f08 sp=0xc000159ec0 pc=0x5fefed36a7a5
Nov 14 22:44:27 ubuntu ollama[2477]: net.(*conn).Read(0xc00013a010, {0xc000118731?, 0x153f?, 0xc0000a7028?})
Nov 14 22:44:27 ubuntu ollama[2477]:         net/net.go:185 +0x45 fp=0xc000159f50 sp=0xc000159f08 pc=0x5fefed374a65
Nov 14 22:44:27 ubuntu ollama[2477]: net.(*TCPConn).Read(0xc0000a7020?, {0xc000118731?, 0xf32?, 0x1dd?})
Nov 14 22:44:27 ubuntu ollama[2477]:         <autogenerated>:1 +0x25 fp=0xc000159f80 sp=0xc000159f50 pc=0x5fefed380445
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*connReader).backgroundRead(0xc000118720)
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:681 +0x37 fp=0xc000159fc8 sp=0xc000159f80 pc=0x5fefed48f1d7
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*connReader).startBackgroundRead.gowrap2()
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:677 +0x25 fp=0xc000159fe0 sp=0xc000159fc8 pc=0x5fefed48f105
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc000159fe8 sp=0xc000159fe0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by net/http.(*connReader).startBackgroundRead in goroutine 3149
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:677 +0xba
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 3188 gp=0xc000217c00 m=nil [select]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0xc0001b5a80?, 0x2?, 0x18?, 0x57?, 0xc0001b5824?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc0001b5698 sp=0xc0001b5678 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.selectgo(0xc0001b5a80, 0xc0001b5820, 0xc00018c500?, 0x0, 0x1?, 0x1)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/select.go:327 +0x725 fp=0xc0001b57b8 sp=0xc0001b5698 pc=0x5fefed2903e5
Nov 14 22:44:27 ubuntu ollama[2477]: main.(*Server).completion(0xc0000c0120, {0x5fefed7ff5b0, 0xc000130000}, 0xc000116000)
Nov 14 22:44:27 ubuntu ollama[2477]:         github.com/ollama/ollama/llama/runner/runner.go:652 +0x8fe fp=0xc0001b5ab8 sp=0xc0001b57b8 pc=0x5fefed4c26de
Nov 14 22:44:27 ubuntu ollama[2477]: main.(*Server).completion-fm({0x5fefed7ff5b0?, 0xc000130000?}, 0x5fefed49db8d?)
Nov 14 22:44:27 ubuntu ollama[2477]:         <autogenerated>:1 +0x36 fp=0xc0001b5ae8 sp=0xc0001b5ab8 pc=0x5fefed4c56b6
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.HandlerFunc.ServeHTTP(0xc000018ea0?, {0x5fefed7ff5b0?, 0xc000130000?}, 0x10?)
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:2171 +0x29 fp=0xc0001b5b10 sp=0xc0001b5ae8 pc=0x5fefed496629
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*ServeMux).ServeHTTP(0x5fefed251f85?, {0x5fefed7ff5b0, 0xc000130000}, 0xc000116000)
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:2688 +0x1ad fp=0xc0001b5b60 sp=0xc0001b5b10 pc=0x5fefed4984ad
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.serverHandler.ServeHTTP({0x5fefed7fe900?}, {0x5fefed7ff5b0?, 0xc000130000?}, 0x6?)
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:3142 +0x8e fp=0xc0001b5b90 sp=0xc0001b5b60 pc=0x5fefed4994ce
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*conn).serve(0xc00039a090, {0x5fefed7ffa08, 0xc0000a6db0})
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:2044 +0x5e8 fp=0xc0001b5fb8 sp=0xc0001b5b90 pc=0x5fefed495268
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*Server).Serve.gowrap3()
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:3290 +0x28 fp=0xc0001b5fe0 sp=0xc0001b5fb8 pc=0x5fefed499c48
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc0001b5fe8 sp=0xc0001b5fe0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by net/http.(*Server).Serve in goroutine 1
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:3290 +0x4b4
Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 3181 gp=0xc000305880 m=nil [select]:
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0xc0000e3a80?, 0x2?, 0x18?, 0x37?, 0xc0000e3824?)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/proc.go:402 +0xce fp=0xc0000e3698 sp=0xc0000e3678 pc=0x5fefed27f00e
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.selectgo(0xc0000e3a80, 0xc0000e3820, 0xc000030280?, 0x0, 0x1?, 0x1)
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/select.go:327 +0x725 fp=0xc0000e37b8 sp=0xc0000e3698 pc=0x5fefed2903e5
Nov 14 22:44:27 ubuntu ollama[2477]: main.(*Server).completion(0xc0000c0120, {0x5fefed7ff5b0, 0xc0001280e0}, 0xc000116120)
Nov 14 22:44:27 ubuntu ollama[2477]:         github.com/ollama/ollama/llama/runner/runner.go:652 +0x8fe fp=0xc0000e3ab8 sp=0xc0000e37b8 pc=0x5fefed4c26de
Nov 14 22:44:27 ubuntu ollama[2477]: main.(*Server).completion-fm({0x5fefed7ff5b0?, 0xc0001280e0?}, 0x5fefed49db8d?)
Nov 14 22:44:27 ubuntu ollama[2477]:         <autogenerated>:1 +0x36 fp=0xc0000e3ae8 sp=0xc0000e3ab8 pc=0x5fefed4c56b6
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.HandlerFunc.ServeHTTP(0xc000018ea0?, {0x5fefed7ff5b0?, 0xc0001280e0?}, 0x10?)
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:2171 +0x29 fp=0xc0000e3b10 sp=0xc0000e3ae8 pc=0x5fefed496629
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*ServeMux).ServeHTTP(0x5fefed251f85?, {0x5fefed7ff5b0, 0xc0001280e0}, 0xc000116120)
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:2688 +0x1ad fp=0xc0000e3b60 sp=0xc0000e3b10 pc=0x5fefed4984ad
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.serverHandler.ServeHTTP({0x5fefed7fe900?}, {0x5fefed7ff5b0?, 0xc0001280e0?}, 0x6?)
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:3142 +0x8e fp=0xc0000e3b90 sp=0xc0000e3b60 pc=0x5fefed4994ce
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*conn).serve(0xc00035e090, {0x5fefed7ffa08, 0xc0000a6db0})
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:2044 +0x5e8 fp=0xc0000e3fb8 sp=0xc0000e3b90 pc=0x5fefed495268
Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*Server).Serve.gowrap3()
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:3290 +0x28 fp=0xc0000e3fe0 sp=0xc0000e3fb8 pc=0x5fefed499c48
Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({})
Nov 14 22:44:27 ubuntu ollama[2477]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc0000e3fe8 sp=0xc0000e3fe0 pc=0x5fefed2b0de1
Nov 14 22:44:27 ubuntu ollama[2477]: created by net/http.(*Server).Serve in goroutine 1
Nov 14 22:44:27 ubuntu ollama[2477]:         net/http/server.go:3290 +0x4b4
Nov 14 22:44:27 ubuntu ollama[2477]: rax    0x0
Nov 14 22:44:27 ubuntu ollama[2477]: rbx    0x0
Nov 14 22:44:27 ubuntu ollama[2477]: rcx    0x1f500
Nov 14 22:44:27 ubuntu ollama[2477]: rdx    0x5fefef98db70
Nov 14 22:44:27 ubuntu ollama[2477]: rdi    0x5fefef98db70
Nov 14 22:44:27 ubuntu ollama[2477]: rsi    0x1f500
Nov 14 22:44:27 ubuntu ollama[2477]: rbp    0x7044f0006b10
Nov 14 22:44:27 ubuntu ollama[2477]: rsp    0x7fffa008e480
Nov 14 22:44:27 ubuntu ollama[2477]: r8     0x1f500
Nov 14 22:44:27 ubuntu ollama[2477]: r9     0x0
Nov 14 22:44:27 ubuntu ollama[2477]: r10    0x704546c03b30
Nov 14 22:44:27 ubuntu ollama[2477]: r11    0x4
Nov 14 22:44:27 ubuntu ollama[2477]: r12    0x33a
Nov 14 22:44:27 ubuntu ollama[2477]: r13    0x7042e8332270
Nov 14 22:44:27 ubuntu ollama[2477]: r14    0x7042e83321a0
Nov 14 22:44:27 ubuntu ollama[2477]: r15    0x0
Nov 14 22:44:27 ubuntu ollama[2477]: rip    0x5fefed64dc68
Nov 14 22:44:27 ubuntu ollama[2477]: rflags 0x10206
Nov 14 22:44:27 ubuntu ollama[2477]: cs     0x33
Nov 14 22:44:27 ubuntu ollama[2477]: fs     0x0
Nov 14 22:44:27 ubuntu ollama[2477]: gs     0x0
Nov 14 22:44:27 ubuntu ollama[2477]: [GIN] 2024/11/14 - 22:44:27 | 500 | 23.414990833s |       127.0.0.1 | POST     "/api/chat"
Nov 14 22:44:27 ubuntu ollama[2477]: [GIN] 2024/11/14 - 22:44:27 | 500 |         1m28s |       127.0.0.1 | POST     "/api/chat"
<!-- gh-comment-id:2477995322 --> @konrad0101 commented on GitHub (Nov 15, 2024): @jessegross I'm getting the same error on ollama 0.4.1 running llama 3.1 70B Q4_K_M on Linux with Nvidia (2x3090). Full logs are below. Application layer: ``` File "/opt/venv/lib/python3.10/site-packages/ollama/_client.py", line 654, in chat return await self._request_stream( File "/opt/venv/lib/python3.10/site-packages/ollama/_client.py", line 518, in _request_stream response = await self._request(*args, **kwargs) File "/opt/venv/lib/python3.10/site-packages/ollama/_client.py", line 488, in _request raise ResponseError(e.response.text, e.response.status_code) from None ollama._types.ResponseError: POST predict: Post "http://127.0.0.1:40081/completion": EOF ``` And from ollama logs: ``` Nov 14 22:42:39 ubuntu ollama[2477]: [GIN] 2024/11/14 - 22:42:39 | 200 | 7.405984ms | 127.0.0.1 | POST "/api/embeddings" Nov 14 22:42:41 ubuntu ollama[2477]: [GIN] 2024/11/14 - 22:42:41 | 200 | 1.366778533s | 127.0.0.1 | POST "/api/chat" Nov 14 22:42:41 ubuntu ollama[2477]: [GIN] 2024/11/14 - 22:42:41 | 200 | 10.587194ms | 127.0.0.1 | POST "/api/embeddings" Nov 14 22:42:44 ubuntu ollama[2477]: [GIN] 2024/11/14 - 22:42:44 | 200 | 1.223217799s | 127.0.0.1 | POST "/api/chat" Nov 14 22:42:44 ubuntu ollama[2477]: [GIN] 2024/11/14 - 22:42:44 | 200 | 11.002472ms | 127.0.0.1 | POST "/api/embeddings" Nov 14 22:42:46 ubuntu ollama[2477]: [GIN] 2024/11/14 - 22:42:46 | 200 | 1.032608795s | 127.0.0.1 | POST "/api/chat" Nov 14 22:42:59 ubuntu ollama[2477]: time=2024-11-14T22:42:59.301-05:00 level=WARN source=runner.go:126 msg="truncating input prompt" limit=8000 prompt=8101 numKeep=5 Nov 14 22:42:59 ubuntu ollama[2477]: time=2024-11-14T22:42:59.330-05:00 level=WARN source=runner.go:126 msg="truncating input prompt" limit=8000 prompt=8110 numKeep=5 Nov 14 22:43:44 ubuntu ollama[2477]: [GIN] 2024/11/14 - 22:43:44 | 200 | 45.650632769s | 127.0.0.1 | POST "/api/chat" Nov 14 22:43:44 ubuntu ollama[2477]: time=2024-11-14T22:43:44.970-05:00 level=WARN source=runner.go:126 msg="truncating input prompt" limit=8000 prompt=8137 numKeep=5 Nov 14 22:44:04 ubuntu ollama[2477]: [GIN] 2024/11/14 - 22:44:04 | 200 | 1m4s | 127.0.0.1 | POST "/api/chat" Nov 14 22:44:04 ubuntu ollama[2477]: time=2024-11-14T22:44:04.275-05:00 level=WARN source=runner.go:126 msg="truncating input prompt" limit=8000 prompt=8305 numKeep=5 Nov 14 22:44:08 ubuntu ollama[2477]: [GIN] 2024/11/14 - 22:44:08 | 200 | 1m9s | 127.0.0.1 | POST "/api/chat" Nov 14 22:44:26 ubuntu ollama[2477]: time=2024-11-14T22:44:26.367-05:00 level=ERROR source=runner.go:426 msg="failed to decode batch" error="could not find a KV slot for the batch - try reducing the size of the batch or increase the context. code: 1" Nov 14 22:44:26 ubuntu ollama[2477]: time=2024-11-14T22:44:26.400-05:00 level=ERROR source=runner.go:426 msg="failed to decode batch" error="could not find a KV slot for the batch - try reducing the size of the batch or increase the context. code: 1" Nov 14 22:44:26 ubuntu ollama[2477]: time=2024-11-14T22:44:26.432-05:00 level=ERROR source=runner.go:426 msg="failed to decode batch" error="could not find a KV slot for the batch - try reducing the size of the batch or increase the context. code: 1" Nov 14 22:44:26 ubuntu ollama[2477]: time=2024-11-14T22:44:26.465-05:00 level=ERROR source=runner.go:426 msg="failed to decode batch" error="could not find a KV slot for the batch - try reducing the size of the batch or increase the context. code: 1" Nov 14 22:44:26 ubuntu ollama[2477]: time=2024-11-14T22:44:26.497-05:00 level=ERROR source=runner.go:426 msg="failed to decode batch" error="could not find a KV slot for the batch - try reducing the size of the batch or increase the context. code: 1" Nov 14 22:44:26 ubuntu ollama[2477]: time=2024-11-14T22:44:26.530-05:00 level=ERROR source=runner.go:426 msg="failed to decode batch" error="could not find a KV slot for the batch - try reducing the size of the batch or increase the context. code: 1" Nov 14 22:44:26 ubuntu ollama[2477]: time=2024-11-14T22:44:26.562-05:00 level=ERROR source=runner.go:426 msg="failed to decode batch" error="could not find a KV slot for the batch - try reducing the size of the batch or increase the context. code: 1" Nov 14 22:44:27 ubuntu ollama[2477]: llama_get_logits_ith: invalid logits id 826, reason: batch.logits[826] != true Nov 14 22:44:27 ubuntu ollama[2477]: SIGSEGV: segmentation violation Nov 14 22:44:27 ubuntu ollama[2477]: PC=0x5fefed64dc68 m=0 sigcode=1 addr=0x0 Nov 14 22:44:27 ubuntu ollama[2477]: signal arrived during cgo execution Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 7 gp=0xc0000f4000 m=0 mp=0x5fefed9cde80 [syscall]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.cgocall(0x5fefed4c5cc0, 0xc0001b1c78) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/cgocall.go:157 +0x4b fp=0xc0001b1c50 sp=0xc0001b1c18 pc=0x5fefed2483cb Nov 14 22:44:27 ubuntu ollama[2477]: github.com/ollama/ollama/llama._Cfunc_gpt_sampler_csample(0x7042e83321a0, 0x7044f0006b10, 0x33a) Nov 14 22:44:27 ubuntu ollama[2477]: _cgo_gotypes.go:463 +0x4f fp=0xc0001b1c78 sp=0xc0001b1c50 pc=0x5fefed3454af Nov 14 22:44:27 ubuntu ollama[2477]: main.(*Server).processBatch.(*SamplingContext).Sample.func2(0xc0003ec200?, 0x4?, 0x33a) Nov 14 22:44:27 ubuntu ollama[2477]: github.com/ollama/ollama/llama/llama.go:663 +0x86 fp=0xc0001b1cc8 sp=0xc0001b1c78 pc=0x5fefed4c1ca6 Nov 14 22:44:27 ubuntu ollama[2477]: github.com/ollama/ollama/llama.(*SamplingContext).Sample(...) Nov 14 22:44:27 ubuntu ollama[2477]: github.com/ollama/ollama/llama/llama.go:663 Nov 14 22:44:27 ubuntu ollama[2477]: main.(*Server).processBatch(0xc0000c0120, 0xc000134000, 0xc0001b1f10) Nov 14 22:44:27 ubuntu ollama[2477]: github.com/ollama/ollama/llama/runner/runner.go:458 +0x4ea fp=0xc0001b1ed0 sp=0xc0001b1cc8 pc=0x5fefed4c0fca Nov 14 22:44:27 ubuntu ollama[2477]: main.(*Server).run(0xc0000c0120, {0x5fefed7ffa40, 0xc000098050}) Nov 14 22:44:27 ubuntu ollama[2477]: github.com/ollama/ollama/llama/runner/runner.go:338 +0x1a5 fp=0xc0001b1fb8 sp=0xc0001b1ed0 pc=0x5fefed4c0765 Nov 14 22:44:27 ubuntu ollama[2477]: main.main.gowrap2() Nov 14 22:44:27 ubuntu ollama[2477]: github.com/ollama/ollama/llama/runner/runner.go:901 +0x28 fp=0xc0001b1fe0 sp=0xc0001b1fb8 pc=0x5fefed4c4ec8 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc0001b1fe8 sp=0xc0001b1fe0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by main.main in goroutine 1 Nov 14 22:44:27 ubuntu ollama[2477]: github.com/ollama/ollama/llama/runner/runner.go:901 +0xc2b Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 1 gp=0xc0000061c0 m=nil [IO wait, 1 minutes]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0xc000032008?, 0xc0000e5908?, 0xf4?, 0xed?, 0xc0000e58e8?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc0000e5888 sp=0xc0000e5868 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.netpollblock(0x10?, 0xed247b26?, 0xef?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/netpoll.go:573 +0xf7 fp=0xc0000e58c0 sp=0xc0000e5888 pc=0x5fefed277257 Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.runtime_pollWait(0x70456c656fe0, 0x72) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/netpoll.go:345 +0x85 fp=0xc0000e58e0 sp=0xc0000e58c0 pc=0x5fefed2abaa5 Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*pollDesc).wait(0x3?, 0x70456c6113a8?, 0x0) Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0000e5908 sp=0xc0000e58e0 pc=0x5fefed2fb9c7 Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*pollDesc).waitRead(...) Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll/fd_poll_runtime.go:89 Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*FD).Accept(0xc000030100) Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll/fd_unix.go:611 +0x2ac fp=0xc0000e59b0 sp=0xc0000e5908 pc=0x5fefed2fce8c Nov 14 22:44:27 ubuntu ollama[2477]: net.(*netFD).accept(0xc000030100) Nov 14 22:44:27 ubuntu ollama[2477]: net/fd_unix.go:172 +0x29 fp=0xc0000e5a68 sp=0xc0000e59b0 pc=0x5fefed36b8a9 Nov 14 22:44:27 ubuntu ollama[2477]: net.(*TCPListener).accept(0xc0000801e0) Nov 14 22:44:27 ubuntu ollama[2477]: net/tcpsock_posix.go:159 +0x1e fp=0xc0000e5a90 sp=0xc0000e5a68 pc=0x5fefed37c5de Nov 14 22:44:27 ubuntu ollama[2477]: net.(*TCPListener).Accept(0xc0000801e0) Nov 14 22:44:27 ubuntu ollama[2477]: net/tcpsock.go:327 +0x30 fp=0xc0000e5ac0 sp=0xc0000e5a90 pc=0x5fefed37b930 Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*onceCloseListener).Accept(0xc00035e090?) Nov 14 22:44:27 ubuntu ollama[2477]: <autogenerated>:1 +0x24 fp=0xc0000e5ad8 sp=0xc0000e5ac0 pc=0x5fefed4a2a44 Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*Server).Serve(0xc0000161e0, {0x5fefed7ff400, 0xc0000801e0}) Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:3260 +0x33e fp=0xc0000e5c08 sp=0xc0000e5ad8 pc=0x5fefed49985e Nov 14 22:44:27 ubuntu ollama[2477]: main.main() Nov 14 22:44:27 ubuntu ollama[2477]: github.com/ollama/ollama/llama/runner/runner.go:921 +0xfcc fp=0xc0000e5f50 sp=0xc0000e5c08 pc=0x5fefed4c4c4c Nov 14 22:44:27 ubuntu ollama[2477]: runtime.main() Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:271 +0x29d fp=0xc0000e5fe0 sp=0xc0000e5f50 pc=0x5fefed27ebdd Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc0000e5fe8 sp=0xc0000e5fe0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 2 gp=0xc000006c40 m=nil [force gc (idle), 103 minutes]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc000060fa8 sp=0xc000060f88 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goparkunlock(...) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:408 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.forcegchelper() Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:326 +0xb8 fp=0xc000060fe0 sp=0xc000060fa8 pc=0x5fefed27ee98 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc000060fe8 sp=0xc000060fe0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.init.6 in goroutine 1 Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:314 +0x1a Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc000061780 sp=0xc000061760 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goparkunlock(...) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:408 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.bgsweep(0xc00008a000) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgcsweep.go:318 +0xdf fp=0xc0000617c8 sp=0xc000061780 pc=0x5fefed269b9f Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcenable.gowrap1() Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:203 +0x25 fp=0xc0000617e0 sp=0xc0000617c8 pc=0x5fefed25e685 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc0000617e8 sp=0xc0000617e0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcenable in goroutine 1 Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:203 +0x66 Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x56913?, 0x4d578a?, 0x0?, 0x0?, 0x0?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc000061f78 sp=0xc000061f58 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goparkunlock(...) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:408 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.(*scavengerState).park(0x5fefed9cd4c0) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgcscavenge.go:425 +0x49 fp=0xc000061fa8 sp=0xc000061f78 pc=0x5fefed267549 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.bgscavenge(0xc00008a000) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgcscavenge.go:658 +0x59 fp=0xc000061fc8 sp=0xc000061fa8 pc=0x5fefed267af9 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcenable.gowrap2() Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:204 +0x25 fp=0xc000061fe0 sp=0xc000061fc8 pc=0x5fefed25e625 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc000061fe8 sp=0xc000061fe0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcenable in goroutine 1 Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:204 +0xa5 Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 5 gp=0xc000007c00 m=nil [finalizer wait]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x0?, 0x5fefed7fb1a0?, 0x0?, 0xe0?, 0x1000000010?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc000060620 sp=0xc000060600 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.runfinq() Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mfinal.go:194 +0x107 fp=0xc0000607e0 sp=0xc000060620 pc=0x5fefed25d6c7 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc0000607e8 sp=0xc0000607e0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.createfing in goroutine 1 Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mfinal.go:164 +0x3d Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 56 gp=0xc000007dc0 m=nil [GC worker (idle), 103 minutes]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x5fefed2c91f3?, 0xc00008c890?, 0x0?, 0x0?, 0x5fefedab6060?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc000062f50 sp=0xc000062f30 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker() Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1310 +0xe5 fp=0xc000062fe0 sp=0xc000062f50 pc=0x5fefed260585 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc000062fe8 sp=0xc000062fe0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8 Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1234 +0x1c Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 2740 gp=0xc0000f4700 m=nil [select]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0xc0000e7a80?, 0x2?, 0x18?, 0x77?, 0xc0000e7824?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc0000e7698 sp=0xc0000e7678 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.selectgo(0xc0000e7a80, 0xc0000e7820, 0xc000030300?, 0x0, 0x1?, 0x1) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/select.go:327 +0x725 fp=0xc0000e77b8 sp=0xc0000e7698 pc=0x5fefed2903e5 Nov 14 22:44:27 ubuntu ollama[2477]: main.(*Server).completion(0xc0000c0120, {0x5fefed7ff5b0, 0xc000130380}, 0xc000116480) Nov 14 22:44:27 ubuntu ollama[2477]: github.com/ollama/ollama/llama/runner/runner.go:652 +0x8fe fp=0xc0000e7ab8 sp=0xc0000e77b8 pc=0x5fefed4c26de Nov 14 22:44:27 ubuntu ollama[2477]: main.(*Server).completion-fm({0x5fefed7ff5b0?, 0xc000130380?}, 0x5fefed49db8d?) Nov 14 22:44:27 ubuntu ollama[2477]: <autogenerated>:1 +0x36 fp=0xc0000e7ae8 sp=0xc0000e7ab8 pc=0x5fefed4c56b6 Nov 14 22:44:27 ubuntu ollama[2477]: net/http.HandlerFunc.ServeHTTP(0xc000018ea0?, {0x5fefed7ff5b0?, 0xc000130380?}, 0x10?) Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:2171 +0x29 fp=0xc0000e7b10 sp=0xc0000e7ae8 pc=0x5fefed496629 Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*ServeMux).ServeHTTP(0x5fefed251f85?, {0x5fefed7ff5b0, 0xc000130380}, 0xc000116480) Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:2688 +0x1ad fp=0xc0000e7b60 sp=0xc0000e7b10 pc=0x5fefed4984ad Nov 14 22:44:27 ubuntu ollama[2477]: net/http.serverHandler.ServeHTTP({0x5fefed7fe900?}, {0x5fefed7ff5b0?, 0xc000130380?}, 0x6?) Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:3142 +0x8e fp=0xc0000e7b90 sp=0xc0000e7b60 pc=0x5fefed4994ce Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*conn).serve(0xc00035e120, {0x5fefed7ffa08, 0xc0000a6db0}) Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:2044 +0x5e8 fp=0xc0000e7fb8 sp=0xc0000e7b90 pc=0x5fefed495268 Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*Server).Serve.gowrap3() Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:3290 +0x28 fp=0xc0000e7fe0 sp=0xc0000e7fb8 pc=0x5fefed499c48 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc0000e7fe8 sp=0xc0000e7fe0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by net/http.(*Server).Serve in goroutine 1 Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:3290 +0x4b4 Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 55 gp=0xc0000f4a80 m=nil [GC worker (idle), 103 minutes]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x5fefed2c91f3?, 0xc00008c890?, 0x0?, 0x0?, 0x5fefedab6060?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc00005cf50 sp=0xc00005cf30 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker() Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1310 +0xe5 fp=0xc00005cfe0 sp=0xc00005cf50 pc=0x5fefed260585 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc00005cfe8 sp=0xc00005cfe0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8 Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1234 +0x1c Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 3218 gp=0xc0000f4c40 m=nil [IO wait, 1 minutes]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x10?, 0x10?, 0xf0?, 0xd5?, 0xb?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc00005d5a8 sp=0xc00005d588 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.netpollblock(0x5fefed2e5558?, 0xed247b26?, 0xef?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/netpoll.go:573 +0xf7 fp=0xc00005d5e0 sp=0xc00005d5a8 pc=0x5fefed277257 Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.runtime_pollWait(0x70456c656c00, 0x72) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/netpoll.go:345 +0x85 fp=0xc00005d600 sp=0xc00005d5e0 pc=0x5fefed2abaa5 Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*pollDesc).wait(0xc00018c000?, 0xc0000a7031?, 0x0) Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00005d628 sp=0xc00005d600 pc=0x5fefed2fb9c7 Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*pollDesc).waitRead(...) Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll/fd_poll_runtime.go:89 Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*FD).Read(0xc00018c000, {0xc0000a7031, 0x1, 0x1}) Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll/fd_unix.go:164 +0x27a fp=0xc00005d6c0 sp=0xc00005d628 pc=0x5fefed2fc51a Nov 14 22:44:27 ubuntu ollama[2477]: net.(*netFD).Read(0xc00018c000, {0xc0000a7031?, 0xc00005d748?, 0x5fefed2ad6d0?}) Nov 14 22:44:27 ubuntu ollama[2477]: net/fd_posix.go:55 +0x25 fp=0xc00005d708 sp=0xc00005d6c0 pc=0x5fefed36a7a5 Nov 14 22:44:27 ubuntu ollama[2477]: net.(*conn).Read(0xc0004f0028, {0xc0000a7031?, 0x0?, 0xc0000a7028?}) Nov 14 22:44:27 ubuntu ollama[2477]: net/net.go:185 +0x45 fp=0xc00005d750 sp=0xc00005d708 pc=0x5fefed374a65 Nov 14 22:44:27 ubuntu ollama[2477]: net.(*TCPConn).Read(0xc0000a7020?, {0xc0000a7031?, 0x0?, 0x0?}) Nov 14 22:44:27 ubuntu ollama[2477]: <autogenerated>:1 +0x25 fp=0xc00005d780 sp=0xc00005d750 pc=0x5fefed380445 Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*connReader).backgroundRead(0xc0000a7020) Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:681 +0x37 fp=0xc00005d7c8 sp=0xc00005d780 pc=0x5fefed48f1d7 Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*connReader).startBackgroundRead.gowrap2() Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:677 +0x25 fp=0xc00005d7e0 sp=0xc00005d7c8 pc=0x5fefed48f105 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc00005d7e8 sp=0xc00005d7e0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by net/http.(*connReader).startBackgroundRead in goroutine 3181 Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:677 +0xba Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 25 gp=0xc0000f4e00 m=nil [GC worker (idle), 5 minutes]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0xe10419f387c2?, 0x3?, 0xb7?, 0x5f?, 0x0?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc00005df50 sp=0xc00005df30 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker() Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1310 +0xe5 fp=0xc00005dfe0 sp=0xc00005df50 pc=0x5fefed260585 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc00005dfe8 sp=0xc00005dfe0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8 Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1234 +0x1c Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 27 gp=0xc0000f4fc0 m=nil [GC worker (idle)]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0xe150c9ef132d?, 0x3?, 0xd?, 0x8a?, 0x0?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc00005e750 sp=0xc00005e730 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker() Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1310 +0xe5 fp=0xc00005e7e0 sp=0xc00005e750 pc=0x5fefed260585 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc00005e7e8 sp=0xc00005e7e0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8 Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1234 +0x1c Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 26 gp=0xc0000f5180 m=nil [GC worker (idle)]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0xe150c9ef1535?, 0x1?, 0xef?, 0x43?, 0x0?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc00005ef50 sp=0xc00005ef30 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker() Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1310 +0xe5 fp=0xc00005efe0 sp=0xc00005ef50 pc=0x5fefed260585 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc00005efe8 sp=0xc00005efe0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8 Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1234 +0x1c Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 44 gp=0xc0004be000 m=nil [GC worker (idle), 103 minutes]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc0004c4750 sp=0xc0004c4730 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker() Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1310 +0xe5 fp=0xc0004c47e0 sp=0xc0004c4750 pc=0x5fefed260585 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc0004c47e8 sp=0xc0004c47e0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8 Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1234 +0x1c Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 57 gp=0xc0000f5340 m=nil [GC worker (idle), 103 minutes]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc00005f750 sp=0xc00005f730 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker() Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1310 +0xe5 fp=0xc00005f7e0 sp=0xc00005f750 pc=0x5fefed260585 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc00005f7e8 sp=0xc00005f7e0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8 Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1234 +0x1c Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 58 gp=0xc0000f5500 m=nil [GC worker (idle), 103 minutes]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc00005ff50 sp=0xc00005ff30 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker() Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1310 +0xe5 fp=0xc00005ffe0 sp=0xc00005ff50 pc=0x5fefed260585 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc00005ffe8 sp=0xc00005ffe0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8 Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1234 +0x1c Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 45 gp=0xc0004be1c0 m=nil [GC worker (idle), 103 minutes]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc0004c4f50 sp=0xc0004c4f30 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker() Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1310 +0xe5 fp=0xc0004c4fe0 sp=0xc0004c4f50 pc=0x5fefed260585 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc0004c4fe8 sp=0xc0004c4fe0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8 Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1234 +0x1c Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 59 gp=0xc0000f56c0 m=nil [GC worker (idle)]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0xe150c9ee7d37?, 0x3?, 0xc2?, 0x56?, 0x0?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc0004c0750 sp=0xc0004c0730 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker() Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1310 +0xe5 fp=0xc0004c07e0 sp=0xc0004c0750 pc=0x5fefed260585 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc0004c07e8 sp=0xc0004c07e0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8 Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1234 +0x1c Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 46 gp=0xc0004be380 m=nil [GC worker (idle)]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0xe150c9ef1625?, 0x1?, 0xb7?, 0xb3?, 0x0?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc0004c5750 sp=0xc0004c5730 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker() Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1310 +0xe5 fp=0xc0004c57e0 sp=0xc0004c5750 pc=0x5fefed260585 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc0004c57e8 sp=0xc0004c57e0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8 Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1234 +0x1c Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 47 gp=0xc0004be540 m=nil [GC worker (idle)]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0xe150c9ef1661?, 0x3?, 0x52?, 0xdf?, 0x0?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc0004c5f50 sp=0xc0004c5f30 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker() Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1310 +0xe5 fp=0xc0004c5fe0 sp=0xc0004c5f50 pc=0x5fefed260585 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc0004c5fe8 sp=0xc0004c5fe0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8 Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1234 +0x1c Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 48 gp=0xc0004be700 m=nil [GC worker (idle)]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x5fefedab72e0?, 0x1?, 0x98?, 0x4f?, 0x0?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc0004c6750 sp=0xc0004c6730 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker() Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1310 +0xe5 fp=0xc0004c67e0 sp=0xc0004c6750 pc=0x5fefed260585 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc0004c67e8 sp=0xc0004c67e0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8 Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1234 +0x1c Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 49 gp=0xc0004be8c0 m=nil [GC worker (idle), 1 minutes]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0xe140aadae303?, 0x1?, 0xcf?, 0x45?, 0x0?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc0004c6f50 sp=0xc0004c6f30 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker() Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1310 +0xe5 fp=0xc0004c6fe0 sp=0xc0004c6f50 pc=0x5fefed260585 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc0004c6fe8 sp=0xc0004c6fe0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8 Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1234 +0x1c Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 66 gp=0xc0004bea80 m=nil [GC worker (idle)]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0xe150c9f1ab5a?, 0x3?, 0x58?, 0x16?, 0x0?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc0004c7750 sp=0xc0004c7730 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker() Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1310 +0xe5 fp=0xc0004c77e0 sp=0xc0004c7750 pc=0x5fefed260585 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc0004c77e8 sp=0xc0004c77e0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8 Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1234 +0x1c Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 60 gp=0xc0000f5880 m=nil [GC worker (idle)]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0xe150c9efe45a?, 0x3?, 0x52?, 0x3?, 0x0?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc0004c0f50 sp=0xc0004c0f30 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gcBgMarkWorker() Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1310 +0xe5 fp=0xc0004c0fe0 sp=0xc0004c0f50 pc=0x5fefed260585 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc0004c0fe8 sp=0xc0004c0fe0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by runtime.gcBgMarkStartWorkers in goroutine 8 Nov 14 22:44:27 ubuntu ollama[2477]: runtime/mgc.go:1234 +0x1c Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 3194 gp=0xc0000f5dc0 m=nil [IO wait]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x10?, 0x10?, 0xf0?, 0x25?, 0xb?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc0002625a8 sp=0xc000262588 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.netpollblock(0x5fefed2e5558?, 0xed247b26?, 0xef?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/netpoll.go:573 +0xf7 fp=0xc0002625e0 sp=0xc0002625a8 pc=0x5fefed277257 Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.runtime_pollWait(0x70456c656df0, 0x72) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/netpoll.go:345 +0x85 fp=0xc000262600 sp=0xc0002625e0 pc=0x5fefed2abaa5 Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*pollDesc).wait(0xc000200180?, 0xc0001182b1?, 0x0) Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000262628 sp=0xc000262600 pc=0x5fefed2fb9c7 Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*pollDesc).waitRead(...) Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll/fd_poll_runtime.go:89 Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*FD).Read(0xc000200180, {0xc0001182b1, 0x1, 0x1}) Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll/fd_unix.go:164 +0x27a fp=0xc0002626c0 sp=0xc000262628 pc=0x5fefed2fc51a Nov 14 22:44:27 ubuntu ollama[2477]: net.(*netFD).Read(0xc000200180, {0xc0001182b1?, 0xc000262748?, 0x5fefed2ad6d0?}) Nov 14 22:44:27 ubuntu ollama[2477]: net/fd_posix.go:55 +0x25 fp=0xc000262708 sp=0xc0002626c0 pc=0x5fefed36a7a5 Nov 14 22:44:27 ubuntu ollama[2477]: net.(*conn).Read(0xc0004f0020, {0xc0001182b1?, 0x0?, 0xc000118098?}) Nov 14 22:44:27 ubuntu ollama[2477]: net/net.go:185 +0x45 fp=0xc000262750 sp=0xc000262708 pc=0x5fefed374a65 Nov 14 22:44:27 ubuntu ollama[2477]: net.(*TCPConn).Read(0x5fefed98e840?, {0xc0001182b1?, 0x0?, 0x0?}) Nov 14 22:44:27 ubuntu ollama[2477]: <autogenerated>:1 +0x25 fp=0xc000262780 sp=0xc000262750 pc=0x5fefed380445 Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*connReader).backgroundRead(0xc0001182a0) Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:681 +0x37 fp=0xc0002627c8 sp=0xc000262780 pc=0x5fefed48f1d7 Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*connReader).startBackgroundRead.gowrap2() Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:677 +0x25 fp=0xc0002627e0 sp=0xc0002627c8 pc=0x5fefed48f105 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc0002627e8 sp=0xc0002627e0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by net/http.(*connReader).startBackgroundRead in goroutine 2740 Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:677 +0xba Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 3149 gp=0xc000304000 m=nil [select]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0xc0000e1a80?, 0x2?, 0x60?, 0x0?, 0xc0000e1824?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc0000e1698 sp=0xc0000e1678 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.selectgo(0xc0000e1a80, 0xc0000e1820, 0x1f40?, 0x0, 0x1?, 0x1) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/select.go:327 +0x725 fp=0xc0000e17b8 sp=0xc0000e1698 pc=0x5fefed2903e5 Nov 14 22:44:27 ubuntu ollama[2477]: main.(*Server).completion(0xc0000c0120, {0x5fefed7ff5b0, 0xc0001300e0}, 0xc0000c2360) Nov 14 22:44:27 ubuntu ollama[2477]: github.com/ollama/ollama/llama/runner/runner.go:652 +0x8fe fp=0xc0000e1ab8 sp=0xc0000e17b8 pc=0x5fefed4c26de Nov 14 22:44:27 ubuntu ollama[2477]: main.(*Server).completion-fm({0x5fefed7ff5b0?, 0xc0001300e0?}, 0x5fefed49db8d?) Nov 14 22:44:27 ubuntu ollama[2477]: <autogenerated>:1 +0x36 fp=0xc0000e1ae8 sp=0xc0000e1ab8 pc=0x5fefed4c56b6 Nov 14 22:44:27 ubuntu ollama[2477]: net/http.HandlerFunc.ServeHTTP(0xc000018ea0?, {0x5fefed7ff5b0?, 0xc0001300e0?}, 0x10?) Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:2171 +0x29 fp=0xc0000e1b10 sp=0xc0000e1ae8 pc=0x5fefed496629 Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*ServeMux).ServeHTTP(0x5fefed251f85?, {0x5fefed7ff5b0, 0xc0001300e0}, 0xc0000c2360) Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:2688 +0x1ad fp=0xc0000e1b60 sp=0xc0000e1b10 pc=0x5fefed4984ad Nov 14 22:44:27 ubuntu ollama[2477]: net/http.serverHandler.ServeHTTP({0x5fefed7fe900?}, {0x5fefed7ff5b0?, 0xc0001300e0?}, 0x6?) Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:3142 +0x8e fp=0xc0000e1b90 sp=0xc0000e1b60 pc=0x5fefed4994ce Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*conn).serve(0xc0000c03f0, {0x5fefed7ffa08, 0xc0000a6db0}) Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:2044 +0x5e8 fp=0xc0000e1fb8 sp=0xc0000e1b90 pc=0x5fefed495268 Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*Server).Serve.gowrap3() Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:3290 +0x28 fp=0xc0000e1fe0 sp=0xc0000e1fb8 pc=0x5fefed499c48 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc0000e1fe8 sp=0xc0000e1fe0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by net/http.(*Server).Serve in goroutine 1 Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:3290 +0x4b4 Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 3152 gp=0xc00030b500 m=nil [IO wait, 1 minutes]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x10?, 0x10?, 0xf0?, 0x65?, 0xb?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc0002365a8 sp=0xc000236588 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.netpollblock(0x5fefed2e5558?, 0xed247b26?, 0xef?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/netpoll.go:573 +0xf7 fp=0xc0002365e0 sp=0xc0002365a8 pc=0x5fefed277257 Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.runtime_pollWait(0x70456c656cf8, 0x72) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/netpoll.go:345 +0x85 fp=0xc000236600 sp=0xc0002365e0 pc=0x5fefed2abaa5 Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*pollDesc).wait(0xc000030080?, 0xc0000a6be1?, 0x0) Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000236628 sp=0xc000236600 pc=0x5fefed2fb9c7 Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*pollDesc).waitRead(...) Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll/fd_poll_runtime.go:89 Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*FD).Read(0xc000030080, {0xc0000a6be1, 0x1, 0x1}) Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll/fd_unix.go:164 +0x27a fp=0xc0002366c0 sp=0xc000236628 pc=0x5fefed2fc51a Nov 14 22:44:27 ubuntu ollama[2477]: net.(*netFD).Read(0xc000030080, {0xc0000a6be1?, 0xc000236748?, 0x5fefed2ad6d0?}) Nov 14 22:44:27 ubuntu ollama[2477]: net/fd_posix.go:55 +0x25 fp=0xc000236708 sp=0xc0002366c0 pc=0x5fefed36a7a5 Nov 14 22:44:27 ubuntu ollama[2477]: net.(*conn).Read(0xc000588008, {0xc0000a6be1?, 0x41d1?, 0xc0001182a8?}) Nov 14 22:44:27 ubuntu ollama[2477]: net/net.go:185 +0x45 fp=0xc000236750 sp=0xc000236708 pc=0x5fefed374a65 Nov 14 22:44:27 ubuntu ollama[2477]: net.(*TCPConn).Read(0xc0001182a0?, {0xc0000a6be1?, 0x0?, 0x0?}) Nov 14 22:44:27 ubuntu ollama[2477]: <autogenerated>:1 +0x25 fp=0xc000236780 sp=0xc000236750 pc=0x5fefed380445 Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*connReader).backgroundRead(0xc0000a6bd0) Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:681 +0x37 fp=0xc0002367c8 sp=0xc000236780 pc=0x5fefed48f1d7 Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*connReader).startBackgroundRead.gowrap2() Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:677 +0x25 fp=0xc0002367e0 sp=0xc0002367c8 pc=0x5fefed48f105 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc0002367e8 sp=0xc0002367e0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by net/http.(*connReader).startBackgroundRead in goroutine 3188 Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:677 +0xba Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 3219 gp=0xc000216c40 m=nil [IO wait]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0x10?, 0x10?, 0xf0?, 0x9d?, 0xb?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc000159da8 sp=0xc000159d88 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.netpollblock(0x5fefed2e5558?, 0xed247b26?, 0xef?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/netpoll.go:573 +0xf7 fp=0xc000159de0 sp=0xc000159da8 pc=0x5fefed277257 Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.runtime_pollWait(0x70456c656ee8, 0x72) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/netpoll.go:345 +0x85 fp=0xc000159e00 sp=0xc000159de0 pc=0x5fefed2abaa5 Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*pollDesc).wait(0xc000031800?, 0xc000118731?, 0x0) Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000159e28 sp=0xc000159e00 pc=0x5fefed2fb9c7 Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*pollDesc).waitRead(...) Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll/fd_poll_runtime.go:89 Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll.(*FD).Read(0xc000031800, {0xc000118731, 0x1, 0x1}) Nov 14 22:44:27 ubuntu ollama[2477]: internal/poll/fd_unix.go:164 +0x27a fp=0xc000159ec0 sp=0xc000159e28 pc=0x5fefed2fc51a Nov 14 22:44:27 ubuntu ollama[2477]: net.(*netFD).Read(0xc000031800, {0xc000118731?, 0xc000159f48?, 0x5fefed2ad6d0?}) Nov 14 22:44:27 ubuntu ollama[2477]: net/fd_posix.go:55 +0x25 fp=0xc000159f08 sp=0xc000159ec0 pc=0x5fefed36a7a5 Nov 14 22:44:27 ubuntu ollama[2477]: net.(*conn).Read(0xc00013a010, {0xc000118731?, 0x153f?, 0xc0000a7028?}) Nov 14 22:44:27 ubuntu ollama[2477]: net/net.go:185 +0x45 fp=0xc000159f50 sp=0xc000159f08 pc=0x5fefed374a65 Nov 14 22:44:27 ubuntu ollama[2477]: net.(*TCPConn).Read(0xc0000a7020?, {0xc000118731?, 0xf32?, 0x1dd?}) Nov 14 22:44:27 ubuntu ollama[2477]: <autogenerated>:1 +0x25 fp=0xc000159f80 sp=0xc000159f50 pc=0x5fefed380445 Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*connReader).backgroundRead(0xc000118720) Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:681 +0x37 fp=0xc000159fc8 sp=0xc000159f80 pc=0x5fefed48f1d7 Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*connReader).startBackgroundRead.gowrap2() Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:677 +0x25 fp=0xc000159fe0 sp=0xc000159fc8 pc=0x5fefed48f105 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc000159fe8 sp=0xc000159fe0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by net/http.(*connReader).startBackgroundRead in goroutine 3149 Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:677 +0xba Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 3188 gp=0xc000217c00 m=nil [select]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0xc0001b5a80?, 0x2?, 0x18?, 0x57?, 0xc0001b5824?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc0001b5698 sp=0xc0001b5678 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.selectgo(0xc0001b5a80, 0xc0001b5820, 0xc00018c500?, 0x0, 0x1?, 0x1) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/select.go:327 +0x725 fp=0xc0001b57b8 sp=0xc0001b5698 pc=0x5fefed2903e5 Nov 14 22:44:27 ubuntu ollama[2477]: main.(*Server).completion(0xc0000c0120, {0x5fefed7ff5b0, 0xc000130000}, 0xc000116000) Nov 14 22:44:27 ubuntu ollama[2477]: github.com/ollama/ollama/llama/runner/runner.go:652 +0x8fe fp=0xc0001b5ab8 sp=0xc0001b57b8 pc=0x5fefed4c26de Nov 14 22:44:27 ubuntu ollama[2477]: main.(*Server).completion-fm({0x5fefed7ff5b0?, 0xc000130000?}, 0x5fefed49db8d?) Nov 14 22:44:27 ubuntu ollama[2477]: <autogenerated>:1 +0x36 fp=0xc0001b5ae8 sp=0xc0001b5ab8 pc=0x5fefed4c56b6 Nov 14 22:44:27 ubuntu ollama[2477]: net/http.HandlerFunc.ServeHTTP(0xc000018ea0?, {0x5fefed7ff5b0?, 0xc000130000?}, 0x10?) Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:2171 +0x29 fp=0xc0001b5b10 sp=0xc0001b5ae8 pc=0x5fefed496629 Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*ServeMux).ServeHTTP(0x5fefed251f85?, {0x5fefed7ff5b0, 0xc000130000}, 0xc000116000) Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:2688 +0x1ad fp=0xc0001b5b60 sp=0xc0001b5b10 pc=0x5fefed4984ad Nov 14 22:44:27 ubuntu ollama[2477]: net/http.serverHandler.ServeHTTP({0x5fefed7fe900?}, {0x5fefed7ff5b0?, 0xc000130000?}, 0x6?) Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:3142 +0x8e fp=0xc0001b5b90 sp=0xc0001b5b60 pc=0x5fefed4994ce Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*conn).serve(0xc00039a090, {0x5fefed7ffa08, 0xc0000a6db0}) Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:2044 +0x5e8 fp=0xc0001b5fb8 sp=0xc0001b5b90 pc=0x5fefed495268 Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*Server).Serve.gowrap3() Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:3290 +0x28 fp=0xc0001b5fe0 sp=0xc0001b5fb8 pc=0x5fefed499c48 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc0001b5fe8 sp=0xc0001b5fe0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by net/http.(*Server).Serve in goroutine 1 Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:3290 +0x4b4 Nov 14 22:44:27 ubuntu ollama[2477]: goroutine 3181 gp=0xc000305880 m=nil [select]: Nov 14 22:44:27 ubuntu ollama[2477]: runtime.gopark(0xc0000e3a80?, 0x2?, 0x18?, 0x37?, 0xc0000e3824?) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/proc.go:402 +0xce fp=0xc0000e3698 sp=0xc0000e3678 pc=0x5fefed27f00e Nov 14 22:44:27 ubuntu ollama[2477]: runtime.selectgo(0xc0000e3a80, 0xc0000e3820, 0xc000030280?, 0x0, 0x1?, 0x1) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/select.go:327 +0x725 fp=0xc0000e37b8 sp=0xc0000e3698 pc=0x5fefed2903e5 Nov 14 22:44:27 ubuntu ollama[2477]: main.(*Server).completion(0xc0000c0120, {0x5fefed7ff5b0, 0xc0001280e0}, 0xc000116120) Nov 14 22:44:27 ubuntu ollama[2477]: github.com/ollama/ollama/llama/runner/runner.go:652 +0x8fe fp=0xc0000e3ab8 sp=0xc0000e37b8 pc=0x5fefed4c26de Nov 14 22:44:27 ubuntu ollama[2477]: main.(*Server).completion-fm({0x5fefed7ff5b0?, 0xc0001280e0?}, 0x5fefed49db8d?) Nov 14 22:44:27 ubuntu ollama[2477]: <autogenerated>:1 +0x36 fp=0xc0000e3ae8 sp=0xc0000e3ab8 pc=0x5fefed4c56b6 Nov 14 22:44:27 ubuntu ollama[2477]: net/http.HandlerFunc.ServeHTTP(0xc000018ea0?, {0x5fefed7ff5b0?, 0xc0001280e0?}, 0x10?) Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:2171 +0x29 fp=0xc0000e3b10 sp=0xc0000e3ae8 pc=0x5fefed496629 Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*ServeMux).ServeHTTP(0x5fefed251f85?, {0x5fefed7ff5b0, 0xc0001280e0}, 0xc000116120) Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:2688 +0x1ad fp=0xc0000e3b60 sp=0xc0000e3b10 pc=0x5fefed4984ad Nov 14 22:44:27 ubuntu ollama[2477]: net/http.serverHandler.ServeHTTP({0x5fefed7fe900?}, {0x5fefed7ff5b0?, 0xc0001280e0?}, 0x6?) Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:3142 +0x8e fp=0xc0000e3b90 sp=0xc0000e3b60 pc=0x5fefed4994ce Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*conn).serve(0xc00035e090, {0x5fefed7ffa08, 0xc0000a6db0}) Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:2044 +0x5e8 fp=0xc0000e3fb8 sp=0xc0000e3b90 pc=0x5fefed495268 Nov 14 22:44:27 ubuntu ollama[2477]: net/http.(*Server).Serve.gowrap3() Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:3290 +0x28 fp=0xc0000e3fe0 sp=0xc0000e3fb8 pc=0x5fefed499c48 Nov 14 22:44:27 ubuntu ollama[2477]: runtime.goexit({}) Nov 14 22:44:27 ubuntu ollama[2477]: runtime/asm_amd64.s:1695 +0x1 fp=0xc0000e3fe8 sp=0xc0000e3fe0 pc=0x5fefed2b0de1 Nov 14 22:44:27 ubuntu ollama[2477]: created by net/http.(*Server).Serve in goroutine 1 Nov 14 22:44:27 ubuntu ollama[2477]: net/http/server.go:3290 +0x4b4 Nov 14 22:44:27 ubuntu ollama[2477]: rax 0x0 Nov 14 22:44:27 ubuntu ollama[2477]: rbx 0x0 Nov 14 22:44:27 ubuntu ollama[2477]: rcx 0x1f500 Nov 14 22:44:27 ubuntu ollama[2477]: rdx 0x5fefef98db70 Nov 14 22:44:27 ubuntu ollama[2477]: rdi 0x5fefef98db70 Nov 14 22:44:27 ubuntu ollama[2477]: rsi 0x1f500 Nov 14 22:44:27 ubuntu ollama[2477]: rbp 0x7044f0006b10 Nov 14 22:44:27 ubuntu ollama[2477]: rsp 0x7fffa008e480 Nov 14 22:44:27 ubuntu ollama[2477]: r8 0x1f500 Nov 14 22:44:27 ubuntu ollama[2477]: r9 0x0 Nov 14 22:44:27 ubuntu ollama[2477]: r10 0x704546c03b30 Nov 14 22:44:27 ubuntu ollama[2477]: r11 0x4 Nov 14 22:44:27 ubuntu ollama[2477]: r12 0x33a Nov 14 22:44:27 ubuntu ollama[2477]: r13 0x7042e8332270 Nov 14 22:44:27 ubuntu ollama[2477]: r14 0x7042e83321a0 Nov 14 22:44:27 ubuntu ollama[2477]: r15 0x0 Nov 14 22:44:27 ubuntu ollama[2477]: rip 0x5fefed64dc68 Nov 14 22:44:27 ubuntu ollama[2477]: rflags 0x10206 Nov 14 22:44:27 ubuntu ollama[2477]: cs 0x33 Nov 14 22:44:27 ubuntu ollama[2477]: fs 0x0 Nov 14 22:44:27 ubuntu ollama[2477]: gs 0x0 Nov 14 22:44:27 ubuntu ollama[2477]: [GIN] 2024/11/14 - 22:44:27 | 500 | 23.414990833s | 127.0.0.1 | POST "/api/chat" Nov 14 22:44:27 ubuntu ollama[2477]: [GIN] 2024/11/14 - 22:44:27 | 500 | 1m28s | 127.0.0.1 | POST "/api/chat" ```
Author
Owner

@NWBx01 commented on GitHub (Nov 15, 2024):

@jessegross I am also experiencing this issue. My logs are nearly identical to @konrad0101. I'm using a Quadro P4000 and Qwen2.5 7b on Ollama 0.4.1

It seems like the issue is caused by a segmentation fault, perhaps?

time=2024-11-15T20:13:00.077Z level=DEBUG source=sched.go:462 msg="finished setting up runner" model=/root/.ollama/models/blobs/sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730
time=2024-11-15T20:13:00.077Z level=DEBUG source=routes.go:270 msg="generate request" images=0 prompt="***\nUser: How far is the 38 parallel from the equator?\nChatGPT:"
time=2024-11-15T20:13:00.079Z level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=0 prompt=20 used=0 remaining=20
llama-sampling.cpp:93: GGML_ASSERT(cur_p->size > 0) failed
SIGSEGV: segmentation violation
PC=0x7f2e71e77f37 m=0 sigcode=1 addr=0x206203fd8
signal arrived during cgo execution

I can post full logs, but as mentioned, they're nearly identical to what has already been posted above.

<!-- gh-comment-id:2479857616 --> @NWBx01 commented on GitHub (Nov 15, 2024): @jessegross I am also experiencing this issue. My logs are nearly identical to @konrad0101. I'm using a Quadro P4000 and Qwen2.5 7b on Ollama 0.4.1 It seems like the issue is caused by a segmentation fault, perhaps? ```time=2024-11-15T20:13:00.077Z level=INFO source=server.go:601 msg="llama runner started in 2.27 seconds" time=2024-11-15T20:13:00.077Z level=DEBUG source=sched.go:462 msg="finished setting up runner" model=/root/.ollama/models/blobs/sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 time=2024-11-15T20:13:00.077Z level=DEBUG source=routes.go:270 msg="generate request" images=0 prompt="***\nUser: How far is the 38 parallel from the equator?\nChatGPT:" time=2024-11-15T20:13:00.079Z level=DEBUG source=cache.go:99 msg="loading cache slot" id=0 cache=0 prompt=20 used=0 remaining=20 llama-sampling.cpp:93: GGML_ASSERT(cur_p->size > 0) failed SIGSEGV: segmentation violation PC=0x7f2e71e77f37 m=0 sigcode=1 addr=0x206203fd8 signal arrived during cgo execution ``` I can post full logs, but as mentioned, they're nearly identical to what has already been posted above.
Author
Owner

@NWBx01 commented on GitHub (Nov 15, 2024):

I decided to check the past several version of Ollama. It would appear that 0.4.2-rc1, 0.4.1, 0.4.0, and 0.3.14 are affected. The last version to function properly is 0.3.13. 0.3.12 also functioned properly in my testing, and it was mentioned earlier in this thread that 0.3.11 worked as well.

<!-- gh-comment-id:2479903955 --> @NWBx01 commented on GitHub (Nov 15, 2024): I decided to check the past several version of Ollama. It would appear that 0.4.2-rc1, 0.4.1, 0.4.0, and 0.3.14 are affected. The last version to function properly is 0.3.13. 0.3.12 also functioned properly in my testing, and it was mentioned earlier in this thread that 0.3.11 worked as well.
Author
Owner

@S-yf commented on GitHub (Nov 15, 2024):

Have you solved it? I also had this problem

<!-- gh-comment-id:2480058498 --> @S-yf commented on GitHub (Nov 15, 2024): Have you solved it? I also had this problem
Author
Owner

@padey commented on GitHub (Nov 16, 2024):

+1, same problem.

<!-- gh-comment-id:2480578205 --> @padey commented on GitHub (Nov 16, 2024): +1, same problem.
Author
Owner

@agreppin commented on GitHub (Nov 16, 2024):

more logs with AMD/ROCm variant, Ubuntu 24.04.1, CPU no-avx2:

ollama-v0.3.13-rocm-v6.2.4.log
{"error":"llama runner process has terminated: signal: illegal instruction (core dumped)"}

ollama-v0.4.2-rocm-v6.2.4.log
{"error":"POST predict: Post "http://127.0.0.1:39843/completion": EOF"}

scripts used:

sudo systemctl stop ollama.service
set -eux
export AMD_LOG_LEVEL=3
export OLLAMA_DEBUG=1
ollama serve
curl http://localhost:11434/api/generate -d '{
  "model": "granite-code",
  "prompt": "hello"
}'

Edit: using your builds with your rocm version

<!-- gh-comment-id:2480624774 --> @agreppin commented on GitHub (Nov 16, 2024): more logs with AMD/ROCm variant, Ubuntu 24.04.1, CPU no-avx2: [ollama-v0.3.13-rocm-v6.2.4.log](https://github.com/user-attachments/files/17787029/ollama-v0.3.13-rocm-v6.2.4.log) {"error":"llama runner process has terminated: signal: illegal instruction (core dumped)"} [ollama-v0.4.2-rocm-v6.2.4.log](https://github.com/user-attachments/files/17787030/ollama-v0.4.2-rocm-v6.2.4.log) {"error":"POST predict: Post \"http://127.0.0.1:39843/completion\": EOF"} scripts used: ```sh sudo systemctl stop ollama.service set -eux export AMD_LOG_LEVEL=3 export OLLAMA_DEBUG=1 ollama serve ``` ```sh curl http://localhost:11434/api/generate -d '{ "model": "granite-code", "prompt": "hello" }' ``` Edit: using your builds with your rocm version
Author
Owner

@huskeyw commented on GitHub (Nov 22, 2024):

same issue here, only when a LLM extends over 1 GPU..

huskeyw@timmy:~$ [GIN] 2024/11/21 - 18:09:24 | 200 | 42.709µs | 127.0.0.1 | HEAD "/"
[GIN] 2024/11/21 - 18:09:24 | 200 | 48.614366ms | 127.0.0.1 | POST "/api/show"
time=2024-11-21T18:09:24.743-08:00 level=INFO source=sched.go:730 msg="new model will fit in available VRAM, loading" model=/home/huskeyw/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574 library=cuda parallel=4 required="43.2 GiB"
time=2024-11-21T18:09:24.919-08:00 level=INFO source=server.go:105 msg="system memory" total="251.6 GiB" free="246.9 GiB" free_swap="8.0 GiB"
time=2024-11-21T18:09:24.920-08:00 level=INFO source=memory.go:343 msg="offload to cuda" layers.requested=-1 layers.model=81 layers.offload=81 layers.split=41,40 memory.available="[22.3 GiB 22.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="43.2 GiB" memory.required.partial="43.2 GiB" memory.required.kv="2.5 GiB" memory.required.allocations="[22.0 GiB 21.2 GiB]" memory.weights.total="38.4 GiB" memory.weights.repeating="37.6 GiB" memory.weights.nonrepeating="822.0 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.1 GiB"
time=2024-11-21T18:09:24.923-08:00 level=INFO source=server.go:383 msg="starting llama server" cmd="/tmp/ollama1665379698/runners/cuda_v11/ollama_llama_server --model /home/huskeyw/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574 --ctx-size 8192 --batch-size 512 --n-gpu-layers 81 --threads 24 --parallel 4 --tensor-split 41,40 --port 42669"
time=2024-11-21T18:09:24.924-08:00 level=INFO source=sched.go:449 msg="loaded runners" count=1
time=2024-11-21T18:09:24.924-08:00 level=INFO source=server.go:562 msg="waiting for llama runner to start responding"
time=2024-11-21T18:09:24.924-08:00 level=INFO source=server.go:596 msg="waiting for server to become available" status="llm server error"
time=2024-11-21T18:09:24.942-08:00 level=INFO source=runner.go:883 msg="starting go runner"
time=2024-11-21T18:09:24.942-08:00 level=INFO source=runner.go:884 msg=system info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | cgo(gcc)" threads=24
time=2024-11-21T18:09:24.942-08:00 level=INFO source=.:0 msg="Server listening on 127.0.0.1:42669"
llama_model_loader: loaded meta data with 29 key-value pairs and 724 tensors from /home/huskeyw/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Meta Llama 3.1 70B Instruct
llama_model_loader: - kv 3: general.finetune str = Instruct
llama_model_loader: - kv 4: general.basename str = Meta-Llama-3.1
llama_model_loader: - kv 5: general.size_label str = 70B
llama_model_loader: - kv 6: general.license str = llama3.1
llama_model_loader: - kv 7: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv 8: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ...
llama_model_loader: - kv 9: llama.block_count u32 = 80
llama_model_loader: - kv 10: llama.context_length u32 = 131072
llama_model_loader: - kv 11: llama.embedding_length u32 = 8192
llama_model_loader: - kv 12: llama.feed_forward_length u32 = 28672
llama_model_loader: - kv 13: llama.attention.head_count u32 = 64
llama_model_loader: - kv 14: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 15: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 16: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 17: general.file_type u32 = 2
llama_model_loader: - kv 18: llama.vocab_size u32 = 128256
llama_model_loader: - kv 19: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 21: tokenizer.ggml.pre str = llama-bpe
llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,128256] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
time=2024-11-21T18:09:25.176-08:00 level=INFO source=server.go:596 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 25: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 128009
llama_model_loader: - kv 27: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv 28: general.quantization_version u32 = 2
llama_model_loader: - type f32: 162 tensors
llama_model_loader: - type q4_0: 561 tensors
llama_model_loader: - type q6_K: 1 tensors
llm_load_vocab: special tokens cache size = 256
llm_load_vocab: token to piece cache size = 0.7999 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 128256
llm_load_print_meta: n_merges = 280147
llm_load_print_meta: vocab_only = 0
llm_load_print_meta: n_ctx_train = 131072
llm_load_print_meta: n_embd = 8192
llm_load_print_meta: n_layer = 80
llm_load_print_meta: n_head = 64
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_swa = 0
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 8
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 28672
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn = 131072
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: ssm_dt_b_c_rms = 0
llm_load_print_meta: model type = 70B
llm_load_print_meta: model ftype = Q4_0
llm_load_print_meta: model params = 70.55 B
llm_load_print_meta: model size = 37.22 GiB (4.53 BPW)
llm_load_print_meta: general.name = Meta Llama 3.1 70B Instruct
llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token = 128009 '<|eot_id|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_print_meta: EOT token = 128009 '<|eot_id|>'
llm_load_print_meta: EOM token = 128008 '<|eom_id|>'
llm_load_print_meta: EOG token = 128008 '<|eom_id|>'
llm_load_print_meta: EOG token = 128009 '<|eot_id|>'
llm_load_print_meta: max token length = 256
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
Device 0: Tesla M40 24GB, compute capability 5.2, VMM: yes
Device 1: Tesla M40 24GB, compute capability 5.2, VMM: yes
llm_load_tensors: ggml ctx size = 1.02 MiB
llm_load_tensors: offloading 80 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 81/81 layers to GPU
llm_load_tensors: CPU buffer size = 563.62 MiB
llm_load_tensors: CUDA0 buffer size = 18821.57 MiB
llm_load_tensors: CUDA1 buffer size = 18725.43 MiB
llama_new_context_with_model: n_ctx = 8192
llama_new_context_with_model: n_batch = 2048
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CUDA0 KV buffer size = 1312.00 MiB
llama_kv_cache_init: CUDA1 KV buffer size = 1248.00 MiB
llama_new_context_with_model: KV self size = 2560.00 MiB, K (f16): 1280.00 MiB, V (f16): 1280.00 MiB
llama_new_context_with_model: CUDA_Host output buffer size = 2.08 MiB
llama_new_context_with_model: pipeline parallelism enabled (n_copies=4)
llama_new_context_with_model: CUDA0 compute buffer size = 1216.01 MiB
llama_new_context_with_model: CUDA1 compute buffer size = 1216.02 MiB
llama_new_context_with_model: CUDA_Host compute buffer size = 80.02 MiB
llama_new_context_with_model: graph nodes = 2566
llama_new_context_with_model: graph splits = 3
time=2024-11-21T18:09:36.721-08:00 level=INFO source=server.go:601 msg="llama runner started in 11.80 seconds"
[GIN] 2024/11/21 - 18:09:36 | 200 | 12.287203955s | 127.0.0.1 | POST "/api/generate"
CUDA error: out of memory
current device: 1, in function alloc at ggml-cuda.cu:406
cuMemCreate(&handle, reserve_size, &prop, 0)
ggml-cuda.cu:132: CUDA error
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
SIGABRT: abort
PC=0x7ec7edc9eb1c m=5 sigcode=18446744073709551610
signal arrived during cgo execution

goroutine 7 gp=0xc0000f0380 m=5 mp=0xc000100008 [syscall]:
runtime.cgocall(0x5982c5435b70, 0xc0000b6b20)
runtime/cgocall.go:157 +0x4b fp=0xc0000b6af8 sp=0xc0000b6ac0 pc=0x5982c51b73cb
github.com/ollama/ollama/llama._Cfunc_llama_decode(0x7ec798006470, {0xf, 0x7ec79828bed0, 0x0, 0x0, 0x7ec798009bd0, 0x7ec7980070c0, 0x7ec798012260, 0x7ec798024950, 0x0, ...})
_cgo_gotypes.go:543 +0x52 fp=0xc0000b6b20 sp=0xc0000b6af8 pc=0x5982c52b4952
github.com/ollama/ollama/llama.(*Context).Decode.func1(0x5982c543186b?, 0x7ec798006470?)
github.com/ollama/ollama/llama/llama.go:169 +0xd8 fp=0xc0000b6c40 sp=0xc0000b6b20 pc=0x5982c52b6f18
github.com/ollama/ollama/llama.(*Context).Decode(0xc0000b6d28?, 0x0?)
github.com/ollama/ollama/llama/llama.go:169 +0x17 fp=0xc0000b6c88 sp=0xc0000b6c40 pc=0x5982c52b6d77
main.(*Server).processBatch(0xc000188120, 0xc0001861c0, 0xc0000b6f10)
github.com/ollama/ollama/llama/runner/runner.go:427 +0x38d fp=0xc0000b6ed0 sp=0xc0000b6c88 pc=0x5982c543080d
main.(*Server).run(0xc000188120, {0x5982c5774ea0, 0xc0000dc0f0})
github.com/ollama/ollama/llama/runner/runner.go:327 +0x1a5 fp=0xc0000b6fb8 sp=0xc0000b6ed0 pc=0x5982c5430105
main.main.gowrap2()
github.com/ollama/ollama/llama/runner/runner.go:922 +0x28 fp=0xc0000b6fe0 sp=0xc0000b6fb8 pc=0x5982c5434ba8
runtime.goexit({})
runtime/asm_amd64.s:1695 +0x1 fp=0xc0000b6fe8 sp=0xc0000b6fe0 pc=0x5982c521fde1
created by main.main in goroutine 1
github.com/ollama/ollama/llama/runner/runner.go:922 +0xc52

goroutine 1 gp=0xc000006380 m=nil [IO wait]:
runtime.gopark(0xc000032008?, 0x0?, 0x80?, 0x63?, 0xc00002d8b8?)
runtime/proc.go:402 +0xce fp=0xc00002d880 sp=0xc00002d860 pc=0x5982c51ee00e
runtime.netpollblock(0xc00002d918?, 0xc51b6b26?, 0x82?)
runtime/netpoll.go:573 +0xf7 fp=0xc00002d8b8 sp=0xc00002d880 pc=0x5982c51e6257
internal/poll.runtime_pollWait(0x7ec806b89820, 0x72)
runtime/netpoll.go:345 +0x85 fp=0xc00002d8d8 sp=0xc00002d8b8 pc=0x5982c521aaa5
internal/poll.(*pollDesc).wait(0x3?, 0x3fe?, 0x0)
internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00002d900 sp=0xc00002d8d8 pc=0x5982c526a9c7
internal/poll.(*pollDesc).waitRead(...)
internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc0001be100)
internal/poll/fd_unix.go:611 +0x2ac fp=0xc00002d9a8 sp=0xc00002d900 pc=0x5982c526be8c
net.(*netFD).accept(0xc0001be100)
net/fd_unix.go:172 +0x29 fp=0xc00002da60 sp=0xc00002d9a8 pc=0x5982c52da949
net.(*TCPListener).accept(0xc0000c6200)
net/tcpsock_posix.go:159 +0x1e fp=0xc00002da88 sp=0xc00002da60 pc=0x5982c52eb67e
net.(*TCPListener).Accept(0xc0000c6200)
net/tcpsock.go:327 +0x30 fp=0xc00002dab8 sp=0xc00002da88 pc=0x5982c52ea9d0
net/http.(*onceCloseListener).Accept(0xc0001881b0?)
:1 +0x24 fp=0xc00002dad0 sp=0xc00002dab8 pc=0x5982c5411be4
net/http.(*Server).Serve(0xc0001ce0f0, {0x5982c5774860, 0xc0000c6200})
net/http/server.go:3260 +0x33e fp=0xc00002dc00 sp=0xc00002dad0 pc=0x5982c54089fe
main.main()
github.com/ollama/ollama/llama/runner/runner.go:942 +0xfec fp=0xc00002df50 sp=0xc00002dc00 pc=0x5982c543492c
runtime.main()
runtime/proc.go:271 +0x29d fp=0xc00002dfe0 sp=0xc00002df50 pc=0x5982c51edbdd
runtime.goexit({})
runtime/asm_amd64.s:1695 +0x1 fp=0xc00002dfe8 sp=0xc00002dfe0 pc=0x5982c521fde1

goroutine 2 gp=0xc000006e00 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:402 +0xce fp=0xc0000a6fa8 sp=0xc0000a6f88 pc=0x5982c51ee00e
runtime.goparkunlock(...)
runtime/proc.go:408
runtime.forcegchelper()
runtime/proc.go:326 +0xb8 fp=0xc0000a6fe0 sp=0xc0000a6fa8 pc=0x5982c51ede98
runtime.goexit({})
runtime/asm_amd64.s:1695 +0x1 fp=0xc0000a6fe8 sp=0xc0000a6fe0 pc=0x5982c521fde1
created by runtime.init.6 in goroutine 1
runtime/proc.go:314 +0x1a

goroutine 3 gp=0xc000007340 m=nil [GC sweep wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:402 +0xce fp=0xc0000a7780 sp=0xc0000a7760 pc=0x5982c51ee00e
runtime.goparkunlock(...)
runtime/proc.go:408
runtime.bgsweep(0xc0000220e0)
runtime/mgcsweep.go:278 +0x94 fp=0xc0000a77c8 sp=0xc0000a7780 pc=0x5982c51d8b54
runtime.gcenable.gowrap1()
runtime/mgc.go:203 +0x25 fp=0xc0000a77e0 sp=0xc0000a77c8 pc=0x5982c51cd685
runtime.goexit({})
runtime/asm_amd64.s:1695 +0x1 fp=0xc0000a77e8 sp=0xc0000a77e0 pc=0x5982c521fde1
created by runtime.gcenable in goroutine 1
runtime/mgc.go:203 +0x66

goroutine 4 gp=0xc000007500 m=nil [GC scavenge wait]:
runtime.gopark(0xc0000220e0?, 0x5982c5674260?, 0x1?, 0x0?, 0xc000007500?)
runtime/proc.go:402 +0xce fp=0xc0000a7f78 sp=0xc0000a7f58 pc=0x5982c51ee00e
runtime.goparkunlock(...)
runtime/proc.go:408
runtime.(*scavengerState).park(0x5982c5943520)
runtime/mgcscavenge.go:425 +0x49 fp=0xc0000a7fa8 sp=0xc0000a7f78 pc=0x5982c51d6549
runtime.bgscavenge(0xc0000220e0)
runtime/mgcscavenge.go:653 +0x3c fp=0xc0000a7fc8 sp=0xc0000a7fa8 pc=0x5982c51d6adc
runtime.gcenable.gowrap2()
runtime/mgc.go:204 +0x25 fp=0xc0000a7fe0 sp=0xc0000a7fc8 pc=0x5982c51cd625
runtime.goexit({})
runtime/asm_amd64.s:1695 +0x1 fp=0xc0000a7fe8 sp=0xc0000a7fe0 pc=0x5982c521fde1
created by runtime.gcenable in goroutine 1
runtime/mgc.go:204 +0xa5

goroutine 5 gp=0xc0000f0000 m=nil [finalizer wait]:
runtime.gopark(0xc0000a6648?, 0x5982c51c0f85?, 0xa8?, 0x1?, 0xc000006380?)
runtime/proc.go:402 +0xce fp=0xc0000a6620 sp=0xc0000a6600 pc=0x5982c51ee00e
runtime.runfinq()
runtime/mfinal.go:194 +0x107 fp=0xc0000a67e0 sp=0xc0000a6620 pc=0x5982c51cc6c7
runtime.goexit({})
runtime/asm_amd64.s:1695 +0x1 fp=0xc0000a67e8 sp=0xc0000a67e0 pc=0x5982c521fde1
created by runtime.createfing in goroutine 1
runtime/mfinal.go:164 +0x3d

goroutine 8 gp=0xc0000f0540 m=nil [select]:
runtime.gopark(0xc000253a48?, 0x2?, 0x50?, 0x81?, 0xc0002537ec?)
runtime/proc.go:402 +0xce fp=0xc000253658 sp=0xc000253638 pc=0x5982c51ee00e
runtime.selectgo(0xc000253a48, 0xc0002537e8, 0xf?, 0x0, 0x1?, 0x1)
runtime/select.go:327 +0x725 fp=0xc000253778 sp=0xc000253658 pc=0x5982c51ff3e5
main.(*Server).completion(0xc000188120, {0x5982c5774a10, 0xc000217180}, 0xc000207680)
github.com/ollama/ollama/llama/runner/runner.go:667 +0xa45 fp=0xc000253ab8 sp=0xc000253778 pc=0x5982c5432345
main.(*Server).completion-fm({0x5982c5774a10?, 0xc000217180?}, 0x5982c540cd2d?)
:1 +0x36 fp=0xc000253ae8 sp=0xc000253ab8 pc=0x5982c5435396
net/http.HandlerFunc.ServeHTTP(0xc0000eac30?, {0x5982c5774a10?, 0xc000217180?}, 0x10?)
net/http/server.go:2171 +0x29 fp=0xc000253b10 sp=0xc000253ae8 pc=0x5982c54057c9
net/http.(*ServeMux).ServeHTTP(0x5982c51c0f85?, {0x5982c5774a10, 0xc000217180}, 0xc000207680)
net/http/server.go:2688 +0x1ad fp=0xc000253b60 sp=0xc000253b10 pc=0x5982c540764d
net/http.serverHandler.ServeHTTP({0x5982c5773d60?}, {0x5982c5774a10?, 0xc000217180?}, 0x6?)
net/http/server.go:3142 +0x8e fp=0xc000253b90 sp=0xc000253b60 pc=0x5982c540866e
net/http.(*conn).serve(0xc0001881b0, {0x5982c5774e68, 0xc0000e8e10})
net/http/server.go:2044 +0x5e8 fp=0xc000253fb8 sp=0xc000253b90 pc=0x5982c5404408
net/http.(*Server).Serve.gowrap3()
net/http/server.go:3290 +0x28 fp=0xc000253fe0 sp=0xc000253fb8 pc=0x5982c5408de8
runtime.goexit({})
runtime/asm_amd64.s:1695 +0x1 fp=0xc000253fe8 sp=0xc000253fe0 pc=0x5982c521fde1
created by net/http.(*Server).Serve in goroutine 1
net/http/server.go:3290 +0x4b4

goroutine 70 gp=0xc00021efc0 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0xb?)
runtime/proc.go:402 +0xce fp=0xc000220da8 sp=0xc000220d88 pc=0x5982c51ee00e
runtime.netpollblock(0x5982c5254558?, 0xc51b6b26?, 0x82?)
runtime/netpoll.go:573 +0xf7 fp=0xc000220de0 sp=0xc000220da8 pc=0x5982c51e6257
internal/poll.runtime_pollWait(0x7ec806b89728, 0x72)
runtime/netpoll.go:345 +0x85 fp=0xc000220e00 sp=0xc000220de0 pc=0x5982c521aaa5
internal/poll.(*pollDesc).wait(0xc0001be180?, 0xc0000e8f41?, 0x0)
internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000220e28 sp=0xc000220e00 pc=0x5982c526a9c7
internal/poll.(*pollDesc).waitRead(...)
internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc0001be180, {0xc0000e8f41, 0x1, 0x1})
internal/poll/fd_unix.go:164 +0x27a fp=0xc000220ec0 sp=0xc000220e28 pc=0x5982c526b51a
net.(*netFD).Read(0xc0001be180, {0xc0000e8f41?, 0x0?, 0x0?})
net/fd_posix.go:55 +0x25 fp=0xc000220f08 sp=0xc000220ec0 pc=0x5982c52d9845
net.(*conn).Read(0xc0000aa0a0, {0xc0000e8f41?, 0x0?, 0x0?})
net/net.go:185 +0x45 fp=0xc000220f50 sp=0xc000220f08 pc=0x5982c52e3b05
net.(*TCPConn).Read(0x0?, {0xc0000e8f41?, 0x0?, 0x0?})
:1 +0x25 fp=0xc000220f80 sp=0xc000220f50 pc=0x5982c52ef4e5
net/http.(*connReader).backgroundRead(0xc0000e8f30)
net/http/server.go:681 +0x37 fp=0xc000220fc8 sp=0xc000220f80 pc=0x5982c53fe377
net/http.(*connReader).startBackgroundRead.gowrap2()
net/http/server.go:677 +0x25 fp=0xc000220fe0 sp=0xc000220fc8 pc=0x5982c53fe2a5
runtime.goexit({})
runtime/asm_amd64.s:1695 +0x1 fp=0xc000220fe8 sp=0xc000220fe0 pc=0x5982c521fde1
created by net/http.(*connReader).startBackgroundRead in goroutine 8
net/http/server.go:677 +0xba

rax 0x0
rbx 0xe031
rcx 0x7ec7edc9eb1c
rdx 0x6
rdi 0xe02d
rsi 0xe031
rbp 0x7ec7a53f6410
rsp 0x7ec7a53f63d0
r8 0x0
r9 0x0
r10 0x8
r11 0x246
r12 0x6
r13 0x84
r14 0x16
r15 0x4c311a4000
rip 0x7ec7edc9eb1c
rflags 0x246
cs 0x33
fs 0x0
gs 0x0
[GIN] 2024/11/21 - 18:09:52 | 200 | 5.978648843s | 127.0.0.1 | POST "/api/chat"

<!-- gh-comment-id:2492730115 --> @huskeyw commented on GitHub (Nov 22, 2024): same issue here, only when a LLM extends over 1 GPU.. huskeyw@timmy:~$ [GIN] 2024/11/21 - 18:09:24 | 200 | 42.709µs | 127.0.0.1 | HEAD "/" [GIN] 2024/11/21 - 18:09:24 | 200 | 48.614366ms | 127.0.0.1 | POST "/api/show" time=2024-11-21T18:09:24.743-08:00 level=INFO source=sched.go:730 msg="new model will fit in available VRAM, loading" model=/home/huskeyw/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574 library=cuda parallel=4 required="43.2 GiB" time=2024-11-21T18:09:24.919-08:00 level=INFO source=server.go:105 msg="system memory" total="251.6 GiB" free="246.9 GiB" free_swap="8.0 GiB" time=2024-11-21T18:09:24.920-08:00 level=INFO source=memory.go:343 msg="offload to cuda" layers.requested=-1 layers.model=81 layers.offload=81 layers.split=41,40 memory.available="[22.3 GiB 22.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="43.2 GiB" memory.required.partial="43.2 GiB" memory.required.kv="2.5 GiB" memory.required.allocations="[22.0 GiB 21.2 GiB]" memory.weights.total="38.4 GiB" memory.weights.repeating="37.6 GiB" memory.weights.nonrepeating="822.0 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.1 GiB" time=2024-11-21T18:09:24.923-08:00 level=INFO source=server.go:383 msg="starting llama server" cmd="/tmp/ollama1665379698/runners/cuda_v11/ollama_llama_server --model /home/huskeyw/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574 --ctx-size 8192 --batch-size 512 --n-gpu-layers 81 --threads 24 --parallel 4 --tensor-split 41,40 --port 42669" time=2024-11-21T18:09:24.924-08:00 level=INFO source=sched.go:449 msg="loaded runners" count=1 time=2024-11-21T18:09:24.924-08:00 level=INFO source=server.go:562 msg="waiting for llama runner to start responding" time=2024-11-21T18:09:24.924-08:00 level=INFO source=server.go:596 msg="waiting for server to become available" status="llm server error" time=2024-11-21T18:09:24.942-08:00 level=INFO source=runner.go:883 msg="starting go runner" time=2024-11-21T18:09:24.942-08:00 level=INFO source=runner.go:884 msg=system info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | cgo(gcc)" threads=24 time=2024-11-21T18:09:24.942-08:00 level=INFO source=.:0 msg="Server listening on 127.0.0.1:42669" llama_model_loader: loaded meta data with 29 key-value pairs and 724 tensors from /home/huskeyw/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Meta Llama 3.1 70B Instruct llama_model_loader: - kv 3: general.finetune str = Instruct llama_model_loader: - kv 4: general.basename str = Meta-Llama-3.1 llama_model_loader: - kv 5: general.size_label str = 70B llama_model_loader: - kv 6: general.license str = llama3.1 llama_model_loader: - kv 7: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam... llama_model_loader: - kv 8: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ... llama_model_loader: - kv 9: llama.block_count u32 = 80 llama_model_loader: - kv 10: llama.context_length u32 = 131072 llama_model_loader: - kv 11: llama.embedding_length u32 = 8192 llama_model_loader: - kv 12: llama.feed_forward_length u32 = 28672 llama_model_loader: - kv 13: llama.attention.head_count u32 = 64 llama_model_loader: - kv 14: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 15: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 16: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 17: general.file_type u32 = 2 llama_model_loader: - kv 18: llama.vocab_size u32 = 128256 llama_model_loader: - kv 19: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 21: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... time=2024-11-21T18:09:25.176-08:00 level=INFO source=server.go:596 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 25: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 27: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... llama_model_loader: - kv 28: general.quantization_version u32 = 2 llama_model_loader: - type f32: 162 tensors llama_model_loader: - type q4_0: 561 tensors llama_model_loader: - type q6_K: 1 tensors llm_load_vocab: special tokens cache size = 256 llm_load_vocab: token to piece cache size = 0.7999 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 131072 llm_load_print_meta: n_embd = 8192 llm_load_print_meta: n_layer = 80 llm_load_print_meta: n_head = 64 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 8 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 28672 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 500000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 131072 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: ssm_dt_b_c_rms = 0 llm_load_print_meta: model type = 70B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 70.55 B llm_load_print_meta: model size = 37.22 GiB (4.53 BPW) llm_load_print_meta: general.name = Meta Llama 3.1 70B Instruct llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' llm_load_print_meta: EOS token = 128009 '<|eot_id|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_print_meta: EOT token = 128009 '<|eot_id|>' llm_load_print_meta: EOM token = 128008 '<|eom_id|>' llm_load_print_meta: EOG token = 128008 '<|eom_id|>' llm_load_print_meta: EOG token = 128009 '<|eot_id|>' llm_load_print_meta: max token length = 256 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 2 CUDA devices: Device 0: Tesla M40 24GB, compute capability 5.2, VMM: yes Device 1: Tesla M40 24GB, compute capability 5.2, VMM: yes llm_load_tensors: ggml ctx size = 1.02 MiB llm_load_tensors: offloading 80 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 81/81 layers to GPU llm_load_tensors: CPU buffer size = 563.62 MiB llm_load_tensors: CUDA0 buffer size = 18821.57 MiB llm_load_tensors: CUDA1 buffer size = 18725.43 MiB llama_new_context_with_model: n_ctx = 8192 llama_new_context_with_model: n_batch = 2048 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 500000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: CUDA0 KV buffer size = 1312.00 MiB llama_kv_cache_init: CUDA1 KV buffer size = 1248.00 MiB llama_new_context_with_model: KV self size = 2560.00 MiB, K (f16): 1280.00 MiB, V (f16): 1280.00 MiB llama_new_context_with_model: CUDA_Host output buffer size = 2.08 MiB llama_new_context_with_model: pipeline parallelism enabled (n_copies=4) llama_new_context_with_model: CUDA0 compute buffer size = 1216.01 MiB llama_new_context_with_model: CUDA1 compute buffer size = 1216.02 MiB llama_new_context_with_model: CUDA_Host compute buffer size = 80.02 MiB llama_new_context_with_model: graph nodes = 2566 llama_new_context_with_model: graph splits = 3 time=2024-11-21T18:09:36.721-08:00 level=INFO source=server.go:601 msg="llama runner started in 11.80 seconds" [GIN] 2024/11/21 - 18:09:36 | 200 | 12.287203955s | 127.0.0.1 | POST "/api/generate" CUDA error: out of memory current device: 1, in function alloc at ggml-cuda.cu:406 cuMemCreate(&handle, reserve_size, &prop, 0) ggml-cuda.cu:132: CUDA error Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf ptrace: Operation not permitted. No stack. The program is not being run. SIGABRT: abort PC=0x7ec7edc9eb1c m=5 sigcode=18446744073709551610 signal arrived during cgo execution goroutine 7 gp=0xc0000f0380 m=5 mp=0xc000100008 [syscall]: runtime.cgocall(0x5982c5435b70, 0xc0000b6b20) runtime/cgocall.go:157 +0x4b fp=0xc0000b6af8 sp=0xc0000b6ac0 pc=0x5982c51b73cb github.com/ollama/ollama/llama._Cfunc_llama_decode(0x7ec798006470, {0xf, 0x7ec79828bed0, 0x0, 0x0, 0x7ec798009bd0, 0x7ec7980070c0, 0x7ec798012260, 0x7ec798024950, 0x0, ...}) _cgo_gotypes.go:543 +0x52 fp=0xc0000b6b20 sp=0xc0000b6af8 pc=0x5982c52b4952 github.com/ollama/ollama/llama.(*Context).Decode.func1(0x5982c543186b?, 0x7ec798006470?) github.com/ollama/ollama/llama/llama.go:169 +0xd8 fp=0xc0000b6c40 sp=0xc0000b6b20 pc=0x5982c52b6f18 github.com/ollama/ollama/llama.(*Context).Decode(0xc0000b6d28?, 0x0?) github.com/ollama/ollama/llama/llama.go:169 +0x17 fp=0xc0000b6c88 sp=0xc0000b6c40 pc=0x5982c52b6d77 main.(*Server).processBatch(0xc000188120, 0xc0001861c0, 0xc0000b6f10) github.com/ollama/ollama/llama/runner/runner.go:427 +0x38d fp=0xc0000b6ed0 sp=0xc0000b6c88 pc=0x5982c543080d main.(*Server).run(0xc000188120, {0x5982c5774ea0, 0xc0000dc0f0}) github.com/ollama/ollama/llama/runner/runner.go:327 +0x1a5 fp=0xc0000b6fb8 sp=0xc0000b6ed0 pc=0x5982c5430105 main.main.gowrap2() github.com/ollama/ollama/llama/runner/runner.go:922 +0x28 fp=0xc0000b6fe0 sp=0xc0000b6fb8 pc=0x5982c5434ba8 runtime.goexit({}) runtime/asm_amd64.s:1695 +0x1 fp=0xc0000b6fe8 sp=0xc0000b6fe0 pc=0x5982c521fde1 created by main.main in goroutine 1 github.com/ollama/ollama/llama/runner/runner.go:922 +0xc52 goroutine 1 gp=0xc000006380 m=nil [IO wait]: runtime.gopark(0xc000032008?, 0x0?, 0x80?, 0x63?, 0xc00002d8b8?) runtime/proc.go:402 +0xce fp=0xc00002d880 sp=0xc00002d860 pc=0x5982c51ee00e runtime.netpollblock(0xc00002d918?, 0xc51b6b26?, 0x82?) runtime/netpoll.go:573 +0xf7 fp=0xc00002d8b8 sp=0xc00002d880 pc=0x5982c51e6257 internal/poll.runtime_pollWait(0x7ec806b89820, 0x72) runtime/netpoll.go:345 +0x85 fp=0xc00002d8d8 sp=0xc00002d8b8 pc=0x5982c521aaa5 internal/poll.(*pollDesc).wait(0x3?, 0x3fe?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00002d900 sp=0xc00002d8d8 pc=0x5982c526a9c7 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0xc0001be100) internal/poll/fd_unix.go:611 +0x2ac fp=0xc00002d9a8 sp=0xc00002d900 pc=0x5982c526be8c net.(*netFD).accept(0xc0001be100) net/fd_unix.go:172 +0x29 fp=0xc00002da60 sp=0xc00002d9a8 pc=0x5982c52da949 net.(*TCPListener).accept(0xc0000c6200) net/tcpsock_posix.go:159 +0x1e fp=0xc00002da88 sp=0xc00002da60 pc=0x5982c52eb67e net.(*TCPListener).Accept(0xc0000c6200) net/tcpsock.go:327 +0x30 fp=0xc00002dab8 sp=0xc00002da88 pc=0x5982c52ea9d0 net/http.(*onceCloseListener).Accept(0xc0001881b0?) <autogenerated>:1 +0x24 fp=0xc00002dad0 sp=0xc00002dab8 pc=0x5982c5411be4 net/http.(*Server).Serve(0xc0001ce0f0, {0x5982c5774860, 0xc0000c6200}) net/http/server.go:3260 +0x33e fp=0xc00002dc00 sp=0xc00002dad0 pc=0x5982c54089fe main.main() github.com/ollama/ollama/llama/runner/runner.go:942 +0xfec fp=0xc00002df50 sp=0xc00002dc00 pc=0x5982c543492c runtime.main() runtime/proc.go:271 +0x29d fp=0xc00002dfe0 sp=0xc00002df50 pc=0x5982c51edbdd runtime.goexit({}) runtime/asm_amd64.s:1695 +0x1 fp=0xc00002dfe8 sp=0xc00002dfe0 pc=0x5982c521fde1 goroutine 2 gp=0xc000006e00 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:402 +0xce fp=0xc0000a6fa8 sp=0xc0000a6f88 pc=0x5982c51ee00e runtime.goparkunlock(...) runtime/proc.go:408 runtime.forcegchelper() runtime/proc.go:326 +0xb8 fp=0xc0000a6fe0 sp=0xc0000a6fa8 pc=0x5982c51ede98 runtime.goexit({}) runtime/asm_amd64.s:1695 +0x1 fp=0xc0000a6fe8 sp=0xc0000a6fe0 pc=0x5982c521fde1 created by runtime.init.6 in goroutine 1 runtime/proc.go:314 +0x1a goroutine 3 gp=0xc000007340 m=nil [GC sweep wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:402 +0xce fp=0xc0000a7780 sp=0xc0000a7760 pc=0x5982c51ee00e runtime.goparkunlock(...) runtime/proc.go:408 runtime.bgsweep(0xc0000220e0) runtime/mgcsweep.go:278 +0x94 fp=0xc0000a77c8 sp=0xc0000a7780 pc=0x5982c51d8b54 runtime.gcenable.gowrap1() runtime/mgc.go:203 +0x25 fp=0xc0000a77e0 sp=0xc0000a77c8 pc=0x5982c51cd685 runtime.goexit({}) runtime/asm_amd64.s:1695 +0x1 fp=0xc0000a77e8 sp=0xc0000a77e0 pc=0x5982c521fde1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:203 +0x66 goroutine 4 gp=0xc000007500 m=nil [GC scavenge wait]: runtime.gopark(0xc0000220e0?, 0x5982c5674260?, 0x1?, 0x0?, 0xc000007500?) runtime/proc.go:402 +0xce fp=0xc0000a7f78 sp=0xc0000a7f58 pc=0x5982c51ee00e runtime.goparkunlock(...) runtime/proc.go:408 runtime.(*scavengerState).park(0x5982c5943520) runtime/mgcscavenge.go:425 +0x49 fp=0xc0000a7fa8 sp=0xc0000a7f78 pc=0x5982c51d6549 runtime.bgscavenge(0xc0000220e0) runtime/mgcscavenge.go:653 +0x3c fp=0xc0000a7fc8 sp=0xc0000a7fa8 pc=0x5982c51d6adc runtime.gcenable.gowrap2() runtime/mgc.go:204 +0x25 fp=0xc0000a7fe0 sp=0xc0000a7fc8 pc=0x5982c51cd625 runtime.goexit({}) runtime/asm_amd64.s:1695 +0x1 fp=0xc0000a7fe8 sp=0xc0000a7fe0 pc=0x5982c521fde1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0xa5 goroutine 5 gp=0xc0000f0000 m=nil [finalizer wait]: runtime.gopark(0xc0000a6648?, 0x5982c51c0f85?, 0xa8?, 0x1?, 0xc000006380?) runtime/proc.go:402 +0xce fp=0xc0000a6620 sp=0xc0000a6600 pc=0x5982c51ee00e runtime.runfinq() runtime/mfinal.go:194 +0x107 fp=0xc0000a67e0 sp=0xc0000a6620 pc=0x5982c51cc6c7 runtime.goexit({}) runtime/asm_amd64.s:1695 +0x1 fp=0xc0000a67e8 sp=0xc0000a67e0 pc=0x5982c521fde1 created by runtime.createfing in goroutine 1 runtime/mfinal.go:164 +0x3d goroutine 8 gp=0xc0000f0540 m=nil [select]: runtime.gopark(0xc000253a48?, 0x2?, 0x50?, 0x81?, 0xc0002537ec?) runtime/proc.go:402 +0xce fp=0xc000253658 sp=0xc000253638 pc=0x5982c51ee00e runtime.selectgo(0xc000253a48, 0xc0002537e8, 0xf?, 0x0, 0x1?, 0x1) runtime/select.go:327 +0x725 fp=0xc000253778 sp=0xc000253658 pc=0x5982c51ff3e5 main.(*Server).completion(0xc000188120, {0x5982c5774a10, 0xc000217180}, 0xc000207680) github.com/ollama/ollama/llama/runner/runner.go:667 +0xa45 fp=0xc000253ab8 sp=0xc000253778 pc=0x5982c5432345 main.(*Server).completion-fm({0x5982c5774a10?, 0xc000217180?}, 0x5982c540cd2d?) <autogenerated>:1 +0x36 fp=0xc000253ae8 sp=0xc000253ab8 pc=0x5982c5435396 net/http.HandlerFunc.ServeHTTP(0xc0000eac30?, {0x5982c5774a10?, 0xc000217180?}, 0x10?) net/http/server.go:2171 +0x29 fp=0xc000253b10 sp=0xc000253ae8 pc=0x5982c54057c9 net/http.(*ServeMux).ServeHTTP(0x5982c51c0f85?, {0x5982c5774a10, 0xc000217180}, 0xc000207680) net/http/server.go:2688 +0x1ad fp=0xc000253b60 sp=0xc000253b10 pc=0x5982c540764d net/http.serverHandler.ServeHTTP({0x5982c5773d60?}, {0x5982c5774a10?, 0xc000217180?}, 0x6?) net/http/server.go:3142 +0x8e fp=0xc000253b90 sp=0xc000253b60 pc=0x5982c540866e net/http.(*conn).serve(0xc0001881b0, {0x5982c5774e68, 0xc0000e8e10}) net/http/server.go:2044 +0x5e8 fp=0xc000253fb8 sp=0xc000253b90 pc=0x5982c5404408 net/http.(*Server).Serve.gowrap3() net/http/server.go:3290 +0x28 fp=0xc000253fe0 sp=0xc000253fb8 pc=0x5982c5408de8 runtime.goexit({}) runtime/asm_amd64.s:1695 +0x1 fp=0xc000253fe8 sp=0xc000253fe0 pc=0x5982c521fde1 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3290 +0x4b4 goroutine 70 gp=0xc00021efc0 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0xb?) runtime/proc.go:402 +0xce fp=0xc000220da8 sp=0xc000220d88 pc=0x5982c51ee00e runtime.netpollblock(0x5982c5254558?, 0xc51b6b26?, 0x82?) runtime/netpoll.go:573 +0xf7 fp=0xc000220de0 sp=0xc000220da8 pc=0x5982c51e6257 internal/poll.runtime_pollWait(0x7ec806b89728, 0x72) runtime/netpoll.go:345 +0x85 fp=0xc000220e00 sp=0xc000220de0 pc=0x5982c521aaa5 internal/poll.(*pollDesc).wait(0xc0001be180?, 0xc0000e8f41?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000220e28 sp=0xc000220e00 pc=0x5982c526a9c7 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc0001be180, {0xc0000e8f41, 0x1, 0x1}) internal/poll/fd_unix.go:164 +0x27a fp=0xc000220ec0 sp=0xc000220e28 pc=0x5982c526b51a net.(*netFD).Read(0xc0001be180, {0xc0000e8f41?, 0x0?, 0x0?}) net/fd_posix.go:55 +0x25 fp=0xc000220f08 sp=0xc000220ec0 pc=0x5982c52d9845 net.(*conn).Read(0xc0000aa0a0, {0xc0000e8f41?, 0x0?, 0x0?}) net/net.go:185 +0x45 fp=0xc000220f50 sp=0xc000220f08 pc=0x5982c52e3b05 net.(*TCPConn).Read(0x0?, {0xc0000e8f41?, 0x0?, 0x0?}) <autogenerated>:1 +0x25 fp=0xc000220f80 sp=0xc000220f50 pc=0x5982c52ef4e5 net/http.(*connReader).backgroundRead(0xc0000e8f30) net/http/server.go:681 +0x37 fp=0xc000220fc8 sp=0xc000220f80 pc=0x5982c53fe377 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:677 +0x25 fp=0xc000220fe0 sp=0xc000220fc8 pc=0x5982c53fe2a5 runtime.goexit({}) runtime/asm_amd64.s:1695 +0x1 fp=0xc000220fe8 sp=0xc000220fe0 pc=0x5982c521fde1 created by net/http.(*connReader).startBackgroundRead in goroutine 8 net/http/server.go:677 +0xba rax 0x0 rbx 0xe031 rcx 0x7ec7edc9eb1c rdx 0x6 rdi 0xe02d rsi 0xe031 rbp 0x7ec7a53f6410 rsp 0x7ec7a53f63d0 r8 0x0 r9 0x0 r10 0x8 r11 0x246 r12 0x6 r13 0x84 r14 0x16 r15 0x4c311a4000 rip 0x7ec7edc9eb1c rflags 0x246 cs 0x33 fs 0x0 gs 0x0 [GIN] 2024/11/21 - 18:09:52 | 200 | 5.978648843s | 127.0.0.1 | POST "/api/chat"
Author
Owner

@phalexo commented on GitHub (Nov 22, 2024):

I think a lot of different bugs look the same on the surface, because they
kill the "runner" process, and ollama then reports being unable to
communicate with that runner.

On Thu, Nov 21, 2024 at 9:10 PM huskeyw @.***> wrote:

same issue here, only when a LLM extends over 1 GPU..

@.***:~$ [GIN] 2024/11/21 - 18:09:24 | 200 | 42.709µs | 127.0.0.1
| HEAD "/"
[GIN] 2024/11/21 - 18:09:24 | 200 | 48.614366ms | 127.0.0.1 | POST
"/api/show"
time=2024-11-21T18:09:24.743-08:00 level=INFO source=sched.go:730 msg="new
model will fit in available VRAM, loading"
model=/home/huskeyw/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574
library=cuda parallel=4 required="43.2 GiB"
time=2024-11-21T18:09:24.919-08:00 level=INFO source=server.go:105
msg="system memory" total="251.6 GiB" free="246.9 GiB" free_swap="8.0 GiB"
time=2024-11-21T18:09:24.920-08:00 level=INFO source=memory.go:343
msg="offload to cuda" layers.requested=-1 layers.model=81 layers.offload=81
layers.split=41,40 memory.available="[22.3 GiB 22.3 GiB]"
memory.gpu_overhead="0 B" memory.required.full="43.2 GiB"
memory.required.partial="43.2 GiB" memory.required.kv="2.5 GiB"
memory.required.allocations="[22.0 GiB 21.2 GiB]"
memory.weights.total="38.4 GiB" memory.weights.repeating="37.6 GiB"
memory.weights.nonrepeating="822.0 MiB" memory.graph.full="1.1 GiB"
memory.graph.partial="1.1 GiB"
time=2024-11-21T18:09:24.923-08:00 level=INFO source=server.go:383
msg="starting llama server"
cmd="/tmp/ollama1665379698/runners/cuda_v11/ollama_llama_server --model
/home/huskeyw/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574
--ctx-size 8192 --batch-size 512 --n-gpu-layers 81 --threads 24 --parallel
4 --tensor-split 41,40 --port 42669"
time=2024-11-21T18:09:24.924-08:00 level=INFO source=sched.go:449
msg="loaded runners" count=1
time=2024-11-21T18:09:24.924-08:00 level=INFO source=server.go:562
msg="waiting for llama runner to start responding"
time=2024-11-21T18:09:24.924-08:00 level=INFO source=server.go:596
msg="waiting for server to become available" status="llm server error"
time=2024-11-21T18:09:24.942-08:00 level=INFO source=runner.go:883
msg="starting go runner"
time=2024-11-21T18:09:24.942-08:00 level=INFO source=runner.go:884
msg=system info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 |
AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 |
SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD
= 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
LLAMAFILE = 1 | cgo(gcc)" threads=24
time=2024-11-21T18:09:24.942-08:00 level=INFO source=.:0 msg="Server
listening on 127.0.0.1:42669"
llama_model_loader: loaded meta data with 29 key-value pairs and 724
tensors from
/home/huskeyw/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574
(version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do
not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Meta Llama 3.1 70B Instruct
llama_model_loader: - kv 3: general.finetune str = Instruct
llama_model_loader: - kv 4: general.basename str = Meta-Llama-3.1
llama_model_loader: - kv 5: general.size_label str = 70B
llama_model_loader: - kv 6: general.license str = llama3.1
llama_model_loader: - kv 7: general.tags arr[str,6] = ["facebook", "meta",
"pytorch", "llam...
llama_model_loader: - kv 8: general.languages arr[str,8] = ["en", "de",
"fr", "it", "pt", "hi", ...
llama_model_loader: - kv 9: llama.block_count u32 = 80
llama_model_loader: - kv 10: llama.context_length u32 = 131072
llama_model_loader: - kv 11: llama.embedding_length u32 = 8192
llama_model_loader: - kv 12: llama.feed_forward_length u32 = 28672
llama_model_loader: - kv 13: llama.attention.head_count u32 = 64
llama_model_loader: - kv 14: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 15: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 16: llama.attention.layer_norm_rms_epsilon f32 =
0.000010
llama_model_loader: - kv 17: general.file_type u32 = 2
llama_model_loader: - kv 18: llama.vocab_size u32 = 128256
llama_model_loader: - kv 19: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 21: tokenizer.ggml.pre str = llama-bpe
llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,128256] = ["!",
""", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,128256] =
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
time=2024-11-21T18:09:25.176-08:00 level=INFO source=server.go:596
msg="waiting for server to become available" status="llm server loading
model"
llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,280147] = ["Ġ
Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 25: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 128009
llama_model_loader: - kv 27: tokenizer.chat_template str = {{- bos_token
}}\n{%- if custom_tools ...
llama_model_loader: - kv 28: general.quantization_version u32 = 2
llama_model_loader: - type f32: 162 tensors
llama_model_loader: - type q4_0: 561 tensors
llama_model_loader: - type q6_K: 1 tensors
llm_load_vocab: special tokens cache size = 256
llm_load_vocab: token to piece cache size = 0.7999 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 128256
llm_load_print_meta: n_merges = 280147
llm_load_print_meta: vocab_only = 0
llm_load_print_meta: n_ctx_train = 131072
llm_load_print_meta: n_embd = 8192
llm_load_print_meta: n_layer = 80
llm_load_print_meta: n_head = 64
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_swa = 0
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 8
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 28672
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn = 131072
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: ssm_dt_b_c_rms = 0
llm_load_print_meta: model type = 70B
llm_load_print_meta: model ftype = Q4_0
llm_load_print_meta: model params = 70.55 B
llm_load_print_meta: model size = 37.22 GiB (4.53 BPW)
llm_load_print_meta: general.name = Meta Llama 3.1 70B Instruct
llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token = 128009 '<|eot_id|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_print_meta: EOT token = 128009 '<|eot_id|>'
llm_load_print_meta: EOM token = 128008 '<|eom_id|>'
llm_load_print_meta: EOG token = 128008 '<|eom_id|>'
llm_load_print_meta: EOG token = 128009 '<|eot_id|>'
llm_load_print_meta: max token length = 256
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
Device 0: Tesla M40 24GB, compute capability 5.2, VMM: yes
Device 1: Tesla M40 24GB, compute capability 5.2, VMM: yes
llm_load_tensors: ggml ctx size = 1.02 MiB
llm_load_tensors: offloading 80 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 81/81 layers to GPU
llm_load_tensors: CPU buffer size = 563.62 MiB
llm_load_tensors: CUDA0 buffer size = 18821.57 MiB
llm_load_tensors: CUDA1 buffer size = 18725.43 MiB
llama_new_context_with_model: n_ctx = 8192
llama_new_context_with_model: n_batch = 2048
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CUDA0 KV buffer size = 1312.00 MiB
llama_kv_cache_init: CUDA1 KV buffer size = 1248.00 MiB
llama_new_context_with_model: KV self size = 2560.00 MiB, K (f16): 1280.00
MiB, V (f16): 1280.00 MiB
llama_new_context_with_model: CUDA_Host output buffer size = 2.08 MiB
llama_new_context_with_model: pipeline parallelism enabled (n_copies=4)
llama_new_context_with_model: CUDA0 compute buffer size = 1216.01 MiB
llama_new_context_with_model: CUDA1 compute buffer size = 1216.02 MiB
llama_new_context_with_model: CUDA_Host compute buffer size = 80.02 MiB
llama_new_context_with_model: graph nodes = 2566
llama_new_context_with_model: graph splits = 3
time=2024-11-21T18:09:36.721-08:00 level=INFO source=server.go:601
msg="llama runner started in 11.80 seconds"
[GIN] 2024/11/21 - 18:09:36 | 200 | 12.287203955s | 127.0.0.1 | POST
"/api/generate"
CUDA error: out of memory
current device: 1, in function alloc at ggml-cuda.cu:406
cuMemCreate(&handle, reserve_size, &prop, 0)
ggml-cuda.cu:132: CUDA error
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
SIGABRT: abort
PC=0x7ec7edc9eb1c m=5 sigcode=18446744073709551610
signal arrived during cgo execution

goroutine 7 gp=0xc0000f0380 m=5 mp=0xc000100008 [syscall]:
runtime.cgocall(0x5982c5435b70, 0xc0000b6b20)
runtime/cgocall.go:157 +0x4b fp=0xc0000b6af8 sp=0xc0000b6ac0
pc=0x5982c51b73cb
github.com/ollama/ollama/llama._Cfunc_llama_decode(0x7ec798006470, {0xf,
0x7ec79828bed0, 0x0, 0x0, 0x7ec798009bd0, 0x7ec7980070c0, 0x7ec798012260,
0x7ec798024950, 0x0, ...})
_cgo_gotypes.go:543 +0x52 fp=0xc0000b6b20 sp=0xc0000b6af8 pc=0x5982c52b4952
github.com/ollama/ollama/llama.(*Context).Decode.func1(0x5982c543186b?,
0x7ec798006470?)
github.com/ollama/ollama/llama/llama.go:169 +0xd8 fp=0xc0000b6c40
sp=0xc0000b6b20 pc=0x5982c52b6f18
github.com/ollama/ollama/llama.(*Context).Decode(0xc0000b6d28?, 0x0?)
github.com/ollama/ollama/llama/llama.go:169 +0x17 fp=0xc0000b6c88
sp=0xc0000b6c40 pc=0x5982c52b6d77
main.(*Server).processBatch(0xc000188120, 0xc0001861c0, 0xc0000b6f10)
github.com/ollama/ollama/llama/runner/runner.go:427 +0x38d
fp=0xc0000b6ed0 sp=0xc0000b6c88 pc=0x5982c543080d
main.(*Server).run(0xc000188120, {0x5982c5774ea0, 0xc0000dc0f0})
github.com/ollama/ollama/llama/runner/runner.go:327 +0x1a5
fp=0xc0000b6fb8 sp=0xc0000b6ed0 pc=0x5982c5430105
main.main.gowrap2()
github.com/ollama/ollama/llama/runner/runner.go:922 +0x28 fp=0xc0000b6fe0
sp=0xc0000b6fb8 pc=0x5982c5434ba8
runtime.goexit({})
runtime/asm_amd64.s:1695 +0x1 fp=0xc0000b6fe8 sp=0xc0000b6fe0
pc=0x5982c521fde1
created by main.main in goroutine 1
github.com/ollama/ollama/llama/runner/runner.go:922 +0xc52

goroutine 1 gp=0xc000006380 m=nil [IO wait]:
runtime.gopark(0xc000032008?, 0x0?, 0x80?, 0x63?, 0xc00002d8b8?)
runtime/proc.go:402 +0xce fp=0xc00002d880 sp=0xc00002d860 pc=0x5982c51ee00e
runtime.netpollblock(0xc00002d918?, 0xc51b6b26?, 0x82?)
runtime/netpoll.go:573 +0xf7 fp=0xc00002d8b8 sp=0xc00002d880
pc=0x5982c51e6257
internal/poll.runtime_pollWait(0x7ec806b89820, 0x72)
runtime/netpoll.go:345 +0x85 fp=0xc00002d8d8 sp=0xc00002d8b8
pc=0x5982c521aaa5
internal/poll.(*pollDesc).wait(0x3?, 0x3fe?, 0x0)
internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00002d900 sp=0xc00002d8d8
pc=0x5982c526a9c7
internal/poll.(*pollDesc).waitRead(...)
internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc0001be100)
internal/poll/fd_unix.go:611 +0x2ac fp=0xc00002d9a8 sp=0xc00002d900
pc=0x5982c526be8c
net.(*netFD).accept(0xc0001be100)
net/fd_unix.go:172 +0x29 fp=0xc00002da60 sp=0xc00002d9a8 pc=0x5982c52da949
net.(*TCPListener).accept(0xc0000c6200)
net/tcpsock_posix.go:159 +0x1e fp=0xc00002da88 sp=0xc00002da60
pc=0x5982c52eb67e
net.(*TCPListener).Accept(0xc0000c6200)
net/tcpsock.go:327 +0x30 fp=0xc00002dab8 sp=0xc00002da88 pc=0x5982c52ea9d0
net/http.(*onceCloseListener).Accept(0xc0001881b0?)
:1 +0x24 fp=0xc00002dad0 sp=0xc00002dab8 pc=0x5982c5411be4
net/http.(*Server).Serve(0xc0001ce0f0, {0x5982c5774860, 0xc0000c6200})
net/http/server.go:3260 +0x33e fp=0xc00002dc00 sp=0xc00002dad0
pc=0x5982c54089fe
main.main()
github.com/ollama/ollama/llama/runner/runner.go:942 +0xfec
fp=0xc00002df50 sp=0xc00002dc00 pc=0x5982c543492c
runtime.main()
runtime/proc.go:271 +0x29d fp=0xc00002dfe0 sp=0xc00002df50
pc=0x5982c51edbdd
runtime.goexit({})
runtime/asm_amd64.s:1695 +0x1 fp=0xc00002dfe8 sp=0xc00002dfe0
pc=0x5982c521fde1

goroutine 2 gp=0xc000006e00 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:402 +0xce fp=0xc0000a6fa8 sp=0xc0000a6f88 pc=0x5982c51ee00e
runtime.goparkunlock(...)
runtime/proc.go:408
runtime.forcegchelper()
runtime/proc.go:326 +0xb8 fp=0xc0000a6fe0 sp=0xc0000a6fa8 pc=0x5982c51ede98
runtime.goexit({})
runtime/asm_amd64.s:1695 +0x1 fp=0xc0000a6fe8 sp=0xc0000a6fe0
pc=0x5982c521fde1
created by runtime.init.6 in goroutine 1
runtime/proc.go:314 +0x1a

goroutine 3 gp=0xc000007340 m=nil [GC sweep wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:402 +0xce fp=0xc0000a7780 sp=0xc0000a7760 pc=0x5982c51ee00e
runtime.goparkunlock(...)
runtime/proc.go:408
runtime.bgsweep(0xc0000220e0)
runtime/mgcsweep.go:278 +0x94 fp=0xc0000a77c8 sp=0xc0000a7780
pc=0x5982c51d8b54
runtime.gcenable.gowrap1()
runtime/mgc.go:203 +0x25 fp=0xc0000a77e0 sp=0xc0000a77c8 pc=0x5982c51cd685
runtime.goexit({})
runtime/asm_amd64.s:1695 +0x1 fp=0xc0000a77e8 sp=0xc0000a77e0
pc=0x5982c521fde1
created by runtime.gcenable in goroutine 1
runtime/mgc.go:203 +0x66

goroutine 4 gp=0xc000007500 m=nil [GC scavenge wait]:
runtime.gopark(0xc0000220e0?, 0x5982c5674260?, 0x1?, 0x0?, 0xc000007500?)
runtime/proc.go:402 +0xce fp=0xc0000a7f78 sp=0xc0000a7f58 pc=0x5982c51ee00e
runtime.goparkunlock(...)
runtime/proc.go:408
runtime.(*scavengerState).park(0x5982c5943520)
runtime/mgcscavenge.go:425 +0x49 fp=0xc0000a7fa8 sp=0xc0000a7f78
pc=0x5982c51d6549
runtime.bgscavenge(0xc0000220e0)
runtime/mgcscavenge.go:653 +0x3c fp=0xc0000a7fc8 sp=0xc0000a7fa8
pc=0x5982c51d6adc
runtime.gcenable.gowrap2()
runtime/mgc.go:204 +0x25 fp=0xc0000a7fe0 sp=0xc0000a7fc8 pc=0x5982c51cd625
runtime.goexit({})
runtime/asm_amd64.s:1695 +0x1 fp=0xc0000a7fe8 sp=0xc0000a7fe0
pc=0x5982c521fde1
created by runtime.gcenable in goroutine 1
runtime/mgc.go:204 +0xa5

goroutine 5 gp=0xc0000f0000 m=nil [finalizer wait]:
runtime.gopark(0xc0000a6648?, 0x5982c51c0f85?, 0xa8?, 0x1?, 0xc000006380?)
runtime/proc.go:402 +0xce fp=0xc0000a6620 sp=0xc0000a6600 pc=0x5982c51ee00e
runtime.runfinq()
runtime/mfinal.go:194 +0x107 fp=0xc0000a67e0 sp=0xc0000a6620
pc=0x5982c51cc6c7
runtime.goexit({})
runtime/asm_amd64.s:1695 +0x1 fp=0xc0000a67e8 sp=0xc0000a67e0
pc=0x5982c521fde1
created by runtime.createfing in goroutine 1
runtime/mfinal.go:164 +0x3d

goroutine 8 gp=0xc0000f0540 m=nil [select]:
runtime.gopark(0xc000253a48?, 0x2?, 0x50?, 0x81?, 0xc0002537ec?)
runtime/proc.go:402 +0xce fp=0xc000253658 sp=0xc000253638 pc=0x5982c51ee00e
runtime.selectgo(0xc000253a48, 0xc0002537e8, 0xf?, 0x0, 0x1?, 0x1)
runtime/select.go:327 +0x725 fp=0xc000253778 sp=0xc000253658
pc=0x5982c51ff3e5
main.(*Server).completion(0xc000188120, {0x5982c5774a10, 0xc000217180},
0xc000207680)
github.com/ollama/ollama/llama/runner/runner.go:667 +0xa45
fp=0xc000253ab8 sp=0xc000253778 pc=0x5982c5432345
main.(*Server).completion-fm({0x5982c5774a10?, 0xc000217180?},
0x5982c540cd2d?)
:1 +0x36 fp=0xc000253ae8 sp=0xc000253ab8 pc=0x5982c5435396
net/http.HandlerFunc.ServeHTTP(0xc0000eac30?, {0x5982c5774a10?,
0xc000217180?}, 0x10?)
net/http/server.go:2171 +0x29 fp=0xc000253b10 sp=0xc000253ae8
pc=0x5982c54057c9
net/http.(*ServeMux).ServeHTTP(0x5982c51c0f85?, {0x5982c5774a10,
0xc000217180}, 0xc000207680)
net/http/server.go:2688 +0x1ad fp=0xc000253b60 sp=0xc000253b10
pc=0x5982c540764d
net/http.serverHandler.ServeHTTP({0x5982c5773d60?}, {0x5982c5774a10?,
0xc000217180?}, 0x6?)
net/http/server.go:3142 +0x8e fp=0xc000253b90 sp=0xc000253b60
pc=0x5982c540866e
net/http.(*conn).serve(0xc0001881b0, {0x5982c5774e68, 0xc0000e8e10})
net/http/server.go:2044 +0x5e8 fp=0xc000253fb8 sp=0xc000253b90
pc=0x5982c5404408
net/http.(*Server).Serve.gowrap3()
net/http/server.go:3290 +0x28 fp=0xc000253fe0 sp=0xc000253fb8
pc=0x5982c5408de8
runtime.goexit({})
runtime/asm_amd64.s:1695 +0x1 fp=0xc000253fe8 sp=0xc000253fe0
pc=0x5982c521fde1
created by net/http.(*Server).Serve in goroutine 1
net/http/server.go:3290 +0x4b4

goroutine 70 gp=0xc00021efc0 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0xb?)
runtime/proc.go:402 +0xce fp=0xc000220da8 sp=0xc000220d88 pc=0x5982c51ee00e
runtime.netpollblock(0x5982c5254558?, 0xc51b6b26?, 0x82?)
runtime/netpoll.go:573 +0xf7 fp=0xc000220de0 sp=0xc000220da8
pc=0x5982c51e6257
internal/poll.runtime_pollWait(0x7ec806b89728, 0x72)
runtime/netpoll.go:345 +0x85 fp=0xc000220e00 sp=0xc000220de0
pc=0x5982c521aaa5
internal/poll.(*pollDesc).wait(0xc0001be180?, 0xc0000e8f41?, 0x0)
internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000220e28 sp=0xc000220e00
pc=0x5982c526a9c7
internal/poll.(*pollDesc).waitRead(...)
internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc0001be180, {0xc0000e8f41, 0x1, 0x1})
internal/poll/fd_unix.go:164 +0x27a fp=0xc000220ec0 sp=0xc000220e28
pc=0x5982c526b51a
net.(*netFD).Read(0xc0001be180, {0xc0000e8f41?, 0x0?, 0x0?})
net/fd_posix.go:55 +0x25 fp=0xc000220f08 sp=0xc000220ec0 pc=0x5982c52d9845
net.(*conn).Read(0xc0000aa0a0, {0xc0000e8f41?, 0x0?, 0x0?})
net/net.go:185 +0x45 fp=0xc000220f50 sp=0xc000220f08 pc=0x5982c52e3b05
net.(*TCPConn).Read(0x0?, {0xc0000e8f41?, 0x0?, 0x0?})
:1 +0x25 fp=0xc000220f80 sp=0xc000220f50 pc=0x5982c52ef4e5
net/http.(*connReader).backgroundRead(0xc0000e8f30)
net/http/server.go:681 +0x37 fp=0xc000220fc8 sp=0xc000220f80
pc=0x5982c53fe377
net/http.(*connReader).startBackgroundRead.gowrap2()
net/http/server.go:677 +0x25 fp=0xc000220fe0 sp=0xc000220fc8
pc=0x5982c53fe2a5
runtime.goexit({})
runtime/asm_amd64.s:1695 +0x1 fp=0xc000220fe8 sp=0xc000220fe0
pc=0x5982c521fde1
created by net/http.(*connReader).startBackgroundRead in goroutine 8
net/http/server.go:677 +0xba

rax 0x0
rbx 0xe031
rcx 0x7ec7edc9eb1c
rdx 0x6
rdi 0xe02d
rsi 0xe031
rbp 0x7ec7a53f6410
rsp 0x7ec7a53f63d0
r8 0x0
r9 0x0
r10 0x8
r11 0x246
r12 0x6
r13 0x84
r14 0x16
r15 0x4c311a4000
rip 0x7ec7edc9eb1c
rflags 0x246
cs 0x33
fs 0x0
gs 0x0
[GIN] 2024/11/21 - 18:09:52 | 200 | 5.978648843s | 127.0.0.1 | POST
"/api/chat"


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/7640#issuecomment-2492730115,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABDD3ZNUDWOY4SUYZSI364D2B2HCBAVCNFSM6AAAAABRVM4QT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOJSG4ZTAMJRGU
.
You are receiving this because you authored the thread.Message ID:
@.***>

<!-- gh-comment-id:2494114326 --> @phalexo commented on GitHub (Nov 22, 2024): I think a lot of different bugs look the same on the surface, because they kill the "runner" process, and ollama then reports being unable to communicate with that runner. On Thu, Nov 21, 2024 at 9:10 PM huskeyw ***@***.***> wrote: > same issue here, only when a LLM extends over 1 GPU.. > > ***@***.***:~$ [GIN] 2024/11/21 - 18:09:24 | 200 | 42.709µs | 127.0.0.1 > | HEAD "/" > [GIN] 2024/11/21 - 18:09:24 | 200 | 48.614366ms | 127.0.0.1 | POST > "/api/show" > time=2024-11-21T18:09:24.743-08:00 level=INFO source=sched.go:730 msg="new > model will fit in available VRAM, loading" > model=/home/huskeyw/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574 > library=cuda parallel=4 required="43.2 GiB" > time=2024-11-21T18:09:24.919-08:00 level=INFO source=server.go:105 > msg="system memory" total="251.6 GiB" free="246.9 GiB" free_swap="8.0 GiB" > time=2024-11-21T18:09:24.920-08:00 level=INFO source=memory.go:343 > msg="offload to cuda" layers.requested=-1 layers.model=81 layers.offload=81 > layers.split=41,40 memory.available="[22.3 GiB 22.3 GiB]" > memory.gpu_overhead="0 B" memory.required.full="43.2 GiB" > memory.required.partial="43.2 GiB" memory.required.kv="2.5 GiB" > memory.required.allocations="[22.0 GiB 21.2 GiB]" > memory.weights.total="38.4 GiB" memory.weights.repeating="37.6 GiB" > memory.weights.nonrepeating="822.0 MiB" memory.graph.full="1.1 GiB" > memory.graph.partial="1.1 GiB" > time=2024-11-21T18:09:24.923-08:00 level=INFO source=server.go:383 > msg="starting llama server" > cmd="/tmp/ollama1665379698/runners/cuda_v11/ollama_llama_server --model > /home/huskeyw/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574 > --ctx-size 8192 --batch-size 512 --n-gpu-layers 81 --threads 24 --parallel > 4 --tensor-split 41,40 --port 42669" > time=2024-11-21T18:09:24.924-08:00 level=INFO source=sched.go:449 > msg="loaded runners" count=1 > time=2024-11-21T18:09:24.924-08:00 level=INFO source=server.go:562 > msg="waiting for llama runner to start responding" > time=2024-11-21T18:09:24.924-08:00 level=INFO source=server.go:596 > msg="waiting for server to become available" status="llm server error" > time=2024-11-21T18:09:24.942-08:00 level=INFO source=runner.go:883 > msg="starting go runner" > time=2024-11-21T18:09:24.942-08:00 level=INFO source=runner.go:884 > msg=system info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | > AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | > SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD > = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | > LLAMAFILE = 1 | cgo(gcc)" threads=24 > time=2024-11-21T18:09:24.942-08:00 level=INFO source=.:0 msg="Server > listening on 127.0.0.1:42669" > llama_model_loader: loaded meta data with 29 key-value pairs and 724 > tensors from > /home/huskeyw/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574 > (version GGUF V3 (latest)) > llama_model_loader: Dumping metadata keys/values. Note: KV overrides do > not apply in this output. > llama_model_loader: - kv 0: general.architecture str = llama > llama_model_loader: - kv 1: general.type str = model > llama_model_loader: - kv 2: general.name str = Meta Llama 3.1 70B Instruct > llama_model_loader: - kv 3: general.finetune str = Instruct > llama_model_loader: - kv 4: general.basename str = Meta-Llama-3.1 > llama_model_loader: - kv 5: general.size_label str = 70B > llama_model_loader: - kv 6: general.license str = llama3.1 > llama_model_loader: - kv 7: general.tags arr[str,6] = ["facebook", "meta", > "pytorch", "llam... > llama_model_loader: - kv 8: general.languages arr[str,8] = ["en", "de", > "fr", "it", "pt", "hi", ... > llama_model_loader: - kv 9: llama.block_count u32 = 80 > llama_model_loader: - kv 10: llama.context_length u32 = 131072 > llama_model_loader: - kv 11: llama.embedding_length u32 = 8192 > llama_model_loader: - kv 12: llama.feed_forward_length u32 = 28672 > llama_model_loader: - kv 13: llama.attention.head_count u32 = 64 > llama_model_loader: - kv 14: llama.attention.head_count_kv u32 = 8 > llama_model_loader: - kv 15: llama.rope.freq_base f32 = 500000.000000 > llama_model_loader: - kv 16: llama.attention.layer_norm_rms_epsilon f32 = > 0.000010 > llama_model_loader: - kv 17: general.file_type u32 = 2 > llama_model_loader: - kv 18: llama.vocab_size u32 = 128256 > llama_model_loader: - kv 19: llama.rope.dimension_count u32 = 128 > llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2 > llama_model_loader: - kv 21: tokenizer.ggml.pre str = llama-bpe > llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,128256] = ["!", > """, "#", "$", "%", "&", "'", ... > llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,128256] = > [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... > time=2024-11-21T18:09:25.176-08:00 level=INFO source=server.go:596 > msg="waiting for server to become available" status="llm server loading > model" > llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,280147] = ["Ġ > Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... > llama_model_loader: - kv 25: tokenizer.ggml.bos_token_id u32 = 128000 > llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 128009 > llama_model_loader: - kv 27: tokenizer.chat_template str = {{- bos_token > }}\n{%- if custom_tools ... > llama_model_loader: - kv 28: general.quantization_version u32 = 2 > llama_model_loader: - type f32: 162 tensors > llama_model_loader: - type q4_0: 561 tensors > llama_model_loader: - type q6_K: 1 tensors > llm_load_vocab: special tokens cache size = 256 > llm_load_vocab: token to piece cache size = 0.7999 MB > llm_load_print_meta: format = GGUF V3 (latest) > llm_load_print_meta: arch = llama > llm_load_print_meta: vocab type = BPE > llm_load_print_meta: n_vocab = 128256 > llm_load_print_meta: n_merges = 280147 > llm_load_print_meta: vocab_only = 0 > llm_load_print_meta: n_ctx_train = 131072 > llm_load_print_meta: n_embd = 8192 > llm_load_print_meta: n_layer = 80 > llm_load_print_meta: n_head = 64 > llm_load_print_meta: n_head_kv = 8 > llm_load_print_meta: n_rot = 128 > llm_load_print_meta: n_swa = 0 > llm_load_print_meta: n_embd_head_k = 128 > llm_load_print_meta: n_embd_head_v = 128 > llm_load_print_meta: n_gqa = 8 > llm_load_print_meta: n_embd_k_gqa = 1024 > llm_load_print_meta: n_embd_v_gqa = 1024 > llm_load_print_meta: f_norm_eps = 0.0e+00 > llm_load_print_meta: f_norm_rms_eps = 1.0e-05 > llm_load_print_meta: f_clamp_kqv = 0.0e+00 > llm_load_print_meta: f_max_alibi_bias = 0.0e+00 > llm_load_print_meta: f_logit_scale = 0.0e+00 > llm_load_print_meta: n_ff = 28672 > llm_load_print_meta: n_expert = 0 > llm_load_print_meta: n_expert_used = 0 > llm_load_print_meta: causal attn = 1 > llm_load_print_meta: pooling type = 0 > llm_load_print_meta: rope type = 0 > llm_load_print_meta: rope scaling = linear > llm_load_print_meta: freq_base_train = 500000.0 > llm_load_print_meta: freq_scale_train = 1 > llm_load_print_meta: n_ctx_orig_yarn = 131072 > llm_load_print_meta: rope_finetuned = unknown > llm_load_print_meta: ssm_d_conv = 0 > llm_load_print_meta: ssm_d_inner = 0 > llm_load_print_meta: ssm_d_state = 0 > llm_load_print_meta: ssm_dt_rank = 0 > llm_load_print_meta: ssm_dt_b_c_rms = 0 > llm_load_print_meta: model type = 70B > llm_load_print_meta: model ftype = Q4_0 > llm_load_print_meta: model params = 70.55 B > llm_load_print_meta: model size = 37.22 GiB (4.53 BPW) > llm_load_print_meta: general.name = Meta Llama 3.1 70B Instruct > llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' > llm_load_print_meta: EOS token = 128009 '<|eot_id|>' > llm_load_print_meta: LF token = 128 'Ä' > llm_load_print_meta: EOT token = 128009 '<|eot_id|>' > llm_load_print_meta: EOM token = 128008 '<|eom_id|>' > llm_load_print_meta: EOG token = 128008 '<|eom_id|>' > llm_load_print_meta: EOG token = 128009 '<|eot_id|>' > llm_load_print_meta: max token length = 256 > ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no > ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no > ggml_cuda_init: found 2 CUDA devices: > Device 0: Tesla M40 24GB, compute capability 5.2, VMM: yes > Device 1: Tesla M40 24GB, compute capability 5.2, VMM: yes > llm_load_tensors: ggml ctx size = 1.02 MiB > llm_load_tensors: offloading 80 repeating layers to GPU > llm_load_tensors: offloading non-repeating layers to GPU > llm_load_tensors: offloaded 81/81 layers to GPU > llm_load_tensors: CPU buffer size = 563.62 MiB > llm_load_tensors: CUDA0 buffer size = 18821.57 MiB > llm_load_tensors: CUDA1 buffer size = 18725.43 MiB > llama_new_context_with_model: n_ctx = 8192 > llama_new_context_with_model: n_batch = 2048 > llama_new_context_with_model: n_ubatch = 512 > llama_new_context_with_model: flash_attn = 0 > llama_new_context_with_model: freq_base = 500000.0 > llama_new_context_with_model: freq_scale = 1 > llama_kv_cache_init: CUDA0 KV buffer size = 1312.00 MiB > llama_kv_cache_init: CUDA1 KV buffer size = 1248.00 MiB > llama_new_context_with_model: KV self size = 2560.00 MiB, K (f16): 1280.00 > MiB, V (f16): 1280.00 MiB > llama_new_context_with_model: CUDA_Host output buffer size = 2.08 MiB > llama_new_context_with_model: pipeline parallelism enabled (n_copies=4) > llama_new_context_with_model: CUDA0 compute buffer size = 1216.01 MiB > llama_new_context_with_model: CUDA1 compute buffer size = 1216.02 MiB > llama_new_context_with_model: CUDA_Host compute buffer size = 80.02 MiB > llama_new_context_with_model: graph nodes = 2566 > llama_new_context_with_model: graph splits = 3 > time=2024-11-21T18:09:36.721-08:00 level=INFO source=server.go:601 > msg="llama runner started in 11.80 seconds" > [GIN] 2024/11/21 - 18:09:36 | 200 | 12.287203955s | 127.0.0.1 | POST > "/api/generate" > CUDA error: out of memory > current device: 1, in function alloc at ggml-cuda.cu:406 > cuMemCreate(&handle, reserve_size, &prop, 0) > ggml-cuda.cu:132: CUDA error > Could not attach to process. If your uid matches the uid of the target > process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try > again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf > ptrace: Operation not permitted. > No stack. > The program is not being run. > SIGABRT: abort > PC=0x7ec7edc9eb1c m=5 sigcode=18446744073709551610 > signal arrived during cgo execution > > goroutine 7 gp=0xc0000f0380 m=5 mp=0xc000100008 [syscall]: > runtime.cgocall(0x5982c5435b70, 0xc0000b6b20) > runtime/cgocall.go:157 +0x4b fp=0xc0000b6af8 sp=0xc0000b6ac0 > pc=0x5982c51b73cb > github.com/ollama/ollama/llama._Cfunc_llama_decode(0x7ec798006470, {0xf, > 0x7ec79828bed0, 0x0, 0x0, 0x7ec798009bd0, 0x7ec7980070c0, 0x7ec798012260, > 0x7ec798024950, 0x0, ...}) > _cgo_gotypes.go:543 +0x52 fp=0xc0000b6b20 sp=0xc0000b6af8 pc=0x5982c52b4952 > github.com/ollama/ollama/llama.(*Context).Decode.func1(0x5982c543186b?, > 0x7ec798006470?) > github.com/ollama/ollama/llama/llama.go:169 +0xd8 fp=0xc0000b6c40 > sp=0xc0000b6b20 pc=0x5982c52b6f18 > github.com/ollama/ollama/llama.(*Context).Decode(0xc0000b6d28?, 0x0?) > github.com/ollama/ollama/llama/llama.go:169 +0x17 fp=0xc0000b6c88 > sp=0xc0000b6c40 pc=0x5982c52b6d77 > main.(*Server).processBatch(0xc000188120, 0xc0001861c0, 0xc0000b6f10) > github.com/ollama/ollama/llama/runner/runner.go:427 +0x38d > fp=0xc0000b6ed0 sp=0xc0000b6c88 pc=0x5982c543080d > main.(*Server).run(0xc000188120, {0x5982c5774ea0, 0xc0000dc0f0}) > github.com/ollama/ollama/llama/runner/runner.go:327 +0x1a5 > fp=0xc0000b6fb8 sp=0xc0000b6ed0 pc=0x5982c5430105 > main.main.gowrap2() > github.com/ollama/ollama/llama/runner/runner.go:922 +0x28 fp=0xc0000b6fe0 > sp=0xc0000b6fb8 pc=0x5982c5434ba8 > runtime.goexit({}) > runtime/asm_amd64.s:1695 +0x1 fp=0xc0000b6fe8 sp=0xc0000b6fe0 > pc=0x5982c521fde1 > created by main.main in goroutine 1 > github.com/ollama/ollama/llama/runner/runner.go:922 +0xc52 > > goroutine 1 gp=0xc000006380 m=nil [IO wait]: > runtime.gopark(0xc000032008?, 0x0?, 0x80?, 0x63?, 0xc00002d8b8?) > runtime/proc.go:402 +0xce fp=0xc00002d880 sp=0xc00002d860 pc=0x5982c51ee00e > runtime.netpollblock(0xc00002d918?, 0xc51b6b26?, 0x82?) > runtime/netpoll.go:573 +0xf7 fp=0xc00002d8b8 sp=0xc00002d880 > pc=0x5982c51e6257 > internal/poll.runtime_pollWait(0x7ec806b89820, 0x72) > runtime/netpoll.go:345 +0x85 fp=0xc00002d8d8 sp=0xc00002d8b8 > pc=0x5982c521aaa5 > internal/poll.(*pollDesc).wait(0x3?, 0x3fe?, 0x0) > internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00002d900 sp=0xc00002d8d8 > pc=0x5982c526a9c7 > internal/poll.(*pollDesc).waitRead(...) > internal/poll/fd_poll_runtime.go:89 > internal/poll.(*FD).Accept(0xc0001be100) > internal/poll/fd_unix.go:611 +0x2ac fp=0xc00002d9a8 sp=0xc00002d900 > pc=0x5982c526be8c > net.(*netFD).accept(0xc0001be100) > net/fd_unix.go:172 +0x29 fp=0xc00002da60 sp=0xc00002d9a8 pc=0x5982c52da949 > net.(*TCPListener).accept(0xc0000c6200) > net/tcpsock_posix.go:159 +0x1e fp=0xc00002da88 sp=0xc00002da60 > pc=0x5982c52eb67e > net.(*TCPListener).Accept(0xc0000c6200) > net/tcpsock.go:327 +0x30 fp=0xc00002dab8 sp=0xc00002da88 pc=0x5982c52ea9d0 > net/http.(*onceCloseListener).Accept(0xc0001881b0?) > :1 +0x24 fp=0xc00002dad0 sp=0xc00002dab8 pc=0x5982c5411be4 > net/http.(*Server).Serve(0xc0001ce0f0, {0x5982c5774860, 0xc0000c6200}) > net/http/server.go:3260 +0x33e fp=0xc00002dc00 sp=0xc00002dad0 > pc=0x5982c54089fe > main.main() > github.com/ollama/ollama/llama/runner/runner.go:942 +0xfec > fp=0xc00002df50 sp=0xc00002dc00 pc=0x5982c543492c > runtime.main() > runtime/proc.go:271 +0x29d fp=0xc00002dfe0 sp=0xc00002df50 > pc=0x5982c51edbdd > runtime.goexit({}) > runtime/asm_amd64.s:1695 +0x1 fp=0xc00002dfe8 sp=0xc00002dfe0 > pc=0x5982c521fde1 > > goroutine 2 gp=0xc000006e00 m=nil [force gc (idle)]: > runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) > runtime/proc.go:402 +0xce fp=0xc0000a6fa8 sp=0xc0000a6f88 pc=0x5982c51ee00e > runtime.goparkunlock(...) > runtime/proc.go:408 > runtime.forcegchelper() > runtime/proc.go:326 +0xb8 fp=0xc0000a6fe0 sp=0xc0000a6fa8 pc=0x5982c51ede98 > runtime.goexit({}) > runtime/asm_amd64.s:1695 +0x1 fp=0xc0000a6fe8 sp=0xc0000a6fe0 > pc=0x5982c521fde1 > created by runtime.init.6 in goroutine 1 > runtime/proc.go:314 +0x1a > > goroutine 3 gp=0xc000007340 m=nil [GC sweep wait]: > runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) > runtime/proc.go:402 +0xce fp=0xc0000a7780 sp=0xc0000a7760 pc=0x5982c51ee00e > runtime.goparkunlock(...) > runtime/proc.go:408 > runtime.bgsweep(0xc0000220e0) > runtime/mgcsweep.go:278 +0x94 fp=0xc0000a77c8 sp=0xc0000a7780 > pc=0x5982c51d8b54 > runtime.gcenable.gowrap1() > runtime/mgc.go:203 +0x25 fp=0xc0000a77e0 sp=0xc0000a77c8 pc=0x5982c51cd685 > runtime.goexit({}) > runtime/asm_amd64.s:1695 +0x1 fp=0xc0000a77e8 sp=0xc0000a77e0 > pc=0x5982c521fde1 > created by runtime.gcenable in goroutine 1 > runtime/mgc.go:203 +0x66 > > goroutine 4 gp=0xc000007500 m=nil [GC scavenge wait]: > runtime.gopark(0xc0000220e0?, 0x5982c5674260?, 0x1?, 0x0?, 0xc000007500?) > runtime/proc.go:402 +0xce fp=0xc0000a7f78 sp=0xc0000a7f58 pc=0x5982c51ee00e > runtime.goparkunlock(...) > runtime/proc.go:408 > runtime.(*scavengerState).park(0x5982c5943520) > runtime/mgcscavenge.go:425 +0x49 fp=0xc0000a7fa8 sp=0xc0000a7f78 > pc=0x5982c51d6549 > runtime.bgscavenge(0xc0000220e0) > runtime/mgcscavenge.go:653 +0x3c fp=0xc0000a7fc8 sp=0xc0000a7fa8 > pc=0x5982c51d6adc > runtime.gcenable.gowrap2() > runtime/mgc.go:204 +0x25 fp=0xc0000a7fe0 sp=0xc0000a7fc8 pc=0x5982c51cd625 > runtime.goexit({}) > runtime/asm_amd64.s:1695 +0x1 fp=0xc0000a7fe8 sp=0xc0000a7fe0 > pc=0x5982c521fde1 > created by runtime.gcenable in goroutine 1 > runtime/mgc.go:204 +0xa5 > > goroutine 5 gp=0xc0000f0000 m=nil [finalizer wait]: > runtime.gopark(0xc0000a6648?, 0x5982c51c0f85?, 0xa8?, 0x1?, 0xc000006380?) > runtime/proc.go:402 +0xce fp=0xc0000a6620 sp=0xc0000a6600 pc=0x5982c51ee00e > runtime.runfinq() > runtime/mfinal.go:194 +0x107 fp=0xc0000a67e0 sp=0xc0000a6620 > pc=0x5982c51cc6c7 > runtime.goexit({}) > runtime/asm_amd64.s:1695 +0x1 fp=0xc0000a67e8 sp=0xc0000a67e0 > pc=0x5982c521fde1 > created by runtime.createfing in goroutine 1 > runtime/mfinal.go:164 +0x3d > > goroutine 8 gp=0xc0000f0540 m=nil [select]: > runtime.gopark(0xc000253a48?, 0x2?, 0x50?, 0x81?, 0xc0002537ec?) > runtime/proc.go:402 +0xce fp=0xc000253658 sp=0xc000253638 pc=0x5982c51ee00e > runtime.selectgo(0xc000253a48, 0xc0002537e8, 0xf?, 0x0, 0x1?, 0x1) > runtime/select.go:327 +0x725 fp=0xc000253778 sp=0xc000253658 > pc=0x5982c51ff3e5 > main.(*Server).completion(0xc000188120, {0x5982c5774a10, 0xc000217180}, > 0xc000207680) > github.com/ollama/ollama/llama/runner/runner.go:667 +0xa45 > fp=0xc000253ab8 sp=0xc000253778 pc=0x5982c5432345 > main.(*Server).completion-fm({0x5982c5774a10?, 0xc000217180?}, > 0x5982c540cd2d?) > :1 +0x36 fp=0xc000253ae8 sp=0xc000253ab8 pc=0x5982c5435396 > net/http.HandlerFunc.ServeHTTP(0xc0000eac30?, {0x5982c5774a10?, > 0xc000217180?}, 0x10?) > net/http/server.go:2171 +0x29 fp=0xc000253b10 sp=0xc000253ae8 > pc=0x5982c54057c9 > net/http.(*ServeMux).ServeHTTP(0x5982c51c0f85?, {0x5982c5774a10, > 0xc000217180}, 0xc000207680) > net/http/server.go:2688 +0x1ad fp=0xc000253b60 sp=0xc000253b10 > pc=0x5982c540764d > net/http.serverHandler.ServeHTTP({0x5982c5773d60?}, {0x5982c5774a10?, > 0xc000217180?}, 0x6?) > net/http/server.go:3142 +0x8e fp=0xc000253b90 sp=0xc000253b60 > pc=0x5982c540866e > net/http.(*conn).serve(0xc0001881b0, {0x5982c5774e68, 0xc0000e8e10}) > net/http/server.go:2044 +0x5e8 fp=0xc000253fb8 sp=0xc000253b90 > pc=0x5982c5404408 > net/http.(*Server).Serve.gowrap3() > net/http/server.go:3290 +0x28 fp=0xc000253fe0 sp=0xc000253fb8 > pc=0x5982c5408de8 > runtime.goexit({}) > runtime/asm_amd64.s:1695 +0x1 fp=0xc000253fe8 sp=0xc000253fe0 > pc=0x5982c521fde1 > created by net/http.(*Server).Serve in goroutine 1 > net/http/server.go:3290 +0x4b4 > > goroutine 70 gp=0xc00021efc0 m=nil [IO wait]: > runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0xb?) > runtime/proc.go:402 +0xce fp=0xc000220da8 sp=0xc000220d88 pc=0x5982c51ee00e > runtime.netpollblock(0x5982c5254558?, 0xc51b6b26?, 0x82?) > runtime/netpoll.go:573 +0xf7 fp=0xc000220de0 sp=0xc000220da8 > pc=0x5982c51e6257 > internal/poll.runtime_pollWait(0x7ec806b89728, 0x72) > runtime/netpoll.go:345 +0x85 fp=0xc000220e00 sp=0xc000220de0 > pc=0x5982c521aaa5 > internal/poll.(*pollDesc).wait(0xc0001be180?, 0xc0000e8f41?, 0x0) > internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000220e28 sp=0xc000220e00 > pc=0x5982c526a9c7 > internal/poll.(*pollDesc).waitRead(...) > internal/poll/fd_poll_runtime.go:89 > internal/poll.(*FD).Read(0xc0001be180, {0xc0000e8f41, 0x1, 0x1}) > internal/poll/fd_unix.go:164 +0x27a fp=0xc000220ec0 sp=0xc000220e28 > pc=0x5982c526b51a > net.(*netFD).Read(0xc0001be180, {0xc0000e8f41?, 0x0?, 0x0?}) > net/fd_posix.go:55 +0x25 fp=0xc000220f08 sp=0xc000220ec0 pc=0x5982c52d9845 > net.(*conn).Read(0xc0000aa0a0, {0xc0000e8f41?, 0x0?, 0x0?}) > net/net.go:185 +0x45 fp=0xc000220f50 sp=0xc000220f08 pc=0x5982c52e3b05 > net.(*TCPConn).Read(0x0?, {0xc0000e8f41?, 0x0?, 0x0?}) > :1 +0x25 fp=0xc000220f80 sp=0xc000220f50 pc=0x5982c52ef4e5 > net/http.(*connReader).backgroundRead(0xc0000e8f30) > net/http/server.go:681 +0x37 fp=0xc000220fc8 sp=0xc000220f80 > pc=0x5982c53fe377 > net/http.(*connReader).startBackgroundRead.gowrap2() > net/http/server.go:677 +0x25 fp=0xc000220fe0 sp=0xc000220fc8 > pc=0x5982c53fe2a5 > runtime.goexit({}) > runtime/asm_amd64.s:1695 +0x1 fp=0xc000220fe8 sp=0xc000220fe0 > pc=0x5982c521fde1 > created by net/http.(*connReader).startBackgroundRead in goroutine 8 > net/http/server.go:677 +0xba > > rax 0x0 > rbx 0xe031 > rcx 0x7ec7edc9eb1c > rdx 0x6 > rdi 0xe02d > rsi 0xe031 > rbp 0x7ec7a53f6410 > rsp 0x7ec7a53f63d0 > r8 0x0 > r9 0x0 > r10 0x8 > r11 0x246 > r12 0x6 > r13 0x84 > r14 0x16 > r15 0x4c311a4000 > rip 0x7ec7edc9eb1c > rflags 0x246 > cs 0x33 > fs 0x0 > gs 0x0 > [GIN] 2024/11/21 - 18:09:52 | 200 | 5.978648843s | 127.0.0.1 | POST > "/api/chat" > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/7640#issuecomment-2492730115>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABDD3ZNUDWOY4SUYZSI364D2B2HCBAVCNFSM6AAAAABRVM4QT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOJSG4ZTAMJRGU> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> >
Author
Owner

@jessegross commented on GitHub (Nov 22, 2024):

I agree, there are multiple issues here, I think most of them should be resolved with 0.4.3. @phalexo are you still seeing this with the current version?

<!-- gh-comment-id:2494649666 --> @jessegross commented on GitHub (Nov 22, 2024): I agree, there are multiple issues here, I think most of them should be resolved with 0.4.3. @phalexo are you still seeing this with the current version?
Author
Owner

@phalexo commented on GitHub (Nov 22, 2024):

I have not yet upgraded to the latest version, still using 0.3.11 at
moment.

On Fri, Nov 22, 2024, 2:37 PM Jesse Gross @.***> wrote:

I agree, there are multiple issues here, I think most of them should be
resolved with 0.4.3. @phalexo https://github.com/phalexo are you still
seeing this with the current version?


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/7640#issuecomment-2494649666,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABDD3ZMBLHTSXGALHVAL3AD2B6BX5AVCNFSM6AAAAABRVM4QT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOJUGY2DSNRWGY
.
You are receiving this because you were mentioned.Message ID:
@.***>

<!-- gh-comment-id:2494718375 --> @phalexo commented on GitHub (Nov 22, 2024): I have not yet upgraded to the latest version, still using 0.3.11 at moment. On Fri, Nov 22, 2024, 2:37 PM Jesse Gross ***@***.***> wrote: > I agree, there are multiple issues here, I think most of them should be > resolved with 0.4.3. @phalexo <https://github.com/phalexo> are you still > seeing this with the current version? > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/7640#issuecomment-2494649666>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABDD3ZMBLHTSXGALHVAL3AD2B6BX5AVCNFSM6AAAAABRVM4QT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOJUGY2DSNRWGY> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >
Author
Owner

@huskeyw commented on GitHub (Nov 22, 2024):

I apologies, while its the same output my issues was really related to https://github.com/ollama/ollama/issues/6382

<!-- gh-comment-id:2494748083 --> @huskeyw commented on GitHub (Nov 22, 2024): I apologies, while its the same output my issues was really related to https://github.com/ollama/ollama/issues/6382
Author
Owner

@ibamibrhm commented on GitHub (Nov 24, 2024):

@jessegross I still received the error with version 0.4.4
Error: POST predict: Post "http://127.0.0.1:60337/completion": EOF.

<!-- gh-comment-id:2496068705 --> @ibamibrhm commented on GitHub (Nov 24, 2024): @jessegross I still received the error with version 0.4.4 `Error: POST predict: Post "http://127.0.0.1:60337/completion": EOF`.
Author
Owner

@jessegross commented on GitHub (Nov 25, 2024):

@ibamibrhm Can you please collect the server logs and file them in a new bug?

There are too many different things mixed together in this bug and, as I said, I think most have been fixed so I'm going to go ahead and close this one and we can start fresh on any additional issues.

<!-- gh-comment-id:2498885385 --> @jessegross commented on GitHub (Nov 25, 2024): @ibamibrhm Can you please collect the [server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) and file them in a new bug? There are too many different things mixed together in this bug and, as I said, I think most have been fixed so I'm going to go ahead and close this one and we can start fresh on any additional issues.
Author
Owner

@Glitchfix commented on GitHub (Apr 18, 2025):

I did some investigation about this
In my case it occurs when I was running multiple models in parallel
I changed the env variables to support it and doubled up the queue size to prevent any crashes for multiple parallel incoming requests
It hasn't crashed after that

Environment="OLLAMA_MAX_LOADED_MODELS=3"
Environment="OLLAMA_MAX_QUEUE=1024"
Environment="OLLAMA_NUM_PARALLEL=3"
Environment="OLLAMA_FLASH_ATTENTION=1"
<!-- gh-comment-id:2815280659 --> @Glitchfix commented on GitHub (Apr 18, 2025): I did some investigation about this In my case it occurs when I was running multiple models in parallel I changed the env variables to support it and doubled up the queue size to prevent any crashes for multiple parallel incoming requests It hasn't crashed after that ```sh Environment="OLLAMA_MAX_LOADED_MODELS=3" Environment="OLLAMA_MAX_QUEUE=1024" Environment="OLLAMA_NUM_PARALLEL=3" Environment="OLLAMA_FLASH_ATTENTION=1" ```
Author
Owner

@Android-PowerUser commented on GitHub (Apr 27, 2025):

I still have it 0.6.6 >>> hello ⠹ time=2025-04-27T22:00:48.808Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 ⠋ ops.cpp:4717: void ggml_compute_forward_soft_max_f32(const ggml_compute_params *, ggml_tensor *): assertion "!isnan(wp[i])" failed ops.cpp:4717: void ggml_compute_forward_soft_max_f32(const ggml_compute_params *, ggml_tensor *): assertion "!isnan(wp[i])" failed ops.cpp:4717: void ggml_compute_forward_soft_max_f32(const ggml_compute_params *, ggml_tensor *): assertion "!isnan(wp[i])" failed ops.cpp:4717: void ggml_compute_forward_soft_max_f32(const ggml_compute_params *, ggml_tensor *): assertion "!isnan(wp[i])" failed ops.cpp:4717: void ggml_compute_forward_soft_max_f32(const ggml_compute_params *, ggml_tensor *): assertion "!isnan(wp[i])" failed ops.cpp:4717: void ggml_compute_forward_soft_max_f32(const ggml_compute_params *, ggml_tensor *): assertion "!isnan(wp[i])" failed ops.cpp:4717: void ggml_compute_forward_soft_max_f32(const ggml_compute_params *, ggml_tensor *): assertion "!isnan(wp[i])" failed ops.cpp:4717: void ggml_compute_forward_soft_max_f32(const ggml_compute_params *, ggml_tensor *): assertion "!isnan(wp[i])" failed ⠼ [GIN] 2025/04/27 - 22:01:00 | 200 | 11.424188221s | 127.0.0.1 | POST "/api/chat" Error: POST predict: Post "http://127.0.0.1:43067/completion": EOF

<!-- gh-comment-id:2833665642 --> @Android-PowerUser commented on GitHub (Apr 27, 2025): I still have it 0.6.6 `>>> hello ⠹ time=2025-04-27T22:00:48.808Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 ⠋ ops.cpp:4717: void ggml_compute_forward_soft_max_f32(const ggml_compute_params *, ggml_tensor *): assertion "!isnan(wp[i])" failed ops.cpp:4717: void ggml_compute_forward_soft_max_f32(const ggml_compute_params *, ggml_tensor *): assertion "!isnan(wp[i])" failed ops.cpp:4717: void ggml_compute_forward_soft_max_f32(const ggml_compute_params *, ggml_tensor *): assertion "!isnan(wp[i])" failed ops.cpp:4717: void ggml_compute_forward_soft_max_f32(const ggml_compute_params *, ggml_tensor *): assertion "!isnan(wp[i])" failed ops.cpp:4717: void ggml_compute_forward_soft_max_f32(const ggml_compute_params *, ggml_tensor *): assertion "!isnan(wp[i])" failed ops.cpp:4717: void ggml_compute_forward_soft_max_f32(const ggml_compute_params *, ggml_tensor *): assertion "!isnan(wp[i])" failed ops.cpp:4717: void ggml_compute_forward_soft_max_f32(const ggml_compute_params *, ggml_tensor *): assertion "!isnan(wp[i])" failed ops.cpp:4717: void ggml_compute_forward_soft_max_f32(const ggml_compute_params *, ggml_tensor *): assertion "!isnan(wp[i])" failed ⠼ [GIN] 2025/04/27 - 22:01:00 | 200 | 11.424188221s | 127.0.0.1 | POST "/api/chat" Error: POST predict: Post "http://127.0.0.1:43067/completion": EOF`
Author
Owner

@officalsaints commented on GitHub (Apr 29, 2025):

+1 all issue is same to me

<!-- gh-comment-id:2837818654 --> @officalsaints commented on GitHub (Apr 29, 2025): +1 all issue is same to me
Author
Owner

@sadhikariSteep commented on GitHub (Apr 29, 2025):

i try both ollama 0.6.5 and 0.6.6 got same error for gemme and deepseek: ResponseError: POST predict: Post "http://127.0.0.1:44119/completion": EOF (status code: 500)
response = await engine.aquery(query) got above error but without async it works
it work for llama3.2:3b

<!-- gh-comment-id:2838810615 --> @sadhikariSteep commented on GitHub (Apr 29, 2025): i try both ollama 0.6.5 and 0.6.6 got same error for gemme and deepseek: `ResponseError: POST predict: Post "http://127.0.0.1:44119/completion": EOF (status code: 500)` response = await engine.aquery(query) got above error but without async it works it work for llama3.2:3b
Author
Owner

@ghost commented on GitHub (May 1, 2025):

Hello.

  • Same issue

192.168.68.59:11434 - ollama server.

╔╣ Request ║ POST
http://192.168.68.59:11434/api/generate
╚══════════════════════════════════════════════════════════════════════════════════════════╝

╔╣ Response ║ POST ║ Status: 500 Internal Server Error ║ Time: 367 ms
http://192.168.68.59:11434/api/generate
╚══════════════════════════════════════════════════════════════════════════════════════════╝
╔ Body

║ {
║ "error": "POST predict: Post "http://127.0.0.1:50262/completion": EOF"
║ }

╚══════════════════════════════════════════════

<!-- gh-comment-id:2845260771 --> @ghost commented on GitHub (May 1, 2025): Hello. + Same issue 192.168.68.59:11434 - ollama server. ╔╣ Request ║ POST ║ http://192.168.68.59:11434/api/generate ╚══════════════════════════════════════════════════════════════════════════════════════════╝ ╔╣ Response ║ POST ║ Status: 500 Internal Server Error ║ Time: 367 ms ║ http://192.168.68.59:11434/api/generate ╚══════════════════════════════════════════════════════════════════════════════════════════╝ ╔ Body ║ ║ { ║ "error": "POST predict: Post "http://127.0.0.1:50262/completion": EOF" ║ } ║ ╚══════════════════════════════════════════════
Author
Owner

@phalexo commented on GitHub (May 1, 2025):

I don't think this is the same problem for everyone. When a process dies
for whatever reason, could be an OOM, the main process tries to start a new
one. Watch the port number, it changes from one failure to another.

On Thu, May 1, 2025, 1:04 PM HeroeBew @.***> wrote:

HeroeBew left a comment (ollama/ollama#7640)
https://github.com/ollama/ollama/issues/7640#issuecomment-2845260771

Hello.

  • Same issue

192.168.68.59:11434 - ollama server.

╔╣ Request ║ POST
http://192.168.68.59:11434/api/generate

╚══════════════════════════════════════════════════════════════════════════════════════════╝

╔╣ Response ║ POST ║ Status: 500 Internal Server Error ║ Time: 367 ms
http://192.168.68.59:11434/api/generate

╚══════════════════════════════════════════════════════════════════════════════════════════╝
╔ Body

║ {
║ "error": "POST predict: Post "http://127.0.0.1:50262/completion": EOF"
║ }

╚══════════════════════════════════════════════


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/7640#issuecomment-2845260771,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABDD3ZK7LFTXKHPO6JPRGM324JH2HAVCNFSM6AAAAABRVM4QT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQNBVGI3DANZXGE
.
You are receiving this because you were mentioned.Message ID:
@.***>

<!-- gh-comment-id:2845278695 --> @phalexo commented on GitHub (May 1, 2025): I don't think this is the same problem for everyone. When a process dies for whatever reason, could be an OOM, the main process tries to start a new one. Watch the port number, it changes from one failure to another. On Thu, May 1, 2025, 1:04 PM HeroeBew ***@***.***> wrote: > *HeroeBew* left a comment (ollama/ollama#7640) > <https://github.com/ollama/ollama/issues/7640#issuecomment-2845260771> > > Hello. > > - Same issue > > 192.168.68.59:11434 - ollama server. > > ╔╣ Request ║ POST > ║ http://192.168.68.59:11434/api/generate > > ╚══════════════════════════════════════════════════════════════════════════════════════════╝ > > ╔╣ Response ║ POST ║ Status: 500 Internal Server Error ║ Time: 367 ms > ║ http://192.168.68.59:11434/api/generate > > ╚══════════════════════════════════════════════════════════════════════════════════════════╝ > ╔ Body > ║ > ║ { > ║ "error": "POST predict: Post "http://127.0.0.1:50262/completion": EOF" > ║ } > ║ > ╚══════════════════════════════════════════════ > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/7640#issuecomment-2845260771>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABDD3ZK7LFTXKHPO6JPRGM324JH2HAVCNFSM6AAAAABRVM4QT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQNBVGI3DANZXGE> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >
Author
Owner

@jessegross commented on GitHub (May 1, 2025):

Yes, this is an old issue, so the new comments are likely not related. If you are still seeing this with 0.6.7, please file a new issue and attach logs.

<!-- gh-comment-id:2845691303 --> @jessegross commented on GitHub (May 1, 2025): Yes, this is an old issue, so the new comments are likely not related. If you are still seeing this with 0.6.7, please file a new issue and attach logs.
Author
Owner

@cattei commented on GitHub (May 1, 2025):

我这边也发生同样的问题了,0.6.7版本,windows系统

<!-- gh-comment-id:2845933041 --> @cattei commented on GitHub (May 1, 2025): 我这边也发生同样的问题了,0.6.7版本,windows系统
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#51386