[GH-ISSUE #3871] llama3_instruct_70b_q8 does not work properly in Ollama. #2399

Closed
opened 2026-04-12 12:42:44 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @17Reset on GitHub (Apr 24, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3871

Originally assigned to: @dhiltgen on GitHub.

I added the model in gguf format from llama3_instruct_70b_q8 to ollama and used ollama for inference, and got the error shown below:

time=2024-04-24T15:08:27.886+08:00 level=WARN source=server.go:51 msg="requested context length is greater than model max context length" requested=4096 model=0
time=2024-04-24T15:08:27.886+08:00 level=INFO source=gpu.go:121 msg="Detecting GPU type"
time=2024-04-24T15:08:27.886+08:00 level=INFO source=gpu.go:268 msg="Searching for GPU management library libcudart.so*"
time=2024-04-24T15:08:27.895+08:00 level=INFO source=gpu.go:314 msg="Discovered GPU libraries: [/tmp/ollama2048460055/runners/cuda_v11/libcudart.so.11.0 /usr/local/cuda/lib64/libcudart.so.12.4.127]"
time=2024-04-24T15:08:27.896+08:00 level=INFO source=gpu.go:126 msg="Nvidia GPU detected via cudart"
time=2024-04-24T15:08:27.897+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-04-24T15:08:28.284+08:00 level=INFO source=gpu.go:202 msg="[cudart] CUDART CUDA Compute Capability detected: 8.9"
time=2024-04-24T15:08:28.346+08:00 level=INFO source=gpu.go:121 msg="Detecting GPU type"
time=2024-04-24T15:08:28.346+08:00 level=INFO source=gpu.go:268 msg="Searching for GPU management library libcudart.so*"
time=2024-04-24T15:08:28.347+08:00 level=INFO source=gpu.go:314 msg="Discovered GPU libraries: [/tmp/ollama2048460055/runners/cuda_v11/libcudart.so.11.0 /usr/local/cuda/lib64/libcudart.so.12.4.127]"
time=2024-04-24T15:08:28.348+08:00 level=INFO source=gpu.go:126 msg="Nvidia GPU detected via cudart"
time=2024-04-24T15:08:28.348+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-04-24T15:08:28.457+08:00 level=INFO source=gpu.go:202 msg="[cudart] CUDART CUDA Compute Capability detected: 8.9"


2024/04/24 15:08:28 [Recovery] 2024/04/24 - 15:08:28 panic recovered:
runtime error: integer divide by zero
runtime/panic.go:240 (0x45a45d)
github.com/ollama/ollama/llm/server.go:71 (0x8c4198)
github.com/ollama/ollama/server/routes.go:101 (0xee875d)
github.com/ollama/ollama/server/routes.go:1295 (0xef47ea)
github.com/gin-gonic/gin@v1.9.1/context.go:174 (0xebb84a)
github.com/ollama/ollama/server/routes.go:1017 (0xef2f7c)
github.com/gin-gonic/gin@v1.9.1/context.go:174 (0xec8739)
github.com/gin-gonic/gin@v1.9.1/recovery.go:102 (0xec8727)
github.com/gin-gonic/gin@v1.9.1/context.go:174 (0xec787c)
github.com/gin-gonic/gin@v1.9.1/logger.go:240 (0xec7863)
github.com/gin-gonic/gin@v1.9.1/context.go:174 (0xec6d6d)
github.com/gin-gonic/gin@v1.9.1/gin.go:620 (0xec69fc)
github.com/gin-gonic/gin@v1.9.1/gin.go:576 (0xec6531)
net/http/server.go:3137 (0x721f4d)
net/http/server.go:2039 (0x71d307)
runtime/asm_amd64.s:1695 (0x491a20)

[GIN] 2024/04/24 - 15:08:28 | 500 |  639.135847ms |  192.168.18.174 | POST     "/api/chat"
Originally created by @17Reset on GitHub (Apr 24, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3871 Originally assigned to: @dhiltgen on GitHub. I added the model in gguf format from llama3_instruct_70b_q8 to ollama and used ollama for inference, and got the error shown below: ``` time=2024-04-24T15:08:27.886+08:00 level=WARN source=server.go:51 msg="requested context length is greater than model max context length" requested=4096 model=0 time=2024-04-24T15:08:27.886+08:00 level=INFO source=gpu.go:121 msg="Detecting GPU type" time=2024-04-24T15:08:27.886+08:00 level=INFO source=gpu.go:268 msg="Searching for GPU management library libcudart.so*" time=2024-04-24T15:08:27.895+08:00 level=INFO source=gpu.go:314 msg="Discovered GPU libraries: [/tmp/ollama2048460055/runners/cuda_v11/libcudart.so.11.0 /usr/local/cuda/lib64/libcudart.so.12.4.127]" time=2024-04-24T15:08:27.896+08:00 level=INFO source=gpu.go:126 msg="Nvidia GPU detected via cudart" time=2024-04-24T15:08:27.897+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-04-24T15:08:28.284+08:00 level=INFO source=gpu.go:202 msg="[cudart] CUDART CUDA Compute Capability detected: 8.9" time=2024-04-24T15:08:28.346+08:00 level=INFO source=gpu.go:121 msg="Detecting GPU type" time=2024-04-24T15:08:28.346+08:00 level=INFO source=gpu.go:268 msg="Searching for GPU management library libcudart.so*" time=2024-04-24T15:08:28.347+08:00 level=INFO source=gpu.go:314 msg="Discovered GPU libraries: [/tmp/ollama2048460055/runners/cuda_v11/libcudart.so.11.0 /usr/local/cuda/lib64/libcudart.so.12.4.127]" time=2024-04-24T15:08:28.348+08:00 level=INFO source=gpu.go:126 msg="Nvidia GPU detected via cudart" time=2024-04-24T15:08:28.348+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-04-24T15:08:28.457+08:00 level=INFO source=gpu.go:202 msg="[cudart] CUDART CUDA Compute Capability detected: 8.9" 2024/04/24 15:08:28 [Recovery] 2024/04/24 - 15:08:28 panic recovered: runtime error: integer divide by zero runtime/panic.go:240 (0x45a45d) github.com/ollama/ollama/llm/server.go:71 (0x8c4198) github.com/ollama/ollama/server/routes.go:101 (0xee875d) github.com/ollama/ollama/server/routes.go:1295 (0xef47ea) github.com/gin-gonic/gin@v1.9.1/context.go:174 (0xebb84a) github.com/ollama/ollama/server/routes.go:1017 (0xef2f7c) github.com/gin-gonic/gin@v1.9.1/context.go:174 (0xec8739) github.com/gin-gonic/gin@v1.9.1/recovery.go:102 (0xec8727) github.com/gin-gonic/gin@v1.9.1/context.go:174 (0xec787c) github.com/gin-gonic/gin@v1.9.1/logger.go:240 (0xec7863) github.com/gin-gonic/gin@v1.9.1/context.go:174 (0xec6d6d) github.com/gin-gonic/gin@v1.9.1/gin.go:620 (0xec69fc) github.com/gin-gonic/gin@v1.9.1/gin.go:576 (0xec6531) net/http/server.go:3137 (0x721f4d) net/http/server.go:2039 (0x71d307) runtime/asm_amd64.s:1695 (0x491a20) [GIN] 2024/04/24 - 15:08:28 | 500 | 639.135847ms | 192.168.18.174 | POST "/api/chat" ```
Author
Owner

@dhiltgen commented on GitHub (May 1, 2024):

What version are you running? I tried to map server.go:71 to a plausible divide by zero bug but couldn't find a match.

If possible, can you try the latest RC of 0.1.33 and see if this still repros? (I'm guessing this is memory prediction code, which has been reshuffled quite a bit recently)

<!-- gh-comment-id:2089220963 --> @dhiltgen commented on GitHub (May 1, 2024): What version are you running? I tried to map server.go:71 to a plausible divide by zero bug but couldn't find a match. If possible, can you try the latest RC of 0.1.33 and see if this still repros? (I'm guessing this is memory prediction code, which has been reshuffled quite a bit recently)
Author
Owner

@dhiltgen commented on GitHub (May 21, 2024):

Please make sure to update to the latest version, and if you're still having problems, share the updated server log and I'll reopen.

<!-- gh-comment-id:2123131238 --> @dhiltgen commented on GitHub (May 21, 2024): Please make sure to update to the latest version, and if you're still having problems, share the updated server log and I'll reopen.
Author
Owner

@UmutAlihan commented on GitHub (May 23, 2024):

I am also having this issue on llama3 70b fp16

<!-- gh-comment-id:2127850773 --> @UmutAlihan commented on GitHub (May 23, 2024): I am also having this issue on llama3 70b fp16
Author
Owner

@MihailCosmin commented on GitHub (May 30, 2024):

Same issue here:

I used these 3:

Meta-Llama-3-70B-Instruct-v2.Q8_0-00001-of-00003.gguf
Meta-Llama-3-70B-Instruct-v2.Q8_0-00002-of-00003.gguf
Meta-Llama-3-70B-Instruct-v2.Q8_0-00003-of-00003.gguf

Combined with:

cat Meta-Llama-3-70B-Instruct-v2.Q8_0-* > Meta-Llama-3-70B-Instruct-v2.Q8_0.gguf

Modelfile:

FROM ./Meta-Llama-3-70B-Instruct-v2.Q8_0.gguf
TEMPLATE "{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"
PARAMETER num_keep 24
PARAMETER stop <|start_header_id|>
PARAMETER stop <|end_header_id|>
PARAMETER stop <|eot_id|>

Create with:

ollama create llama3_70b_instruct_Q8 -f Modelfile

After the files are created they displayed a red X sign on them. The Access rights were not ok, so I fixed them.

Then when running ollama run llama3_70b_instruct_Q8
I was getting this: Error: Post "http://127.0.0.1:11434/api/chat": EOF

Prompting using this model in WebUI results in this error message:
Uh-oh! There was an issue connecting to Ollama.

In the journalctl i can see this:
ollama[3181079]: time=2024-05-30T11:13:50.882+02:00 level=WARN source=routes.go:757 msg="bad manifest config filepath" name=registry.ollama.ai/library/llama3_70b_instruct:latest error="open /usr/share/ollama/.ollama/models/blobs/sha256-ed6c573d79b79cbe2fb6299c0623ad4757d4e0b314f9f08dd9280b2efbca0b44: no such file or directory"

Before ollama run llama3_70b_instruct_Q8 the files are definitely there, after, they are deleted.

<!-- gh-comment-id:2139167443 --> @MihailCosmin commented on GitHub (May 30, 2024): Same issue here: I used these 3: ``` Meta-Llama-3-70B-Instruct-v2.Q8_0-00001-of-00003.gguf Meta-Llama-3-70B-Instruct-v2.Q8_0-00002-of-00003.gguf Meta-Llama-3-70B-Instruct-v2.Q8_0-00003-of-00003.gguf ``` Combined with: ``` cat Meta-Llama-3-70B-Instruct-v2.Q8_0-* > Meta-Llama-3-70B-Instruct-v2.Q8_0.gguf ``` Modelfile: ``` FROM ./Meta-Llama-3-70B-Instruct-v2.Q8_0.gguf TEMPLATE "{{ if .System }}<|start_header_id|>system<|end_header_id|> {{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|> {{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|> {{ .Response }}<|eot_id|>" PARAMETER num_keep 24 PARAMETER stop <|start_header_id|> PARAMETER stop <|end_header_id|> PARAMETER stop <|eot_id|> ``` Create with: `ollama create llama3_70b_instruct_Q8 -f Modelfile` After the files are created they displayed a red X sign on them. The Access rights were not ok, so I fixed them. Then when running `ollama run llama3_70b_instruct_Q8` I was getting this: `Error: Post "http://127.0.0.1:11434/api/chat": EOF` Prompting using this model in WebUI results in this error message: `Uh-oh! There was an issue connecting to Ollama.` In the journalctl i can see this: `ollama[3181079]: time=2024-05-30T11:13:50.882+02:00 level=WARN source=routes.go:757 msg="bad manifest config filepath" name=registry.ollama.ai/library/llama3_70b_instruct:latest error="open /usr/share/ollama/.ollama/models/blobs/sha256-ed6c573d79b79cbe2fb6299c0623ad4757d4e0b314f9f08dd9280b2efbca0b44: no such file or directory"` Before `ollama run llama3_70b_instruct_Q8` the files are definitely there, after, they are deleted.
Author
Owner

@shenyimings commented on GitHub (Jun 15, 2024):

Hi, I am facing the exact same issue. Is there a solution available now? The version is 0.1.44.
It is noteworthy that the original two .gguf files can be read and inferred normally using llama.cpp. There may be an issue with the strategy used by Ollama for loading models from multiple .gguf files. Thanks.

<!-- gh-comment-id:2169076538 --> @shenyimings commented on GitHub (Jun 15, 2024): Hi, I am facing the exact same issue. Is there a solution available now? The version is 0.1.44. It is noteworthy that the original two .gguf files can be read and inferred normally using llama.cpp. There may be an issue with the strategy used by Ollama for loading models from multiple .gguf files. Thanks.
Author
Owner

@jasonchanhku commented on GitHub (Jun 15, 2024):

facing this issue as well. And also using 0.1.44

<!-- gh-comment-id:2170056734 --> @jasonchanhku commented on GitHub (Jun 15, 2024): facing this issue as well. And also using 0.1.44
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#2399