[GH-ISSUE #6125] minor bug: ggml/llama.cpp's new Q4_0_4_8 quantized files don't import into ollama #3826

Closed
opened 2026-04-12 14:39:28 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @AndreasKunar on GitHub (Aug 1, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6125

What is the issue?

I built ollama on Ubuntu24.04, running in Windows11's WSL2 on my Surface 11 Pro to try and test Ollama with llama.cpp's Q4_0_4_8 acceleration.

Ollama+llama.cpp builds, imports my local llama-2 Q4_0, and runs it.

But when I try and import a local llama-2 Q4_0_4_8 model (which runs with llama.cpp), it gives an "Error: invalid file magic", apparently from its ggml.go module (at line 311 ?), which maybe does not seem to understand the new Q4_0_4_4 and Q4_0_4_8 formats.

llama.cpp recently introduced these formats to accelerate modern arm64 CPUs like the Snapdragon X, It also works on other newer ARM CPUs and brings an up to 2-3x speed improvement. Details see llama.cpp PR#5780, and there seems to be work done for x64.

P.S.:I tried this on Linux (Windows' WSL2), since building llama.cpp for Windows on ARM / Snapdragon X requires special build instructions (using clang instead of MSVC, details see llama.cpp build instructions), and I'm not sure if ollama already follows these.

@SebastianGode independently also had this issue.

OS

Linux / Ubuntu 24.04 on WSL2, Windows on ARM

GPU

None

CPU

arm64 / Snapdragon X Plus

Ollama version

0.3.2, 3e61426

Originally created by @AndreasKunar on GitHub (Aug 1, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6125 ### What is the issue? I built ollama on Ubuntu24.04, running in Windows11's WSL2 on my Surface 11 Pro to try and test Ollama with llama.cpp's Q4_0_4_8 acceleration. Ollama+llama.cpp builds, imports my local llama-2 Q4_0, and runs it. But when I try and import a local llama-2 Q4_0_4_8 model (which runs with llama.cpp), it gives an "Error: invalid file magic", apparently from its ggml.go module (at line 311 ?), which maybe does not seem to understand the new Q4_0_4_4 and Q4_0_4_8 formats. llama.cpp recently introduced these formats to accelerate modern arm64 CPUs like the Snapdragon X, It also works on other newer ARM CPUs and brings an up to 2-3x speed improvement. Details see [llama.cpp PR#5780](https://github.com/ggerganov/llama.cpp/pull/5780), and there seems to be work done for x64. P.S.:I tried this on Linux (Windows' WSL2), since building llama.cpp for Windows on ARM / Snapdragon X requires special build instructions (using clang instead of MSVC, details see [llama.cpp build instructions](https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md)), and I'm not sure if ollama already follows these. @SebastianGode independently also had this issue. ### OS Linux / Ubuntu 24.04 on WSL2, Windows on ARM ### GPU None ### CPU arm64 / Snapdragon X Plus ### Ollama version 0.3.2, 3e61426
GiteaMirror added the bug label 2026-04-12 14:39:28 -05:00
Author
Owner

@rick-github commented on GitHub (Aug 1, 2024):

ollama lags llama.cpp. give it a little time and the recent changes to llama.cpp will be incorporated into ollama.

<!-- gh-comment-id:2263765822 --> @rick-github commented on GitHub (Aug 1, 2024): ollama lags llama.cpp. give it a little time and the recent changes to llama.cpp will be incorporated into ollama.
Author
Owner

@AndreasKunar commented on GitHub (Aug 1, 2024):

ollama lags llama.cpp. give it a little time and the recent changes to llama.cpp will be incorporated into ollama.

Thanks! I just want to raise the issue, that it is not just an usual import/build thing, and probably requires a small code-change in ggml.go, which might not be on the radar otherwise.

<!-- gh-comment-id:2263793090 --> @AndreasKunar commented on GitHub (Aug 1, 2024): > ollama lags llama.cpp. give it a little time and the recent changes to llama.cpp will be incorporated into ollama. Thanks! I just want to raise the issue, that it is not just an usual import/build thing, and probably requires a small code-change in ggml.go, which might not be on the radar otherwise.
Author
Owner

@AndreasKunar commented on GitHub (Aug 11, 2024):

Import issue has been fixed.

<!-- gh-comment-id:2282719087 --> @AndreasKunar commented on GitHub (Aug 11, 2024): Import issue has been fixed.
Author
Owner

@emzaedu commented on GitHub (Oct 19, 2024):

Import issue still here. "Error: invalid file magic" on all Q4_0_4_8 models (

<!-- gh-comment-id:2424107523 --> @emzaedu commented on GitHub (Oct 19, 2024): Import issue still here. "Error: invalid file magic" on all Q4_0_4_8 models (
Author
Owner

@rick-github commented on GitHub (Oct 19, 2024):

Which model?

<!-- gh-comment-id:2424151989 --> @rick-github commented on GitHub (Oct 19, 2024): Which model?
Author
Owner

@emzaedu commented on GitHub (Oct 20, 2024):

literally every model that I try to add. for example:
Screenshot 2024-10-20 130707
They also work well in LM Studio

UPD: Windows ARM64

<!-- gh-comment-id:2424801309 --> @emzaedu commented on GitHub (Oct 20, 2024): literally every model that I try to add. for example: ![Screenshot 2024-10-20 130707](https://github.com/user-attachments/assets/f5e1a433-6823-4ccb-846f-26d87f12c7c5) They also work well in LM Studio UPD: Windows ARM64
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3826