[GH-ISSUE #5881] Is llama 3.1 already supported (on 2.8) or should we wait another update ? #65707

Closed
opened 2026-05-03 22:18:56 -05:00 by GiteaMirror · 20 comments
Owner

Originally created by @Qualzz on GitHub (Jul 23, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5881

What is the issue?

The model page seems to already exists in ollama website, but the model is clearly behaving erratically, which makes me wonder if we should wait for an update before using llama 3.1.

OS

Windows

GPU

Nvidia

CPU

AMD

Ollama version

2.8

Originally created by @Qualzz on GitHub (Jul 23, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5881 ### What is the issue? The model page seems to already exists in ollama website, but the model is clearly behaving erratically, which makes me wonder if we should wait for an update before using llama 3.1. ### OS Windows ### GPU Nvidia ### CPU AMD ### Ollama version 2.8
GiteaMirror added the bug label 2026-05-03 22:18:56 -05:00
Author
Owner

@Qualzz commented on GitHub (Jul 23, 2024):

image

<!-- gh-comment-id:2245742782 --> @Qualzz commented on GitHub (Jul 23, 2024): ![image](https://github.com/user-attachments/assets/aa403a90-5622-4d40-a416-8fb53499bc14)
Author
Owner

@rick-github commented on GitHub (Jul 23, 2024):

What model and UI? I tried the "sticky header" question with open-webui v0.3.8, ollama 0.2.8 and model llama3.1:8b-instruct-q4_K_M and got a perfectly reasonable answer.

<!-- gh-comment-id:2245809996 --> @rick-github commented on GitHub (Jul 23, 2024): What model and UI? I tried the "sticky header" question with open-webui v0.3.8, ollama 0.2.8 and model llama3.1:8b-instruct-q4_K_M and got a perfectly reasonable answer.
Author
Owner

@Qualzz commented on GitHub (Jul 23, 2024):

Exactly the same config as you, except Q8.

But I have the same issue without any ui:
image

<!-- gh-comment-id:2245919592 --> @Qualzz commented on GitHub (Jul 23, 2024): Exactly the same config as you, except Q8. But I have the same issue without any ui: ![image](https://github.com/user-attachments/assets/3738af1d-4eb7-4aca-9b13-480aaae70834)
Author
Owner

@sam2332 commented on GitHub (Jul 23, 2024):

model just spams text until stopped

<!-- gh-comment-id:2245920492 --> @sam2332 commented on GitHub (Jul 23, 2024): model just spams text until stopped
Author
Owner

@taozhiyuai commented on GitHub (Jul 23, 2024):

works fine on m3 max 128GB. version=0.2.8

<!-- gh-comment-id:2245945884 --> @taozhiyuai commented on GitHub (Jul 23, 2024): works fine on m3 max 128GB. version=0.2.8
Author
Owner

@debackerl commented on GitHub (Jul 23, 2024):

I don't think that the Ollama config is finalized yet. For example, they explain that when the Llama 3.1 is using a tool, it would emit <|eom_id|> instead of <|eot_id|>, however, that new token is still missing in Ollama's config: https://ollama.com/library/llama3.1/blobs/56bb8bd477a5

I guess it's not related to your problem, but it shows that more improvements will be needed.

<!-- gh-comment-id:2245956652 --> @debackerl commented on GitHub (Jul 23, 2024): I don't think that the Ollama config is finalized yet. For example, they explain that when the Llama 3.1 is using a tool, it would emit <|eom_id|> instead of <|eot_id|>, however, that new token is still missing in Ollama's config: https://ollama.com/library/llama3.1/blobs/56bb8bd477a5 I guess it's not related to your problem, but it shows that more improvements will be needed.
Author
Owner

@rick-github commented on GitHub (Jul 23, 2024):

Q8 also worked fine for me, intel + nvidia rtx4070 with 12G VRAM.

<!-- gh-comment-id:2245957161 --> @rick-github commented on GitHub (Jul 23, 2024): Q8 also worked fine for me, intel + nvidia rtx4070 with 12G VRAM.
Author
Owner

@sam2332 commented on GitHub (Jul 23, 2024):

the Q8 loads and does not constantly spam out text for me as well

<!-- gh-comment-id:2245961427 --> @sam2332 commented on GitHub (Jul 23, 2024): the Q8 loads and does not constantly spam out text for me as well
Author
Owner

@rick-github commented on GitHub (Jul 23, 2024):

Tried Q4_0, Q4_1, Q4_K_S, Q4_K_M, Q8, FP16, all returned expected results.

<!-- gh-comment-id:2245980768 --> @rick-github commented on GitHub (Jul 23, 2024): Tried Q4_0, Q4_1, Q4_K_S, Q4_K_M, Q8, FP16, all returned expected results.
Author
Owner

@Qualzz commented on GitHub (Jul 23, 2024):

Tried Q4_0, Q4_1, Q4_K_S, Q4_K_M, Q8, FP16, all returned expected results.

Did you tried anything with longer context windows ? for exemple 32k ? The more the context is, the less the model is coherent (by that I mean random outputs).

On smaller context size, like 2048, it seems ok at first glance but still kinda weird.

<!-- gh-comment-id:2245988963 --> @Qualzz commented on GitHub (Jul 23, 2024): > Tried Q4_0, Q4_1, Q4_K_S, Q4_K_M, Q8, FP16, all returned expected results. Did you tried anything with longer context windows ? for exemple 32k ? The more the context is, the less the model is coherent (by that I mean random outputs). On smaller context size, like 2048, it seems ok at first glance but still kinda weird.
Author
Owner

@rick-github commented on GitHub (Jul 23, 2024):

I did a needle-in-a-haystack for various window lengths up to 32k, the performance went down as the context window went up, but it never went off the rails - the answers were coherent, if not correct.

<!-- gh-comment-id:2246042975 --> @rick-github commented on GitHub (Jul 23, 2024): I did a needle-in-a-haystack for various window lengths up to 32k, the performance went down as the context window went up, but it never went off the rails - the answers were coherent, if not correct.
Author
Owner

@Qualzz commented on GitHub (Jul 23, 2024):

Seems like an issue was raised in llama.cpp with other people having the same kind of experience. It's weird everything work fine for you :o

#8650

<!-- gh-comment-id:2246263000 --> @Qualzz commented on GitHub (Jul 23, 2024): Seems like an issue was raised in llama.cpp with other people having the same kind of experience. It's weird everything work fine for you :o [#8650](https://github.com/ggerganov/llama.cpp/issues/8650)
Author
Owner

@mirek190 commented on GitHub (Jul 23, 2024):

ollama or llamacpp works bad with llama 3.1 .... so do mot bother because is dumber than llama 3 currently

<!-- gh-comment-id:2246340496 --> @mirek190 commented on GitHub (Jul 23, 2024): ollama or llamacpp works bad with llama 3.1 .... so do mot bother because is dumber than llama 3 currently
Author
Owner

@Baughn commented on GitHub (Jul 23, 2024):

It's also configured with an 8k context size. Though I thought the model was trained as 128k?

<!-- gh-comment-id:2246493000 --> @Baughn commented on GitHub (Jul 23, 2024): It's also configured with an 8k context size. Though I thought the model was trained as 128k?
Author
Owner

@ouening commented on GitHub (Jul 24, 2024):

I got bad response using llama3.1(Q4_0):
image

<!-- gh-comment-id:2246789252 --> @ouening commented on GitHub (Jul 24, 2024): I got bad response using llama3.1(Q4_0): ![image](https://github.com/user-attachments/assets/7d637895-00de-4646-9014-11b4f7c30392)
Author
Owner

@kiradzS commented on GitHub (Jul 24, 2024):

I got bad response using llama3.1(Q4_0): image

It doesn't support Chinese, only a few languages on the page, you have to translate it into English and send it to it first to get better results.
But I feel it's not much better than 3. I tried Q8 and tested it on various projects and it didn't work as well as glm4

<!-- gh-comment-id:2247432211 --> @kiradzS commented on GitHub (Jul 24, 2024): > I got bad response using llama3.1(Q4_0): ![image](https://private-user-images.githubusercontent.com/15122811/351562992-7d637895-00de-4646-9014-11b4f7c30392.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MTQ3NTgsIm5iZiI6MTcyMTgxNDQ1OCwicGF0aCI6Ii8xNTEyMjgxMS8zNTE1NjI5OTItN2Q2Mzc4OTUtMDBkZS00NjQ2LTkwMTQtMTFiNGY3YzMwMzkyLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MjQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzI0VDA5NDczOFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWNhZDBlYTgxMzIwZWU4YWRmYmYwOThmYjQ1Zjc5ODI3MjVkN2JjODczMTg2Y2FjMWEzM2Y2MTI2YjZjNjJlNjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.4QQgY-1E40-GEW88aRkj_6KJF6abicnMzs6cV5xcLgM) It doesn't support Chinese, only a few languages on the page, you have to translate it into English and send it to it first to get better results. But I feel it's not much better than 3. I tried Q8 and tested it on various projects and it didn't work as well as glm4
Author
Owner

@MeinDeutschkurs commented on GitHub (Jul 24, 2024):

Up to 4096 Tokens, everything seems to be fine. I tried to get a summary on 70.000 input (expected output about 4000 tokens) and I got jibberish:

Something like:
I ...., was .... !+ //a ... like I go 1 547wtr ... never...

In comparison to the "gradient" version, no meaning at all. (The gradient version did the summary, but was not able to end the summary - it hallucinated it further, till the context window was completely used)

(m2 ultra 192GB)

After updating the model, I got this jibberish:

His-

that" it (that," he the thing you! those, that 2 that this. It had the 
minute this [the...the whole - the next." The other.


The—

the 'This was,

' THE the way the good as an absolute, you said this, the second the best 
the very good the last. That!" He with that when his

That. All of" he! "that' and a bit a big now!" that he. You, that: You!"

Sav the -he:

The, the minute. His wonderful.

His,

There!

the moment -

That. The secret, Harry...-

the Secret this -
the moment to that the instant. He was he the moment." that the minute.

You! that.

He's! the second.
You!"

that," it (see this

he -you"the, but the great."

a big. That!

The rest you, he. This: THAT you that time that that! That day," that the 
secret, He was a

That!"

Dy (you to that:

He went with the last of them..."

-Those F. that.

This one way his"!" Weh!
the whole." he!

S.

The next moment this the rat."

His 27, you this 'You! You looked. The whole -this.

you.

the

(he...the" the secret" He had that we. This

' the
That was -He was. THAT. The best then. Your own to get the (he a "the"the"

the minute." that, as you. This:

You and he. The next. You'd his time. You're.

D.

' this!"

That this"this!"

you."

That the moment

He!

the instant with that the night the last - that the next," the" that the 
great. It had this the (his"

He said in his

this...that." the great 'him and

any way this to you, he the best!" it. He had that the moment of the last: 
The good at the most now"

The The and the and the the and The The and the the The and The  The and 
The the and the The The  The
You the  The The  The the the  The

You the  The  the  The The the

The  the the  of the the  the  the  the the the  the  the the  the the  
The the  the  the the the  the  The  the  The  the  the  the  The and the  
the the the the  the  The the  the the  The the The  the and the  the  The 
the  The and the the the  The  the  The the the and the The The and the  
of the and the the the  The and the the the  of the the  the  The the  the 
theThe  the of the and the the the the the the the of the  the the the the 
 The the the the  the the  the the  The the  the the the of the the  the 
The  the the the of the the the  the of the the the  The the  of the the  
the  the  The  Of the the the of the theThe and the of the the of the the 
of the  the  of the the  The  the of the  the of the the  the The  the The 
 the of the of the the  the  of the  the of the the the of the the of the 
the  The of the  the  of the the  Of the the of the and of theThe of the 
the  the  Of the of the  the the  The  of the  the the of the the of the  
Of the the of the the  the the  the of the  the the of the  the of the the 
of the  of the  the  The  of the  The  the the  The of the  the of the  
the of  The  The of the  The the of the the of the  the Of the the the of 
the  Of The  The of the  Of the  The  Of the of the the of The the of 
theThe  Of the of the  the  The  the the of the the of the  The The of the 
The  the  of the of the  the of the  The of theThe The  Of the  Of the  of 
the of the  of the  Of the the of the  Of theThe  Of the  Of the Of the  
Of the  Of the  Of the  Of the  the Of the of the  Of the the  Of the  Of 
the of the  Of the of the  Of the of the Of The of the of the of the  Of 
the the the  Of the  Of the  The of the  Of the of the the of the  Of the 
the of the the Of the the  Of the  the  Of the The of the  the The Of the  
The  The of the of the of the  of the the of the  To the Of the  of the  
Of the the  The  of the  The Of the the Of the the the Of the the Of the 
the The Of the the  Of the the Of the The of The Of the the  The  The  Of 
the the  The  the the Of The  Of the the  the The  the of the the the  Of 
the the the  of the  The of the Of the the the  Of the the of the  the Of 
the  Of the the Of the the the of the the Of the the Of the of^C
<!-- gh-comment-id:2247682043 --> @MeinDeutschkurs commented on GitHub (Jul 24, 2024): Up to 4096 Tokens, everything seems to be fine. I tried to get a summary on 70.000 input (expected output about 4000 tokens) and I got jibberish: Something like: `I ...., was .... !+ //a ... like I go 1 547wtr ... never...` In comparison to the "gradient" version, no meaning at all. (The gradient version did the summary, but was not able to end the summary - it hallucinated it further, till the context window was completely used) (m2 ultra 192GB) After updating the model, I got this jibberish: ``` His- that" it (that," he the thing you! those, that 2 that this. It had the minute this [the...the whole - the next." The other. The— the 'This was, ' THE the way the good as an absolute, you said this, the second the best the very good the last. That!" He with that when his That. All of" he! "that' and a bit a big now!" that he. You, that: You!" Sav the -he: The, the minute. His wonderful. His, There! the moment - That. The secret, Harry...- the Secret this - the moment to that the instant. He was he the moment." that the minute. You! that. He's! the second. You!" that," it (see this he -you"the, but the great." a big. That! The rest you, he. This: THAT you that time that that! That day," that the secret, He was a That!" Dy (you to that: He went with the last of them..." -Those F. that. This one way his"!" Weh! the whole." he! S. The next moment this the rat." His 27, you this 'You! You looked. The whole -this. you. the (he...the" the secret" He had that we. This ' the That was -He was. THAT. The best then. Your own to get the (he a "the"the" the minute." that, as you. This: You and he. The next. You'd his time. You're. D. ' this!" That this"this!" you." That the moment He! the instant with that the night the last - that the next," the" that the great. It had this the (his" He said in his this...that." the great 'him and any way this to you, he the best!" it. He had that the moment of the last: The good at the most now" The The and the and the the and The The and the the The and The The and The the and the The The The You the The The The the the The You the The the The The the The the the of the the the the the the the the the the the the The the the the the the the The the The the the the The and the the the the the the The the the the The the The the and the the The the The and the the the The the The the the and the The The and the of the and the the the The and the the the of the the the The the the theThe the of the and the the the the the the the of the the the the the The the the the the the the the The the the the the of the the the The the the the of the the the the of the the the The the of the the the the The Of the the the of the theThe and the of the the of the the of the the of the the The the of the the of the the the The the The the of the of the the the of the the of the the the of the the of the the The of the the of the the Of the the of the and of theThe of the the the Of the of the the the The of the the the of the the of the Of the the of the the the the the of the the the of the the of the the of the of the the The of the The the the The of the the of the the of The The of the The the of the the of the the Of the the the of the Of The The of the Of the The Of the of the the of The the of theThe Of the of the the The the the of the the of the The The of the The the of the of the the of the The of theThe The Of the Of the of the of the of the Of the the of the Of theThe Of the Of the Of the Of the Of the Of the Of the the Of the of the Of the the Of the Of the of the Of the of the Of the of the Of The of the of the of the Of the the the Of the Of the The of the Of the of the the of the Of the the of the the Of the the Of the the Of the The of the the The Of the The The of the of the of the of the the of the To the Of the of the Of the the The of the The Of the the Of the the the Of the the Of the the The Of the the Of the the Of the The of The Of the the The The Of the the The the the Of The Of the the the The the of the the the Of the the the of the The of the Of the the the Of the the of the the Of the Of the the Of the the the of the the Of the the Of the of^C ```
Author
Owner

@zakcali commented on GitHub (Jul 24, 2024):

llama3.1:8b-instruct-fp16 can-not answer that question correctly:
A dead cat is placed into a box along with a nuclear isotope, a vial of poison and a radiation detector. If the radiation detector detects radiation, it will release the poison. The box is opened one day later. What is the probability of the cat being alive?

if I ask same question again, then model answers right.

situation is same on groq.com with Model:Llama-3.1-8b-Instant.
Just paste the question twice.

<!-- gh-comment-id:2248740673 --> @zakcali commented on GitHub (Jul 24, 2024): llama3.1:8b-instruct-fp16 can-not answer that question correctly: A dead cat is placed into a box along with a nuclear isotope, a vial of poison and a radiation detector. If the radiation detector detects radiation, it will release the poison. The box is opened one day later. What is the probability of the cat being alive? if I ask same question again, then model answers right. situation is same on groq.com with Model:Llama-3.1-8b-Instant. Just paste the question twice.
Author
Owner

@nonetrix commented on GitHub (Jul 24, 2024):

Llama 3.1 8B at even Q8 or Q4 seems much worse than on Groq, on Groq it is like same ball park as model like Gemma 27B if not better even gets questions I ask to trip smaller models up right. But on Ollama it is like one of worst 8B models I have tried. 70B seems better, but I think it's still slightly better on Groq. It's fully coherent for me, but just not good

<!-- gh-comment-id:2248934401 --> @nonetrix commented on GitHub (Jul 24, 2024): Llama 3.1 8B at even Q8 or Q4 seems much worse than on Groq, on Groq it is like same ball park as model like Gemma 27B if not better even gets questions I ask to trip smaller models up right. But on Ollama it is like one of worst 8B models I have tried. 70B seems better, but I think it's still slightly better on Groq. It's fully coherent for me, but just not good
Author
Owner

@pdevine commented on GitHub (Sep 2, 2024):

I'm going to go ahead and close this since llama3.1 is working correctly.

<!-- gh-comment-id:2323558589 --> @pdevine commented on GitHub (Sep 2, 2024): I'm going to go ahead and close this since llama3.1 is working correctly.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#65707