[GH-ISSUE #5400] Gemma2 work incorrect in parallel request #3380

Closed
opened 2026-04-12 14:00:50 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @dooezgo on GitHub (Jul 1, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5400

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

I'm testing for the multiple requests that can be handled in my system.
For Gemma2, when executing a single request, the respond is perfect.
But for multiple requests, the response looks so dummy.
image
I tried multi-request with llama3 but it worked perfectly on that.

OS

Windows

GPU

Nvidia

CPU

AMD

Ollama version

v0.1.48

Originally created by @dooezgo on GitHub (Jul 1, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5400 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? I'm testing for the multiple requests that can be handled in my system. For Gemma2, when executing a single request, the respond is perfect. But for multiple requests, the response looks so dummy. ![image](https://github.com/ollama/ollama/assets/33556384/3bdbbf5a-264e-4ba8-8f00-b6f3ea31f780) I tried multi-request with llama3 but it worked perfectly on that. ### OS Windows ### GPU Nvidia ### CPU AMD ### Ollama version v0.1.48
GiteaMirror added the needs more infobug labels 2026-04-12 14:00:50 -05:00
Author
Owner

@dhiltgen commented on GitHub (Jul 24, 2024):

I'm not able to reproduce. On a GPU where the model fully loads into VRAM, sending parallel requests doesn't result in gibberish responses.

Can you share more information about your setup? Is the model split between GPU/CPU? How much VRAM do you have?

<!-- gh-comment-id:2248693675 --> @dhiltgen commented on GitHub (Jul 24, 2024): I'm not able to reproduce. On a GPU where the model fully loads into VRAM, sending parallel requests doesn't result in gibberish responses. Can you share more information about your setup? Is the model split between GPU/CPU? How much VRAM do you have?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3380