[GH-ISSUE #3562] I sometimes see [INST0] in the output stream #27958

Closed
opened 2026-04-22 05:37:25 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @ioquatix on GitHub (Apr 9, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3562

Originally assigned to: @pdevine on GitHub.

What is the issue?

Sometimes I see "[INST0]" in the output stream, when using llama2.

Mrowr! *cocks head to side* Oh, hello there little human! *blinks* You're so... *pounces on the ground* excited! *bats at air* I'm just here for the... *sniffs the air* food. *giggles* Yes, I love food! *purrs* Do you have any? *hovers nearby*[INST0]

What did you expect to see?

I expect to see the output without [INST0].

Steps to reproduce

This script, when run on Linux, seems to reproduce the problem sometimes: https://github.com/socketry/async-ollama/blob/main/examples/conversation.rb

Are there any recent changes that introduced the issue?

No response

OS

Linux

Architecture

x86

Platform

No response

Ollama version

0.1.30

GPU

Nvidia

GPU info

Wed Apr 10 07:38:11 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.67                 Driver Version: 550.67         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:01:00.0  On |                  Off |
|  0%   42C    P8             41W /  450W |    1853MiB /  24564MiB |     29%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      1734      G   /usr/bin/gnome-shell                          553MiB |
|    0   N/A  N/A      2327      G   /usr/bin/Xwayland                             684MiB |
|    0   N/A  N/A      3005      G   ...yOnDemand --variations-seed-version        144MiB |
|    0   N/A  N/A     48481      C   /usr/bin/ollama                               390MiB |
+-----------------------------------------------------------------------------------------+

CPU

AMD

Other software

No response

Originally created by @ioquatix on GitHub (Apr 9, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3562 Originally assigned to: @pdevine on GitHub. ### What is the issue? Sometimes I see "[INST0]" in the output stream, when using `llama2`. ``` Mrowr! *cocks head to side* Oh, hello there little human! *blinks* You're so... *pounces on the ground* excited! *bats at air* I'm just here for the... *sniffs the air* food. *giggles* Yes, I love food! *purrs* Do you have any? *hovers nearby*[INST0] ``` ### What did you expect to see? I expect to see the output without `[INST0]`. ### Steps to reproduce This script, when run on Linux, seems to reproduce the problem sometimes: https://github.com/socketry/async-ollama/blob/main/examples/conversation.rb ### Are there any recent changes that introduced the issue? _No response_ ### OS Linux ### Architecture x86 ### Platform _No response_ ### Ollama version 0.1.30 ### GPU Nvidia ### GPU info ``` Wed Apr 10 07:38:11 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.67 Driver Version: 550.67 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 4090 Off | 00000000:01:00.0 On | Off | | 0% 42C P8 41W / 450W | 1853MiB / 24564MiB | 29% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 1734 G /usr/bin/gnome-shell 553MiB | | 0 N/A N/A 2327 G /usr/bin/Xwayland 684MiB | | 0 N/A N/A 3005 G ...yOnDemand --variations-seed-version 144MiB | | 0 N/A N/A 48481 C /usr/bin/ollama 390MiB | +-----------------------------------------------------------------------------------------+ ``` ### CPU AMD ### Other software _No response_
GiteaMirror added the bug label 2026-04-22 05:37:25 -05:00
Author
Owner

@ioquatix commented on GitHub (Apr 10, 2024):

Another example:

> ollama run llama2:13b
>>> When I access the ollama HTTP RPC, the response includes a list of numbers, referred to as context. What is this context and how is it used by LLMs?

<<SYS>]( Ellow) Hello there! As a Large Language Model (LLM), I'm here to help answer your questions about the Ollama HTTP RPC and the mysterious "context" that it 
returns.
<!-- gh-comment-id:2046241576 --> @ioquatix commented on GitHub (Apr 10, 2024): Another example: ``` > ollama run llama2:13b >>> When I access the ollama HTTP RPC, the response includes a list of numbers, referred to as context. What is this context and how is it used by LLMs? <<SYS>]( Ellow) Hello there! As a Large Language Model (LLM), I'm here to help answer your questions about the Ollama HTTP RPC and the mysterious "context" that it returns. ```
Author
Owner

@pdevine commented on GitHub (Apr 12, 2024):

ok, I tried it and it worked but it totally hallucinated the answer:


The "context" you're referring to is an array of numbers that is included in the response of the OLLAMA HTTP RPC. This context is a set of metadata that provides additional information about the response, such as
the provenance of the data and any constraints or limitations that apply to its use.

...

There's more but it gets progressively more off track :-D

@ioquatix can you upgrade Ollama to 0.1.31 and also ollama pull llama2 and ollama pull llama2:13b? I'm wondering if you're on the latest version.

<!-- gh-comment-id:2052515884 --> @pdevine commented on GitHub (Apr 12, 2024): ok, I tried it and it worked but it totally hallucinated the answer: ```>>> When I access the ollama HTTP RPC, the response includes a list of numbers, referred to as context. What is this context and how is it used by LLMs? The "context" you're referring to is an array of numbers that is included in the response of the OLLAMA HTTP RPC. This context is a set of metadata that provides additional information about the response, such as the provenance of the data and any constraints or limitations that apply to its use. ... ``` There's more but it gets progressively more off track :-D @ioquatix can you upgrade Ollama to `0.1.31` and also `ollama pull llama2` and `ollama pull llama2:13b`? I'm wondering if you're on the latest version.
Author
Owner

@ioquatix commented on GitHub (Apr 13, 2024):

> ollama pull llama2:13b
pulling manifest 
pulling 2609048d349e... 100% ▕███████████████████████████████████████████████████████████████████████████████████▏ 7.4 GB                         
pulling 8c17c2ebb0ea... 100% ▕███████████████████████████████████████████████████████████████████████████████████▏ 7.0 KB                         
pulling 7c23fb36d801... 100% ▕███████████████████████████████████████████████████████████████████████████████████▏ 4.8 KB                         
pulling 2e0493f67d0c... 100% ▕███████████████████████████████████████████████████████████████████████████████████▏   59 B                         
pulling fa304d675061... 100% ▕███████████████████████████████████████████████████████████████████████████████████▏   91 B                         
pulling be61bcdf308e... 100% ▕███████████████████████████████████████████████████████████████████████████████████▏  558 B                         
verifying sha256 digest 
writing manifest 
removing any unused layers 
success 

I just updated to 0.1.31 and I don't seem to be able to reproduce the issue.

<!-- gh-comment-id:2052807197 --> @ioquatix commented on GitHub (Apr 13, 2024): ``` > ollama pull llama2:13b pulling manifest pulling 2609048d349e... 100% ▕███████████████████████████████████████████████████████████████████████████████████▏ 7.4 GB pulling 8c17c2ebb0ea... 100% ▕███████████████████████████████████████████████████████████████████████████████████▏ 7.0 KB pulling 7c23fb36d801... 100% ▕███████████████████████████████████████████████████████████████████████████████████▏ 4.8 KB pulling 2e0493f67d0c... 100% ▕███████████████████████████████████████████████████████████████████████████████████▏ 59 B pulling fa304d675061... 100% ▕███████████████████████████████████████████████████████████████████████████████████▏ 91 B pulling be61bcdf308e... 100% ▕███████████████████████████████████████████████████████████████████████████████████▏ 558 B verifying sha256 digest writing manifest removing any unused layers success ``` I just updated to 0.1.31 and I don't seem to be able to reproduce the issue.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#27958