[GH-ISSUE #11966] gpt-oss:20b and gpt-oss:120b input char "sanitization" on tilde (~) #54459

Closed
opened 2026-04-29 06:00:46 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @JKremsner on GitHub (Aug 19, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11966

Originally assigned to: @drifkin, @mxyng on GitHub.

What is the issue?

It seems that the tilde (~) is completely sanitized from the input to the model. For my rather complex workflow it is crucial that this input gets passed as expected. All my tests failed.
As an example:
input:
'please repeat exactly: "this is a ~ test"'
output:
'please repeat exactly: "this is a test"'
see also screenshot
Image

Relevant log output


OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.11.4

Originally created by @JKremsner on GitHub (Aug 19, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11966 Originally assigned to: @drifkin, @mxyng on GitHub. ### What is the issue? It seems that the tilde (~) is completely sanitized from the input to the model. For my rather complex workflow it is crucial that this input gets passed as expected. All my tests failed. As an example: input: 'please repeat exactly: "this is a ~ test"' output: 'please repeat exactly: "this is a test"' see also screenshot <img width="786" height="779" alt="Image" src="https://github.com/user-attachments/assets/58c06d54-bef1-4fcc-8ce9-4aebca57d0e6" /> ### Relevant log output ```shell ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.11.4
GiteaMirror added the appbug labels 2026-04-29 06:00:47 -05:00
Author
Owner

@rick-github commented on GitHub (Aug 19, 2025):

Doesn't seem to be an app issue.

$ ollama run gpt-oss:20b 'what is the name of the following character: "~"'
Thinking...
The user says: "what is the name of the following character: " " ". There's a space between the quotes: `" "`? The character is presumably a space character. The user might want the name of the character represented by a space. The name: "space", "space character", 
"whitespace", "blank". The character in the string is a single space. So answer: It's a space (Unicode U+0020, called "Space"). The name could be "SPACE" or "Space". Maybe mention "space character" or "blank". Also could mention "Unicode whitespace". So answer: the 
name is "Space" or "Whitespace". Provide explanation. Also note it's the space bar key. So answer accordingly.
...done thinking.

It’s just the **space character** – the blank that the space‑bar types. In Unicode it’s U+0020 and is officially called “Space.” It’s also a member of the whitespace family of characters.

Does the same for a modified version of the model:

$ ollama run gpt-oss:20b-abliterated-q4_K_M 'what is the name of the following character: "~"'
Thinking...
The user asks: "what is the name of the following character: " ". This is a bit ambiguous. They want the name of the following character: " ". Maybe they want to identify the character represented by that symbol? They have provided only the quotation marks? Maybe 
it's an opening double quotation mark? The character is the double quotation mark? But the question: "what is the name of the following character: " " Perhaps the character is the double quotation mark, or a typographic quote symbol. The name might be "quotation 
mark" or "double quotation mark" or "double quote" or "quotation mark" or "double quote" or "quotation mark" also known as "quotation mark" or "double quotation mark" also known as "quote".

But we need to respond with the name of the character. Possibly "quotation mark". Or "the double quotation mark" or "double quote". Or "the double quotation mark" or "the quote character" or "the double quote".

But the question includes a colon and a space: " what is the name of the following character: " ". The character they show is " (double quote). So the answer: The character is called the "double quotation mark" or "quotation mark". Let's provide a thorough 
explanation: The symbol is the Unicode character U+0022 (Quotation mark, also known as double quote). In many contexts, it's called the double quotation mark.

So I'll answer: "The character is called a quotation mark (U+0022), also known as a double quotation mark or a double quote."

But maybe they want the name "Quotation mark". Provide the Unicode name and codepoint. Provide the official Unicode name: QUOTATION MARK. Also known as "Double Quote" or "Double Quotation Mark".

The best answer: "The character is called the quotation mark (U+0022), also referred to as a double quotation mark or a double quote."

Also maybe they want "Guillemets"? But the symbol " is the double quotation mark. There is also a “ typographic opening quotation mark (U+201C) and ” closing (U+201D). But the simple " is U+0022.

So answer: "Quotation mark" or "double quotation mark".

Ok. Let's produce final answer: The character is called the "quotation mark" (Unicode U+0022). It is also known as a "double quote".

Also mention that in some fonts it's "quotation mark". Provide a small snippet: "The character shown is the quotation mark (U+0022), commonly referred to as a double quotation mark."

I'll deliver final answer accordingly.
...done thinking.

The symbol you posted is the **quotation mark** – Unicode U+0022 – which is often called a **double quotation mark** or simply a **double quote**.

But the Turbo version gets it right:

$ OLLAMA_HOST=ollama.com ollama run gpt-oss:20b 'what is the name of the following character: "~"'
Thinking...
The user asks: "what is the name of the following character: "~"". They likely want the name of the tilde character. The name is "tilde". The user may want the character name: "tilde" is the name. Alternatively "modifier 
letter tilde". It's called "Tilde". The question: "what is the name of the following character: "~"". So answer: "tilded" maybe. Should provide name: "tilde". Could also mention ASCII code, Unicode U+007E. Provide some 
context. The user just wants name. So answer succinctly: It's called a tilde. Provide maybe alternate names: "inverted circumflex". Provide code point. But must respond carefully. There's no policy conflict. It's safe.
...done thinking.

The character **`~`** is called the **tilde**.  
- In ASCII it is code 126 (0x7E).  
- In Unicode it’s U+007E, also known as *Small Tilde* or *Modifier Letter Tilde*.  
- It’s a diacritic in some languages and also used as a back‑tick or “approximate” symbol in many contexts.

Other new-engine models also correct:

$ ollama run gemma3n 'what is the name of the following character: "~"'
The character "~" is called a **tilde**. 
<!-- gh-comment-id:3201459384 --> @rick-github commented on GitHub (Aug 19, 2025): Doesn't seem to be an app issue. ```console $ ollama run gpt-oss:20b 'what is the name of the following character: "~"' Thinking... The user says: "what is the name of the following character: " " ". There's a space between the quotes: `" "`? The character is presumably a space character. The user might want the name of the character represented by a space. The name: "space", "space character", "whitespace", "blank". The character in the string is a single space. So answer: It's a space (Unicode U+0020, called "Space"). The name could be "SPACE" or "Space". Maybe mention "space character" or "blank". Also could mention "Unicode whitespace". So answer: the name is "Space" or "Whitespace". Provide explanation. Also note it's the space bar key. So answer accordingly. ...done thinking. It’s just the **space character** – the blank that the space‑bar types. In Unicode it’s U+0020 and is officially called “Space.” It’s also a member of the whitespace family of characters. ``` Does the same for a modified version of the model: ```console $ ollama run gpt-oss:20b-abliterated-q4_K_M 'what is the name of the following character: "~"' Thinking... The user asks: "what is the name of the following character: " ". This is a bit ambiguous. They want the name of the following character: " ". Maybe they want to identify the character represented by that symbol? They have provided only the quotation marks? Maybe it's an opening double quotation mark? The character is the double quotation mark? But the question: "what is the name of the following character: " " Perhaps the character is the double quotation mark, or a typographic quote symbol. The name might be "quotation mark" or "double quotation mark" or "double quote" or "quotation mark" or "double quote" or "quotation mark" also known as "quotation mark" or "double quotation mark" also known as "quote". But we need to respond with the name of the character. Possibly "quotation mark". Or "the double quotation mark" or "double quote". Or "the double quotation mark" or "the quote character" or "the double quote". But the question includes a colon and a space: " what is the name of the following character: " ". The character they show is " (double quote). So the answer: The character is called the "double quotation mark" or "quotation mark". Let's provide a thorough explanation: The symbol is the Unicode character U+0022 (Quotation mark, also known as double quote). In many contexts, it's called the double quotation mark. So I'll answer: "The character is called a quotation mark (U+0022), also known as a double quotation mark or a double quote." But maybe they want the name "Quotation mark". Provide the Unicode name and codepoint. Provide the official Unicode name: QUOTATION MARK. Also known as "Double Quote" or "Double Quotation Mark". The best answer: "The character is called the quotation mark (U+0022), also referred to as a double quotation mark or a double quote." Also maybe they want "Guillemets"? But the symbol " is the double quotation mark. There is also a “ typographic opening quotation mark (U+201C) and ” closing (U+201D). But the simple " is U+0022. So answer: "Quotation mark" or "double quotation mark". Ok. Let's produce final answer: The character is called the "quotation mark" (Unicode U+0022). It is also known as a "double quote". Also mention that in some fonts it's "quotation mark". Provide a small snippet: "The character shown is the quotation mark (U+0022), commonly referred to as a double quotation mark." I'll deliver final answer accordingly. ...done thinking. The symbol you posted is the **quotation mark** – Unicode U+0022 – which is often called a **double quotation mark** or simply a **double quote**. ``` But the Turbo version gets it right: ```console $ OLLAMA_HOST=ollama.com ollama run gpt-oss:20b 'what is the name of the following character: "~"' Thinking... The user asks: "what is the name of the following character: "~"". They likely want the name of the tilde character. The name is "tilde". The user may want the character name: "tilde" is the name. Alternatively "modifier letter tilde". It's called "Tilde". The question: "what is the name of the following character: "~"". So answer: "tilded" maybe. Should provide name: "tilde". Could also mention ASCII code, Unicode U+007E. Provide some context. The user just wants name. So answer succinctly: It's called a tilde. Provide maybe alternate names: "inverted circumflex". Provide code point. But must respond carefully. There's no policy conflict. It's safe. ...done thinking. The character **`~`** is called the **tilde**. - In ASCII it is code 126 (0x7E). - In Unicode it’s U+007E, also known as *Small Tilde* or *Modifier Letter Tilde*. - It’s a diacritic in some languages and also used as a back‑tick or “approximate” symbol in many contexts. ``` Other new-engine models also correct: ```console $ ollama run gemma3n 'what is the name of the following character: "~"' The character "~" is called a **tilde**. ```
Author
Owner

@drifkin commented on GitHub (Aug 20, 2025):

@mxyng: I investigated this a bit and I think it's a tokenizer issue.

With tracing on and a user message containing solely ~, I verified that the prompt indeed renders the tilde, and I verified on the way into BytePairEncoding.Encode() the rune's code is 126, which is the typical tilde character. After Encode(), it gets mapped to token 220, which is a space character. I've also noticed that it's part of a larger user message, it still gets treated as a space, but might end up attached to a real space and becomes token 256 (space-space)

example log:

time=2025-08-19T17:03:35.489-07:00 level=TRACE source=bytepairencoding.go:214 msg=encoded string="<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.\nKnowledge cutoff: 2024-06\nCurrent date: 2025-08-19\n\nReasoning: medium\n\n# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>user<|message|>~<|end|><|start|>assistant" ids="[200006 17360 200008 3575 553 17554 162016 11 261 4410 6439 2359 22203 656 7788 17527 558 87447 100594 25 220 1323 19 12 3218 198 6576 3521 25 220 1323 20 12 3062 12 858 279 30377 289 25 14093 279 2 13888 18403 25 8450 11 49159 11 1721 13 21030 2804 413 7360 395 1753 3176 13 200007 200006 1428 200008 220 200007 200006 173781]"
<!-- gh-comment-id:3203257567 --> @drifkin commented on GitHub (Aug 20, 2025): @mxyng: I investigated this a bit and I think it's a tokenizer issue. With tracing on and a user message containing solely `~`, I verified that the prompt indeed renders the tilde, and I verified on the way into `BytePairEncoding.Encode()` the rune's code is 126, which is the typical tilde character. After `Encode()`, it gets mapped to token 220, which is a space character. I've also noticed that it's part of a larger user message, it still gets treated as a space, but might end up attached to a real space and becomes token 256 (space-space) example log: ``` time=2025-08-19T17:03:35.489-07:00 level=TRACE source=bytepairencoding.go:214 msg=encoded string="<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.\nKnowledge cutoff: 2024-06\nCurrent date: 2025-08-19\n\nReasoning: medium\n\n# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>user<|message|>~<|end|><|start|>assistant" ids="[200006 17360 200008 3575 553 17554 162016 11 261 4410 6439 2359 22203 656 7788 17527 558 87447 100594 25 220 1323 19 12 3218 198 6576 3521 25 220 1323 20 12 3062 12 858 279 30377 289 25 14093 279 2 13888 18403 25 8450 11 49159 11 1721 13 21030 2804 413 7360 395 1753 3176 13 200007 200006 1428 200008 220 200007 200006 173781]" ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#54459