[GH-ISSUE #2128] CLI not properly handles some unicode characters #47728

Closed
opened 2026-04-28 05:06:28 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @hyjwei on GitHub (Jan 21, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2128

If I input prompt with some unicode characters in ollama run command line, and then try to move the cursor back and forth, insert new ones, or delete some of them using delete or backspace key, the input line is then malformed.

In addition, if ollama output unicode characters, the text might occasionally repeat itself. It looks like that --nowordwrap option can solve the problem, so I guess that this issue happens when ollamo tries to wrap words to the next line. But the side effect of this option is that the English words break by newline.

I use PuTTY with Unicode support. And this issue can be reproduced by using some characters, for example, "请翻译以下文字“. You can copy/paste them into CLI and try to move cursor around or do some insert/delete. If you do move/insert/delete, the operation seems correctly executed on the string itself, but print of the string is malformed. In this example, if you use backspace to delete unicode chars, the CLI should delete 1 char and move back 2 bytes each time, and after 7 actions, CLI should delete all of them and show only ">>>". But in fact, each backspace moves back only 1 byte and corrupt the print.

This is what I get after input this string, and then use backspace to delete all of them:

>>> 请翻译以下文字
Use Ctrl + d or /bye to exit.
>>> 请翻译 Send a message (/? for help)

You can see that the last line is not cleared (3 chars remain), but CLI gives "Send a message", indicating that internally no char left in the input buffer. And there is a space before "S", and the reason is that these 7 chars occupy 14 bytes, and after 7 deletion, only last 7 bytes are wiped off from CLI, so the first 7 bytes (3 chars plus a space) remains.

Regards,

Originally created by @hyjwei on GitHub (Jan 21, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2128 If I input prompt with some unicode characters in `ollama run` command line, and then try to move the cursor back and forth, insert new ones, or delete some of them using delete or backspace key, the input line is then malformed. In addition, if ollama output unicode characters, the text might occasionally repeat itself. It looks like that `--nowordwrap` option can solve the problem, so I guess that this issue happens when ollamo tries to wrap words to the next line. But the side effect of this option is that the English words break by newline. I use PuTTY with Unicode support. And this issue can be reproduced by using some characters, for example, "请翻译以下文字“. You can copy/paste them into CLI and try to move cursor around or do some insert/delete. If you do move/insert/delete, the operation seems correctly executed on the string itself, but print of the string is malformed. In this example, if you use backspace to delete unicode chars, the CLI should delete 1 char and move back 2 bytes each time, and after 7 actions, CLI should delete all of them and show only ">>>". But in fact, each backspace moves back only 1 byte and corrupt the print. This is what I get after input this string, and then use backspace to delete all of them: ``` >>> 请翻译以下文字 Use Ctrl + d or /bye to exit. >>> 请翻译 Send a message (/? for help) ``` You can see that the last line is not cleared (3 chars remain), but CLI gives "Send a message", indicating that internally no char left in the input buffer. And there is a space before "S", and the reason is that these 7 chars occupy 14 bytes, and after 7 deletion, only last 7 bytes are wiped off from CLI, so the first 7 bytes (3 chars plus a space) remains. Regards,
Author
Owner

@mengzhuo commented on GitHub (Jan 22, 2024):

It seems duplicate to #1275

<!-- gh-comment-id:1903629857 --> @mengzhuo commented on GitHub (Jan 22, 2024): It seems duplicate to #1275
Author
Owner

@hyjwei commented on GitHub (Jan 22, 2024):

It seems duplicate to #1275

The output part looks like duplicate to it. And --nowordwrap seems to be able to solve it (at least for unicode chars).

<!-- gh-comment-id:1904122334 --> @hyjwei commented on GitHub (Jan 22, 2024): > It seems duplicate to #1275 The output part looks like duplicate to it. And `--nowordwrap` seems to be able to solve it (at least for unicode chars).
Author
Owner

@pdevine commented on GitHub (Jan 22, 2024):

Yep, it's a dupe. Let's track it in the other one.

<!-- gh-comment-id:1904521670 --> @pdevine commented on GitHub (Jan 22, 2024): Yep, it's a dupe. Let's track it in the other one.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#47728