[GH-ISSUE #5539] can't embedding PDF file in Korean #3459

Closed
opened 2026-04-12 14:08:24 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @codeMonkey-shin on GitHub (Jul 8, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5539

What is the issue?

I'm trying to use rag by embedding a PDF file in Korean, but the encoding seems to be broken. When saved to vectordb, broken strings are stored.

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.1.48

Originally created by @codeMonkey-shin on GitHub (Jul 8, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5539 ### What is the issue? I'm trying to use rag by embedding a PDF file in Korean, but the encoding seems to be broken. When saved to vectordb, broken strings are stored. ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.1.48
GiteaMirror added the bug label 2026-04-12 14:08:24 -05:00
Author
Owner

@mchiang0610 commented on GitHub (Jul 9, 2024):

@codeMonkey-shin sorry about this. May I ask which embedding model you are using? Is there a sample korean PDF in the public you could share for us to reproduce?

Thank you!

<!-- gh-comment-id:2216497786 --> @mchiang0610 commented on GitHub (Jul 9, 2024): @codeMonkey-shin sorry about this. May I ask which embedding model you are using? Is there a sample korean PDF in the public you could share for us to reproduce? Thank you!
Author
Owner

@codeMonkey-shin commented on GitHub (Jul 9, 2024):

@codeMonkey-shin sorry about this. May I ask which embedding model you are using? Is there a sample korean PDF in the public you could share for us to reproduce?

Thank you!

I used the chatfire/bge-m3:q8_0 model.

This model also supports Korean.

test.pdf

You can use the short Korean pdf file here.

<!-- gh-comment-id:2216639530 --> @codeMonkey-shin commented on GitHub (Jul 9, 2024): > @codeMonkey-shin sorry about this. May I ask which embedding model you are using? Is there a sample korean PDF in the public you could share for us to reproduce? > > Thank you! I used the chatfire/bge-m3:q8_0 model. This model also supports Korean. [test.pdf](https://github.com/user-attachments/files/16138640/test.pdf) You can use the short Korean pdf file here.
Author
Owner

@codeMonkey-shin commented on GitHub (Jul 9, 2024):

@codeMonkey-shin sorry about this. May I ask which embedding model you are using? Is there a sample korean PDF in the public you could share for us to reproduce?

Thank you!
Don't worry about it, it was my mistake. There seems to be an issue with the front-end.

<!-- gh-comment-id:2216986447 --> @codeMonkey-shin commented on GitHub (Jul 9, 2024): > @codeMonkey-shin sorry about this. May I ask which embedding model you are using? Is there a sample korean PDF in the public you could share for us to reproduce? > > Thank you! Don't worry about it, it was my mistake. There seems to be an issue with the front-end.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3459