enh: Better CJK language support when downloading pdf #1721

Closed
opened 2025-11-11 14:50:51 -06:00 by GiteaMirror · 3 comments
Owner

Originally created by @y0umu on GitHub (Aug 8, 2024).

Is your feature request related to a problem? Please describe.
When I try to download the conversation as pdf, Chinese (traditional or simplified) charactors are missing glyphs. Examples below

traditional_chinese
Got WARNI [fpdf.output] Font MPDFAA+NotoSans is missing the following glyphs: 啟, 鸠 when downloading the pdf.

simplified_chinese

Got WARNI [fpdf.output] Font MPDFAA+NotoSans is missing the following glyphs: 谁, 这, 话, 时, 议, 颁, 夺, 财, 产, 为, 发, 许, 宪, 现 when downloading the pdf.

Describe the solution you'd like
I workarounded by hacking codes near 99d10d1189/backend/apps/webui/routers/utils.py (L83) to something like:

    pdf.add_font("NotoSans", "", f"{FONTS_DIR}/NotoSans-Regular.ttf")
    pdf.add_font("NotoSans", "b", f"{FONTS_DIR}/NotoSans-Bold.ttf")
    pdf.add_font("NotoSans", "i", f"{FONTS_DIR}/NotoSans-Italic.ttf")
    pdf.add_font("NotoSansSC", "", f"{FONTS_DIR}/NotoSansSC-Regular.ttf")
    pdf.add_font("NotoSansKR", "", f"{FONTS_DIR}/NotoSansKR-Regular.ttf")
    pdf.add_font("NotoSansJP", "", f"{FONTS_DIR}/NotoSansJP-Regular.ttf")

    pdf.set_font("NotoSans", size=12)
    # TODO select the best fallback sequence by determining the user's locale
    pdf.set_fallback_fonts(["NotoSansSC", "NotoSansKR", "NotoSansJP"])

And I have put NotoSansSC-Regular.ttf from https://fonts.google.com/noto into backend/static/fonts.

This might not be accepted as the general solution however, as it probably breaks displays of some Japanese chars (门、入、内 for example)

Describe alternatives you've considered
Determine the user's locale before executing pdf.set_fallback_fonts(["NotoSansSC", "NotoSansKR", "NotoSansJP"]) to make sure the fallback sequence is correct.

Additional context
No additionall context

Originally created by @y0umu on GitHub (Aug 8, 2024). **Is your feature request related to a problem? Please describe.** When I try to download the conversation as pdf, Chinese (traditional or simplified) charactors are missing glyphs. Examples below ![traditional_chinese](https://github.com/user-attachments/assets/b83b4163-4901-4f7e-94b4-0721bb86581b) Got `WARNI [fpdf.output] Font MPDFAA+NotoSans is missing the following glyphs: 啟, 鸠` when downloading the pdf. ![simplified_chinese](https://github.com/user-attachments/assets/e2e9f2c8-1e81-4a9f-bce5-b68db207916e) Got `WARNI [fpdf.output] Font MPDFAA+NotoSans is missing the following glyphs: 谁, 这, 话, 时, 议, 颁, 夺, 财, 产, 为, 发, 许, 宪, 现` when downloading the pdf. **Describe the solution you'd like** I workarounded by hacking codes near https://github.com/open-webui/open-webui/blob/99d10d1189452ad49fcace219e9c90ae65906cd1/backend/apps/webui/routers/utils.py#L83 to something like: ```python pdf.add_font("NotoSans", "", f"{FONTS_DIR}/NotoSans-Regular.ttf") pdf.add_font("NotoSans", "b", f"{FONTS_DIR}/NotoSans-Bold.ttf") pdf.add_font("NotoSans", "i", f"{FONTS_DIR}/NotoSans-Italic.ttf") pdf.add_font("NotoSansSC", "", f"{FONTS_DIR}/NotoSansSC-Regular.ttf") pdf.add_font("NotoSansKR", "", f"{FONTS_DIR}/NotoSansKR-Regular.ttf") pdf.add_font("NotoSansJP", "", f"{FONTS_DIR}/NotoSansJP-Regular.ttf") pdf.set_font("NotoSans", size=12) # TODO select the best fallback sequence by determining the user's locale pdf.set_fallback_fonts(["NotoSansSC", "NotoSansKR", "NotoSansJP"]) ``` And I have put NotoSansSC-Regular.ttf from https://fonts.google.com/noto into `backend/static/fonts`. This might not be accepted as the general solution however, as it probably breaks displays of some Japanese chars (门、入、内 for example) **Describe alternatives you've considered** Determine the user's locale before executing `pdf.set_fallback_fonts(["NotoSansSC", "NotoSansKR", "NotoSansJP"])` to make sure the fallback sequence is correct. **Additional context** No additionall context
Author
Owner

@tjbck commented on GitHub (Aug 8, 2024):

PR welcome!

@tjbck commented on GitHub (Aug 8, 2024): PR welcome!
Author
Owner

@y0umu commented on GitHub (Aug 8, 2024):

I did't propose a pull request in the first place because I cannot come to a conclusion on what is the most suitable approach to fetch the locale.

  • The locale of backend is the easy to get, but it does not make sense if it is a public server with plain en_US or C locale .
  • Inferring locale from the text fed to the FPDF module might work, but inferring itself sounds tedious.
  • Inferring from the WebUI language setting is a good idea, but I haven't found related code at this point.
  • Inferring from the request header is also a good idea, but download_chat_as_pdf does not seem to have access to request headers at all.
@y0umu commented on GitHub (Aug 8, 2024): I did't propose a pull request in the first place because I cannot come to a conclusion on what is the most suitable approach to fetch the locale. - The locale of backend is the easy to get, but it does not make sense if it is a public server with plain `en_US` or `C` locale . - Inferring locale from the text fed to the FPDF module might work, but inferring itself sounds tedious. - Inferring from the WebUI language setting is a good idea, but I haven't found related code at this point. - Inferring from the request header is also a good idea, but `download_chat_as_pdf` does not seem to have access to request headers at all.
Author
Owner

@tjbck commented on GitHub (Aug 14, 2024):

what would be the issue of setting it to ["NotoSansKR", "NotoSansJP", "NotoSansSC"] in this order?

@tjbck commented on GitHub (Aug 14, 2024): what would be the issue of setting it to `["NotoSansKR", "NotoSansJP", "NotoSansSC"]` in this order?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#1721