mirror of
https://github.com/open-webui/open-webui.git
synced 2026-03-11 00:04:08 -05:00
Mathematical formulas cannot be displayed normally. #1738
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @MangoFF on GitHub (Aug 9, 2024).
Originally assigned to: @tjbck on GitHub.
@Yanyutin753 commented on GitHub (Aug 9, 2024):
yep,It seems that in the latest dev branch, some special mathematical formulas are not supported, such as the following example.
@Yanyutin753 commented on GitHub (Aug 10, 2024):
@tjbck
In the latest version of dev
Issues
Improvements
Genuine
Hope to continue the feat of the new edition, while fixing the problems mentioned, thank the author again for your selfless dedication 🥳
@Yanyutin753 commented on GitHub (Aug 12, 2024):
I don't know how to fix this real-time rendering, hope everyone to fix this problem⭐
@tjbck commented on GitHub (Aug 14, 2024):
Should be fixed on dev, testing wanted!
@MangoFF commented on GitHub (Aug 14, 2024):
I will test it tomorrow!
@maxkonrad commented on GitHub (Aug 14, 2024):
Would like to test dev but can't run on my pi 4b check #4331
@MangoFF commented on GitHub (Aug 15, 2024):
@MangoFF commented on GitHub (Aug 15, 2024):
sometimes, still have some problem,and some words become red.
@MangoFF commented on GitHub (Aug 15, 2024):
@tjbck
@jukofyork commented on GitHub (Sep 5, 2024):
Yeah, I've got this and think I know what is causing it:
It seems to be that
\))isn't closing the inline markdown block, so whenever the LLM writes a markdown expression inside of some brackets it misses this and then waits until it sees the next closing markdown\)and makes everything red between the two, eg:Is treating:
as the expression, instead of two expressions:
and the red likely means it's come a cross invalid LaTeX syntax.
@jukofyork commented on GitHub (Sep 5, 2024):
It seems the
\\\(.*?\\\)part of the regex in:Is the cause of the problem, but I've no idea why it isn't acting greedy nor what difference the extra bracket afer could make... :/
I've tried Claude Sonnet 3.5, GPT 4-Turbo and GPT 4o to see if they can see the problem and solve it and all just start going round in circles and contradict themselves when asked about this:
I suspect the problem is deeper and relates to the theory that "regular expressions are fundamentally not capable of parsing context-free grammars".
That theory relates to nested structures though and the problem above doesn't seem to have any nesting, but for some reason the extra
)after the\)seems to cause the.*?to not behave "lazily" and I've no idea why...@jukofyork commented on GitHub (Sep 5, 2024):
Absolutely, substituting
\(and\)with unique placeholders can be a practical workaround to sidestep issues associated with distinguishing escaped and non-escaped parentheses. This method essentially simplifies the regex patterns by ensuring that the placeholders do not have syntactic ambiguity within the text. Here’s how you might approach this:Steps Involved:
Choose Unique Placeholders: Select placeholders that are unlikely to appear in the regular text. It could be a combination of non-standard ASCII characters or a rarely-used Unicode character.
Substitution: Replace all instances of
\(with a unique opening placeholder, and\)with a corresponding closing placeholder.Regex Application: Run your regex operations using these placeholders instead of the original bracket characters.
Reverse Substitution: Convert the placeholders back to the original LaTeX brackets after processing.
Example Implementation:
Here's a simple JavaScript example that demonstrates this approach:
Advantages:
Considerations:
This approach works well for circumventing the complications due to syntax mixture in languages like LaTeX embedded in texts and gives more controlled management over how expressions are parsed and handled.
No idea of this will work or why it would make a difference though.
@jukofyork commented on GitHub (Sep 5, 2024):
https://onecompiler.com/javascript/42r9qmmny
Maybe just have 3 like this?
@jukofyork commented on GitHub (Sep 5, 2024):
https://onecompiler.com/javascript/42r9rd8h3
Something like this but tidied up. It still won't handle nesting but at least it will perform "lazy" matches where the
.*?seems to be failing...@jukofyork commented on GitHub (Sep 5, 2024):
I'm absolutely clueless when it comes to Docker so no idea how to test any of this in actual running version :/ Where does this file even live in Docker, how can it be edited and so on are total mysteries - around 50% of the time I think I'm updating the files inside of Docker they don't update, and I couldn't even delete the OpenWebUI Docker data without a complete uninstall of everything to do with Docker, it's Linux user and so on, so I seriously doubt I can fix this even if I knew how... :(
I might be able to patch the JS in the browser using the "inspector" though so will try that later if I get chance.
@jukofyork commented on GitHub (Sep 16, 2024):
OpenAI: o1-preview
USER
Can you look at this code:
It seems to be that
\))isn't closing the inline markdown block, so whenever the LLM writes a markdown expression inside of some brackets it misses this and then waits until it sees the next closing markdown\)and makes everything red between the two, eg:Is treating:
as the expression, instead of two expressions:
It seems the
\\\(.*?\\\)part of the reg-ex and for some reason the extra)after the\)seems to cause the.*?to not behave "lazily".Can you explain exactly why this reg-ex is failing and suggest a fix to handle these specific
\))cases properly? It seems to be a simple problem, but as yet no other LLM has been able to offer a valid solution that wouldn't break other LaTeX syntax parsing for backslashes inside the inline expressions, etc.ASSISTANT
The issue arises due to the way regular expressions handle backtracking, especially with nested patterns and quantifiers like
.*?. In your regex, the\\\((.*?)\\\)pattern is designed to match LaTeX expressions enclosed in\(and\). However, when the LaTeX expression is immediately followed by a closing parenthesis), like in your example, the regex captures that extra)as part of the LaTeX expression.Here's why this happens:
.*?is a non-greedy quantifier, so it matches as few characters as possible before the next part of the pattern (\\\)).|), the regex engine may backtrack and expand the.*?to include more characters, inadvertently including the extra).In your example, the regex matches from the first
\(to the last\), including the extra):Solution:
To prevent the regex from capturing the extra closing parenthesis
), we can use a negative lookahead(?!\))to ensure that the closing\)is not immediately followed by another). Here's the corrected regex:By adding
(?!\))right after\\\), we tell the regex engine to only match\)if it is not followed by a). This prevents the extra)from being included in the match.Explanation:
\\\((.*?)\\\)matches LaTeX expressions enclosed in\(and\).(?!\))is a negative lookahead that asserts that the pattern\)is not followed by).\)and doesn't include any subsequent)characters.Updated Function:
Testing the Fix:
Using the updated regex on your example:
Output:
Now, the LaTeX expressions are correctly converted, and the extra
)is not included in the LaTeX expression, preserving the intended structure.USER
Just to be clear: "we tell the regex engine to only match ) if it is not followed by a )" means this will still work for the case where there is a closing bracket after the expression closing? The way you worded it makes it sound like it will not treat this as a closing?
ASSISTANT
You're absolutely right, and I apologize for any confusion. My previous explanation wasn't accurate, and the proposed solution doesn't handle the case correctly. Let me clarify the issue and provide a better solution.
Understanding the Issue:
The problem arises because the regular expression is matching more than it should due to backtracking caused by the alternation (
|) in the pattern. Specifically, when the LaTeX expression\(Q^T Q = I\))is followed by, \(I + Q\), the regex incorrectly includes the extra), causing it to consume beyond the intended closing\).Here's what's happening step by step:
\\\((.*?)\\\)is supposed to match content between\(and\)..*?is a non-greedy quantifier, so it should match as little as possible..*?to include more characters if the overall match fails, even consuming extra)characters.\(to the last\), including any intervening)characters.Why the Previous Solution Doesn't Work:
Adding
(?!\))after\\\)in the regex (\\\((.*?)\\\)(?!\))) means that the closing\)will only be matched if it's not followed by another). In your case, since the\)is followed by a), the regex fails to match the closing delimiter, causing the entire expression to be incorrectly matched.Correct Solution:
To fix this issue, we need to adjust the regex pattern to ensure that:
Here's how we can modify the regex:
Change the Order of Alternation: Place the patterns in the regex in an order that minimizes backtracking. Specifically, match the LaTeX expressions before the code blocks.
Use Character Classes to Restrict Matching: Instead of
.*?, use a character class that matches any character except the closing delimiter.Avoid Matching Extra Characters: Ensure that the pattern for matching
\(and\)pairs does not include any closing parentheses)that are not part of the LaTeX expression.Updated Regex Pattern:
Here's the corrected regex pattern:
Explanation:
\\\(([\s\S]*?)\\\): Matches LaTeX expressions enclosed in\(and\).([\s\S]*?): Matches any character (including newlines) non-greedily.\)without including extra characters.\\\[([\s\S]*?)\\\]: Matches LaTeX expressions enclosed in\[and\].(`[^\`]*`): Matches inline code enclosed in backticks.[^]instead of.?` to prevent matching backticks inside code blocks.[\s\S]*?: Matches code blocks enclosed in triple backticks.Updated Function:
Testing the Updated Function:
Let's test it with your example:
Expected Output:
Explanation of the Changes:
[\s\S]*?directly inside the LaTeX patterns ensures that we match any character, but the non-greedy quantifier stops at the first\\\).Why This Works:
\)delimiter.*?ensures that the match stops at the first occurrence of\\\).Answer to Your Question:
You're correct, and I apologize for any confusion. My previous wording was misleading, and the proposed negative lookahead
(?!\))would indeed prevent the\)from being matched if it's followed by a), which is not what we want.In the updated solution, we've adjusted the regex to ensure that the closing
\)is always matched, regardless of what follows it. This ensures that LaTeX expressions like\(Q^T Q = I\)are correctly identified and transformed, even if they are immediately followed by additional parentheses or other characters.Key Takeaways:
I hope this clarifies the issue and provides a working solution for your problem. Let me know if you have any more questions or need further assistance!
@jukofyork commented on GitHub (Sep 16, 2024):
OpenAI: o1-minidoesn't get it at all and just gives retarded solutions.Reading through the
OpenAI: o1-previewsolution makes me think that just breaking into 3 separate patterns and running the replace 3 times would quickly prove if it is actually the ordering causing the lazy operation to fail?@jukofyork commented on GitHub (Sep 16, 2024):
@tjbck can we reopen this? The LaTeX parsing is sooo close to working now and this just seems like the last hurdle and it will be working 100%! :)
@a15355447898a commented on GitHub (Jan 20, 2025):
To date, I still frequently encounter issues where mathematical formulas don't render correctly, such as the following examples.
Software like Typora can render these formats perfectly.

Is there any plan to address this issue in open-webui? I'm really looking forward to it!