mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 19:08:59 -05:00
[GH-ISSUE #15360] feat: Enhance A/B Testing #56207
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Azzeo on GitHub (Jun 27, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/15360
Check Existing Issues
Problem Description
There are currently a few ways to evaluate models:
All of these require explicit action from the user and possibly creates selection bias, with the exception if the only model available is the arena model. However, technical users do want to know which model they are using so this is not a proper solution.
Desired Solution you'd like
True A/B testing, where the user gets to see two responses (1 from the chosen model, and 1 from another model). The user does not know which other model is shown and whether the left or right model is the selected model. This is similar how A/B testing is performed at large LLM providers. The user gets asked which answer they prefer. The results are properly saved and added towards evaluation if selected. There should be a skip button in case the user is not interested in reading two answers (and default to the selected model)
There are a few ways A/B testing can be triggered:
It should be a specific option per model where it makes sense. E.g. Let's say you have two models in production:
Let's say there is a new Gemma model that you want to A/B test. The A/B test should only be triggered when using Gemma.
To implement, we can possibly utilise the current double model mode in some way, but hide the tested model from the model selector and randomise where the selected models show up (left/right)
Alternatives Considered
No response
Additional Context
No response