[GH-ISSUE #15360] feat: Enhance A/B Testing #17541

Closed
opened 2026-04-19 23:19:18 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @Azzeo on GitHub (Jun 27, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/15360

Check Existing Issues

  • I have searched the existing issues and discussions.

Problem Description

There are currently a few ways to evaluate models:

  1. Using the Arena feature
  2. and/or using thumbs up/down on answers
  3. Adding two models explicitly and using thumbs up/down

All of these require explicit action from the user and possibly creates selection bias, with the exception if the only model available is the arena model. However, technical users do want to know which model they are using so this is not a proper solution.

Desired Solution you'd like

True A/B testing, where the user gets to see two responses (1 from the chosen model, and 1 from another model). The user does not know which other model is shown and whether the left or right model is the selected model. This is similar how A/B testing is performed at large LLM providers. The user gets asked which answer they prefer. The results are properly saved and added towards evaluation if selected. There should be a skip button in case the user is not interested in reading two answers (and default to the selected model)

There are a few ways A/B testing can be triggered:

  1. Random chance of the trigger, which can be modified. Let's say 1% of this occurring per prompt. There should be a limit of 1 per chat.
  2. and /or feature flags. Where a (random/select) group of users are in this sampling pool.

It should be a specific option per model where it makes sense. E.g. Let's say you have two models in production:

  1. Mistral w/ reasoning
  2. Gemma-3 27b

Let's say there is a new Gemma model that you want to A/B test. The A/B test should only be triggered when using Gemma.

To implement, we can possibly utilise the current double model mode in some way, but hide the tested model from the model selector and randomise where the selected models show up (left/right)

Alternatives Considered

No response

Additional Context

No response

Originally created by @Azzeo on GitHub (Jun 27, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/15360 ### Check Existing Issues - [x] I have searched the existing issues and discussions. ### Problem Description There are currently a few ways to evaluate models: 1) Using the Arena feature 2) and/or using thumbs up/down on answers 3) Adding two models explicitly and using thumbs up/down All of these require explicit action from the user and possibly creates selection bias, with the exception if the only model available is the arena model. However, technical users do want to know which model they are using so this is not a proper solution. ### Desired Solution you'd like True A/B testing, where the user gets to see two responses (1 from the chosen model, and 1 from another model). The user does not know which other model is shown and whether the left or right model is the selected model. This is similar how A/B testing is performed at large LLM providers. The user gets asked which answer they prefer. The results are properly saved and added towards evaluation if selected. There should be a skip button in case the user is not interested in reading two answers (and default to the selected model) There are a few ways A/B testing can be triggered: 1. Random chance of the trigger, which can be modified. Let's say 1% of this occurring per prompt. There should be a limit of 1 per chat. 2. and /or feature flags. Where a (random/select) group of users are in this sampling pool. It should be a specific option per model where it makes sense. E.g. Let's say you have two models in production: 1) Mistral w/ reasoning 2) Gemma-3 27b Let's say there is a new Gemma model that you want to A/B test. The A/B test should only be triggered when using Gemma. To implement, we can possibly utilise the current double model mode in some way, but hide the tested model from the model selector and randomise where the selected models show up (left/right) ### Alternatives Considered _No response_ ### Additional Context _No response_
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#17541