feat: direct llama.cpp integration #620

Closed
opened 2025-11-11 14:27:37 -06:00 by GiteaMirror · 8 comments
Owner

Originally created by @tjbck on GitHub (Apr 10, 2024).

Originally assigned to: @tjbck on GitHub.

Originally created by @tjbck on GitHub (Apr 10, 2024). Originally assigned to: @tjbck on GitHub.
Author
Owner

@jukofyork commented on GitHub (Apr 10, 2024):

Just a quick follow-up to say it seems to work fine:

  • I had to change the llama.cpp server port to 8081 to not clash with OpenWebUI (eg: ./server --port 8081 ...).
  • Then set the OpenAPI base URL to http://127.0.0.1:8081/v1 and the API Key to not be blank (eg: none) in OpenWebUI settings.

and it seems to be calling the OAI-like API endpoint on the llama.cpp server fine. It wasn't that clear I needed to add the /v1 to the URL and ensure the API Key not be blank though (had to find by trial and error).

The only difference I can see is there is no little "information" icon like there was with Ollama models, but it does seem to be calling the OAI-like API endpoint to get these stats:

{
  "tid": "140627543928832",
  "timestamp": 1712766280,
  "level": "INFO",
  "function": "print_timings",
  "line": 313,
  "msg": "prompt eval time     =     129.89 ms /    55 tokens (    2.36 ms per token,   423.43 tokens per second)",
  "id_slot": 0,
  "id_task": 13,
  "t_prompt_processing": 129.892,
  "n_prompt_tokens_processed": 55,
  "t_token": 2.3616727272727274,
  "n_tokens_second": 423.42869460782805
}

I'll report back if I can see any other major differences, but otherwise 👍

@jukofyork commented on GitHub (Apr 10, 2024): Just a quick follow-up to say it seems to work fine: - I had to change the llama.cpp server port to 8081 to not clash with OpenWebUI (eg: `./server --port 8081 ...`). - Then set the OpenAPI base URL to `http://127.0.0.1:8081/v1` and the API Key to not be blank (eg: `none`) in OpenWebUI settings. and it seems to be calling the [OAI-like API](https://github.com/openai/openai-openapi) endpoint on the llama.cpp server fine. It wasn't that clear I needed to add the `/v1` to the URL and ensure the API Key not be blank though (had to find by trial and error). The only difference I can see is there is no little "information" icon like there was with Ollama models, but it does seem to be calling the `OAI-like API` endpoint to get these stats: ``` { "tid": "140627543928832", "timestamp": 1712766280, "level": "INFO", "function": "print_timings", "line": 313, "msg": "prompt eval time = 129.89 ms / 55 tokens ( 2.36 ms per token, 423.43 tokens per second)", "id_slot": 0, "id_task": 13, "t_prompt_processing": 129.892, "n_prompt_tokens_processed": 55, "t_token": 2.3616727272727274, "n_tokens_second": 423.42869460782805 } ``` I'll report back if I can see any other major differences, but otherwise :+1:
Author
Owner

@jukofyork commented on GitHub (Apr 12, 2024):

I've used this quite a bit with llama.cpp server now and the only problem I've come across is pressing the stop button doesn't actually disconnect/stop the generation. This was a problem with the Ollama server and was fixed AFAIK:

https://github.com/open-webui/open-webui/issues/1166
https://github.com/open-webui/open-webui/issues/1170

It would be helpful if this could be added to the OpenAI API code too, as otherwise the only way currently to stop runaway LLMs is to Control-C the running server and restart it.

@jukofyork commented on GitHub (Apr 12, 2024): I've used this quite a bit with llama.cpp server now and the only problem I've come across is pressing the stop button doesn't actually disconnect/stop the generation. This was a problem with the Ollama server and was fixed AFAIK: https://github.com/open-webui/open-webui/issues/1166 https://github.com/open-webui/open-webui/issues/1170 It would be helpful if this could be added to the OpenAI API code too, as otherwise the only way currently to stop runaway LLMs is to Control-C the running server and restart it.
Author
Owner

@jukofyork commented on GitHub (Apr 12, 2024):

Another thing that might be helpful would be to add an option to hide the "Modelfiles" and "Prompts" menu options in the left, as these aren't able to be used with the OpenAI API and just add clutter.

@jukofyork commented on GitHub (Apr 12, 2024): Another thing that might be helpful would be to add an option to hide the "Modelfiles" and "Prompts" menu options in the left, as these aren't able to be used with the OpenAI API and just add clutter.
Author
Owner

@tjbck commented on GitHub (Apr 14, 2024):

@jukofyork I'll start working on this feature after #665, we should strive to keep all the core features.

@tjbck commented on GitHub (Apr 14, 2024): @jukofyork I'll start working on this feature after #665, we should strive to keep all the core features.
Author
Owner

@DenisSergeevitch commented on GitHub (Apr 26, 2024):

Small update: Stop generation button is still an issue

@DenisSergeevitch commented on GitHub (Apr 26, 2024): Small update: Stop generation button is still an issue
Author
Owner

@justinh-rahb commented on GitHub (Apr 26, 2024):

@DenisSergeevitch that is unrelated to the issue being discussed here. Let's keep discussion of the stop generation function here:

@justinh-rahb commented on GitHub (Apr 26, 2024): @DenisSergeevitch that is unrelated to the issue being discussed here. Let's keep discussion of the _stop generation_ function here: - #1568
Author
Owner

@tjbck commented on GitHub (Jun 13, 2024):

Related: https://github.com/open-webui/open-webui/issues/1166

I'm sorry, looks like it was my mistake or something with my setup (reverse proxies?) caused a problem. Can confirm that everything works as expected with the current open webui and ollama docker. :) thanks for the great software

#1568

I've completely ditched Ollama and just moved over to the llama.cpp server and just want to say thanks as it's working really smoothly with Open-WebUI! 👍

@jukofyork @DenisSergeevitch @SN4K3D @0x7CFE

Correct me if I'm wrong but, stop generation button not actually stopping is only an issue when running LLMs with Ollama using CPU only and a vast majority of us face zero issue with terminating the response using a stop button. Could anyone confirm this with the latest? I appreciate it!

@tjbck commented on GitHub (Jun 13, 2024): Related: https://github.com/open-webui/open-webui/issues/1166 > I'm sorry, looks like it was my mistake or something with my setup (reverse proxies?) caused a problem. Can confirm that everything works as expected with the current open webui and ollama docker. :) thanks for the great software #1568 > I've completely ditched Ollama and just moved over to the llama.cpp server and just want to say thanks as it's working really smoothly with Open-WebUI! 👍 @jukofyork @DenisSergeevitch @SN4K3D @0x7CFE Correct me if I'm wrong but, stop generation button not actually stopping is only an issue when running LLMs with Ollama using CPU only and a vast majority of us face zero issue with terminating the response using a stop button. Could anyone confirm this with the latest? I appreciate it!
Author
Owner

@SN4K3D commented on GitHub (Jul 24, 2024):

I confirm the issue it is with ollama.cpp using LLMs CPU only, today i have try with the latest version and the stop button work , this is top all thread ollama can be launch.
Thanks all for your work its appreciated

@SN4K3D commented on GitHub (Jul 24, 2024): I confirm the issue it is with ollama.cpp using LLMs CPU only, today i have try with the latest version and the stop button work , this is top all thread ollama can be launch. Thanks all for your work its appreciated
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#620