bug: Tools aren't working in version 0.4.3 #2784

Closed
opened 2025-11-11 15:14:19 -06:00 by GiteaMirror · 18 comments
Owner

Originally created by @Simi5599 on GitHub (Nov 22, 2024).

Bug Report

Installation Method

Docker

Environment

  • Open WebUI Version: 0.4.3

Confirmation:

  • I have read and followed all the instructions provided in the README.md.
  • I am on the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided the exact steps to reproduce the bug in the "Steps to Reproduce" section below.

Expected Behavior:

When asking something to the model that requires a tool, the tool should activate

Actual Behavior:

The tool never activates, even if enabled for the chat. I have downgraded to 0.4.2 and tools are working again.

Description

Reproduction Details

Steps to Reproduce:
0) Be sure that you have a tool that worked in version 0.4.2 (For example: "Run Code" by EtiennePerrot

  1. Ask the lmm something

  2. The LMM won't activate the tool

Additional Information

Definitely can confirm that this is introduced in the 0.4.3 version.

Originally created by @Simi5599 on GitHub (Nov 22, 2024). # Bug Report ## Installation Method Docker ## Environment - **Open WebUI Version:** 0.4.3 **Confirmation:** - [x] I have read and followed all the instructions provided in the README.md. - [x] I am on the latest version of both Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have provided the exact steps to reproduce the bug in the "Steps to Reproduce" section below. ## Expected Behavior: When asking something to the model that requires a tool, the tool should activate ## Actual Behavior: The tool never activates, even if enabled for the chat. I have downgraded to 0.4.2 and tools are working again. ## Description ## Reproduction Details **Steps to Reproduce:** 0) Be sure that you have a tool that worked in version 0.4.2 (For example: "Run Code" by EtiennePerrot 1) Ask the lmm something 2) The LMM won't activate the tool ## Additional Information Definitely can confirm that this is introduced in the 0.4.3 version.
Author
Owner

@tjbck commented on GitHub (Nov 22, 2024):

image

Definitely works on my end, we might need more details here.

@tjbck commented on GitHub (Nov 22, 2024): ![image](https://github.com/user-attachments/assets/fbafb21d-719c-4a34-ae6b-ba8593a97b74) Definitely works on my end, we might need more details here.
Author
Owner

@Simi5599 commented on GitHub (Nov 22, 2024):

This is strange. Will update with a comment soon when I am in front of my laptop by attaching logs

@Simi5599 commented on GitHub (Nov 22, 2024): This is strange. Will update with a comment soon when I am in front of my laptop by attaching logs
Author
Owner

@Simi5599 commented on GitHub (Nov 22, 2024):

So i have updates, the "Run code" tools works fine it was just a coincidence.
But the web search tool (not the one included by default in Open WebUI) does not work in any case. (in fact i did notice this issue by using the web search)
By reading the Docker logs i see that the error is related to the event_emitter.
Maybe this is a deprecation from Open WebUI?

Browser Console Logs:
Nothing relevant

Docker Container Logs:
I have got this error:

INFO  [open_webui.apps.openai.main] get_all_models()
/usr/local/lib/python3.11/site-packages/pydantic/main.py:1552: RuntimeWarning: fields may not start with an underscore, ignoring "__event_emitter__"
  warnings.warn(f'fields may not start with an underscore, ignoring "{f_name}"', RuntimeWarning)
ERROR [open_webui.main] Fields must not use names with leading underscores; e.g., use 'event_emitter__' instead of '__event_emitter__'.
Traceback (most recent call last):
  File "/app/backend/open_webui/main.py", line 679, in dispatch
    body, flags = await chat_completion_tools_handler(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/backend/open_webui/main.py", line 391, in chat_completion_tools_handler
    tools = get_tools(
            ^^^^^^^^^^
  File "/app/backend/open_webui/utils/tools.py", line 77, in get_tools
    "pydantic_model": function_to_pydantic_model(callable),
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/backend/open_webui/utils/tools.py", line 148, in function_to_pydantic_model
    return create_model(func.__name__, **field_defs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pydantic/main.py", line 1600, in create_model
    return meta(
           ^^^^^
  File "/usr/local/lib/python3.11/site-packages/pydantic/_internal/_model_construction.py", line 115, in __new__
    private_attributes = inspect_namespace(
                         ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pydantic/_internal/_model_construction.py", line 403, in inspect_namespace
    raise NameError(
NameError: Fields must not use names with leading underscores; e.g., use 'event_emitter__' instead of '__event_emitter__'.
generate_queries
gpt-4o-mini-2024-07-18
INFO:     151.57.200.225:0 - "POST /ws/socket.io/?EIO=4&transport=polling&t=PDLDqGJ&sid=7LZ6KPgpH6wotc46AAAC HTTP/1.1" 200 OK
INFO:     151.57.200.225:0 - "GET /ws/socket.io/?EIO=4&transport=polling&t=PDLDna3&sid=7LZ6KPgpH6wotc46AAAC HTTP/1.1" 200 OK
@Simi5599 commented on GitHub (Nov 22, 2024): So i have updates, the "Run code" tools works fine it was just a coincidence. But the web search tool (not the one included by default in Open WebUI) does not work in any case. (in fact i did notice this issue by using the web search) By reading the Docker logs i see that the error is related to the event_emitter. Maybe this is a deprecation from Open WebUI? **Browser Console Logs:** Nothing relevant **Docker Container Logs:** I have got this error: ``` INFO [open_webui.apps.openai.main] get_all_models() /usr/local/lib/python3.11/site-packages/pydantic/main.py:1552: RuntimeWarning: fields may not start with an underscore, ignoring "__event_emitter__" warnings.warn(f'fields may not start with an underscore, ignoring "{f_name}"', RuntimeWarning) ERROR [open_webui.main] Fields must not use names with leading underscores; e.g., use 'event_emitter__' instead of '__event_emitter__'. Traceback (most recent call last): File "/app/backend/open_webui/main.py", line 679, in dispatch body, flags = await chat_completion_tools_handler( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/backend/open_webui/main.py", line 391, in chat_completion_tools_handler tools = get_tools( ^^^^^^^^^^ File "/app/backend/open_webui/utils/tools.py", line 77, in get_tools "pydantic_model": function_to_pydantic_model(callable), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/backend/open_webui/utils/tools.py", line 148, in function_to_pydantic_model return create_model(func.__name__, **field_defs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pydantic/main.py", line 1600, in create_model return meta( ^^^^^ File "/usr/local/lib/python3.11/site-packages/pydantic/_internal/_model_construction.py", line 115, in __new__ private_attributes = inspect_namespace( ^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pydantic/_internal/_model_construction.py", line 403, in inspect_namespace raise NameError( NameError: Fields must not use names with leading underscores; e.g., use 'event_emitter__' instead of '__event_emitter__'. generate_queries gpt-4o-mini-2024-07-18 INFO: 151.57.200.225:0 - "POST /ws/socket.io/?EIO=4&transport=polling&t=PDLDqGJ&sid=7LZ6KPgpH6wotc46AAAC HTTP/1.1" 200 OK INFO: 151.57.200.225:0 - "GET /ws/socket.io/?EIO=4&transport=polling&t=PDLDna3&sid=7LZ6KPgpH6wotc46AAAC HTTP/1.1" 200 OK ```
Author
Owner

@tjbck commented on GitHub (Nov 22, 2024):

Logs above aren't related. Perhaps @michaelpoluektov could chime in.

@tjbck commented on GitHub (Nov 22, 2024): Logs above aren't related. Perhaps @michaelpoluektov could chime in.
Author
Owner

@Simi5599 commented on GitHub (Nov 22, 2024):

Could we do something like this?

for name, param in parameters.items():
# Rename parameters that starts with __
if name.startswith('__'): 
    name = 'param_' + name[2:]  # Remove __ and add "param_"     

This looks like a Pydantic restriction with the fields name

@Simi5599 commented on GitHub (Nov 22, 2024): Could we do something like this? for name, param in parameters.items(): # Rename parameters that starts with __ if name.startswith('__'): name = 'param_' + name[2:] # Remove __ and add "param_" This looks like a Pydantic restriction with the fields name
Author
Owner

@tjbck commented on GitHub (Nov 22, 2024):

reserved params should NOT be added to the function spec, dropping reserved params is intended here.

@tjbck commented on GitHub (Nov 22, 2024): reserved params should NOT be added to the function spec, dropping reserved params is intended here.
Author
Owner

@Simi5599 commented on GitHub (Nov 22, 2024):

So basically we could do something like this

if name.startswith('__'):
    continue

And the Pydantic model should be created (?)

@Simi5599 commented on GitHub (Nov 22, 2024): So basically we could do something like this ``` if name.startswith('__'): continue ``` And the Pydantic model should be created (?)
Author
Owner

@Simi5599 commented on GitHub (Nov 22, 2024):

Can confirm this works on my side and does not break tools.

Screenshot 2024-11-22 205111

If a simple PR is welcome i can do this without any issues :)

@Simi5599 commented on GitHub (Nov 22, 2024): Can confirm this works on my side and does not break tools. ![Screenshot 2024-11-22 205111](https://github.com/user-attachments/assets/8eb0252b-e968-4fd9-bb7a-0ffb72dbfd00) If a simple PR is welcome i can do this without any issues :)
Author
Owner

@michaelpoluektov commented on GitHub (Nov 22, 2024):

I'm pretty sure everything works as intended here (apart from maybe the fact that it's not failing fast enough, but that's a separate issue)

You should not be using reserved params (params starting with "__") in your tool definitions, unless they're part of OWUI (like __event_emitter__, __event_call__ etc.)

If this isn't what's happening, feel free to attach your toolkit (or a dummy with the same type spec) and I'll look into it.

@michaelpoluektov commented on GitHub (Nov 22, 2024): I'm pretty sure everything works as intended here (apart from _maybe_ the fact that it's not failing fast enough, but that's a separate issue) You should not be using reserved params (params starting with "__") in your tool definitions, unless they're part of OWUI (like `__event_emitter__`, `__event_call__` etc.) If this isn't what's happening, feel free to attach your toolkit (or a dummy with the same type spec) and I'll look into it.
Author
Owner

@Simi5599 commented on GitHub (Nov 22, 2024):

I am not, the only part of my code where i am using this kind of parameter is the following

async def search_web(self, query: str, __event_emitter__: Callable[[dict], Any] = None)

Anyways i think that the simple IF that i wrote should do the trick.
Maybe we can refine it, but on my end, the problem was solved

@Simi5599 commented on GitHub (Nov 22, 2024): I am not, the only part of my code where i am using this kind of parameter is the following ``async def search_web(self, query: str, __event_emitter__: Callable[[dict], Any] = None)`` Anyways i think that the simple IF that i wrote should do the trick. Maybe we can refine it, but on my end, the problem was solved
Author
Owner

@michaelpoluektov commented on GitHub (Nov 22, 2024):

It works for me, could you include your tool? Here's my type signature:

    async def get_documents_by_id(
        self, ids: list[int], __event_emitter__: Callable[[dict], Awaitable] = None
    ) -> str:
@michaelpoluektov commented on GitHub (Nov 22, 2024): It works for me, could you include your tool? Here's my type signature: ```python async def get_documents_by_id( self, ids: list[int], __event_emitter__: Callable[[dict], Awaitable] = None ) -> str: ```
Author
Owner

@Simi5599 commented on GitHub (Nov 22, 2024):

Deleted my previous comment, i was able to get the code by my cloud resource using my phone (sorry if the code is not formatted very well)

Importing libs

import os
import requests
from datetime import datetime
import json
from requests import get
from bs4 import BeautifulSoup
import concurrent.futures
from html.parser import HTMLParser
from urllib.parse import urlparse, urljoin
import re
import unicodedata
from pydantic import BaseModel, Field
import asyncio
from typing import Callable, Any

class HelpFunctions:
"""Funzioni di supporto per il sistema di ricerca."""

def __init__(self):
    """Costruttore della classe HelpFunctions."""
    pass

def get_base_url(self, url: str) -> str:
    """Estrae l'URL di base partendo da un URL completo.

    :param url: L'URL completo.
    :return: base_url: L'URL di base (schema + dominio).
    """
    parsed_url = urlparse(url)
    base_url = f"{parsed_url.scheme}://{parsed_url.netloc}"
    return base_url

def generate_excerpt(self, content: str, max_length: int = 200) -> str:
    """Genera un estratto di testo limitato a una certa lunghezza.

    :param content: Il contenuto originale.
    :param max_length: La lunghezza massima dell'estratto.
    :return: L'estratto di testo (stringa).
    """
    return content[:max_length] + "..." if len(content) > max_length else content

def format_text(self, original_text: str) -> str:
    """Formatta il testo originale rimuovendo HTML e normalizzandolo.

    :param original_text: Testo HTML originale.
    :return: Testo formattato e normalizzato.
    """
    soup = BeautifulSoup(original_text, "html.parser")
    formatted_text = soup.get_text(separator=" ", strip=True)
    formatted_text = unicodedata.normalize("NFKC", formatted_text)
    formatted_text = re.sub(r"\s+", " ", formatted_text)
    formatted_text = formatted_text.strip()
    formatted_text = self.remove_emojis(formatted_text)
    return formatted_text

def remove_emojis(self, text: str) -> str:
    """Rimuove gli emoji da un testo.

    :param text: Il testo originale.
    :return: Il testo senza emoji.
    """
    return "".join(c for c in text if not unicodedata.category(c).startswith("So"))

def process_search_result(self, result: dict, valves: "Tools.Valves") -> dict:
    """Elabora il risultato di una ricerca estraendo e filtrando i dati.

    :param result: Il risultato della ricerca contenente 'title', 'url' e 'content'.
    :param valves: Oggetto contenente le impostazioni come IGNORED_WEBSITES e PAGE_CONTENT_WORDS_LIMIT.
    :return: Un dizionario con le informazioni elaborate o None se il sito viene ignorato o c'è un errore.
    """
    title_site = self.remove_emojis(result["title"])
    url_site = result["url"]

    # Estensioni di file da escludere dall'elaborazione
    excluded_extensions = [
        '.pdf', '.doc', '.docx', '.xls', '.xlsx', '.ppt', '.pptx',
        '.zip', '.rar', '.tar', '.gz', '.7z', '.json', '.xml',
        '.txt', '.csv', '.exe', '.jpg', '.jpeg', '.png', '.gif',
        '.bmp', '.ico', '.svg',  # Immagini e file grafici
        '.mp3', '.wav', '.mp4', '.avi', '.mov',  # Audio e video
    ]

    # Controlla se l'URL finisce con una delle estensioni escluse
    if any(url_site.endswith(ext) for ext in excluded_extensions):
        return None  # Scarta il link a file non supportati

    snippet = result.get("content", "")
    if valves.IGNORED_WEBSITES:
        base_url = self.get_base_url(url_site)
        if any(
            ignored_site.strip() in base_url
            for ignored_site in valves.IGNORED_WEBSITES.split(",")
        ):
            return None
    try:
        response_site = requests.get(url_site, timeout=20)
        response_site.raise_for_status()
        html_content = response_site.text
        soup = BeautifulSoup(html_content, "html.parser")
        content_site = self.format_text(soup.get_text(separator=" ", strip=True))
        truncated_content = self.truncate_to_n_words(
            content_site, valves.PAGE_CONTENT_WORDS_LIMIT
        )
        return {
            "title": title_site,
            "url": url_site,
            "content": truncated_content,
            "snippet": self.remove_emojis(snippet),
        }
    except requests.exceptions.RequestException:
        return None

def truncate_to_n_words(self, text: str, token_limit: int) -> str:
    """Tronca il testo a un numero specifico di parole.

    :param text: Il testo da troncare.
    :param token_limit: Il limite di parole.
    :return: Il testo troncato.
    """
    tokens = text.split()
    truncated_tokens = tokens[:token_limit]
    return " ".join(truncated_tokens)

class EventEmitter:
"""Interfaccia di comunicazione con il front-end."""

def __init__(self, event_emitter: Callable[[dict], Any] = None):
    """Costruttore della classe EventEmitter.

    :param event_emitter: Funzione di callback per emettere eventi.
    """
    self.event_emitter = event_emitter

async def emit(
    self,
    description: str = "Unknown State",
    status: str = "in_progress",
    done: bool = False,
):
    """Emette un evento di stato.

    :param description: Descrizione dello stato dell'evento.
    :param status: Stato dell'evento.
    :param done: Indica se l'operazione è completata.
    """
    if self.event_emitter:
        await self.event_emitter(
            {
                "type": "status",
                "data": {
                    "status": status,
                    "description": description,
                    "done": done,
                },
            }
        )

class Tools:
class Valves(BaseModel):
"""Variabili del sistema di ricerca."""

    SEARXNG_ENGINE_API_BASE_URL: str = Field(
        default="https://example.com/search",
        description="The base URL for Search Engine",
    )
    IGNORED_WEBSITES: str = Field(
        default="",
        description="Comma-separated list of websites to ignore",
    )
    RETURNED_SCRAPPED_PAGES_NO: int = Field(
        default=3,
        description="The number of Search Engine Results to Parse",
    )
    SCRAPPED_PAGES_NO: int = Field(
        default=5,
        description="Total pages scrapped. Ideally greater than one of the returned pages",
    )
    PAGE_CONTENT_WORDS_LIMIT: int = Field(
        default=5000,
        description="Limit words content for each page.",
    )
    CITATION_LINKS: bool = Field(
        default=False,
        description="If True, send custom citations with links",
    )

def __init__(self):
    """Costruttore della classe Tools."""
    self.valves = self.Valves()
    self.headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"
    }

async def search_web(
    self, query: str, __event_emitter__: Callable[[dict], Any] = None
) -> str:
    """
    Effettua una ricerca sul web e restituisce il contenuto delle pagine pertinenti.

    Questa funzione è progettata per eseguire ricerche in modo continuo e approfondito su una vasta gamma di argomenti,
    inclusi ma non limitati a: informazioni sconosciute, notizie, informazioni di contatto pubbliche, meteo,
    attività lavorative, aggiornamenti recenti, versioni di software, informazioni specifiche,
    precisione dei dati, tendenze di mercato, recensioni, normative, innovazioni tecnologiche, eventi storici,
    statistiche, cultura e intrattenimento, salute e benessere, sostenibilità, finanza, viaggi, formazione,
    sport, tecniche, politica, economia globale, social media, privacy, start-up, arte, cibo, comunicazione,
    stili di vita, attivismo, educazione finanziaria, attività ricreative, psicologia, pianificazione familiare
    e guide aggiornate su interventi tecnici.
    Si deve sempre cercare attivamente informazioni pertinenti e aggiornate; se non vengono trovati risultati sufficienti,
    viene attivata una ricerca di fallback.

    :param query: La query da utilizzare nel motore di ricerca.
    :param __event_emitter__: Funzione di callback per emettere eventi (opzionale).
    :return: Il contenuto delle pagine in formato JSON.
    """
    functions = (
        HelpFunctions()
    )  # Instanza della classe di supporto per le funzioni ausiliarie
    emitter = EventEmitter(__event_emitter__)  # Inizializza l'emettitore di eventi
    await emitter.emit(f"Initiating web search for: {query}")
    search_engine_url = self.valves.SEARXNG_ENGINE_API_BASE_URL
    if self.valves.RETURNED_SCRAPPED_PAGES_NO > self.valves.SCRAPPED_PAGES_NO:
        self.valves.RETURNED_SCRAPPED_PAGES_NO = self.valves.SCRAPPED_PAGES_NO
    params = {
        "q": query,
        "format": "json",
        "number_of_results": self.valves.RETURNED_SCRAPPED_PAGES_NO,
    }
    try:
        await emitter.emit("Sending request to search engine")
        resp = requests.get(
            search_engine_url, params=params, headers=self.headers, timeout=120
        )
        resp.raise_for_status()
        data = resp.json()
        results = data.get("results", [])

        # Controllo se ho risultati...
        if not results:
            await emitter.emit(
                "Not enough results found, attempting a fallback search"
            )
            # ... se non ho nessun risultato, eseguo la ricerca di fallback
            fallback_query = query + " additional keywords"  # Modifica la query
            return await self.fallback_search(fallback_query, __event_emitter__)

        limited_results = results[: self.valves.SCRAPPED_PAGES_NO]
        await emitter.emit(f"Retrieved {len(limited_results)} search results")

    except requests.exceptions.RequestException as e:
        await emitter.emit(
            status="error",
            description=f"Error during search: {str(e)}",
            done=True,
        )
        return json.dumps({"error": str(e)})

    results_json = []
    if limited_results:
        await emitter.emit("Processing search results")
        with concurrent.futures.ThreadPoolExecutor() as executor:
            futures = [
                executor.submit(
                    functions.process_search_result, result, self.valves
                )
                for result in limited_results
            ]
            for future in concurrent.futures.as_completed(futures):
                result_json = future.result()
                if result_json:
                    try:
                        json.dumps(result_json)
                        results_json.append(result_json)
                    except (TypeError, ValueError):
                        continue
                if len(results_json) >= self.valves.RETURNED_SCRAPPED_PAGES_NO:
                    break
        results_json = results_json[: self.valves.RETURNED_SCRAPPED_PAGES_NO]
        if self.valves.CITATION_LINKS and __event_emitter__:
            for result in results_json:
                await __event_emitter__(
                    {
                        "type": "citation",
                        "data": {
                            "document": [result["content"]],
                            "metadata": [{"source": result["url"]}],
                            "source": {"name": result["title"]},
                        },
                    }
                )
    await emitter.emit(
        status="complete",
        description=f"Web search completed. Retrieved content from {len(results_json)} pages",
        done=True,
    )
    return json.dumps(results_json, ensure_ascii=False)

async def fallback_search(
    self, query: str, __event_emitter__: Callable[[dict], Any] = None
) -> str:
    """Effettua una ricerca di fallback in caso di risultati insufficienti.

    Questa funzione viene invocata quando la ricerca principale non restituisce un numero sufficiente di risultati.
    Essa modifica la query di ricerca e ripete la richiesta per cercare risultati alternativi.

    :param query: La query di fallback da utilizzare nel motore di ricerca.
    :param __event_emitter__: Funzione di callback per emettere eventi (opzionale).
    :return: Il contenuto delle pagine in formato JSON, risultante dalla ricerca di fallback.
    """
    functions = HelpFunctions()  # Instanza della classe di supporto
    emitter = EventEmitter(__event_emitter__)  # Inizializza l'emettitore di eventi
    await emitter.emit(f"Initiating fallback search for: {query}")

    search_engine_url = self.valves.SEARXNG_ENGINE_API_BASE_URL
    params = {
        "q": query,
        "format": "json",
        "number_of_results": self.valves.RETURNED_SCRAPPED_PAGES_NO,
    }

    try:
        await emitter.emit("Sending fallback request to search engine")
        resp = requests.get(
            search_engine_url, params=params, headers=self.headers, timeout=120
        )
        resp.raise_for_status()
        data = resp.json()
        results = data.get("results", [])

        if not results:
            await emitter.emit("No results found in fallback search.")
            return json.dumps({"message": "No results found after fallback."})

        limited_results = results[: self.valves.SCRAPPED_PAGES_NO]
        await emitter.emit(
            f"Retrieved {len(limited_results)} fallback search results"
        )

    except requests.exceptions.RequestException as e:
        await emitter.emit(
            status="error",
            description=f"Error during fallback search: {str(e)}",
            done=True,
        )
        return json.dumps({"error": str(e)})

    # Elabora i risultati di fallback...
    return await self.process_results(limited_results)

async def get_website(
    self, url: str, __event_emitter__: Callable[[dict], Any] = None
) -> str:
    """
    Effettua lo scraping del sito web fornito e ne recupera il contenuto.
    Questa funzione è essenziale per permettere al modello di aprire e analizzare siti web quando riceve un URL dall'utente.

    :param url: L'URL del sito web da scrappare.
    :param __event_emitter__: Funzione di callback per emettere eventi (opzionale).
    :return: Il contenuto del sito web in formato JSON.
    """
    functions = HelpFunctions()  # Instanza della classe di supporto
    emitter = EventEmitter(__event_emitter__)  # Inizializza l'emettitore di eventi
    await emitter.emit(f"Fetching content from URL: {url}")
    results_json = []
    try:
        response_site = requests.get(url, headers=self.headers, timeout=120)
        response_site.raise_for_status()
        html_content = response_site.text
        await emitter.emit("Parsing website content")
        soup = BeautifulSoup(html_content, "html.parser")
        page_title = soup.title.string if soup.title else "No title found"
        page_title = unicodedata.normalize("NFKC", page_title.strip())
        page_title = functions.remove_emojis(page_title)
        title_site = page_title
        url_site = url
        content_site = functions.format_text(
            soup.get_text(separator=" ", strip=True)
        )
        truncated_content = functions.truncate_to_n_words(
            content_site, self.valves.PAGE_CONTENT_WORDS_LIMIT
        )
        result_site = {
            "title": title_site,
            "url": url_site,
            "content": truncated_content,
            "excerpt": functions.generate_excerpt(content_site),
        }
        results_json.append(result_site)
        if self.valves.CITATION_LINKS and __event_emitter__:
            await __event_emitter__(
                {
                    "type": "citation",
                    "data": {
                        "document": [truncated_content],
                        "metadata": [{"source": url_site}],
                        "source": {"name": title_site},
                    },
                }
            )
        await emitter.emit(
            status="complete",
            description="Website content retrieved and processed successfully",
            done=True,
        )
    except requests.exceptions.RequestException as e:
        results_json.append(
            {
                "url": url,
                "content": f"Failed to retrieve the page. Error: {str(e)}",
            }
        )
        await emitter.emit(
            status="error",
            description=f"Error fetching website content: {str(e)}",
            done=True,
        )
    return json.dumps(results_json, ensure_ascii=False)

async def process_results(self, results: list) -> str:
    """Processa i risultati ottenuti.

    Questa funzione elabora la lista di risultati e restituisce il contenuto in formato JSON.

    :param results: Lista di risultati da elaborare.
    :return: I risultati elaborati in formato JSON.
    """
    results_json = []
    for result in results:
        # Implementa la logica di elaborazione dei risultati...
        results_json.append(result)  # Aggiungi il risultato elaborato alla lista
    return json.dumps(results_json, ensure_ascii=False)
@Simi5599 commented on GitHub (Nov 22, 2024): Deleted my previous comment, i was able to get the code by my cloud resource using my phone (sorry if the code is not formatted very well) # Importing libs import os import requests from datetime import datetime import json from requests import get from bs4 import BeautifulSoup import concurrent.futures from html.parser import HTMLParser from urllib.parse import urlparse, urljoin import re import unicodedata from pydantic import BaseModel, Field import asyncio from typing import Callable, Any class HelpFunctions: """Funzioni di supporto per il sistema di ricerca.""" def __init__(self): """Costruttore della classe HelpFunctions.""" pass def get_base_url(self, url: str) -> str: """Estrae l'URL di base partendo da un URL completo. :param url: L'URL completo. :return: base_url: L'URL di base (schema + dominio). """ parsed_url = urlparse(url) base_url = f"{parsed_url.scheme}://{parsed_url.netloc}" return base_url def generate_excerpt(self, content: str, max_length: int = 200) -> str: """Genera un estratto di testo limitato a una certa lunghezza. :param content: Il contenuto originale. :param max_length: La lunghezza massima dell'estratto. :return: L'estratto di testo (stringa). """ return content[:max_length] + "..." if len(content) > max_length else content def format_text(self, original_text: str) -> str: """Formatta il testo originale rimuovendo HTML e normalizzandolo. :param original_text: Testo HTML originale. :return: Testo formattato e normalizzato. """ soup = BeautifulSoup(original_text, "html.parser") formatted_text = soup.get_text(separator=" ", strip=True) formatted_text = unicodedata.normalize("NFKC", formatted_text) formatted_text = re.sub(r"\s+", " ", formatted_text) formatted_text = formatted_text.strip() formatted_text = self.remove_emojis(formatted_text) return formatted_text def remove_emojis(self, text: str) -> str: """Rimuove gli emoji da un testo. :param text: Il testo originale. :return: Il testo senza emoji. """ return "".join(c for c in text if not unicodedata.category(c).startswith("So")) def process_search_result(self, result: dict, valves: "Tools.Valves") -> dict: """Elabora il risultato di una ricerca estraendo e filtrando i dati. :param result: Il risultato della ricerca contenente 'title', 'url' e 'content'. :param valves: Oggetto contenente le impostazioni come IGNORED_WEBSITES e PAGE_CONTENT_WORDS_LIMIT. :return: Un dizionario con le informazioni elaborate o None se il sito viene ignorato o c'è un errore. """ title_site = self.remove_emojis(result["title"]) url_site = result["url"] # Estensioni di file da escludere dall'elaborazione excluded_extensions = [ '.pdf', '.doc', '.docx', '.xls', '.xlsx', '.ppt', '.pptx', '.zip', '.rar', '.tar', '.gz', '.7z', '.json', '.xml', '.txt', '.csv', '.exe', '.jpg', '.jpeg', '.png', '.gif', '.bmp', '.ico', '.svg', # Immagini e file grafici '.mp3', '.wav', '.mp4', '.avi', '.mov', # Audio e video ] # Controlla se l'URL finisce con una delle estensioni escluse if any(url_site.endswith(ext) for ext in excluded_extensions): return None # Scarta il link a file non supportati snippet = result.get("content", "") if valves.IGNORED_WEBSITES: base_url = self.get_base_url(url_site) if any( ignored_site.strip() in base_url for ignored_site in valves.IGNORED_WEBSITES.split(",") ): return None try: response_site = requests.get(url_site, timeout=20) response_site.raise_for_status() html_content = response_site.text soup = BeautifulSoup(html_content, "html.parser") content_site = self.format_text(soup.get_text(separator=" ", strip=True)) truncated_content = self.truncate_to_n_words( content_site, valves.PAGE_CONTENT_WORDS_LIMIT ) return { "title": title_site, "url": url_site, "content": truncated_content, "snippet": self.remove_emojis(snippet), } except requests.exceptions.RequestException: return None def truncate_to_n_words(self, text: str, token_limit: int) -> str: """Tronca il testo a un numero specifico di parole. :param text: Il testo da troncare. :param token_limit: Il limite di parole. :return: Il testo troncato. """ tokens = text.split() truncated_tokens = tokens[:token_limit] return " ".join(truncated_tokens) class EventEmitter: """Interfaccia di comunicazione con il front-end.""" def __init__(self, event_emitter: Callable[[dict], Any] = None): """Costruttore della classe EventEmitter. :param event_emitter: Funzione di callback per emettere eventi. """ self.event_emitter = event_emitter async def emit( self, description: str = "Unknown State", status: str = "in_progress", done: bool = False, ): """Emette un evento di stato. :param description: Descrizione dello stato dell'evento. :param status: Stato dell'evento. :param done: Indica se l'operazione è completata. """ if self.event_emitter: await self.event_emitter( { "type": "status", "data": { "status": status, "description": description, "done": done, }, } ) class Tools: class Valves(BaseModel): """Variabili del sistema di ricerca.""" SEARXNG_ENGINE_API_BASE_URL: str = Field( default="https://example.com/search", description="The base URL for Search Engine", ) IGNORED_WEBSITES: str = Field( default="", description="Comma-separated list of websites to ignore", ) RETURNED_SCRAPPED_PAGES_NO: int = Field( default=3, description="The number of Search Engine Results to Parse", ) SCRAPPED_PAGES_NO: int = Field( default=5, description="Total pages scrapped. Ideally greater than one of the returned pages", ) PAGE_CONTENT_WORDS_LIMIT: int = Field( default=5000, description="Limit words content for each page.", ) CITATION_LINKS: bool = Field( default=False, description="If True, send custom citations with links", ) def __init__(self): """Costruttore della classe Tools.""" self.valves = self.Valves() self.headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3" } async def search_web( self, query: str, __event_emitter__: Callable[[dict], Any] = None ) -> str: """ Effettua una ricerca sul web e restituisce il contenuto delle pagine pertinenti. Questa funzione è progettata per eseguire ricerche in modo continuo e approfondito su una vasta gamma di argomenti, inclusi ma non limitati a: informazioni sconosciute, notizie, informazioni di contatto pubbliche, meteo, attività lavorative, aggiornamenti recenti, versioni di software, informazioni specifiche, precisione dei dati, tendenze di mercato, recensioni, normative, innovazioni tecnologiche, eventi storici, statistiche, cultura e intrattenimento, salute e benessere, sostenibilità, finanza, viaggi, formazione, sport, tecniche, politica, economia globale, social media, privacy, start-up, arte, cibo, comunicazione, stili di vita, attivismo, educazione finanziaria, attività ricreative, psicologia, pianificazione familiare e guide aggiornate su interventi tecnici. Si deve sempre cercare attivamente informazioni pertinenti e aggiornate; se non vengono trovati risultati sufficienti, viene attivata una ricerca di fallback. :param query: La query da utilizzare nel motore di ricerca. :param __event_emitter__: Funzione di callback per emettere eventi (opzionale). :return: Il contenuto delle pagine in formato JSON. """ functions = ( HelpFunctions() ) # Instanza della classe di supporto per le funzioni ausiliarie emitter = EventEmitter(__event_emitter__) # Inizializza l'emettitore di eventi await emitter.emit(f"Initiating web search for: {query}") search_engine_url = self.valves.SEARXNG_ENGINE_API_BASE_URL if self.valves.RETURNED_SCRAPPED_PAGES_NO > self.valves.SCRAPPED_PAGES_NO: self.valves.RETURNED_SCRAPPED_PAGES_NO = self.valves.SCRAPPED_PAGES_NO params = { "q": query, "format": "json", "number_of_results": self.valves.RETURNED_SCRAPPED_PAGES_NO, } try: await emitter.emit("Sending request to search engine") resp = requests.get( search_engine_url, params=params, headers=self.headers, timeout=120 ) resp.raise_for_status() data = resp.json() results = data.get("results", []) # Controllo se ho risultati... if not results: await emitter.emit( "Not enough results found, attempting a fallback search" ) # ... se non ho nessun risultato, eseguo la ricerca di fallback fallback_query = query + " additional keywords" # Modifica la query return await self.fallback_search(fallback_query, __event_emitter__) limited_results = results[: self.valves.SCRAPPED_PAGES_NO] await emitter.emit(f"Retrieved {len(limited_results)} search results") except requests.exceptions.RequestException as e: await emitter.emit( status="error", description=f"Error during search: {str(e)}", done=True, ) return json.dumps({"error": str(e)}) results_json = [] if limited_results: await emitter.emit("Processing search results") with concurrent.futures.ThreadPoolExecutor() as executor: futures = [ executor.submit( functions.process_search_result, result, self.valves ) for result in limited_results ] for future in concurrent.futures.as_completed(futures): result_json = future.result() if result_json: try: json.dumps(result_json) results_json.append(result_json) except (TypeError, ValueError): continue if len(results_json) >= self.valves.RETURNED_SCRAPPED_PAGES_NO: break results_json = results_json[: self.valves.RETURNED_SCRAPPED_PAGES_NO] if self.valves.CITATION_LINKS and __event_emitter__: for result in results_json: await __event_emitter__( { "type": "citation", "data": { "document": [result["content"]], "metadata": [{"source": result["url"]}], "source": {"name": result["title"]}, }, } ) await emitter.emit( status="complete", description=f"Web search completed. Retrieved content from {len(results_json)} pages", done=True, ) return json.dumps(results_json, ensure_ascii=False) async def fallback_search( self, query: str, __event_emitter__: Callable[[dict], Any] = None ) -> str: """Effettua una ricerca di fallback in caso di risultati insufficienti. Questa funzione viene invocata quando la ricerca principale non restituisce un numero sufficiente di risultati. Essa modifica la query di ricerca e ripete la richiesta per cercare risultati alternativi. :param query: La query di fallback da utilizzare nel motore di ricerca. :param __event_emitter__: Funzione di callback per emettere eventi (opzionale). :return: Il contenuto delle pagine in formato JSON, risultante dalla ricerca di fallback. """ functions = HelpFunctions() # Instanza della classe di supporto emitter = EventEmitter(__event_emitter__) # Inizializza l'emettitore di eventi await emitter.emit(f"Initiating fallback search for: {query}") search_engine_url = self.valves.SEARXNG_ENGINE_API_BASE_URL params = { "q": query, "format": "json", "number_of_results": self.valves.RETURNED_SCRAPPED_PAGES_NO, } try: await emitter.emit("Sending fallback request to search engine") resp = requests.get( search_engine_url, params=params, headers=self.headers, timeout=120 ) resp.raise_for_status() data = resp.json() results = data.get("results", []) if not results: await emitter.emit("No results found in fallback search.") return json.dumps({"message": "No results found after fallback."}) limited_results = results[: self.valves.SCRAPPED_PAGES_NO] await emitter.emit( f"Retrieved {len(limited_results)} fallback search results" ) except requests.exceptions.RequestException as e: await emitter.emit( status="error", description=f"Error during fallback search: {str(e)}", done=True, ) return json.dumps({"error": str(e)}) # Elabora i risultati di fallback... return await self.process_results(limited_results) async def get_website( self, url: str, __event_emitter__: Callable[[dict], Any] = None ) -> str: """ Effettua lo scraping del sito web fornito e ne recupera il contenuto. Questa funzione è essenziale per permettere al modello di aprire e analizzare siti web quando riceve un URL dall'utente. :param url: L'URL del sito web da scrappare. :param __event_emitter__: Funzione di callback per emettere eventi (opzionale). :return: Il contenuto del sito web in formato JSON. """ functions = HelpFunctions() # Instanza della classe di supporto emitter = EventEmitter(__event_emitter__) # Inizializza l'emettitore di eventi await emitter.emit(f"Fetching content from URL: {url}") results_json = [] try: response_site = requests.get(url, headers=self.headers, timeout=120) response_site.raise_for_status() html_content = response_site.text await emitter.emit("Parsing website content") soup = BeautifulSoup(html_content, "html.parser") page_title = soup.title.string if soup.title else "No title found" page_title = unicodedata.normalize("NFKC", page_title.strip()) page_title = functions.remove_emojis(page_title) title_site = page_title url_site = url content_site = functions.format_text( soup.get_text(separator=" ", strip=True) ) truncated_content = functions.truncate_to_n_words( content_site, self.valves.PAGE_CONTENT_WORDS_LIMIT ) result_site = { "title": title_site, "url": url_site, "content": truncated_content, "excerpt": functions.generate_excerpt(content_site), } results_json.append(result_site) if self.valves.CITATION_LINKS and __event_emitter__: await __event_emitter__( { "type": "citation", "data": { "document": [truncated_content], "metadata": [{"source": url_site}], "source": {"name": title_site}, }, } ) await emitter.emit( status="complete", description="Website content retrieved and processed successfully", done=True, ) except requests.exceptions.RequestException as e: results_json.append( { "url": url, "content": f"Failed to retrieve the page. Error: {str(e)}", } ) await emitter.emit( status="error", description=f"Error fetching website content: {str(e)}", done=True, ) return json.dumps(results_json, ensure_ascii=False) async def process_results(self, results: list) -> str: """Processa i risultati ottenuti. Questa funzione elabora la lista di risultati e restituisce il contenuto in formato JSON. :param results: Lista di risultati da elaborare. :return: I risultati elaborati in formato JSON. """ results_json = [] for result in results: # Implementa la logica di elaborazione dei risultati... results_json.append(result) # Aggiungi il risultato elaborato alla lista return json.dumps(results_json, ensure_ascii=False)
Author
Owner

@michaelpoluektov commented on GitHub (Nov 22, 2024):

Oh I think I understand now: you've got __event_emitter__ in your docstring

@michaelpoluektov commented on GitHub (Nov 22, 2024): Oh I think I understand now: you've got `__event_emitter__` in your docstring
Author
Owner

@Simi5599 commented on GitHub (Nov 22, 2024):

Oh 😂 that's unexpected.

Couldn't we just filter out these kinds of parameters via the simple if?

@Simi5599 commented on GitHub (Nov 22, 2024): Oh 😂 that's unexpected. Couldn't we just filter out these kinds of parameters via the simple if?
Author
Owner

@michaelpoluektov commented on GitHub (Nov 22, 2024):

So simple fix for now, just remove:

"""
:param __event_emitter__:
"""

from your docstring.

Otherwise I put up a PR: can you check that it works?

https://github.com/open-webui/open-webui/pull/7263

@michaelpoluektov commented on GitHub (Nov 22, 2024): So simple fix for now, just remove: ```python """ :param __event_emitter__: """ ``` from your docstring. Otherwise I put up a PR: can you check that it works? https://github.com/open-webui/open-webui/pull/7263
Author
Owner

@Simi5599 commented on GitHub (Nov 22, 2024):

Can't check this at the moment, but I saw your PR, and it should work.

@Simi5599 commented on GitHub (Nov 22, 2024): Can't check this at the moment, but I saw your PR, and it should work.
Author
Owner

@tjbck commented on GitHub (Nov 23, 2024):

0.4.4 will be released shortly!

@tjbck commented on GitHub (Nov 23, 2024): 0.4.4 will be released shortly!
Author
Owner

@Simi5599 commented on GitHub (Nov 23, 2024):

Can confirm this was resolved with 0.4.4 !

@Simi5599 commented on GitHub (Nov 23, 2024): Can confirm this was resolved with 0.4.4 !
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#2784