mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 11:28:35 -05:00
[GH-ISSUE #18596] refactor: replace requests with Firecrawl API in search_firecrawl function and update Firecrawl version in requirements #34176
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @wei840222 on GitHub (Oct 24, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/18596
Check Existing Issues
Although not exactly the same, this reconstruction can solve the problem mentioned in this discussion.
https://github.com/open-webui/open-webui/discussions/17814
Problem Description
The current implementation for interacting with the Firecrawl service in
search_firecrawlandSafeFireCrawlLoaderis not optimal. It uses the requests library to manually make API calls and processes URLs sequentially when scraping. This approach is less efficient and requires more boilerplate code than using the features available in the officialfirecrawl-pylibrary.Desired Solution you'd like
I propose to refactor the existing code to fully leverage the
firecrawl-py[1] library. The desired changes are:Replace the manual
requests.postcall insearch_firecrawlwith the library's nativefirecrawl.searchmethod.Update
SafeFireCrawlLoaderto use thefirecrawl.batch_scrape[2] method for both synchronous and asynchronous loading. This will process multiple URLs in a single, more efficient batch operation.Update the
firecrawl-pydependency in requirements.txt from version 1.12.0 to 4.5.0 to support these new features.This will result in cleaner, more maintainable code and improved performance for web retrieval tasks.
Example implementation:
32c7673eee[1] https://docs.firecrawl.dev/sdks/python
[2] https://docs.firecrawl.dev/sdks/python#batch-scrape
Alternatives Considered
The alternative is to maintain the current implementation. However, this would mean missing out on the performance benefits of batch scraping and continuing to maintain manual HTTP request logic, which is less robust and more complex than using the official library's abstractions.
Additional Context
This is a technical refactoring aimed at improving code quality and performance. The changes primarily affect firecrawl.py and utils.py. This refactoring aligns with best practices by using the official client library for interacting with the Firecrawl API.
@tjbck commented on GitHub (Oct 26, 2025):
PR welcome!