[GH-ISSUE #18596] refactor: replace requests with Firecrawl API in search_firecrawl function and update Firecrawl version in requirements #18647

Closed
opened 2026-04-20 00:51:28 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @wei840222 on GitHub (Oct 24, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/18596

Check Existing Issues

  • I have searched all existing open AND closed issues and discussions for similar requests. I have found none that is comparable to my request.

Although not exactly the same, this reconstruction can solve the problem mentioned in this discussion.
https://github.com/open-webui/open-webui/discussions/17814

Problem Description

The current implementation for interacting with the Firecrawl service in search_firecrawl and SafeFireCrawlLoader is not optimal. It uses the requests library to manually make API calls and processes URLs sequentially when scraping. This approach is less efficient and requires more boilerplate code than using the features available in the official firecrawl-py library.

Desired Solution you'd like

I propose to refactor the existing code to fully leverage the firecrawl-py [1] library. The desired changes are:

Replace the manual requests.post call in search_firecrawl with the library's native firecrawl.search method.
Update SafeFireCrawlLoader to use the firecrawl.batch_scrape[2] method for both synchronous and asynchronous loading. This will process multiple URLs in a single, more efficient batch operation.
Update the firecrawl-py dependency in requirements.txt from version 1.12.0 to 4.5.0 to support these new features.
This will result in cleaner, more maintainable code and improved performance for web retrieval tasks.

Example implementation: 32c7673eee

[1] https://docs.firecrawl.dev/sdks/python
[2] https://docs.firecrawl.dev/sdks/python#batch-scrape

Alternatives Considered

The alternative is to maintain the current implementation. However, this would mean missing out on the performance benefits of batch scraping and continuing to maintain manual HTTP request logic, which is less robust and more complex than using the official library's abstractions.

Additional Context

This is a technical refactoring aimed at improving code quality and performance. The changes primarily affect firecrawl.py and utils.py. This refactoring aligns with best practices by using the official client library for interacting with the Firecrawl API.

Originally created by @wei840222 on GitHub (Oct 24, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/18596 ### Check Existing Issues - [x] I have searched all existing open AND closed issues and discussions for similar requests. I have found none that is comparable to my request. Although not exactly the same, this reconstruction can solve the problem mentioned in this discussion. https://github.com/open-webui/open-webui/discussions/17814 ### Problem Description The current implementation for interacting with the Firecrawl service in `search_firecrawl` and `SafeFireCrawlLoader` is not optimal. It uses the requests library to manually make API calls and processes URLs sequentially when scraping. This approach is less efficient and requires more boilerplate code than using the features available in the official `firecrawl-py` library. ### Desired Solution you'd like I propose to refactor the existing code to fully leverage the `firecrawl-py` [1] library. The desired changes are: Replace the manual `requests.post` call in `search_firecrawl` with the library's native `firecrawl.search` method. Update `SafeFireCrawlLoader` to use the `firecrawl.batch_scrape`[2] method for both synchronous and asynchronous loading. This will process multiple URLs in a single, more efficient batch operation. Update the `firecrawl-py` dependency in requirements.txt from version 1.12.0 to 4.5.0 to support these new features. This will result in cleaner, more maintainable code and improved performance for web retrieval tasks. Example implementation: https://github.com/wei840222/open-webui/commit/32c7673eeeb60b5b2b96703afad22122aebaaa19 [1] https://docs.firecrawl.dev/sdks/python [2] https://docs.firecrawl.dev/sdks/python#batch-scrape ### Alternatives Considered The alternative is to maintain the current implementation. However, this would mean missing out on the performance benefits of batch scraping and continuing to maintain manual HTTP request logic, which is less robust and more complex than using the official library's abstractions. ### Additional Context This is a technical refactoring aimed at improving code quality and performance. The changes primarily affect [firecrawl.py](https://github.com/open-webui/open-webui/blob/main/backend/open_webui/retrieval/web/firecrawl.py) and [utils.py](https://github.com/open-webui/open-webui/blob/main/backend/open_webui/retrieval/web/utils.py). This refactoring aligns with best practices by using the official client library for interacting with the Firecrawl API.
Author
Owner

@tjbck commented on GitHub (Oct 26, 2025):

PR welcome!

<!-- gh-comment-id:3448044836 --> @tjbck commented on GitHub (Oct 26, 2025): PR welcome!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#18647