mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 19:08:59 -05:00
[GH-ISSUE #687] Uploading documents connects to external web services such as an AWS ELB? #27707
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @prologic on GitHub (Feb 9, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/687
Bug Report
Description
Bug Summary:
I tried to upload a document to my locally hosted instance of Ollama Web UI and to my horror I discovered that the Docker container (running Ollaba Web UI) wanted to connect to an AWS ELB?! Naturally I blocked this connection (thanks to LittleSnitch). Then it wanted to connect to another external services, some packages (didn't capture it).
Steps to Reproduce:
Expected Behavior:
I don't know wtf this is trying to do, but I really DO NOT expect a locally hosted instance of anything to be connecting externally to some 3rd-party services (within reason of course). This is absurd.
At the very least, could someone please explain why this is happening and what this is even used for? Maybe it's legit and required for some part of the "Upload Document" user journey to work?
Actual Behavior:
I expect locally hosted software to NOT connect to external services. The whole point of using Ollama in the first place is to run local LLM models 😅
Environment
Not really relevant. But Docker container on a Mac.
PS: Your Issue template is too long. Please simplify it, I don't generally have and time and patience to fill out everything asked, especially of a vision impaired person. It also takes some of the "human"(ity) out of helping to contribute to "better" open source software.
@prologic commented on GitHub (Feb 9, 2024):
FWIW blocking the two connections didn't appear to affect the functionality of Uploading a document. I was later able to select it and use it in context with
#, so I'm really confused as to why those connections are even necessary at all 🤔@tjbck commented on GitHub (Feb 9, 2024):
Hi, Thanks for reporting this issue. Could you verify that AWS ELB connection is 100% occurring from the webui-side? Our backend code does no contain any code that explicitly makes connection with AWS ELB, so my guess is the request is made from one of our dependency libraries. If you could narrow down what part of the code making the connection, that would be tremendously helpful, Thanks!
@prologic commented on GitHub (Feb 10, 2024):
Yup makes sense!
I'll try to narrow this down 👌 As you said, If you're not doing this explicitly in this codebase then I consider a sneaky supply chain type of thing 🤣
@prologic commented on GitHub (Feb 10, 2024):
So here we go:
Text version(s):
Screenshots:


@prologic commented on GitHub (Feb 10, 2024):
Note that this is the container itself trying to do this, so something to do with the backend.
@prologic commented on GitHub (Feb 10, 2024):
Doing a search for the 2nd connection yield this:
d11c70cf83/unstructured/utils.py (L287-L319)Are we using this in the backenda anywhere? 🤔
@tjbck commented on GitHub (Feb 10, 2024):
Here's a list of our suspects:
@prologic commented on GitHub (Feb 10, 2024):
We are:
cb5520c519/backend/requirements.txt (L25)Why does it need to connect to an external service? 🤔
@prologic commented on GitHub (Feb 10, 2024):
I can't figure out this random ELB though, might need some help figuring that one out. But at least we have some culprits now.... The question is, what do we do about it? Blocking both doesn't adversely affect Ollama Web UI in any way that I can tell hmmm
@prologic commented on GitHub (Feb 10, 2024):
Oh wow!
If this library is sending analytics, that's disgusting 😱
@tjbck commented on GitHub (Feb 10, 2024):
UnstructuredMarkdownLoaderseems to be the culprit, investigating more.@prologic commented on GitHub (Feb 10, 2024):
I have half a mind to go yell at this company and ask them to please explain themselves 🤣 Shame on them!
@tjbck commented on GitHub (Feb 10, 2024):
Just reviewed the code, I reckon setting
DO_NOT_TRACKenv var toTruewill stop the telemetry, could you try testing it?@prologic commented on GitHub (Feb 10, 2024):
Love it! Let's do it, happy to test the fix 👌
@prologic commented on GitHub (Feb 10, 2024):
And thank you for responding to this so quickly! When you're self hosting and insisting on doing things locally, you really don't expect your software to reach out to the internet without you knowing about it 😅
@prologic commented on GitHub (Feb 10, 2024):
Some kudos I posted for you 😅
@justinh-rahb commented on GitHub (Feb 10, 2024):
Good find guys, ya that definitely not nice of them to do. Is there any disclosure from the libary anywhere?
@prologic commented on GitHub (Feb 10, 2024):
Are you suggesting we file a bug upstream too? It was a bit of a rude surprise to be honest 😅
@tjbck commented on GitHub (Feb 10, 2024):
@justinh-rahb none I can find from their readme :/
EDIT: they do mention at the very bottom of their readme to set the environment variable
SCARF_NO_ANALYTICS=true.@tjbck commented on GitHub (Feb 10, 2024):
Added
with #694, it should disable the telemetry. Please try it out and let me know!
@justinh-rahb commented on GitHub (Feb 10, 2024):
With RAG being as hot as it is right now, I guess we shouldn't be surprised that some libary authors are cashing in on the user data flowing through their code.
Perhaps it'll be prudent to think about dependency audits in the future. With Ollama now supporting a broad range of CPU-only configurations, it can be integrated into GitHub Actions, along with Ollama-WebUI for thorough end-to-end testing. I'm going to give this a think over the weekend, I seem to recall there being a thread in discussions about using the webUI API directly that may come in handy here, time to do some research...
@tjbck commented on GitHub (Feb 13, 2024):
@prologic has the issue been resolved with the latest release?
@prologic commented on GitHub (Feb 13, 2024):
I pulled the latest Docker image and restarted my local instance and so far so good 😊
@tjbck commented on GitHub (Feb 14, 2024):
I'll close this issue for now, feel free to open new issues if you encounter any spywares from the dependency supply chain, thanks!
@aswani-ms commented on GitHub (Jun 21, 2024):
Do you have an example code how to upload a document programatically through an api? is it possible