mirror of
https://github.com/open-webui/open-webui.git
synced 2026-03-25 04:24:30 -05:00
Memory leak / overflow when using Llama 3.1 models on Ollama backend #1784
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @TheSeraph on GitHub (Aug 15, 2024).
Bug Report
Installation Method
Home gaming PC running virtualbox & ollama server.
Open-WebUI is running as a docker container on a VM in virtualbox, reverse proxied by Nginx, and alongside a set of docker containers for n8n.
Ollama is runnign on the host machine, which has all the Compute resources (CPU, RAM, VRAM) that is being affected by the memory leak
Environment
Open WebUI: v0.3.12 in docker container, on Ubuntu 24.04 LTS VM
Ollama v0.4.3 on Windows 10 Pro (Host)
Operating System: Win 10 Pro + Ubuntu 24.04 LTS
CPU: AMD Ryzen 7 5800x
Memory: 64gb DDR4
GPU: PowerColor Hellhound 7900 XTX (24 gb vram)
Browser (if applicable): Firefox 129.0.1
Confirmation:
Expected Behavior:
I expect to run smaller Llama3.1 modelfiles (16gb or less) completely within my 24gb of VRAM of my GPU
Actual Behavior:
When running llama3.1:latest (4.7gb model) and llama3.1:8b-instruct-fp16 (16gb model) through Open-WebUI there seems to be a memory leak, or a runaway memory problem on the system hosting Ollama, causing models to overflow VRAM, and take up other system resources
Description
Bug Summary:
When running the models with Open-WebUI, the models suddenly take up an amount of memory that is multiples of their actual size. This has been tested with llama3.1:latest (4.7gb model) and llama3.1:8b-instruct-fp16 (16gb model) and both have shown this type of behaviour.
This is running llama3.1:latest through the ollama CLI

This is what happens running it through Open-WebUI. You can see at first it's normal, but then it goes rogue taking up 4.7 times the amount of resources it does under normal operation. This model is only 4.7gb

Interestingly enough other models don't necessarily show the same behaviour. For example dolphin-llama3:8b (4.7gb) does not seem to balloon when used through Open-WebUI.
In order to make a comparison, I did use Ollama CLI and n8n as benchmarks for memory and processor usage. This doesn't occur when using those other interfaces to the LLM, just open-webui
Reproduction Details
Steps to Reproduce:
ollama run llama3.1:latest/byeollama psor any other tools (task manager, GPU tools)ollama psor any other tools (task manager, GPU tools)Logs and Screenshots
NOTICE!! For the purpose of logging I restarted the container to get a relatively "clean" slate of events to debug
Browser Console Logs:
[Include relevant browser console logs, if applicable]
Docker Container Logs:
Screenshots/Screen Recordings (if applicable):
See above
Additional Information
N/A